How Do You Delete Web Pages?
If you haven’t figured it out yet this is a serious of posts continuing from where I left off back in November, after deleting over 140 blog posts.
With that being said, how do you delete your web pages or blog posts?
I’m a firm believer that simply deleting or removing web pages isn’t as simple as deleting it from the web server and walking away. A few minor adjustments need to be made, so that everyone & bots know exactly what has happened to the content.
One way of letting the search bots know what we’ve done with certain content is by making use of HTTP Status Codes such as:
- 200 OK
- 301 Moved Permanently
- 302 Found
- 304 Not Modified
- 307 Temporary Redirect
- 400 Bad Request
- 401 Unauthorized
- 403 Forbidden
- 404 Not Found
- 410 Gone
- 500 Internal Server Error
- 501 Not Implemented
People use all sorts of whacky configurations when a web page is deleted from the server but the correct status code to serve would be the 410 Gone error. This is debatable and not widely practised (more on this later).
Never use 301 Moved Permanently On Deleted Pages Or Error Pages
Its very common practice out there for people to redirect their error pages back into their home page, bad idea and here’s why:
- Its misuse of the HTTP status codes
- Telling bots content has moved when it actually hasn’t (confusing them?)
- Confused bots can do horrible things
- It could confuse people on why the page they requested has redirected back to the home page
OK so some over exaggeration there but hopefully you get my point on why it could be a bad idea using a 301 redirect on an error page.
Make Sure Not To Serve 200 OK Status Code For Deleted Content
As the status code says, 200 OK. This is the code for content being found on the web server. One common mistake is people failing to configure their error pages correctly so when a deleted page is requested and the error page is served it comes back with a 200 OK, this confirms the content is still there when its not. One way to check this is to use a tool such as HTTP Status Codes Checker and see what your error page displays by entering a false URL on your domain. There is plenty of information on-line regarding this issue, I particularly like this GSitecralwers article.
The 410 Gone Status Code
The requested resource is no longer available at the server and no forwarding address is known.
The 404 Not Found Status Code
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.
As you clearly see the above explains the exact difference between a 404 error & a 410, so why are most people using a 404 status code on a URL that has been deleted forever? Ah Shite, why am I even doing it?
Google Engineer Matt Cutts states:
So many webmasters misuse 404 vs. 410 that I don’t expect we’ll distinguish between them any time soon.
Right now I’m staring at 16 URL’s serving the 404 (Not found) status code. These are remainders of the blog posts that I had deleted late last year. The last recent date Google requested these URL’s is Feb 21, 2008 so they are still coming back looking for them, exactly how they should considering I haven’t indicated if it was a temporary or permanent deletion.
So will a 410 Error code stop Google from requesting those URL’s? I know I can use Google Webmaster Tools to remove URL’s from Google’s database but I’m more interested in the 410 Gone method for now and from Matt’s comment and other comments there are not going to be treated differently.
Mark Pilgrim gives a brilliant guide and interesting discussion on HTTP Error 410: Gone, even if it is a 5 year old blog post.
I’ve been looking at few sites seeing how people use the status codes and what information they provide for non-existent URL’s. It’s interesting to see how people pay very little attention to their own error pages which will lead me onto a follow up post on putting your error pages to good use, more specifically a 404 Not Found page. Take a look at mine, its very boring.
I know this is a very old topic but its something that has been sparking my interest since I’ve deleted my previous blog.
Comments
8 Responses to “How Do You Delete Web Pages?”
Leave a Reply









If you run up a new version of a site with a few page deletions,is it nessary to inform the search bots that you have removed the pages from the site,or is the standard server 404 enough
Most times the 404 is the only option but for the likes of Google its not good enough. The Google bot will always come back looking for the removed content because remember, 404 simply means Not Found and can usually be caused by a number of things i.e. server is offline during the bot visit.
If a search engine as the option available to inform them of the removed URL then yes it is good practice to inform them.
You can also block the bots from trying to crawl the removed URL by using the robots.txt file.
I use the 404 error page, and this seems to work well for me.
would it have any ranking ramifications for the site if google bot was hitting 404 error pages we have run in a few versions of the site in past few months with pages being indexed then deleted with new version ,
Not really but thats keeping in mind that you’ve fixed any pages that are linking to the old delete pages, making sure you have no broken links throughout the site.
will do thanks
i recently completed a large ish site, all the product pages had a corresponding pdf file same written content, google indexed all the pdf product pages and not the html, i have run a robots.txt to exclude the pdf folder, should i still remove the pdf urls from google ? and as some of the pdfs are pos 1 with google will the html version drop in at the same pos eventually ? any thoughts
thanks Mick
Mick, I haven’t seen this first hand myself but I would tend to think using PDF’s should be ok with html files, even with the same content.
You could exclude the PDF file from Google and see what effect it would have on the HTML pages, it could rule out the PDF’s as being part of the problem.