How Do You Delete Web Pages?

If you haven’t figured it out yet this is a serious of posts continuing from where I left off back in November, after deleting over 140 blog posts.

With that being said, how do you delete your web pages or blog posts?
I’m a firm believer that simply deleting or removing web pages isn’t as simple as deleting it from the web server and walking away. A few minor adjustments need to be made, so that everyone & bots know exactly what has happened to the content.

One way of letting the search bots know what we’ve done with certain content is by making use of HTTP Status Codes such as:

People use all sorts of whacky configurations when a web page is deleted from the server but the correct status code to serve would be the 410 Gone error. This is debatable and not widely practised (more on this later).

Never use 301 Moved Permanently On Deleted Pages Or Error Pages

Its very common practice out there for people to redirect their error pages back into their home page, bad idea and here’s why:

OK so some over exaggeration there but hopefully you get my point on why it could be a bad idea using a 301 redirect on an error page.

Make Sure Not To Serve 200 OK Status Code For Deleted Content

As the status code says, 200 OK. This is the code for content being found on the web server. One common mistake is people failing to configure their error pages correctly so when a deleted page is requested and the error page is served it comes back with a 200 OK, this confirms the content is still there when its not. One way to check this is to use a tool such as HTTP Status Codes Checker and see what your error page displays by entering a false URL on your domain. There is plenty of information on-line regarding this issue, I particularly like this GSitecralwers article.

The 410 Gone Status Code

The requested resource is no longer available at the server and no forwarding address is known.

The 404 Not Found Status Code

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.

As you clearly see the above explains the exact difference between a 404 error & a 410, so why are most people using a 404 status code on a URL that has been deleted forever? Ah Shite, why am I even doing it?

Google Engineer Matt Cutts states:

So many webmasters misuse 404 vs. 410 that I don’t expect we’ll distinguish between them any time soon.

Right now I’m staring at 16 URL’s serving the 404 (Not found) status code. These are remainders of the blog posts that I had deleted late last year. The last recent date Google requested these URL’s is Feb 21, 2008 so they are still coming back looking for them, exactly how they should considering I haven’t indicated if it was a temporary or permanent deletion.

So will a 410 Error code stop Google from requesting those URL’s? I know I can use Google Webmaster Tools to remove URL’s from Google’s database but I’m more interested in the 410 Gone method for now and from Matt’s comment and other comments there are not going to be treated differently.

Mark Pilgrim gives a brilliant guide and interesting discussion on HTTP Error 410: Gone, even if it is a 5 year old blog post.

I’ve been looking at few sites seeing how people use the status codes and what information they provide for non-existent URL’s. It’s interesting to see how people pay very little attention to their own error pages which will lead me onto a follow up post on putting your error pages to good use, more specifically a 404 Not Found page. Take a look at mine, its very boring. :-(

I know this is a very old topic but its something that has been sparking my interest since I’ve deleted my previous blog.

Share and Enjoy:
  • del.icio.us
  • Digg
  • Google
  • Sphinn
  • StumbleUpon
  • Technorati
  • TwitThis
  • E-mail this story to a friend!
  • Print this article!

Comments

8 Responses to “How Do You Delete Web Pages?”

  1. tipperary design on March 15th, 2008 3:58 pm

    If you run up a new version of a site with a few page deletions,is it nessary to inform the search bots that you have removed the pages from the site,or is the standard server 404 enough

  2. Gavin on March 20th, 2008 8:22 pm

    Most times the 404 is the only option but for the likes of Google its not good enough. The Google bot will always come back looking for the removed content because remember, 404 simply means Not Found and can usually be caused by a number of things i.e. server is offline during the bot visit.

    If a search engine as the option available to inform them of the removed URL then yes it is good practice to inform them.

    You can also block the bots from trying to crawl the removed URL by using the robots.txt file.

  3. new zealand on March 23rd, 2008 1:51 am

    I use the 404 error page, and this seems to work well for me.

  4. tipperary design on March 24th, 2008 6:03 pm

    would it have any ranking ramifications for the site if google bot was hitting 404 error pages we have run in a few versions of the site in past few months with pages being indexed then deleted with new version ,

  5. Gavin on March 25th, 2008 5:05 pm

    Not really but thats keeping in mind that you’ve fixed any pages that are linking to the old delete pages, making sure you have no broken links throughout the site.

  6. tipperary design on March 25th, 2008 6:29 pm

    will do thanks

  7. tipperary design on April 3rd, 2008 8:54 pm

    i recently completed a large ish site, all the product pages had a corresponding pdf file same written content, google indexed all the pdf product pages and not the html, i have run a robots.txt to exclude the pdf folder, should i still remove the pdf urls from google ? and as some of the pdfs are pos 1 with google will the html version drop in at the same pos eventually ? any thoughts
    thanks Mick

  8. Gavin on April 7th, 2008 7:50 pm

    Mick, I haven’t seen this first hand myself but I would tend to think using PDF’s should be ok with html files, even with the same content.

    You could exclude the PDF file from Google and see what effect it would have on the HTML pages, it could rule out the PDF’s as being part of the problem.

Leave a Reply