Send As SMS


 

« Home | Some Blog Traffic Remedies - Getting More Directory/Search Engine Referrals » | Visitor Loyalty - Deciphering Web Metrics Pt 1 » | Announcing the Launch of the Tech-Watch Blog » | More on Tracking Visitors for Multiple Blogs » | Tips To Increase Your Search Engine Positioning Pt 1 » | One Way To Keep Readers From Leaving Your Blog/Site Too Quickly » | Another Way To Incorporate One Blog Into Another » | A Brief Rundown of Feedburner Services and Why You Should Try Them » | The Reverse Long Tail of Blog Visitors and the Long Tail of Quantum Levels of Blogging » | Did Your Google Page Rank Change Pt 2 - Don't Panic »

Wednesday, December 07, 2005

Communication Breakdown - Need To Keep Your Blogs or Webpages From The Search Engines?

Several weeks ago, while I was toying around using the WordPress blogging platform for a test blog, I set the software up live for evaluation. Within an 1 hour, my wordpress directory had had 4 search engines crawl and index it. I had no intention of making that directory public, at all. So how did the engines crawl it all, and so fast at that?

The answer is twofold. Firstly, WordPress itself automatically sends out a ping to various search engines and blog directories whenever you publish a new blog entry. I believe that you can turn this off, but I haven't yet explored all the nuances of WordPress.

Many search engines also love platforms such as WordPress, and come a-crawling faster if your URL includes "/wordpress" in it. (That's apparently also true for URLs with "/blog" in them.)

So how do you stop search engines from crawling a directory you don't really want public? That brings us to the second part of our answer. The quick solution is to remove any "robots.txt" file that you have at the top of your web server.

If you need to keep the robots.txt file intact because you have other directories that are live, the tweak the robots.txt file to "disallow" certain directories. The big engines will respect any instructions to "disallow" certain directories.

If you are just testing WordPress (or any blogging platform), it's best to test the software locally on your computer before going live with it.

Unfortunately, spammers often use the robots.txt to see what directories you are trying to "hide". If someone wants to index your site and knows where your directories are, they can still index your private directories. A robots.txt file basically reveals this info. Spammers often send their own spiders out across the net to scrape for content or parse for email addresses for spamming.

The only way to stop such people is to implement a site-wide script (PHP, Perl, or whatever) that blocks anyone from a list of IP addresses. This requires some technical knowledge about webmastering, or hiring someone to do it for you.

I don't want to get into a lengthy discussion about the details of the robots.txt file or IP-blocking just yet, but if you need to know, please feel free to contact me at rdash001-at-yahoo-dot-ca (email address mangled to fool spambots). I will eventually post a "resource" page for both robots.txt and IP-blocking details.

Links: Wordpress.

(c) Copyright: 2005-present, Raj Kumar Dash, http://blogspinner.countwordula.com/

Technorati : , , , , , ,
Del.icio.us : , , , , , ,
Ice Rocket : , , , , , ,
Buzznet : , , , , , ,


E-mail this post



Remenber me (?)



All personal information that you provide here will be governed by the Privacy Policy of Blogger.com. More...

Add a comment

 


Blogspinner V2.0
 
This site is intended as a how-to guide to blogging for new/recent bloggers. Topics covered include writing, blogging platforms and client software, generating ad revenue, analyzing blog statistics +managing multi-blogs.

Note: If you are absolutely new to blogging, please read this series of webpages first: Intro to Blogging
About Me
I'm a geek/ philosopher/ composer/ artist/ cook/ photographer/ web programmer/ blah-blah-blah who is also a published writer and author. The need to write runs through my veins and this blog documents my experiences with my other blogs.

 
Archives
Internet Blog Top Sites


Technology Blogs by Indian Bloggers

Subscribe in NewsGator Online
Add 'BlogSpinner V2.0' to Newsburst from CNET News.com
Subscribe in Bloglines

Earn advertising revenue for your blog or website
Download the Instant Buzz traffic toolbar
SEO Made Easy - Free E-Book
BlogMad: Traffic to your blog
Button Creator for Free
Web blogspinner
  v1 archives
make money with ads by Google



Used books, out-of-print books, rare books at Biblio




Media Devils Blog Ad
(c) Copyright: 2005-present, Raj Kumar Dash, http://blogspinner.countwordula.com/