Monthly Archives: March 2007

Welcome

Welcome to my online world.

I can not say exactly where I plan this blog to go. I have started at least 10 other blogs and have been all over the map on subject matter.

The fact is that I am an online junkie. I spend the majority of any free time I have surfing, blogging, hacking, emailing and on and on.

Having been involved with PC’s since the late 80’s I have certainly had an opportunity to work and play with all types of software and hardware. For some reason the software and programming side of things was always where I felt most comfortable, and I would expect that this trend will certainly continue here.

WordPress Duplicate Content

One of the problems with WordPress is that it creates a lot of duplicate content, and there have been a lot of conversations on whether search engines, especially Google penalize you for duplicate content. It seems that it is not a penalty but a filter.

If you create a standard WordPress site you will have same post display under the specific post, Archives, Categories, Feeds and Trackbacks. And the Archives and categories can create several duplicates by themselves depending on the site settings.

When a crawler visits the site it must decide which of these pages is most relevant, and the majority of time it does not pick the one you would. I have tested this on several sites and found that the Category pages usually get listed ahead or in place of the actual post.

There are 2 ways to cure this. One is with a robots.txt file and the other is with an IF statement in the code. The downside to using only a robots.txt file solution is that it is a blanket being thrown over a specific problem and sometimes you can trap a good page by accident.

A robots file should be used to block out some of the core files in WordPress, as follows.

User-agent: *
Disallow: /category/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-admin/
Disallow: /wp-
Disallow: /about/trackback/
Disallow: /wp-register.php
Disallow: /wp-login.php
Disallow: /trackback/
Disallow: /feed/

Now to stop all the other duplicate content place the following statement in the header file right before the first occurrence of Meta…

<?php if(is_home() | is_single() | is_page()){
echo ‘<meta name=”robots” content=”index,follow”>’;
} else {
echo ‘<meta name=”robots” content=”noindex,follow”>’;
}?>

This will make all pages that are not the Home page, or a post page or a static page tell the Robots NOT index, but follow all links.

With these changes the site should get a very clean and accurate index listing.