Google Magic
Just to get started, I am reposting an interesting article:
From: Reggie
Okay - what is the "magic" to getting listed at the top of Google? I've done a keyword search and then tear apart the sites that pop up. Most have several things in common, one of which is this in their header:meta name="GOOGLEBOT" content="NOARCHIVE"
Or they may have:
META NAME="robots" CONTENT="index,follow
As a newbie to Web page design, what are these telling the robots to
do, and does it help in the rankings?
Thanks!
Reggie
~~~Jill's Response~~~
Hi Reggie,
As I told my seminar participants the other day, everyone knows that the magic secret to ranking highly in Google is to simply place all the secret ingredients into a big pot, mix them all together, wave your magic wand, say the magic words and -- POOF! -- you'll have high rankings for life! Who needs some stinkin' code when you have a good magic wand? (I'll be selling my designer magic wands at the next
seminar so be sure to be there!)
All right...lest some of you green newbies think I'm serious... of course, I'm just kidding!
But the truth is that the code you mentioned is no more the secret to success than my magic wand.
The first code you mentioned, "noarchive," could actually do you more harm than
good. That's the tag you use when you *don't* want Google to place your page in their cache. (There's a little cache link next to most pages in Google's
results, which brings you to Google's latest copy of your page.) Most site owners don't care if their pages are archived and show up in Google's cache and therefore they don't use the noarchive tag. Those that might care are ones who feel that Google is somehow infringing on their privacy or copyright by storing their pages in the Google cache.
The others who would prefer to have their pages stay out of Google's cache are generally those who are doing something sneaky that they don't want the good people at Google to find out about. Those who are using cloaking methods to show the search engines one thing and the users something else will often use the
noarchive tag to make it less obvious what they are doing. However, since
most people *don't* use that tag, those that do open their sites up to some
scrutiny. Which is why I said that it might do you more harm than good.
What happens if Google finds that 90% of the pages using that tag are cloaking? There's nothing to stop them from deciding one day that they won't index any page that uses the noarchive tag. Remember, it's Google's index and they are a private company. Now, I know that Google would prefer to have all legitimate pages in their database, and therefore, I doubt they'd go to that extreme. But if you want
your pages indexed, I would avoid using the noarchive tag at all costs.
The second piece of code you asked about also has nothing to do with a page's High RankingsĀ® in Google (or any other search engine). Supposedly that code is there to tell the search engine robots that it's okay to index the page and to follow the
links to the inner pages. However, the default for the robots is to index all pages
unless they are told *not* to do so. In other words, you might use the robots Meta tag if you *didn't* want the search engines to add your page to their database for whatever reason. And in that case, you'd say "nofollow" and "noindex" in the tag. Under those circumstances, to be on the safe side, you should also put up a
robots.txt file on your server which excludes the robots from wherever you *don't* want them to go.
You can learn more about how to use these tags at this Google help page.
In the past, I couldn't understand why anyone would want to keep the search
engine spiders out of their site, but I have found a few reasons for this
over the years. For instance, if you have a downloadable product page that
needs to be paid for before one is allowed access, you'd want to exclude that page or directory from being spidered in your robots.txt document.
When using the robots.txt file for exclusions, be sure that you don't list actual file URLs you want excluded, and instead place the file in an excluded directory. Otherwise, you're actually giving hackers a roadmap to your juicy stuff. Anyone can visit a site and look at its robots.txt file. It's pretty fun actually, as you can find all sorts of interesting tidbits that nobody wants you to find! Try it by using your favorite site's URL with /robots.txt tacked onto the end.
