Article- 3/ SEARCH ENGINE OPTIMIZATION
Again, there are all sorts of books and experts who can help you improve your SEO and win a top spot for a site but there are also a few simple things—and a number of creative things—that you can do for yourself.
3.1 Control Your Search Engine Appearances with Robot.txt
One little-known fact about search engines is that you get to say which pages are indexed by the search engines.
Or rather, when the search engines’ robots come around, you can, if you wish, tell them to go away.
You do that with a file called robots.txt.
Robots.txt simply contains a record of which robots should index which pages.
Without going into too much detail, there are two conventions used in a robots.txt file:
User-agent: [Defines which robots the site is addressing.] Disallow:[Allows you to list the sites or robots you want to exclude.]
In general, you’re probably going to use “User-agent: *” to make sure that you’re addressing the robots of every search engine and you’ll probably want include all of your pages (although you might want to exclude your directories: “Disallow: /cgi-bin/”).
That will be the default setting on most websites and it allows all of your pages to be reached by search engines.
So when would you want to access the robots.txt file and start keeping pages and robots away?
If you’re charging for content, you might not want to allow people to access your articles through search engines, and especially their cached entries. When News International put up its paywall, for example, it also prevented Google and other search engines from showing its content in search results. If people want to know what’s on their sites, they have to visit and pay. The same would apply to membership sites. If you’re keeping a part of your content behind a paywall for premium members, then you might want to restrict access to that content as much as possible.
.jpeg)