
Search Engine Optimisation is the “black art” of ensuring that your web site ranks highly in any search results returned to queries that relate directly to your specific area of interest. There are a number of factors to consider when optimising your web site: from the domain used, to its accessibility to search engine “robots”. In this article I shall examine what I have learned to be useful when configuring the Drupal for Search Engine Optimisation. Drupal (http://drupal.org) is the free, open source Content Management System (CMS) that I selected to use for the Brightpoint GB Blog (http://blog.brightpointuk.co.uk)
Note – in this article I shall focus specifically on how to configure the Drupal platform, but many of the concepts detailed here apply to SEO in general.
The specific areas that I shall look at include:
You will notice that “content” does not appear even in the top 5: having a well-constructed web site is at least as important as the actual content itself, strange though that may seem.
If you wish to target the UK market, then your web site ideally have a .co.uk domain.
If possible, you should register both the .com and .co.uk domains for your chosen web site name (to cater for users entering the domain incorrectly) and point both domains to the same site. NOTE – the .com domain should redirect to the .co.uk seamlessly, and the redirect should be created with a “301 – permanent” result so that search engines know the redirect is permanent and to only index the .co.uk site.
Ideally the web site itself should be hosted in the UK with an external IP address known to reside within the UK.
The structure of the addressing scheme used by the site is important, by that I mean the URLs that individual articles on your site are assigned. By default, Drupal will create links to articles including the characters "/?q=" in the URL. This can confuse some search engines. And should be removed. Drupal can be configured to use what are called “Clean URLs” which do not contain the offending characters, however to enable this feature, a change must be made to the default Apache web server configuration file on the host Linux operating system. To do this, locate the file “/etc/httpd/conf/httpd.conf” and edit it. Locate the section beginning:
# AllowOverride controls what directives may be placed in .htaccess files.
# It can be "All", "None", or any combination of the keywords:
# Options FileInfo AuthConfig Limit
#
AllowOverride None
and change the AllowOverride value from 'None' to 'All'. Save the file and restart the httpd service.
Within the Drupal admin interface, browse to Administer --> Site Configuration --> Clean URLs:

This will now create Clean URLs for all articles. However, the addresses created will all be in the form “/node1”, “/node2”, etc. For SEO purposes, it is better that URLs be generated automatically based on article keyword content rather than their entry in the underlying MySQL database, giving “/nokia-e75-exchange-email-setup”, for example.
To achieve this, “URL Aliases” must be configured within Drupal.
Within the Drupal admin interface, browse to Administer --> Site Building --> Modules. Ensure that the “Path” module is enabled.
Once enabled, a new admin area will be listed, browse to Administer --> Site Building --> URL Aliases:

Here the administrator can generate aliases for articles, replacing the default “/node” address with a more suitable one, providing useful keywords for search engines to index and display in the search results.
This is a manual process, however. To have URLs created automatically when the article is created, an optional third party Drupal module must be downloaded and installed, called ‘PathAuto’, available from the Drupal web site (http://drupal.org/project/pathauto)
To install additional Drupal modules, extract the contents of the archive file that you have downloaded, and save the extracted folder to /var/www/html/sites/all/modules/ on the Drupal server. Note – you will have to create the modules folder if none is listed.
Once saved, within the administration interface, browse to Administer --> Site Building --> Modules. The new module will be listed, simply tick it to enable it and save the new configuration.
Once enabled, any articles that are created subsequently will have a URL aliases created automatically based on the site title:

Aliasing can be configured manually when the article is created if desired by unchecking the option to alias automatically, as shown above.
Global Alias settings can be edited within Administer --> Site Building --> URL Aliases --> Automated Alias Settings.
NOTE – this is not a retroactive process; any articles created prior to installing PathAuto will need to have aliases created manually.
IMPORTANT – using URL aliases can effectively make the site appear to search engines as having 2 copies of the same article. Having duplicate content on a site can cause search engines to penalise that site, or rank it less highly.
As well as creating an alias for a site, a URL redirect should be created indicating that “/node1” has been redirected permanently (ie, a 301 permanent redirect query result) to “/nokia-e75-exchange-email-setup”. This will keep the search engine happy.
An optional Dupal module called ‘Global Redirect’ can be installed to easily manage redirects (http://drupal.org/project/globalredirect), automatically and retroactively.
Should you alter an alias at a later date, but want the old alias to still be available, or remove an article and wish any requests for it to be directed to an alternate article, then a URL redirect should be created manually.
These are all elements of a page that are used by search engines to index it. By default the Meta Tags applied to the Drupal web site are propagated to all pages on the site. This can result in pages having tags applied to them that are not relevant, and the automatic allocation of tags can cause search engines to penalise web sites.
An optional Drupal module, “MetaTags” can be installed that allows for individual pages to be assigned their own tags and description at the time of creation (http://drupal.org/project/nodewords):

Another optional module, PageTitle (http://drupal.org/project/page_title), allows for unique titles to be assigned to pages, distinct from the title of the article itself, ie the content of the HEAD TITLE tag pair that will be displayed in any returned search engine results:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
An optional Drupal module, XML Sitemaps (http://drupal.org/project/xmlsitemap), can be downloaded and installed to create a sitemap automatically, define how often the site changes and should be re-indexed, can detail pages that should be excluded from indexing and can be submitted automatically to Google, Microsoft, Yahoo, Ask and Moreover:

Search engine rankings will be higher if the source code of the site complies to the W3C standards for HTML, XHTML and CSS.
The site can be checked for errors online automatically using the W3C validator tools (http://validator.w3.org/).
Ideally you should see no errors:

Enabling you, should you so desire, add an image to your site indicating that the site has passed:

There is an optional Drupal module, HTML Purifier, that can automatically re-render HTML before it is served to clients and search engines (http://drupal.org/project/htmlpurifier), however in my experience this module adds an unacceptable performance lag on the web server. Instead, the effort should be taken to ensure that the original code is correct when submitted.
The text of the web site should be well written. By that I mean grammatically correct, free from spelling mistakes. Although search engines cannot necessarily distinguish the content of web pages, your readers can and won’t take any site seriously that can’t spell. Use spelling and grammar checkers when creating articles.
Articles should naturally also be relevant to your target audience, factually correct and ‘fun’ to read.
All inbound and outbound links to and from your site should be live and available. Linking to expired content from your pages can cause you to be penalised by search engines.
The W3C Link Checker can automatically verify all links on your site, both internally between pages and externally to other sites. It can also report on the status of any URL redirects configured on your site and highlight any errors or warning (http://validator.w3.org/checklink)
Web analysis software can provide useful information on who is accessing your site, where from, using what platform, at what time and how they get there (which search engine they were referred by and what they typed in to get to you).
Google Analytics is a free service that once registered for, can report on your site simply by adding a short code snippet to all pages on your site that you want to be monitored. Drupal can be configured automatically to add the code to all served pages using the Google Analytics module (http://drupal.org/project/google_analytics)
Should you want to exercise more control over your analysis than Google provides, you can install your own web analysis server using the free open source Piwik application (http://www.piwik.org). This provides detailed information on all search engines and users, down to what browser they are using and at what screen resolution!

Your web site should have a search utility, which should be prominently-placed on the site, to encourage users who come across the site to stay on it longer and browse internally.
Drupal has a search feature built in, and can be configured to only index words over a certain length, and to automatically re-index upon new content submission.
One optional module available for Drupal is “Search 404” (http://drupal.org/project/search404). What this does is to replace the default 404 error page on the site (ie “Page Not Found”) with the site’s search page, so that should user’s mis-type a URL, they can enter their desired search phrase there and then without having to press back or re-type the URL in the address bar.
Another module available for Drupal is PorterStemmer (http://drupal.org/project/porterstemmer), which once installed renders all variants of the same word equivalent, so that searching for ‘blog’, ‘blogs’ ,‘blogging’ and ‘blogger’ all return the same search results – widening the available search results and also accounting for users’ poor choice of search phrase.
Finally, some of the above modules can place some additional overhead on the PHP engine on the web server, resulting in reduced performance. By default, the amount of memory allocated to PHP is limited to 8MB (PHP4) and 16MB (PHP5). This allocation can be increased by editing the web server’s PHP configuration file.
Locate the file “/etc/php.ini” and edit it.
Depending on your PHP version, do a search for either “8M” or “16M”, and replace it with a more suitable memory allocation – 64M should be sufficient.
Taxonomy is the practice of assigning individual keywords to articles, distinct from and not to be confused with Meta Tags. This feature allows articles to be assigned distinct “keywords” which can be searched within the site by visitors for other articles containing those same keywords, for the purpose of offering “if you liked this, you may also like”-style GUI functionality. This feature is not native to Drupal but I have written a small script which can provide this functionality. This sort of feature is essential for ensuring that users stay on your site even if they have arrived there by mistake.