URL Rewriting Simplified - An academic view on URL Rewriting
Article Written by Pramod S Nair for www.wisdombay.com
This article explains the technical aspects of URL Rewriting. This is a starter guide for beginners who need know the basics of URL Rewriting.
What is URL Rewriting? - An explanation of the process called URL Rewriting.
URL Rewriting is the process of manipulating a URL or a link, which is send to a web server in such a way that the link is dynamically modified at the server to include additional parameters and information along with a server initiated redirection.The Rewriting Engine scans the URLs and makes the needed restructuring on the URL based on pre-defined conditions and rules. The web server performs all these manipulations on the fly so that the browser is kept out of the loop regarding the change made in URL and the redirection. So client is given a feeling that the content is fetched from the original location mentioned in the URL it has requested.
Let's illustrate this with a real life example.
Consider the 2 following URLs.
http://www.mywebsite.com/products.php?shprod=345 Both the above given URLs are with a query string and is a common scene in almost all dynamically driven websites. These links are supposed to show the user with the details of a product with an id of 345 or show products with a category name of "cars" from the website's database. This URL has many limitations, as the link is cryptic for humans, the links are not that much search engine friendly and is prone to easy manipulation on the query string side by client side interaction. So lets check out how these URLs can be written in a more overall friendly way.
Both the above given links can be called friendly URLs since they are devoid of cryptic parameters and are much more easy to the eyes of humans. On the server side these 2 clean URL's can be transformed back to application specific URLs by using various URL Rewriting methods. The rewriting module at server will covert those links back to their original form by creating the query strings and supplying them with values sent from the client through the clean URL and then initiating a redirect at server side to the new URL. One thing to note at this point is that the redirection that is initiated by the URL Rewrite module is totally different from the normal HTTP Redirects. When an HTTP Redirect happens the browser is made to fetch the new URL where as the Rewritten new URL is fetched without the knowledge of the client side. The transfer to the new URL happens behind the scenes as far as the request initiating browser is concerned.
I will explain how to do URL Rewriting on an Apache web server with the mod_rewrite module later as a follow up article.
Why URL Rewriting is used?
Let's now take a look at the various scenarios where URL Rewriting is useful.
1. Making URLs User Friendly
Consider a content management system, which is used as a platform to publish recipes done by you. In such a scenario normally there will only be a single server side page or script file armed with the task of serving the clients with recipe details based on an identifier supplied for each recipe. Links to access recipes will be like the one given below
These URLs have a major drawback! They are not friendly from a user's perspective. They are lacking usability since they are not short, they are not easy to memorize and they are not depicting the structure of our recipe library.
In such a scenario we can utilize URL Rewriting to bring a bit of user friendliness to our links.
2. Making Spiders Happy by providing Search Engines Friendly URLs
Think about our previous scenario about the CMS for our Recipe Collection. Even if we have a collection of 500 recipes there will only a single page, which is responsible for serving the recipes. This page serves the content dynamically and this causes some problems with Search Engine Optimization. Most of the search engines are not happy with the idea of indexing dynamic links with obscure characters in them and this can result in not proper indexing of a website. This means your website wont be indexed correctly by those search engines. Since most search engines give more importance to static links and index sites with static links faster we can make search engine friendly links for our dynamic website using URL Rewriting.
3. Keeping the Link Structure of a website Permanent and keep Link Rot in check
Often website restructuring can cause a lot of havoc with incoming links towards a website. They may cause links from external sources like other websites, bookmarks etc to point to non-existing resources. When there are links out there on the net pointing towards resources which are no longer where they were on your website, it results in broken links. This is called Link Rot and URL Rewriting can be used to prevent this.
4. Restricting Hot linking to keep bandwidth thieves at bay
URL Rewriting can be used to avoid Hot Linking towards contents hosted on your server by other websites. Hot linking is the use of a media object - usually but not restricted to image files - on a website by directly embedding the content on the page when the actual resource is residing at another server. This results in bandwidth wastage on the original server each time the website which is piggy backing the content is displayed.
We can utilize URL Rewriting to restrict hot linking by checking the HTTP Referrer before serving the content.
5. Preventing External parties from identifying the technology that is powering a website at a casual glance
Even though this cannot be a foolproof way to prevent a person from foot printing a website URL Rewriting prevents a person to identify the server based technologies which drive a site just from looking at a URL. The extension part of pages can be removed and they can be rewritten from the server side.
6. Considering Other Security Aspects
Links are made more secure and are made a bit more resilient against casual query string manipulation and other malicious injection techniques. URL Rewrite technique can be also used to make on the fly conditional checks before serving a page for various clients and these conditional checks can server multiple different outputs to clients based on environmental parameters like IP Address from which the request is originating, User Agent which is making the request etc. Further reference on URL Rewriting from a security view point can be read from Here
How can I implement URL Rewriting?
URL Rewriting can be implemented using various methods based on the web server. On IIS this can be done by creating an ISAPI filter or can be done using the HTTP handler - System.Web.HttpContext class available in ASP .NET. For Apache URL Rewriting can be achieved by using the very powerful but fear invoking mod_rewrite module.
I am planning a follow-up article in the form of a starters guide for using mod-rewrite module on Apache.