Valentin Agachi

freelance web developer

request quote!

Rewriting dynamic URLs into friendly URLs

Let's suppose you are building a small database driven website or web-based application which you need search engine covery for. And this isn't the only reason why you would choose to create a web application with search engine friendly URLs. Another important aspect is the usability of this feature. To achieve this, you could use Apache's mod-rewrite module. This would be one way. Another way would be to implement a sort of URL rewrite engine into your web application's engine.

Reasons

Two main reasons should drive you towards rewriting your URLs into friendly URLs. The most important one is so search engines can index your pages, because most search engines do not index dynamic URLs that contain question mark (?) or equal sign (=). The other main reason is usability. By creating friendly URLs for a site you make it usable. If you care to read more about the usability of this case, you can read Adaptive Path's User-Centered URL Design or Adam Baker's theory on How to make URLs user-friendly.

Update: Someone pointed out in the comments that the scripts I wrote initially were flawed. After a few glances over the code, it seems that they were flawed. I fixed the scripts here in the post, so go ahead and look over them again. Also I have prepared a live example of this method to see it in action. Also the files in the example are available for download (ZIP file, 2 Kb).

Implementation

Although the idea of rewriting URLs is now old, and a MUST have in a dynamic web site or web-based application, I still see sites using the index.php?page=X URL structure.

I think every web-based application or site should use a global loading and unloading script. By that I mean that every script that is called by the browser should call the loading script at the beginning and the unloading script at the end. This way you wrap your scripts with a customizable set of operations both at the start and at the end of them.

Using $_SERVER['PATH_INFO'] is the method I rely on when creating friendly URLs. When a request is made to a PHP-enabled server, PHP fills the $_SERVER global array with several variables regarding the server's enviroment and the request made. Among these variables the PATH_INFO one contains a certain part of the URL requested that is between the actual URL path and the query string of the URL. Let me explain with some examples. Let's take this URL for example:

http://www.example.com/path/to/script.php?var1=value1&var2=value2

For this request, the URL path would be /path/to/script.php and the query string ?var1=value1&var2=value2. Note that there is not any information that would go in the PATH_INFO variable. But if you take this next example:

http://www.example.com/path/to/script.php/foo/bar/?var1=value1&var2=value2

For this request, the URL path and query string are the same as the above, but also the PATH_INFO variable contains /foo/bar/.

Planning our URLs

Now that you figured out what would come in your help. Now decide how are you going make your new URLs. Suppose you have this kind of URLs:

http://www.example.com/categories.php?cat_id=3
http://www.example.com/articles.php?art_id=15&page=2

You would want to try and convert them to something like this:

http://www.example.com/categories.php/cat_id/3/
http://www.example.com/articles.php/art_id/15/page/2/

But wait, that is not enough, because many search engines would still not index you properly. The problem is with the .php extension in the URL followed by /. So what you need to do is transform them into the following:

http://www.example.com/categories/cat_id/3/
http://www.example.com/articles/art_id/15/page/2/

To do this, you need to copy or move the categories.php and articles.php scripts to categories and articles. Now you have to tell the server that those 2 files need to be parsed by PHP. You can do this by creating a .htaccess file in your root directory (or in the same directory you want those kind of files to be parsed by PHP). Then you write in that file:

<Files ~ "categories|articles">
ForceType application/x-httpd-php
</Files>

Ok, now those 2 files are now parsed by PHP. Give it a try. Try to access http://www.example.com/categories?cat_id=3. It will have the same effect as trying to access categories.php?cat_id=3.

Parsing the URL

How you parse the URL (that is the data contained in $_SERVER['PATH_INFO']) is the most important part. For this you must put the method I am going to present below in the global loading script.

if (isset($_SERVER['PATH_INFO'])) {
	$url = substr($_SERVER['PATH_INFO'], 1);
	$urlParts = explode('/', $url);
	if ($urlParts[count($urlParts) - 1] == '') 
		array_pop($urlParts); 

	$urlPartsCount = count($urlParts);
	if ($urlPartsCount % 2 != 0) { 
		$urlPartsCount++; 
	}
	for ($i = 0; $i < $urlPartsCount; $i += 2) { 
		$_GET[$urlParts[$i]] = $urlParts[$i + 1];
	}
}

The next level

Now you have a functional PHP-based friendly URL interpreting engine. You have two choices now: either you hard code every URL in your links to the new pattern (example: /articles/art_id/15/page/2/), either you leave every link intact and build a PHP-based URL rewrite engine. Since this post is named as such, I will continue describing how to achieve this.

It is done by simply by using the preg_replace_callback function. You feed it an array of URL patterns, which you'll have to construct based on every URL in your application you wish to convert to friendly URL, and a replace callback function. First the URL patterns array:

$urlPatterns = array(
	'~'.preg_quote(BASE_DIR).'([^\.]+)\.php(\?([0-9a-zA-Z]+[^#"\']*))?~i',
);

The BASE_DIR constant I used in this example is the base dir of your application, it can be either / or the entire http://www.example.com/ depending on how you wrote your links in the application.

Next step is that you need to catch all the contents of one request before sending it to the browser. You can do this with the help of the output buffering functions builtin the PHP. For more on output buffering read Output Control Functions on PHP.net. First we insert in the global loading script:

ob_start();

And in the global unloading script:

$pageContents = ob_get_contents();
ob_end_clean();
echo preg_replace_callback($urlPatterns,'urlRewriteCallback',$pageContents);

The final task is creating the callback function to handle the replacement of the URLs.

function urlRewriteCallback($match) {
	$extra = '';
	if ($match[3]) {
		$params = explode('&', $match[3]);
		if ($params[0] == '') array_shift($params);
		foreach ($params as $param) {
			$paramEx = explode('=', $param);
			$extra .= $paramEx[0].'/'.$paramEx[1].'/';
		}
	}
	return BASE_DIR.$match[1].'/'.$extra;
}

Now you should have a ready-to-work PHP-based URL rewriting engine.

Thinking ahead

Of course you souldn't stop here. You can do a lot of more things to imporve the URLs.

For example you could drop the variables names at a first glance. But you should map all the variables you've got in a request to certain variables in your application. That way you get:

http://www.example.com/categories/3/
http://www.example.com/articles/15/2/

For this to work you must modify the parsing of the PATH_INFO in the loading script. Using a switch on the basename($_SERVER['SCRIPT_NAME']) you can map the URL data into different variables your application uses.

And another good thing to do is to lose the number ids and to replace them with words. For exmaple instead of 3 put the title of the category Web development. This way you will have even more friendly URLs:

http://www.example.com/categories/web-development/
http://www.example.com/articles/rewriting-dynamic-urls/2/

I hope this post helps everyone who reads it. And let's pray together for a web made of pretty URLs. Any comments on my method would be appreciated.

Comments

at 20:03 on 26/Oct/2005

dantefoxfox

Comment by dantefoxfox

  • global loading -->at the start of the page?
  • unloading script --> at the end of the page?
  • It is to put in a separate file?
  • All the code you show in this post where is to put?
  • Please can you tell me more about to lose the number and replace the words, how can I do this? Please be more specific.

Thanks

at 19:06 on 03/Nov/2005

Valentin A.

Comment by Valentin A.

  • 1, 2 - corect
  • 3, 4 - read the paragraphs before the codes, it is specified.
  • 5 - simply instead of creating a key on a numeric in a database table, create a unique key on a text field.

at 21:28 on 05/Nov/2005

Chris

Comment by Chris

Hi Valentin,

this is a really nice howto ;-) Nonetheless i am getting everytime i try it out the following error:

PHP Warning: preg_replace_callback() [function.preg-replace-callback]: requires argument 2, 'urlRewriteCallback', to be a valid callback in contentm.php on line 221

So would it be possible for you if u could send me a working example of your code als zip file?

Thank you very much

Ciao
Chris ;-)

at 15:27 on 09/Nov/2005

dantefoxfox

Comment by dantefoxfox

Excuse me, this script where is to put?
# # $urlPatterns = array(
# '~href="'.preg_quote(BASE_DIR).'([^\.]+).php(\?([0-9a-zA-Z]+[^"\']*))?~',
# );
# ?>

at 15:48 on 09/Nov/2005

dantefoxfox

Comment by dantefoxfox

Please can you show some example?
I don't understand what it is to do with those 5 scripts.
1. the first script (16 lines) it is to include to all the pages?
2. the third (3 lines) it is to put in the first script[global loading script]?
3. the fouth (5 lines) it is to include in the end of the page?
4. The five scripts are there one script or we must saved as each one?
Please make an example.
Please help me...

at 21:36 on 27/Nov/2005

Valentin A.

Comment by Valentin A.

If you have troubles with the scripts, try to do the following: you must define the replacing callback function urlRewriteCallback() before the following line

echo preg_replace_callback($urlOld, 'urlRewriteCallback', $pageContents);

at 02:38 on 06/Dec/2005

Dominik

Comment by Dominik

thx Valentin. great stuff!
but could you please put a little downloadable sample together so we can see how things work exactly?

anyway, i appreciate your work!

at 06:37 on 11/Dec/2005

Dominik

Comment by Dominik

cool!

at 22:52 on 06/Jan/2006

Wladston

Comment by Wladston

I've seen this solution, to avoid using an extension less file:

Options +MultiViews
DirectoryIndex index index.php index.html

MultiViews get the request to "index" and try to load "index.*" if "index" is not found.

at 00:47 on 28/Jan/2006

Gold_Hunter

Comment by Gold_Hunter

I use this for my site. very nice script

Thank You

at 06:08 on 21/Feb/2006

Andres Santos

Comment by Andres Santos

What if i have

www.blablabla.com/products/snickers
and i want to redirect it to
www.blablabla.com/?section_id=8

and i also have
www.blablabla.com/products/
and i want to redirect it to
www.blablabla.com/?section_id=4

?

at 17:28 on 05/Apr/2006

HYIP monitor

Comment by HYIP monitor

Good script. Use for my sites

at 07:40 on 28/Apr/2006

adam

Comment by adam

mmmmm goood but have i to rebuild my sitemap.xml , and does spider understand the new generated links , is this good for indexing

at 16:02 on 28/Apr/2006

Valentin A.

Comment by Valentin A.

@adam: Yes, you do have to rebuild your sitemap.xml file. But if you already have it dynamicly generated (which is a very good idea), you just add the url replace function at the bottom of the script. Thus it will treat the sitemap file as a normal file and will replace all the URLs into to friendly ones.

This, the whole technique, is especially good for indexing.

at 18:03 on 16/May/2006

Gui

Comment by Gui

I want to drop the variable names like you described in your 'thinking ahead' section, but I am having trouble with my switch statement. Could you provide an example of how to do this? Great scripts by the way.

at 00:28 on 19/Jun/2006

feha

Comment by feha

Why this does not work on IIS5 (Windoze)servers?

at 12:02 on 21/Jun/2006

neo74

Comment by neo74

This script is useful! I like use the script, but not working in my localhost server.
I use win xp and apache.
My error:

Not Found
The requested URL /categories/id/music/ was not found on this server.

Apache/1.3.34 Server at localhost Port 80

please hel me!

Thx!

at 19:57 on 21/Jun/2006

Valentin A.

Comment by Valentin A.

There's a setting you need to set in the httpd.conf of Apache. Under the

<Directory "path/to/localhost">

tag, you should write

AllowOverride All

at 12:40 on 26/Feb/2007

Raj Kumar Santoshi

Comment by Raj Kumar Santoshi

we are using IBM WebSphere 6.0 Server for this website and for development JBOSS server 4.0. We have developed it in JSF/Hibernate/DB2. How we can use method for converting dynamic URL to static URL. Can you guide please. We have used Filter and now we want to build a wrapper class. But how the URL will be changed and vice-versa for response redirencting? Can any other method be helpful?
I will be very thankful to you.

at 17:06 on 14/Apr/2007

adam smith

Comment by adam smith

I want to drop the variable names like you described in your 'thinking ahead' section, but I am having trouble with my switch statement. Could you provide an example of how to do this? Great scripts by the way.

at 04:09 on 18/Apr/2007

Jim k

Comment by Jim k

Very good one. I get friendly urls but i get page error 404. I gave inserted this code to the .htaccess file, where test is my test.php Any idea why is happening?

<Files ~ "test">
ForceType application/x-httpd-php
</Files>

at 04:22 on 18/Apr/2007

Jim k

Comment by Jim k

Refering to the previous comment. I have set to AllowOverride All in the apche conf file

at 19:20 on 19/Apr/2007

George Sorof

Comment by George Sorof

I've seen this solution, to avoid using an extension less file.

at 18:06 on 16/Oct/2007

Scott L.

Comment by Scott L.

Valentin, I love this script but I have one strange problem. When I have the text "php" in my html without a dot "." before it then the "php" will render as a forward slash "/". It doesn't do this on my localhost, only on the web server. php.ini, httpd.conf, preg_replace_callback in url_deint.inc? Thanks in advance for your help.

at 10:43 on 17/Oct/2007

Valentin A.

Comment by Valentin A.

Thanks Scott for pointing that out. There was a missing "\" from the pattern. Check the new $urlPatterns in the post.

at 04:55 on 24/Dec/2008

Raksmey

Comment by Raksmey

Hi, I have a problem!
when I move, for example the categories.php scripts to categories then the categories become a down loadable file as I click on the link! but it work ok on my localhost not on the hosting site!
Thanks in advance for your help.

at 08:05 on 19/Mar/2009

Zeck

Comment by Zeck

Wow, very nice script.
Thanks very much,
Zeck

at 20:34 on 20/Mar/2009

kevin

Comment by kevin

Need some help..

suppose my link is

index.php?page=OngoingAnime

using your script (which is great).. is rewrites my url to

/index/page/OngoingAnime/

what i want is to remove the index/page ... so that it's only

http://www.mysite.com/OngoingAnime/

instead of http://www.mysite.com/index/page/OngoingAnime/

can you help me plz :(

at 11:21 on 25/Sep/2009

tonier

Comment by tonier

I've already used it with my projects. So far so good, except I haven't changed my page.php into page using httaccess rewrite trick.

Thanks for this article.

at 13:04 on 26/Dec/2009

mahesh

Comment by mahesh

Hi All,
I need help to rewrite my URL for better promotion.

My link is

http://www.domain.com/cat_sell.php?cid=1

where cid=1 is Agriculture

Now i want url like this

http://www.domain.com/category/Agriculture.html

Plz help.

Who's this guy?

Hello! I am Valentin Agachi, an ambitious web developer, and you are viewing my site and weblog.

If you care to find more details about me, you can check out my CV page, or you can contact me.

Valentin Agachi's profile on LinkedIn Download vCard

About this entry

You are reading a post entitled "Rewriting dynamic URLs into friendly URLs".

It was posted on 30 January 2005 and tagged with PHP, Search engines.

Technorati tags: PHP Search engines

Post this:

© 2003-2010 Valentin Agachi. All rights reserved.