Dessert #11 - Welcome back, Friendly URL's
Posted on 18/9/06 by Felix Geisendörfer
Not too long ago you've all heard me saying Bye, bye Friendly URL’s. Now, after a lot of feedback and some more thoughts on it I came to what I find a nice compromise between REST'ness, SEO'ness, and FRIENDLY'ness.
For those of you who didn't read the old post, the basic problem was that it's very nice and convenient to have full control over your url's. So instead of having /posts/view/1 you might prefer /a-cool-thing or /post/a-cool-thing or something else, more friendly to both human's and search engines. However, the problem you get, is that you easily break bookmarks & links to your sites when you change the title of the post. And yet another issue is that this method requires all title fields to be unique. Now you can work around all of those, but what I decided in the Bye, bye Friendly URL’s-article was that it is too much of a hassle and you loose several advantages like the ease of building web api's.
So for my current project I started to rethink the issue again and remembered the idea brought up by tomo (and Peter Goodman). Instead of using the post title to identify a post in the url, just use it as 'decoration' together with the real id. So for example you could use urls like this in your application: /post/1/my-cool-post. Your action will only check the first parameter (1) in order to identify the post, so if the title ever changes, the link will still work, or to say it differently /post/1* will all lead to the same page.
What I didn't like about this structure was that for both, search engines, and people it could look like "my-cool-post" is a sub-item of /post/1/ which isn't the case. So a couple days ago I came up with a syntax I really like a lot:
Maybe it's just personal preference. But to me it has all the good stuff:
- When the title changes, all other urls (/post/1:*) will remain usable
- The hierarchy is correct: "my-post-title" is the "1"st "post"
- It's REST'ful: /post/1 instead of /posts/view/1 (which is RPC)
- ... and, I think it just looks very pretty ; )
So however you may want to structure your url's, here is how to accomblish this stuff in CakePHP. The first thing is a route that will redirect your /post/* url's to PostsController::show();. My example is from CakePHP 1.2's Router, however it should work just the same with the old one:
In your PostsController::show() action you'll just need a tiny bit of logic to split the id and post title (which I call url_suffix since I have an additional field for it in my project to gain more control over it):
@list($id, $url_suffix) = preg_split('/[:]/', $id, 2);
$post = $this->Post->findById($id);
Ok and for those of you who want to get even fancier here is an improved version:
@list($id, $url_suffix) = preg_split('/[:]/', $id, 2);
$post = $this->Post->findById($id);
The difference between this function and the first one is that the function checks if the $url_prefix that was in the user request is the same one as currently assigned to our post. In case it has changed (and our user is using an old bookmark for example), we redirect him to a page with the proper url_suffix attached to the url. This is not all that important for normal users, but it's very efficent for telling search engines that your page has moved.
Oh and I'll post a function that'll convert "This is my post" type of titles into url ones like "this-is-my-post". It's not all that difficult, but I want to make sure it takes care of a lot of eventualities (Umlaute/Utf8 chars, double spaces, ...).
--Felix Geisendörfer aka the_undefined
You can skip to the end and add a comment.
Madarco: I think the redirect solves the problem you mentioned pretty good. Because even so the link will work, the url in the adress bar of the user's client will change after clicking on it. At least that's good enough for my needs.
Sweet. I like it. It is pretty and makes sense BUT how do machines like it? I mean having a colon in the URL?? Hmm... Aren't colons reserved for declaring a port number? Just a thought. Anyone know anything more about this?
I think you have a little typo on the second version of the function.
you set $post to the database query but then you use the $page variable...
Am I wrong or missed something ?
Gustavo: You are correct, I messed that up. The Controller I use this stuff for in my project is called PagesController, but since CakePHP comes with a "PagesController" as well I changed the name for my example so people (especially newbies) wouldn't get confused. Well I messed that up, didn't I ; )? Anyway, it's corrected now.
Ben: Well that question cannot be answered. Google will not tell you how their algorythm works, neither will Yahoo or any of the other big "machines" we usally care about. Anyway, I've seen applications using the column before in Url's, like DokuWiki does, and so I think it should be alright.
I like it felix. I'll call on some SEO specialists and check on that colon thing. I'm slightly concerned about its legality, though.
For now, its gone straight into my site. :-D
you might also want to consider running the corrected slug through urlencode();
$this->redirect( '/post/' . $post['Post']['id'] . ':'
. urlencode( $post['Post']['url_suffix'] ), 301 );
Hey Cheekysoft, let me know what those seo guys think about it, that would be cool to know.
About the url_encode: I use a function to create the url_suffix which already uses url_encode as the last filter, so I don't need it at that point of my code, but other then that, I agree. Oh and the reason I use the term of suffix instead of slug is that I consider it more descriptive, the first time I read slug in wordpress I was pretty confused about it's meaning ... ; ).
It would be nice to put the urlencode() in their as i expect people (as I :-) ) will copy and paste the example code.
Mail has been sent to the SEO people asking for their assesment of the approach. I'll let you know what they have say when I get the reply.
Cheekysoft: Alright, let me know when you get reply : ).
More of a "real world" example:
Brandon: I like having the id first and the look it creates. To me:
looks better then:
I also think having id:title makes more sense to somebody looking at it, instead of appending this seemingly unrelated number at the end. But search engines cetainly want the meat first, but to me they come 2nd after humans.
Brandon: Feel free to use whatever formatting you like. I personally prefer the id:title style as it seems more human friendly to me. However, you version is probably nicer for search engines. As long as you stay consistent abut it, there shouldn't be any issue ; ):
[...] As those of you who run a WordPress install probably already know, WP has a nice feature that converts the title of any post one writes into a more url friendly version, a so called post slug. The method it uses is pretty simple: lowercase everything, replace whitespaces with hyphens and convert non url friendly characters into ones that are. Now as I already mentioned in a post a while back, I'm using pretty url's that are RESTful these days. So in the early phase of the app that I'm finishing up right now, I simply had a field called URL Slug where I had to enter this url suffix manually. But since neither I, nor the client this app will ship to are into filling in this field all the times, I made it optional and created a WP-like function for creating the url slug from the title if the field was left blank by the user. [...]
This doesn't seem to be working in the latest build of CakePHP 1.2.
Aaron: Yeah, that's because named parameters use the colon too. If you don't want this you can put:
var $namedArgs = false;
In your AppController. I believe it also takes an array of actions where you can specify which ones use named args and which don't. Give it a try.
Another great article from ThinkingPHP
I am writing a financial system and I've struggled on how I want to do the URLs.
/payments/add seems obvious
but for editing a payment, I like:
I wonder if I will see any problems doing that because I also want to release an API in the future.
cbmeeks: That is router question and I think the 1.2 router in CakePHP can handle it but would need to do some own research on it. In 1.1 you can use the magic $from_url hack I came up with a long time ago:
$from_url = preg_replace('/([^\/])\/[^\/])\/[^\/])\//', '\\1/\\3/\\2' $from_url);
The regex above is off the cuff and probably not quite right / complete but should give you an idea how it works. If you decide to use it put it anywhere in routes.php.
[...] Important: A while after publishing this post I came up with a solution to the problem which I’ve documented in this post. “aas” [...]
[...] If you've been reading my blog for a while then you might know that I'm in love with a certain url pattern. For those who don't here is the synopsis. Instead of /posts/view/5 I like my urls to look like /posts/5:my-post-title. I don't want go into the many advantages of those urls right now (this will be a separate post). But instead I want to show you how they can be accomplished using the 1.2 router without any custom hacking: PLAIN TEXT PHP: [...]
But, isn't a colon in a url used to denote a port?
This seems like a misguided approach. For one thing, there is no reason to regenerate the slug if you change the title after a post is created. Secondly, its very easy to avoid collisions.
If you still want the id, then suffix or prefix it to the slug. I agree that making it a URL segment is a bad idea, it implies hierarchy. But the colon is messy, and violates the spec.
I think suffixing the id is the way to go:
jo Andrew: My colon isn't messy nor violating anybody. Those are serious accusation man ; ).
I'd say the least you can do in order to get the forgiveness of my colon is study RFC 1738 and RFC 3986. They have some cool stuff to say on legit chars for 'path' components of an URI.
Now on to your accusations regarding the concept:
* The reason I change titles of my post is because I am an idiot
* I usually have typos or bad mistakes in my initial titles
* I sometimes don't notice for days at a time, after people already linked to the wrong URL
* Thus changing the url along with the post title makes me look less like an idiot because I can hide my traces
And as far as the position of the id goes:
* Semantics matter, even in urls. Or do you write a numbered list like 'Start your IDE 1.', 'Type in foo() 2.', etc.?
* Avoid confusion: cakephp_tutorial_57 < - is that the 57th tutorial on CakePHP? '57:cakephp-tutorial' is much clearer.
Last but not least: Stop being one of them people who talk standards and specs and shit if you a) don't bother to read them and b) aren't willing to violate specs in order to serve the greater good.
Anyway, next time use a real e-mail address, usually I make comments like these in private ; ).
I didn't realize that my comments could be easily construed as mean-spirited, please accept my apologies. I've had to maintain an application that used colons in urls, and it was a PITA, perhaps some of that annoyance affected my remarks.
I don't think I'm a blind adherent to standards, and I agree that sometimes they should be violated.
I also agree that I was wrong to suggest suffixing, and I'll now say I think it should be prefixed.
I still don't think colons should be in URLs. I feel like they have specific meanings, none of which really relates to this kind of usage. I also don't think IIS will serve them, and I think there are some other common tools that won't work with them. I should not have said they violate either RFC, they are not specifically prohibited.
I wonder if an equals sign would be OK:
Finally, if you don't like anonymous comments on your blog, that's fine, delete mine.
Andrew: Hey no worries, I just felt like going on a little rant and your comment came just in time ; ).
Anyway, I'll continue to fight for the char rights of my little colon. RFC 3986 makes mention of the colon being used to denote a 'this:that' situation. Which I think is exactly what I'm doing '<id>:<title>' essentially means this id belongs to that title. Also, both RFCs state that colon can be a reserved character which is what I'm using it for here. Btw. CakePHP also uses the colon for named parameters and I haven't heard of any problems with this so far. Also IIS is not on my list of considerations, I have control over the server side. Its a different story with IE on the client side : /.
The '=' is probably a legitimate consideration, at this point it really comes down to personal preference. I like my colon and if others don't thats ok. But please don't call him messy and stuff, he doesn't like it ; p.
Regarding anonymous comments: I can live with them. But since the e-mail addresses aren't shown I don't really see what you are trying to achieve, especially since you used the same IP for both posts which indicates that you are only half-serious about hiding your real identity ; ).
Anyway, don't worry - the colon police is busy enough fighting evil products made by microsoft these days so there is little chance they'll try to catch you at any point soon ; p.
PS: I accept and appreciate the apologies, especially considering my semi-provoking response to your comment, keep the good attitude!
I am stuck with a similar problem.
Users can upload files alright, but when it comes to displaying or downloading, I get into same controller issues due to the way CakePhp parses the URLs.
I would like the end user be able to download or display, text file or image when clicking on a HREF hyperlink.
Can your snippet help me with this?
Just a quick comment here for the folk who are worried about a colon (:) denoting a port, and therefore being bad to use in the URL...
As far as I've always been able to tell, IF you are using the colon to denote a port, it would be used at the end of the domain name (example - http://debuggable.com:80/) and therefore should not be a worry when using it further to the right (in the path component of the URL).
This is one of those cases where position matters. If it's not right there at the end of the domain name, then it's not a port. It's something else...
AFAIK, there's not a web browser in existence that sees it any other way.
(I'm not 101% sure about this, but I am fairly certain this is the way it works. 99.9% sure mebbe? Correct me if I'm wrong. I'm only human after all.)
One more quick comment. I did not expect that my "example" link would be converted into a live URL, else I would have placed spaces around the parenthesis so that it would lead to the correct page rather than the 404 error page. Please feel free to edit it and/or delete this comment. :)
I think that the same thing has been implemented with joomla1.5. At first I wonder what is the use of the id's before the actual title aliases: like, www.somedomain.com/40-article-name
and i browse through sites how joomla implements this kind of sef urls and why. thanx to your posts, somehow, i comprehend why things were implemented that way..
After contemplating this SEO friendly URLs thing for a day, I was about to come back and ask the question "How do the search engines link to this properly if the article slug part is not actually used to look up the article?" but then I got the idea to just check on it myself. Here's what I've discovered...
I did not test other search engines yet, because I do not know exactly how, but I DO know how to easily test using Google. You use the "site:www.sitetosearch.com" query syntax (and so that is what I did). I did a search for "site:debuggable.com friendly url" and what I found is that the Google search robot DOES index the entire URL, including the UID (unique ID) after the colon (:), and so (at least for Google) this type of URL scheme is not at all a worry or an issue.
Knowing this, I will be using this scheme for anything I code which requires it (blogs, wikis, etc.) and I thank you guys for thinking about it and discussing it with enough people to work out the kinks. :)
P.S.: I would maybe not use it until testing other search engines, except these days Google holds such a huge share of the search engine market that they are pretty much the only one I am personally concerned with being properly listed in. The rest of them can just list the top-level of my sites and static pages, and I'll be happy (though of course it'll be nice if other search engines also index the site like Google does). ;)
Great post! I have a question though, what do you do to reflect heirarchy in this case?
For example: www.example.com/post/1:how-to-make-soup
That's fine, but what if you have sub-pages/posts?
For example: www.example.com/post/1:how-to-make-soup/7:tomato-soup
Would you do it that way? If so... how do you configure CakePHP to make that work?
If I look at the URLs on that site I can see that you changed your mind because now your IDs come after your title. If I am right you also wrote an article about that. Maybe you could link both articles.
hmm interesting post by the way :)
Thanks for rss comments haha.
This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.
Hi, I use this structure too :)
and I've found that in wordpress it can be achieved by simply setting the permalinks to:
wordpress will ignore the postname and use only the id.
I use it, but I wonder if this can be a problem: for example someone can make the link yourblog.com/1/F**CK appear as a page of your blog