- In Depth Magento Dispatch: Top Level Routers
- In Depth Magento Dispatch: Standard Router
- In Depth Magento Dispatch: Stock Routers
- In Depth Magento Dispatch: Rewrites
- In Depth Magento Dispatch: Advanced Rewrites
We’re in the middle of a series covering the various Magento sub-systems responsible for routing a URL to a particular code entry point. So far we’ve covered the general front controller architecture and four router objects that ship with the system, as well as the basics of Magento’s two request rewrite systems. Today we’ll be diving deeper into Magento’s database-based request rewrite system. You may want to bone up on parts one, two, three, and four before continuing.
Loading a Rewrite
Last time, we glossed over how a rewrite object would take the two path variables (one slashed, one non-slashed)
/foo/baz/bar.html
/foo/baz/bar.html/
and use them load a rewrite object’s data. Let’s de-gloss those details. If we look at the definition for loadByRequestPath
, (remembering that $path
is actually the array with our two string paths)
#File: app/code/core/Mage/Core/Model/Url/Rewrite.php
public function loadByRequestPath($path)
{
$this->setId(null);
$this->_getResource()->loadByRequestPath($this, $path);
$this->_afterLoad();
$this->setOrigData();
$this->_hasDataChanges = false;
return $this;
}
we see a pretty standard loading pattern. The code that ultimately does the loading is (as expected) in the resource model, in a method with the same name
$this->_getResource()->loadByRequestPath($this, $path);
The loadByRequestPath
method on the resource model ends up being a bit more complex that a standard model select SQL query, for reasons we’ll explore below.
Custom Query
The first step in loading our rewrite object is to query the database for any records that match our path values and match the current store id, (set earlier on the passed in rewrite $object
).
That’s what the first half of the loadByRequestPath
is for.
#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
public function loadByRequestPath(Mage_Core_Model_Url_Rewrite $object, $path)
{
if (!is_array($path)) {
$path = array($path);
}
$pathBind = array();
foreach ($path as $key => $url) {
$pathBind['path' . $key] = $url;
}
// Form select
$adapter = $this->_getReadAdapter();
$select = $adapter->select()
->from($this->getMainTable())
->where('request_path IN (:' . implode(', :', array_flip($pathBind)) . ')')
->where('store_id IN(?)', array(Mage_Core_Model_App::ADMIN_STORE_ID, (int)$object->getStoreId()));
$items = $adapter->fetchAll($select, $pathBind);
This is much simpler than it looks. The query we’re constructing uses the Zend Framework’s object oriented database syntax along with named parameters. If you’re not familiar with the concept, the code above ultimately generates a query string that looks like this (notice :path0
and :path1
)
SELECT `core_url_rewrite`.*
FROM `core_url_rewrite`
WHERE
(request_path IN (:path0, :path1))
AND
(store_id IN(0, 1))
Then, the select object and the $pathBind
array are passed to the fetchAll
method. The $pathBind
contains our two normalized path information strings, and looks something like this
array
'path0' => string 'electronics/cell-phones.html'
'path1' => string 'electronics/cell-phones.html/'
The values from $pathBind
are swapped in for the named paramaters in the query, using the array keys (path0, path1
) to swap in the named query parameter tokens (:path0, :path1
). These key names were generated at the beginning of the method with the following
#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
$pathBind = array();
foreach ($path as $key => $url) {
$pathBind['path' . $key] = $url;
}
A key was generated for each $url
, and used to create the $pathBind
array. Then, when creating the bind parameters for the query
->where('request_path IN (:' . implode(', :', array_flip($pathBind)) . ')')
we use the same array keys by flipping $pathBind
and imploding the results with :
.
Next, you may have noticed that in addition to searching for our two request_path
variables, we’re also searching for two store ids
(store_id IN(0, 1))
Magento allows you to create different rewrites for different stores, so that explains one of the store ids, but what about the other one? In addition to searching for a rewrite with the store id on the passed in rewrite object $object->getStoreId()
, Magento will also always search for items that match the admin store_id (typically 0
). This indicates that in addition to providing rewrites for the frontend application, the request rewrite system can provide rewrites for the admin console. It also creates the possibility, although rare, that a rewrite would apply to both the admin console and the frontend. The logic that resolves these conflicts is our next topic.
Resolving the Right Rewrite
This multiple store ids and multiple rewrite path situation creates a problem. Once our query runs and populates the result set $items
, we could have up to four different rows returned. After querying, we need a way of prioritizing which of these four items to use.
Magento solves this problem by implementing a complicated set of bitwise logic which implements a priority system, and then runs each row through the algorithm until one matches
#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
$mapPenalty = array_flip(array_values($path)); // we got mapping array(path => index), lower index - better
$currentPenalty = null;
$foundItem = null;
foreach ($items as $item) {
$penalty = $mapPenalty[$item['request_path']] << 1 + ($item['store_id'] ? 0 : 1);
if (!$foundItem || $currentPenalty > $penalty) {
$foundItem = $item;
$currentPenalty = $penalty;
if (!$currentPenalty) {
break; // Found best matching item with zero penalty, no reason to continue
}
}
}
If you’re not up for a quick primer on bitwise operations, feel free to skip ahead to the next section.
Bitwise operators are always an interesting trip in the land of non-compiled programming languages. They come from the early days of programming, and operate directly on the binary representation of a variable. For example, let’s consider the bitwise OR operator (|
)
$a = 5; //expressed as the binary 0101
$b = 3; //expressed as the binary 0011
$c = $a | $b;
echo $c;
The above bit of code will output the number 7
. How do you take 5
and 3
and get 7
? Take a look at the binary version of the numbers
0101 #binary 5
0011 #binary 3
-----
0111
A bitwise OR looks directly at each binary column (also known as a “bit” of memory) and will return a new binary number with the columns set true (or 1) where either of original number’s columns was set true.
In the above example, the first column of binary 5 is “1” (0101
), and the second column of binary 3 is also “1” (0011
). This means the first column of our results is a “1” (1 OR
1 == true). Follow this logic through for the remaining columns, and you arrive at 0111
.
The binary number 0111
translates to 7
in decimal, and that’s how 5 and 3 make 7.
In the early days of programming, when the performance of every instruction mattered, bitwise operators allowed programmers to come up with a number of clever tricks to save instructions for the then simple processors vs. more complicated algorithms to do regular math. Wikipedia has a good overview of the topic if you want to learn more.
If you’re using bitwise operators every day (i.e. you’re a C programmer or in school) and are binary inclined, their logic becomes relatively simple. If you’re not dealing with binary math on a regular basis, they become a giant headache. Additionally, in the higher level languages you often lose the performance improvements that are available with the lower level languages, or the performance increases are trivial enough not to warrant the added complexity to you and your team.
In Plain English (or, What Language do they Speak in What?)
That’s all we’re going to say on bitwise operators. If you’re so inclined, picking apart how this particular chunk of code works would be a useful exercise in bitwise shifts, (the <<
operator)
More directly useful is a plain english explanation of the priority levels, and which items Magento will pick over the others.
Restating the problem, we have a URL string in its natural state (matches the current request)
/electronics/cell-phones.html
/electronics/foo/
and a URL string in it’s unnatural state, with the trailing slashed removed or added (has or does not have a slash, depending on the natural state)
/electronics/cell-phones.html/
/electronics/foo
We also have a real store id, and the admin store id. This gives us four possible rewrites that could be returned: Natural with a Store ID, Natural with the Admin Store ID, Adulterated with a Store ID, Adulterated with the Admin Store ID
When loading a URL rewrite, Magento will favor, in order
- A URL in its Natural state, with the Admin Store ID
- A URL in its Natural state, with the Non-Admin Store ID
- A URL in its adulterated state, with the Admin Store ID set
- A URL in its adulterated state, with the Non-Admin Store ID set
This can be somewhat confusing to figure out and trust, as the algorithm used in the core code is is dependent on the order the rows are returned in, and there’s no ORDER BY for the query.
It’s a confusing and confounding state of affairs. Although this code is here to let you be lazy about your trailing slashes, I’d highly recommend you don’t add to the confusion with that laziness. If your team (or the previous team) hasn’t been able to manage that, then just keep the above ordering in mind, and don’t be afraid to drop a version of Rewrite.php
into your local code pool with some logging added to suss out why your rewrites aren’t being applied.
Where do Rewrite Objects Come From
So far we’ve treated the database rewrite system as a generalized, system level tool. However, throughout Magento’s life (either by design or engineering expedience), the database rewrite system started to become intimately, and inseparably, intertwined with the shopping cart application. We’ve already seen hints of this with the store id situation mentioned above, but things shift into overdrive when we consider category and product URLs.
The key problem is this. No store owner wants their product or category pages that have URLs which look like this
catalog/category/view/id/8
catalog/product/view/id/16/
They want the category and product name to be in the URL so search engines (i.e. Google) will drive more traffic to these pages. Google rewards sites that use semantic URLs, or put a different way, Google saw a pattern in the early web where sites that used semantic URLs tended to be more relevant, so those sites ranked higher in the search engines. It’s inevitable that rewrites would need to become a core part of the cart offering, and not a stand-alone system. From a system developer’s point of view, this presents a tricky problem because there’s no set list of categories or product names, which means they can’t be incorporated directly into the routing/MVC system. Here’s what Magento did.
When you create a category in the Magento admin console, one of the fields is named URL Key.
If a category has this field set, Magento will automatically create a rewrite for the category landing page, as well as a product URL based on the full category tree path. Product objects have their own URL key as well.
Additionally, if you change the URL Key for a category, not only will Magento create a new set of rewrites for you, it will also use the rewrite’s “option” field to create permanent HTTP 301 redirects from the old pages to the new, attempting to preserve any existing SEO juice. An HTTP status code of 301 is meant to indicate that a web page has moved somewhere else, and you should stop looking for it here. Sort of like a forwarding address. Without these redirects in place, Google treats the moved page as brand new and ranks it accordingly.
That’s what all those extra data properties on the rewrite
object are for
array(
'url_rewrite_id' => string '213' (length=3)
'store_id' => string '1' (length=1)
'category_id' => string '25' (length=2)
'product_id' => string '133' (length=3)
'id_path' => string 'product/133/25' (length=14)
'request_path' => string 'electronics/cameras/accessories/universal-camera-case.html' (length=58)
'target_path' => string 'catalog/product/view/id/133/category/25' (length=39)
'is_system' => string '1' (length=1)
'options' => null
'description' => null
);
The category_id
, product_id
, and id_path
properties are there so Magento can keep track of which rewrite applies to a particular category, product, or both. The is_system
property might be more accurately named is_canonical_rewrite_for_category_or_product_category_combo
. That is, is_system
, is a boolean flag that Magento sets to let itself know what rows are system level rewrites, created by Magento, and currently represent the “main” URL for a particular entity (as opposed to the redirection rewrites, which are also created by the system, but have their is_system
flag set to false).
This data is also used to drive the Admin Console’s rewrite UI at
Catalog -> URL Rewrite Management
as well as determine which URL should be used when a programmer uses the Category or Product object’s URL helper methods
Catalog Rewrite Generation Code
The raises the question of where the code for automatically generating these rewrites lives. You might think it lives in the category and product save methods (it doesn’t), or maybe in a post save event (wrong again). Magento’s automatically generated rewrites are managed by the indexing engine. If you browse to
System -> Index Management
you’ll see one of the index processors is named Catalog URL Rewrites. This is the process responsible for ensuring the database request rewrites are up to date, and reflect the information stored with a product or category object. (With apologies, the indexing system is an article series in and of itself, so we’ll be skipping over some of its nuances)
When you run the Catalog URL Rewrites index, it ultimately instantiates a catalog/indexer_url
model, and calls its reindexAll
method
#File: app/code/core/Mage/Catalog/Model/Indexer/Url.php
public function reindexAll()
{
Mage::getSingleton('catalog/url')->refreshRewrites();
}
As you can see, “indexing” the catalog urls simply means instantiating a catalog/url
model and calling its refreshRewrites
method. If we dive into that method
#File: app/code/core/Mage/Catalog/Model/Url.php
public function refreshRewrites($storeId = null)
{
if (is_null($storeId)) {
foreach ($this->getStores() as $store) {
$this->refreshRewrites($store->getId());
}
return $this;
}
$this->clearStoreInvalidRewrites($storeId);
$this->refreshCategoryRewrite($this->getStores($storeId)->getRootCategoryId(), $storeId, false);
$this->refreshProductRewrites($storeId);
$this->getResource()->clearCategoryProduct($storeId);
return $this;
}
We can see that, from a high level, the work of the system automatically creating rewrites is, for each store id,
- Clearing out old rewrites for deleted products and root categories (
clearStoreInvalidRewrites
) - Updating rewrites for category pages (
refreshCategoryRewrite
) - Updating rewrites for product pages (
refreshProductRewrites
) - Cleaning up rewrites for products no longer in a particular category (
clearCategoryProduct
)
The specifics of this is left as an exercise for the reader. The important take away here is Magento is constantly updating and refreshing this rewrite table on its own, which means your store may be generating numerous URLs behind the scenes without any explicit action by you. If you’re looking to seriously overhaul how Magento handles URLs, this is the place you’ll need to dig deep into.
Wrap Up
With that, we’ll need to end our journey into the Magento routing system. There’s plenty more here to explode (specifics of auto-generated rewrites, how the indexing system works), but there’s always more to explore with Magento. The way the routing system interacts with the rewrite system, which in turn is slowly being consumed by the needs of Magento’s SEO system, which in turn is managed by the indexing system is a perfect example of how Magento’s various sub-systems interlock into the full system, and how it’s often impossible to know one part of Magento without understanding five others.
Junior and intermediate level developers may know one or two Magento sub-systems well, but if you’re looking for true Magento mastery it’s better to understand how the systems interact, which will allow you to dive in and discover the correct solution to your particular problem in your particular installation.
I hope these five article have peeled back the cracks enough for you to feel more comfortable exploring the routing and rewrite systems on your own, as well as encourage you to develop your own best practices so you can avoid spending time down at this level debugging.
If this series has been in anyway useful I’d encourage you to browse through Pulse Storm’s Magento related products like the Commerce Bug debugging extension, or No Frills Magento Layout, the only layout book you’ll ever need.
If digital products don’t strike your fancy, please consider sending a few dollars my way to encourage future article writing
For those of you who’ve already purchase a product, or donated a few bucks, you have my gratitude. These articles and my work with Magento wouldn’t exist without you.