Tuesday, October 7, 2008

Micro-Optimize Your Time, not Your Code

I have to admit... I'm a bit of a thief. We have an internal blog at work for our developers to use, and I originally wrote the article for that blog -- before I started this blog. But of course, this information is equally useful for you as it is for them. And I wrote it, so whatever :)

16 OpCodes.

Unless you use the proper quoting and interpolation techniques, that's how many instructions it might take to turn your 3 line "Hello World" script into it's intermediate language.

13 Hours.

That's how much time would be spent in a year by a developer who spends "only" 3 minutes each work day refactoring old code (or even new code) to be "more efficient" by optimizing quoted strings, using a different function, etc to reduce opcodes.

A while ago, I passed around an article making fun of micro-optimizations to my co-workers. A few weeks later, a co-worker passed around an article about the inefficiencies of string parsing in PHP. Which article you take to the most probably had only to do with your previous thoughts on your. If you haven't read these two articles, I encourage you to do so before you continue:

  1. Better Benchmarks
  2. How long is a Peice of String

I'd like to show you that in the end, it doesn't really matterrrrrrrrrrr (for my Linkin Park fans). Given the article above, you would immediately think that you should never ever use a HEREDOC, because they're so inefficient even if no interpolation happens. But, You might be wrong.

The Test.

I wrote a test which takes a 40 paragraph ( 3727 words, 25409 bytes ) Lorem Ipsum, and assigns it to a varible 10,000 times. The Lorem Ipsum also has 65 seed variables inserted throught it, which would need to be interpolated. I did this for both Single quotes (interpolation using the dot concatenator) and with a HEREDOC (inline interpolation). Who won?

=== CONCAT RESULTS ===
Single Quotes: 2513.158082962ms
Heredoc: 387.84980773926
Difference: 2125.3082752228ms

Surprising? To be honest, it was to me too. Even though the Heredoc uses many many more opcodes to do it's work, in the real world it still performs faster on large datasets. 2 seconds would be significant, if it wasn't for the unrealistic scale of the test. Most of us will be performing that many operations on a dataset that large all in one operation.

Out of curiousity, I wanted to see how single quotes and heredocs matched up face-to-face, without any variable interpolation whatsoever. So I took the same 40 pargraph Lorem Ipsum, and assigned it to a variable 10,000 using single quotes and heredoc syntax.

=== NO CONCAT RESULTS ===
Single Quotes: 39.492130279541ms
Heredoc: 38.956880569458
Difference: 0.53524971008301ms

37 Minutes.

That's how much time I wasted writing the tests and this article. As you can see above, the time you wasted microptimizing would have been better served writing a more efficient database query, or drinking a beer.

I recommend the beer.

Sunday, October 5, 2008

Zend Framework: Content Types in Your Routes

I love information. Recently, I caught myself complaining because it took a friend more than 5 minutes to Google something we were arguing about while sitting in a hot tub. Five years ago, we would be lucky if we could even connect to the internet on our cell phones. Mmmm, information is delicious.

One of the things that makes a useful website useful is openness and usefulness of information. Whenever you provide some sort of interesting or unique information, anyone who wants it is going to get it. If you don't make it easy for them, they'll either go somewhere else or find a way to get it. Imagine if everyone who wanted a stock ticker on their site had to write their own screen scraper? Do I even have to tell you why RSS/Atom feeds are a good thing?

The Problem

My new project will most definitely benefit from open data through a web service. Since I felt this was high on the prioritized features list, I wanted to make sure I had an architecturally sound mechanism to handle all types of requests.

In my case, I wanted almost any page which interacts with data to be accessible in RDF format. Additionally, I wanted it to be easy to add new formats at the view level, without having to do anything to the code (for a true MVC architecture). This means the controller itself should only be concerned with which set of views to push data to, not how to translate the data into any particular format.

My first attempt was rather complicated. I created a router which used one set of routes to strip values from a path, and then a second to do the actual routing. On top of that, the routes had to be confusing in order to get it to work. It wasn't very innovative, but it worked. You can see that attempt here: Zend Framework: Using Transparent Routes.

The Solution: Content Type Filtering

After sleeping on it, I decided to keep it simple. Instead of writing complicated regex routes to do all of the work for me, why not just have a router that knows how to detect the content type specification, and strip it? That's true transparency: don't pass any information to anything that doesn't need it.

To revisit, here are my needs:

  • User-specific content type through the URI. Receiving the same data in a different format is as simple as changing the content-type part of the uri.
  • The content-type does not have to be specified. In that case, the content type will default to html
  • Only content-types I wish to support will be caught through the detection mechanism. For example /foo/bar.html will work, but /foo/bar.bazml will not result in 'bazml' being detected as a content type.
  • I do not want to break the default parameterization mechanism

With those requirements in mind, I went to work. The first step is to create a router which is aware of the 'content type' concept. It would allow a user to specify a list of content types which it will attempt to detect, and the regex patterns to use in order to detect this patterns. Once it's done it's job of detecting the content types, it will strip that portion of the URI, and pass it back to the request object.

The Routing Schema

After some deliberation, I decided to change my "route schema". At first, I wanted the content type to be specified using an extension:

/<controller>/<action>.<contentType>

For example, the following calls AuthorsController::browseAction() and displays the results in RDF format:

/authors/browse.rdf/page/1

The problem with this is simply the way the URIs look. I'm not sure of any implications other than it annoying me. So, I went with my original route schema:

/<contentType>/<controller>/<action>

However, pretend I never said anything. For the purposes of this post, we'll be using the first routing schema. That schema points out some extra efforts that are needed to make this work universally, and provides the best demonstration.

The configuration

<routing>
       <contentTypes>html,xml</contentTypes>
        <contentTypePattern>\w+\.(#)</contentTypePattern>
        <contentTypeReplacePattern>\.(#)</contentTypeReplacePattern>
</routing>

So this will tell our controller that our router than we're only looking for xml and html. We'll use contentTypePattern to detect the pattern, and contentTypeReplacePattern to strip the pattern from the the path before routing.

Whoa, what's up with the #?. Simple, the # characters will be replaced with a list of content types from our configuration. This allows us to abstract the list of content types we want to detect from the pattern used to find them. Here's what the regex will look like after the str_replace is done to insert the content types:

\w+\.(html|rdf|xml)

That makes sense, but why do we need a detection pattern and a stripping pattern? This is exactly why the .extension content type schema makes a better demonstration. Lets say we're trying to determine the content type of a request to `controller/action.xml`. What we want is the 'xml' part. If we write a regex to match the 'xml' part, and use that same pattern to strip it out, we'll end up passing 'controller/action.' as the request path. Obviously, we don't want that stray dot.

Enter the replacement pattern. If we tell our router the entire string that we want to remove, we can also match the dot, and pass 'controller/action' as the request path. This adds much more flexibility to our routes.

The Router

class ContentTypeRouter extends Zend_Controller_Router_Rewrite
{

 protected $_aContentTypes      = array();
 protected $_aContentTypeRegex    = array();
 protected $_sDefaultContentType   = 'html';

 public function addContentType( $sContentType )
 {
  assert( is_string( $sContentType ) );

  $sContentType = strtolower( $sContentType );

  if ( !in_array( $sContentType, $this->_aContentTypes ) )
  {
   $this->_aContentTypes[] = $sContentType;
  }
 }

 public function setDefaultContentType( $sContentType )
 {
  $this->_sContentType = $sContentType;
 }

 public function addFilterRegex( $sRegex, $sReplacePattern, array $aMap = array()  )
 {

  $this->_aContentTypeRegex[]  = array(
   'regex'  => $sRegex,
   'replace'=> $sReplacePattern,
   'map'    => $aMap );
 }

 /**
  * @see Zend_Controller_Router_Rewrite
  *
  * @param Zend_Controller_Request_Abstract $oRequest
  * @return Zend_Controller_Request_Abstract
  */
 public function route( Zend_Controller_Request_Abstract $oRequest )
 {

  $sContentType = $this->_sDefaultContentType;

  $sContentTypes = join( '|', $this->_aContentTypes );
  $sPath = $this->_getPathInfo( $oRequest );

  foreach( $this->_aContentTypeRegex as $aRegex )
  {
   $sRegex = str_replace(
    '#',
    $sContentTypes,
    $aRegex['regex'] );


   $aMatch = array();
   $sRegex = "/$sRegex/";

   //die ($sPath);

   if ( 1 === preg_match( $sRegex, $sPath, $aMatch ) )
   {

    //grab the content type from the match
    $sContentType = $aMatch[ 1 ];

    //---------------------------------------
    //- replace the content type in the route
    //---------------------------------------

    $sRegex = str_replace( '#', $sContentTypes, $aRegex['replace'] );
    $sPath = preg_replace( "/$sRegex/", '', $sPath );

    break;
   }
  }

  $oRequest->setPathInfo( $sPath );
  $oRequest->setParam( 'requestedContentType', $sContentType );

  $oRequest = parent::route( $oRequest );

  return $oRequest;

 }

 private function _getPathInfo( Zend_Controller_Request_Abstract $oRequest )
 {
   if (!method_exists($oRequest, 'getVersion') || $oRequest->getVersion() == 1) {
                return $oRequest->getPathInfo();
            } else {
                return $oRequest;
            }
 }
}

One thing you will notice about the route is that you can stack filters. This means we can support more than one routing schema, and the first one to match will be applied. Now, when I decide to switch back to the `<controller>/<action>.<contentType>` schema, I can do so without breaking existing URLs. My users will love me.

Usage Example

$oRouter = new ContentTypeRouter();

// load supported content types from config
$aContentTypes = $this->getMainConfiguration()->routing->contentTypes;
$sPattern = $this->getMainConfiguration()->routing->contentTypePattern;
$aContentTypes = split( ',', $aContentTypes );

// add our regex filters
$oRouter->addFilterRegex(
 $this->getMainConfiguration()->routing->contentTypePattern,
 $this->getMainConfiguration()->routing->contentTypeReplacePattern );


// add each supported content type to the router
foreach ( $aContentTypes as $sContentType )
{
 $oRouter->addContentType( $sContentType );
}

// make use of the router before dispatch
$oController = Zend_Controller_Front::getInstance();
$oController->setRouter( $oRouter );

Thursday, October 2, 2008

Zend Framework: Using Transparent Routes

I love Zend Framework. I have to admit, my only othe real indepth experience with a rapid development framework in PHP has been CakePHP, though. I hope that doesn't discredit me too much.

Routing with Zend Framework takes on the best of both worlds: It's very simple in it's default form, but tools are provided that allow you to take on as much complexity as you desire. Even if the shipped classes aren't enough for you, you can easily write your own route types, or even your own router.

Update

I've decided that this method of solving my problem was a little complicated and not very scalable. Please check out this article for a better solution.

The Problem

For my current project, I wanted my routes to work like the default routes, with a slight twist. The first part of the URL needed to be a content type to return to the user, and the following string be the controller, action, and parameters. So, for example,

/xml/auth/login
     controller: auth
     action: login
     requestedContentType: xml

/html/auth/login
     controller: auth
     action: login
     requestedContentType: html

This is rather simple using a standard route in zend framework:

$oRouter = $oController->getRouter();
$oRoute = new Zend_Controller_Router_Route(
  ':/requestedContentType/:controller/:action/*
);
$oRouter->addRoute( 'awesomeRoute', $oRoute );
$oController->setRouter( $oRouter );

Perfect. The accomplishes everything we need, and is very simple. But then I had to go and throw a wrench in the works. In addition to having a URI supplied requestedContentType, I wanted to:

  • Make sure that only supported content types were parsed in this manner.
  • Be able to not specify a content type, and have it default to html.
  • Not break Zend Framework's default parameterization behavior.

Example parsed routes:

/xml/content/books
     controller: content
     action: books
     requestedContentType: xml

/html/content/books
     controller: content
     action: books
     requestedContentType: html

/content/books
     controller: content
     action: books
     requestedContentType: html

/rss/content/books/page/1
     controller: content
     action: books
     requestedContentType: rss
     page: 1

/content/books/foo/bar
     controller: content
     action: books
     requestedContentType: html
     foo: bar

By now, you may have guessed it -- A custom is in order. Unfortunately, I messed around with chaining routes of different types for hours before coming to this conclusion. This was mosty out of fear, though.

The Solution

The approach I took was slightly different than the built-in routing flow. As shipped, Zend's routers will loop through the routes you pass to it to it until it finds the first one to match (in the order of which they were supplied). Once the route matches, the parameters extracted through that route are assigned to the request, and control is returned to the dispatcher.

I decided to rework this a little bit to meet my needs. I essentially wanted to pass in a set of filters which could extract parameters from a URI before the actual routing takes place. This would allow me to extract the content type from the URI if it existed, but allow other routes to actually handle the routing itself. I called these routes 'Transparent Routes', since they are applied the same way as Zend's routes, however they do not actually invoke routing.

Here is my flow:

  1. Setup Your Routes
  2. Assign Transparent Routes
  3. Assign Routes
  4. When route() is invoked, the transparent routes are applied, which extracts parameters from the URI
  5. The standard routing mechanism is triggered. When a route is matched, only parameters which are also extracted from this route will be applied.
This completely solved my immediate issue, and allowed opportunties for much more down the road. Almost any parameter can be generated this way.

The TransparencyRouter Class

/**
 * Extended router which allows applying transparent routes which
 * only serve to extract paramters.  Once transparent routes
 * have been applied, controll is returned to the standard
 * Zend_Controller_Router_Rewrite.  
 * 
 * @author A.J. Brown
 * @version 1.0
 *
 */
class TransparencyRouter extends Zend_Controller_Router_Rewrite
{
 protected $_transparentRoutes = array();

 /**
  * Adds a route which will be used for extracting parameters only.
  *
  * @param string $sName the name for this route
  * @param Zend_Controller_Router_Route_Abstract $oRoute
  * @return TransparencyRouter
  */
 public function addTransparentRoute(
  $sName,
  Zend_Controller_Router_Route_Abstract $oRoute
 )
 {
        if (method_exists($oRoute, 'setRequest')) {
            $oRoute->setRequest($this->getFrontController()->getRequest());
        }

        $this->_transparentRoutes[$sName] = $oRoute;

        return $this;
 }

 /**
  * @see Zend_Controller_Router_Rewrite
  *
  * @param Zend_Controller_Request_Abstract $oRequest
  * @return Zend_Controller_Request_Abstract
  */
 public function route( Zend_Controller_Request_Abstract $oRequest )
 {

  foreach (array_reverse($this->_transparentRoutes) as $name => $route) {

      if (!method_exists($route, 'getVersion') || $route->getVersion() == 1) {
                $match = $oRequest->getPathInfo();
            } else {
                $match = $request;
            }

      if ($params = $route->match( $match ) ) {
                $this->_setRequestParams($oRequest, $params);
                $iMatched++;
            }
  }

  return parent::route( $oRequest );
 }
}

Example Usage

//--------------------------
// Configure routes
//--------------------------

$oRouter = new TransparencyRouter();


$oInterpreterRoute = new Zend_Controller_Router_Route(
 ':controller/:action/*' );


$oNonContentTypeRoute = new Zend_Controller_Router_Route_Regex(
 '(\w+)/(\w+)(\/.*)?',
 array(
  'requestedContentType' => 'html'),
 array(
  1 => 'controller',
  2 => 'action' )
);

$oContentTypeRoute = new Zend_Controller_Router_Route_Regex(
 // TODO the content types themselves should be
        // pulled in from a config file.
        '(html|xml)\/(\w+)\/(\w+)(\/.*)?',
 array(),
 array(
  1 => 'requestedContentType',
  2 => 'controller',
  3 => 'action' )
);

$oRouter->addTransparentRoute( 'nonContentType', $oNonContentTypeRoute  );
$oRouter->addFilteringRoute( 'contentType', $oContentTypeRoute );
$oRouter->addRoute( 'generic', $oInterpreterRoute );
$oRouter->addRoute( 'contentType', $oContentTypeRoute );

//-------------------------
// Setup controller
//-------------------------
$oController = Zend_Controller_Front::getInstance();
$oController->setRouter( $oRouter );