My Symfony Plugin for Redirecting Database Connections on Read Operations

A few days ago I was assigned to set up a replication schema for test purposes over the application my new colleagues have been building from some time now. After installing a pair of Ubuntu based servers and configuring appropriately the associated MySQL servers to run a replication process I get to the part I wanted more, I mean, how to integrate easily the existence of new slave servers without forcing the other programmers to change drastically their application code.

The former system is being build around the Symfony PHP framework on version 1.2 with a database which tables are being powered by the InnoDB table engine. The idea behind the use of a replication schema is to have the same information from the original database replicated to slave servers that will be used just for reading operations letting the main database handle just the write ones, hence reducing the locks and improving performance, specially if you take into account that tables on slave servers can be run by the MyISAM table engine. This principle is far from being new, in fact it is broadly described on the MySQL reference manual (the place I recommend before any MySQL book), the key thing with the locks is that the master records not need to be locked for reading operations while the MyISAM tables exhibits a nice performance as the read and write operations counts diverge largely.

However there are not silver bullet and as the lead programmer for the application wisely point out, there are a more than a few places where accurate information is mandatory and certainly there is no cheap nor convenient way to ensure from code that the slaves has no work left to do (in fact the MySQL replication is by definition asynchronous) so in practice we can’t just say SELECT queries goes to the slaves and anything else goes to the master. We needed more or less the opposite implementation concept, keep reading from the master but for a few tables and while certainly conditions apply uses the slave servers.

That lead me to this post, on which I will share the details of a Symfony plugin I wrote that allows you to “redirect” an alter database connection over certain conditions before read operations get executed.

Symfony and Propel over Multiple Slave Servers

If you have read the Symfony documentation and related books you may know that there is no mention of the use of slave servers on a Symfony application, if you Google about it you hopefully will find this useful reply on the Symfony forums that will teach you that Propel (one of the ORM the framework uses) do support the use of slave servers associated to a master database, and also it select the slave randomly, but the Propel engine never know about this information because Symfony doesn’t look after it, the solution of the thread was to fix this issue and let Propel finally know about those servers modifying two files: sfDatabaseConfigHandler.class.php (/sf-lib/config) and sfPropelDatabase.class.php (/sf-lib/plugins/sfPropelPlugin/lib/database).

This was a great point to start but I don’t wanted to alter the framework just like that (don’t misunderstand me that was a good rewrite) and as I needed some degree of extra configuration I already had more place to maneuver if instead of reading from the global databases.yml the envisioned plugin read the slave servers information from the app.yml file of the application config directory. The following snippet is an example of the kind app.yml file you can use with the plugin we will end on this post:

all:
  redirection:
    databases:
      #Identifier for the database at the databases.yml
      propel:
        slaves:        
          #The slaves identifiers has not special usage.
          slave1:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
          slave2:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony

Take into account that being not global the configuration of the slave servers can be different from each of the frameworks apps. Its easily to get all that keys and values using the sfConfig class as we shall see later, now lets continue with the idea.

sfMixer And The Extension points Symfony’s Base Peer Delivers

The second step was to figure it out how to catch up read operations and provide a different database connection, by using the smoothness navigation of JetBrains’s PhpStorm I went from invocation of a doSelect method at a peer class to a doSelectStmt method at a peer base class that luckily define an extension point via the sfMixer class. The sfMixer class is deprecated now but still in use on the model layer, in a nutshell it allow the definition of extension points that can be later fill out from any part of the framework code (yes, it is a kind of aspect oriented programming in PHP). The following code shows the doSelectStmt method generated for a Message entity:

 
    public static function doSelectStmt(Criteria $criteria, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('BaseMessagePeer:doSelectStmt:doSelectStmt') as $callable)
        {
            call_user_func($callable, 'BaseMessagePeer', $criteria, $con);
        }
        if ($con === null) {
            $con = Propel::getConnection(MessagePeer::DATABASE_NAME, Propel::CONNECTION_READ);
        }
        if (!$criteria->hasSelectClause()) {
            $criteria = clone $criteria;
            MessagePeer::addSelectColumns($criteria);
        }
        // Set the correct dbName
        $criteria->setDbName(self::DATABASE_NAME);
        // BasePeer returns a PDOStatement
        return BasePeer::doSelect($criteria, $con);
    }

As you can see the sfMixer allows to retrieve a list of functions registered to an string key that will be later executed via the PHP function call_user_func. Please note that this is maybe one of the best places to uses a mechanism like aspect oriented programming, the code you register within the extension point will be invocated with all the information of the context: the criteria, the connection and the name of the peer. Also note that after the registered functions, if any, are executed the $con parameter will be checked for a value different that null, if it isn’t the case a new connection will be created explicitly intended for reading purposes.

Everything point out that it will be easy to evaluates some conditions (even read the criteria object) and decided when to provide a connection to the slave servers, but there is a problem with this idea and the above code, in order to the desired magic to happen the $con object must be passed as a reference and the call_user_func doesn’t pass references, then putting & will not be enough in the function to be registered.

The workaround is easy but breaks the whole idea of a smooth integration: for any model susceptible of reading from slaves we must override the doSelectStmt on the peer and instead of $con use &$con in the place when the extensions functions are invoked. The following code shows the trick, mmm know I see that after all we don’t need to write too much code.

class MessagePeer extends BaseMessagePeer {

    public static function doSelectStmt(Criteria $criteria, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('MessagePeer:doSelectStmt:doSelectStmt') as $callable)
        {
            call_user_func($callable, 'MessagePeer', $criteria, &$con);
        }
        return BaseMessagePeer::doSelectStmt($criteria, $con);
    }

    public static function doCount(Criteria $criteria, $distinct = false, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('MessagePeer:doCount:doCount') as $callable)
        {
            call_user_func($callable, 'MessagePeer', $criteria, &$con);
        }
        return BaseMessagePeer::doCount($criteria,$distinct,$con);
    }
}

Actually we also need to override the doCount method for the same reason. Frankly, I don’t see why the base peers are not generated to pass this parameter as a reference, if you can’t change these parameters on the extensions functions there are a less number of useful extension different provide logging services of course there are security matters but here aren’t any feature on binary form so as a developer I am able to look after each of the plugins I use before going to production.

The sfPropelRedirectionPlugin

Now that we know where and how “redirect” the database connection lets take a loot at a more complete (and commented) app.yml  that shows how we can configure the still unknown plugin. Remember that the intention is to read from slaves just when its safe to say we wouldn’t miss anything:


all:
  redirection:
    databases:
      #Identifier for the database at the databases.yml
      propel:
        slaves:        
          #The slaves identifiers has not special usage.
          slave1:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
          slave2:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
        #Entities that may can be read from the slave servers
        entities:
          Message:
            #Column that indicates the "age" of a record.
            gen_column: posted_at
            #Conditions dictates when is safe to read from the slave servers, just one is needed.
            #The ctx indicates the module/action pair on wich is valid to read from slaves
            #added enforces a check over the criteria object to ensure that the records
            #seeked are "old" enough to be at the slave servers, added has the form of
            #{s: seconds, m: minutes, h: hours, D: days, M: months, Y: years}
            conditions: [{ctx: stats/index,added:{D: 2}},{ctx: board/index,added:{h: 1}}]
          #No conditions or any emtpy condition means to ALWAYS read from the slaves.
          Topic:
            conditions: []
          Student:
            conditions: [{ctx: people/index},{}]

The new thing in this version is the entities keys that is defined as the same level as the slaves key, in this section we basically want to add those entities that may read from the servers. I say may because there is the option to add a conditions to ensure the redirection doesn’t happen blindly, its sufficient for one condition to hold so we switch to slave servers. Conditions can be made of two components called the context (ctx) and time since addition (added), the last is meaningful only if we provide a value to the gen_column key. The gen_column must refers to a column that define the “age” of a record or the last time a record was updated, this semantic is intended to grasp some information from a criteria object and determine if the records needed are probably on the slave servers. Please note that an empty condition or none conditions will let to always uses the slaves servers.

The first thing I needed to do for start building a plugin was to create a folder under the plugins directory suffixed by the word Plugin, this ensure me that Symfonywill recognize that folder as the root of a new plugin and will automatically execute a file named config.php if located under a config directory. The following picture shows how the plugin folder should look like:

The config.php is executed before the application but after the Symfony bootstrap so is the right place to require other files and init services. I like to keep the config.php clean as the following snippet shows:


require_once dirname(__FILE__) . '\..\lib\sfPropelRedirection.class.php';

sfPropelRedirection::init();

The code for the init method explicitly defines a singleton behavior so you can load the data just once, it first execute sfConfig:get(‘app_redirection_databases’) to get the whole configuration and then start processing it, the process is very straightforward so I will just put the code with the comments:

 
    public static function init() 
    {
        if (self::$_started) return;
        $redirection_conf = sfConfig::get('app_redirection_databases', null);

        if ($redirection_conf === null) return;

        //Iterate over the databases configurations.
        foreach ($redirection_conf as $dbName => $dbOptions)
        {
            //Check if there are any slaves servers configured.
            if ($dbOptions['slaves'] === null) continue;
            foreach ($dbOptions['slaves'] as &$slave)
            {
                foreach ($slave as &$param) $param = sfConfigHandler::replaceConstants($param);
                #Symfony uses 'username' but Propel expects 'user'.
                $slave['user'] = $slave['username'];
                unset($slave['username']);
            }

            self::$_slaveConfig[$dbName] = $dbOptions['slaves'];

            //Check if there is any entity that maybe be redirected to the slave.
            if ($dbOptions['entities'] === null) continue;

            //Iterate over the entities.
            foreach ($dbOptions['entities'] as $model => $options)
            {
                $peerClass = "{$model}Peer";

                //Check if the peer exits.
                if (!class_exists($peerClass)) continue;

                $doSelectStmtHook = "{$peerClass}:doSelectStmt:doSelectStmt";
                $doCountHook = "{$peerClass}:doCount:doCount";

                //Register the interceptor function on the peer hooks.
                $interceptor = array('sfPropelRedirection', 'slaveConnection');
                sfMixer::register($doSelectStmtHook, $interceptor);
                sfMixer::register($doCountHook, $interceptor);

                //Check if the peer has conditions in order to be redirected to the slave.
                if (!isset($options['conditions'])) continue;

                self::$_peerOptions[$peerClass]['conditions'] = $options['conditions'];

                //If there are zero conditions then we don't need to check a gen_column.
                if (!isset($options['gen_column'])) continue;

                $columnName = strtolower($model) . '.' . strtoupper($options['gen_column']);

                //Check if the gen column really exists in the model
                if (!in_array($columnName, $peerClass::getFieldNames(BasePeer::TYPE_COLNAME))) continue;

                self::$_peerOptions[$peerClass]['gen_column'] = $columnName;
            }
        }
        self::$_started = true;
    }

Once the configuration has been loaded the class has two variables full of information waiting for be used. The first time an extension point is reached there may be modifications to the data associated with the corresponding peer, for example the added components of conditions are converted to seconds making more easy to compare the date information.

 
    public static function slaveConnection($peerClassName, Criteria $criteria, &$con) {
        //Retrieve options for the current Peer
        $options = &self::$_peerOptions[$peerClassName];
        $ops = array(Criteria::LESS_EQUAL, Criteria::LESS_THAN, Criteria::EQUAL);
        $apply = true;

        //Check if the peer options includes conditions, if not use the slave.
        if (isset($options['conditions']) && sizeof($options['conditions']) > 0) {
            //Use slave when ANY of the conditions is satisfied.
            $apply = false;
            //Build the current context pattern to check against the ctx condition.
            $pattern = self::getContextPattern();

            //Iterate over the conditions expressions.
            foreach ($options['conditions'] as &$cond)
            {
                //Check if the ctx is given.
                if (isset($cond['ctx']) && !(preg_match($pattern, $cond['ctx']) == 1))
                    continue;

                if (isset($options['gen_column']) && isset($cond['added'])) {
                    //If it is the first time, convert the added component to be a timestamp
                    if (is_array($cond['added']))
                        $cond['added'] = self::calcTimeStamp($cond['added']);

                    $criterion = $criteria->getCriterion($options['gen_column']);
                    if ($criterion !== null && in_array($criterion->getComparison(), $ops)) {
                        $added = strtotime($criterion->getValue()) <= strtotime(date('Y-m-d')) - $cond['added'];
                        if (!$added) continue;
                    }
                }
                //Redirect to the slave, and stop iterating.
                $apply = true;
                break;
            }
        }

        if ($apply) $con = self::getConnectionToSlave($peerClassName::DATABASE_NAME);
    }

Look that the criteria object is checked not just when the gen_colum is provided, the criteria must also have a criterion associated with that field and a comparison that involves the operators <, <= and =. Then a simple math operation is performed to check that the superior quota is in the range where is safe to redirect. For example if the > operator is used there exists the possibility so a record inserted a few seconds ago will be missing from the result set if the connection targets the slave database, so far that is the only constraint I have coded to ensure its safe to use slaves based on the criteria object. Based on the above app.yml file and the current time the following queries are redirected to the slaves:

    //Executing at executeIndex at statsActions controller
    $c = new Criteria();
    //Time of publish is 30/10/2010 and there is safe to seek records added at least 2 days ago.
    $c->add(MessagePeer::POSTED_AT,'2010-10-25',Criteria::LESS_THAN);
    $this->all_messages = MessagePeer::doSelect($c);

    $c = new Criteria();
    $c->add(MessagePeer::POSTED_AT,'2010-10-25',Criteria::LESS_THAN);
    $c->addJoin(MessagePeer::TOPIC_ID,TopicPeer::ID);
    $c->add(TopicPeer::ID,2,Criteria::EQUAL);
    $this->programming_messages = MessagePeer::doSelect($c);

Now, please the note the call to getConnectionToSlave function at the end of the slaveConnection function, if you are sure you can use the slaves this is the shortest way to create a connection that bypass all the above filter and conditions, and it proves handy in case you need to create query using an specific SQL query:

    $con = sfPropelRedirection::getConnectionToSlave(MessagePeer::DATABASE_NAME);
    $sql = "SELECT customer.name, customer.id, COUNT(order.num) AS num_ord
            FROM customer INNER JOIN order ON customers.cust_id = orders.cust_id
            GROUP BY customers.cust_id";
    $stm = $con->createStatement();
    $rs = $stm->executeQuery($sql, ResultSet::FETCHMODE_NUM);

You can download the plugin code from here. For the next days I will be using this plugin directly with others developers so I will keep myself updating this post If I found anything worth to be added, any feedback is welcome.

hope you like it

-horacio

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s