My Symfony Plugin for Redirecting Database Connections on Read Operations

A few days ago I was assigned to set up a replication schema for test purposes over the application my new colleagues have been building from some time now. After installing a pair of Ubuntu based servers and configuring appropriately the associated MySQL servers to run a replication process I get to the part I wanted more, I mean, how to integrate easily the existence of new slave servers without forcing the other programmers to change drastically their application code.

The former system is being build around the Symfony PHP framework on version 1.2 with a database which tables are being powered by the InnoDB table engine. The idea behind the use of a replication schema is to have the same information from the original database replicated to slave servers that will be used just for reading operations letting the main database handle just the write ones, hence reducing the locks and improving performance, specially if you take into account that tables on slave servers can be run by the MyISAM table engine. This principle is far from being new, in fact it is broadly described on the MySQL reference manual (the place I recommend before any MySQL book), the key thing with the locks is that the master records not need to be locked for reading operations while the MyISAM tables exhibits a nice performance as the read and write operations counts diverge largely.

However there are not silver bullet and as the lead programmer for the application wisely point out, there are a more than a few places where accurate information is mandatory and certainly there is no cheap nor convenient way to ensure from code that the slaves has no work left to do (in fact the MySQL replication is by definition asynchronous) so in practice we can’t just say SELECT queries goes to the slaves and anything else goes to the master. We needed more or less the opposite implementation concept, keep reading from the master but for a few tables and while certainly conditions apply uses the slave servers.

That lead me to this post, on which I will share the details of a Symfony plugin I wrote that allows you to “redirect” an alter database connection over certain conditions before read operations get executed.

Symfony and Propel over Multiple Slave Servers

If you have read the Symfony documentation and related books you may know that there is no mention of the use of slave servers on a Symfony application, if you Google about it you hopefully will find this useful reply on the Symfony forums that will teach you that Propel (one of the ORM the framework uses) do support the use of slave servers associated to a master database, and also it select the slave randomly, but the Propel engine never know about this information because Symfony doesn’t look after it, the solution of the thread was to fix this issue and let Propel finally know about those servers modifying two files: sfDatabaseConfigHandler.class.php (/sf-lib/config) and sfPropelDatabase.class.php (/sf-lib/plugins/sfPropelPlugin/lib/database).

This was a great point to start but I don’t wanted to alter the framework just like that (don’t misunderstand me that was a good rewrite) and as I needed some degree of extra configuration I already had more place to maneuver if instead of reading from the global databases.yml the envisioned plugin read the slave servers information from the app.yml file of the application config directory. The following snippet is an example of the kind app.yml file you can use with the plugin we will end on this post:

all:
  redirection:
    databases:
      #Identifier for the database at the databases.yml
      propel:
        slaves:        
          #The slaves identifiers has not special usage.
          slave1:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
          slave2:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony

Take into account that being not global the configuration of the slave servers can be different from each of the frameworks apps. Its easily to get all that keys and values using the sfConfig class as we shall see later, now lets continue with the idea.

sfMixer And The Extension points Symfony’s Base Peer Delivers

The second step was to figure it out how to catch up read operations and provide a different database connection, by using the smoothness navigation of JetBrains’s PhpStorm I went from invocation of a doSelect method at a peer class to a doSelectStmt method at a peer base class that luckily define an extension point via the sfMixer class. The sfMixer class is deprecated now but still in use on the model layer, in a nutshell it allow the definition of extension points that can be later fill out from any part of the framework code (yes, it is a kind of aspect oriented programming in PHP). The following code shows the doSelectStmt method generated for a Message entity:

 
    public static function doSelectStmt(Criteria $criteria, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('BaseMessagePeer:doSelectStmt:doSelectStmt') as $callable)
        {
            call_user_func($callable, 'BaseMessagePeer', $criteria, $con);
        }
        if ($con === null) {
            $con = Propel::getConnection(MessagePeer::DATABASE_NAME, Propel::CONNECTION_READ);
        }
        if (!$criteria->hasSelectClause()) {
            $criteria = clone $criteria;
            MessagePeer::addSelectColumns($criteria);
        }
        // Set the correct dbName
        $criteria->setDbName(self::DATABASE_NAME);
        // BasePeer returns a PDOStatement
        return BasePeer::doSelect($criteria, $con);
    }

As you can see the sfMixer allows to retrieve a list of functions registered to an string key that will be later executed via the PHP function call_user_func. Please note that this is maybe one of the best places to uses a mechanism like aspect oriented programming, the code you register within the extension point will be invocated with all the information of the context: the criteria, the connection and the name of the peer. Also note that after the registered functions, if any, are executed the $con parameter will be checked for a value different that null, if it isn’t the case a new connection will be created explicitly intended for reading purposes.

Everything point out that it will be easy to evaluates some conditions (even read the criteria object) and decided when to provide a connection to the slave servers, but there is a problem with this idea and the above code, in order to the desired magic to happen the $con object must be passed as a reference and the call_user_func doesn’t pass references, then putting & will not be enough in the function to be registered.

The workaround is easy but breaks the whole idea of a smooth integration: for any model susceptible of reading from slaves we must override the doSelectStmt on the peer and instead of $con use &$con in the place when the extensions functions are invoked. The following code shows the trick, mmm know I see that after all we don’t need to write too much code.

class MessagePeer extends BaseMessagePeer {

    public static function doSelectStmt(Criteria $criteria, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('MessagePeer:doSelectStmt:doSelectStmt') as $callable)
        {
            call_user_func($callable, 'MessagePeer', $criteria, &$con);
        }
        return BaseMessagePeer::doSelectStmt($criteria, $con);
    }

    public static function doCount(Criteria $criteria, $distinct = false, PropelPDO $con = null) {
        foreach (sfMixer::getCallables('MessagePeer:doCount:doCount') as $callable)
        {
            call_user_func($callable, 'MessagePeer', $criteria, &$con);
        }
        return BaseMessagePeer::doCount($criteria,$distinct,$con);
    }
}

Actually we also need to override the doCount method for the same reason. Frankly, I don’t see why the base peers are not generated to pass this parameter as a reference, if you can’t change these parameters on the extensions functions there are a less number of useful extension different provide logging services of course there are security matters but here aren’t any feature on binary form so as a developer I am able to look after each of the plugins I use before going to production.

The sfPropelRedirectionPlugin

Now that we know where and how “redirect” the database connection lets take a loot at a more complete (and commented) app.yml  that shows how we can configure the still unknown plugin. Remember that the intention is to read from slaves just when its safe to say we wouldn’t miss anything:


all:
  redirection:
    databases:
      #Identifier for the database at the databases.yml
      propel:
        slaves:        
          #The slaves identifiers has not special usage.
          slave1:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
          slave2:
            dsn:        mysql:dbname=sayeet_repl;host=localhost
            username:   symfony
            password:   symfony
        #Entities that may can be read from the slave servers
        entities:
          Message:
            #Column that indicates the "age" of a record.
            gen_column: posted_at
            #Conditions dictates when is safe to read from the slave servers, just one is needed.
            #The ctx indicates the module/action pair on wich is valid to read from slaves
            #added enforces a check over the criteria object to ensure that the records
            #seeked are "old" enough to be at the slave servers, added has the form of
            #{s: seconds, m: minutes, h: hours, D: days, M: months, Y: years}
            conditions: [{ctx: stats/index,added:{D: 2}},{ctx: board/index,added:{h: 1}}]
          #No conditions or any emtpy condition means to ALWAYS read from the slaves.
          Topic:
            conditions: []
          Student:
            conditions: [{ctx: people/index},{}]

The new thing in this version is the entities keys that is defined as the same level as the slaves key, in this section we basically want to add those entities that may read from the servers. I say may because there is the option to add a conditions to ensure the redirection doesn’t happen blindly, its sufficient for one condition to hold so we switch to slave servers. Conditions can be made of two components called the context (ctx) and time since addition (added), the last is meaningful only if we provide a value to the gen_column key. The gen_column must refers to a column that define the “age” of a record or the last time a record was updated, this semantic is intended to grasp some information from a criteria object and determine if the records needed are probably on the slave servers. Please note that an empty condition or none conditions will let to always uses the slaves servers.

The first thing I needed to do for start building a plugin was to create a folder under the plugins directory suffixed by the word Plugin, this ensure me that Symfonywill recognize that folder as the root of a new plugin and will automatically execute a file named config.php if located under a config directory. The following picture shows how the plugin folder should look like:

The config.php is executed before the application but after the Symfony bootstrap so is the right place to require other files and init services. I like to keep the config.php clean as the following snippet shows:


require_once dirname(__FILE__) . '\..\lib\sfPropelRedirection.class.php';

sfPropelRedirection::init();

The code for the init method explicitly defines a singleton behavior so you can load the data just once, it first execute sfConfig:get(‘app_redirection_databases’) to get the whole configuration and then start processing it, the process is very straightforward so I will just put the code with the comments:

 
    public static function init() 
    {
        if (self::$_started) return;
        $redirection_conf = sfConfig::get('app_redirection_databases', null);

        if ($redirection_conf === null) return;

        //Iterate over the databases configurations.
        foreach ($redirection_conf as $dbName => $dbOptions)
        {
            //Check if there are any slaves servers configured.
            if ($dbOptions['slaves'] === null) continue;
            foreach ($dbOptions['slaves'] as &$slave)
            {
                foreach ($slave as &$param) $param = sfConfigHandler::replaceConstants($param);
                #Symfony uses 'username' but Propel expects 'user'.
                $slave['user'] = $slave['username'];
                unset($slave['username']);
            }

            self::$_slaveConfig[$dbName] = $dbOptions['slaves'];

            //Check if there is any entity that maybe be redirected to the slave.
            if ($dbOptions['entities'] === null) continue;

            //Iterate over the entities.
            foreach ($dbOptions['entities'] as $model => $options)
            {
                $peerClass = "{$model}Peer";

                //Check if the peer exits.
                if (!class_exists($peerClass)) continue;

                $doSelectStmtHook = "{$peerClass}:doSelectStmt:doSelectStmt";
                $doCountHook = "{$peerClass}:doCount:doCount";

                //Register the interceptor function on the peer hooks.
                $interceptor = array('sfPropelRedirection', 'slaveConnection');
                sfMixer::register($doSelectStmtHook, $interceptor);
                sfMixer::register($doCountHook, $interceptor);

                //Check if the peer has conditions in order to be redirected to the slave.
                if (!isset($options['conditions'])) continue;

                self::$_peerOptions[$peerClass]['conditions'] = $options['conditions'];

                //If there are zero conditions then we don't need to check a gen_column.
                if (!isset($options['gen_column'])) continue;

                $columnName = strtolower($model) . '.' . strtoupper($options['gen_column']);

                //Check if the gen column really exists in the model
                if (!in_array($columnName, $peerClass::getFieldNames(BasePeer::TYPE_COLNAME))) continue;

                self::$_peerOptions[$peerClass]['gen_column'] = $columnName;
            }
        }
        self::$_started = true;
    }

Once the configuration has been loaded the class has two variables full of information waiting for be used. The first time an extension point is reached there may be modifications to the data associated with the corresponding peer, for example the added components of conditions are converted to seconds making more easy to compare the date information.

 
    public static function slaveConnection($peerClassName, Criteria $criteria, &$con) {
        //Retrieve options for the current Peer
        $options = &self::$_peerOptions[$peerClassName];
        $ops = array(Criteria::LESS_EQUAL, Criteria::LESS_THAN, Criteria::EQUAL);
        $apply = true;

        //Check if the peer options includes conditions, if not use the slave.
        if (isset($options['conditions']) && sizeof($options['conditions']) > 0) {
            //Use slave when ANY of the conditions is satisfied.
            $apply = false;
            //Build the current context pattern to check against the ctx condition.
            $pattern = self::getContextPattern();

            //Iterate over the conditions expressions.
            foreach ($options['conditions'] as &$cond)
            {
                //Check if the ctx is given.
                if (isset($cond['ctx']) && !(preg_match($pattern, $cond['ctx']) == 1))
                    continue;

                if (isset($options['gen_column']) && isset($cond['added'])) {
                    //If it is the first time, convert the added component to be a timestamp
                    if (is_array($cond['added']))
                        $cond['added'] = self::calcTimeStamp($cond['added']);

                    $criterion = $criteria->getCriterion($options['gen_column']);
                    if ($criterion !== null && in_array($criterion->getComparison(), $ops)) {
                        $added = strtotime($criterion->getValue()) <= strtotime(date('Y-m-d')) - $cond['added'];
                        if (!$added) continue;
                    }
                }
                //Redirect to the slave, and stop iterating.
                $apply = true;
                break;
            }
        }

        if ($apply) $con = self::getConnectionToSlave($peerClassName::DATABASE_NAME);
    }

Look that the criteria object is checked not just when the gen_colum is provided, the criteria must also have a criterion associated with that field and a comparison that involves the operators <, <= and =. Then a simple math operation is performed to check that the superior quota is in the range where is safe to redirect. For example if the > operator is used there exists the possibility so a record inserted a few seconds ago will be missing from the result set if the connection targets the slave database, so far that is the only constraint I have coded to ensure its safe to use slaves based on the criteria object. Based on the above app.yml file and the current time the following queries are redirected to the slaves:

    //Executing at executeIndex at statsActions controller
    $c = new Criteria();
    //Time of publish is 30/10/2010 and there is safe to seek records added at least 2 days ago.
    $c->add(MessagePeer::POSTED_AT,'2010-10-25',Criteria::LESS_THAN);
    $this->all_messages = MessagePeer::doSelect($c);

    $c = new Criteria();
    $c->add(MessagePeer::POSTED_AT,'2010-10-25',Criteria::LESS_THAN);
    $c->addJoin(MessagePeer::TOPIC_ID,TopicPeer::ID);
    $c->add(TopicPeer::ID,2,Criteria::EQUAL);
    $this->programming_messages = MessagePeer::doSelect($c);

Now, please the note the call to getConnectionToSlave function at the end of the slaveConnection function, if you are sure you can use the slaves this is the shortest way to create a connection that bypass all the above filter and conditions, and it proves handy in case you need to create query using an specific SQL query:

    $con = sfPropelRedirection::getConnectionToSlave(MessagePeer::DATABASE_NAME);
    $sql = "SELECT customer.name, customer.id, COUNT(order.num) AS num_ord
            FROM customer INNER JOIN order ON customers.cust_id = orders.cust_id
            GROUP BY customers.cust_id";
    $stm = $con->createStatement();
    $rs = $stm->executeQuery($sql, ResultSet::FETCHMODE_NUM);

You can download the plugin code from here. For the next days I will be using this plugin directly with others developers so I will keep myself updating this post If I found anything worth to be added, any feedback is welcome.

hope you like it

-horacio

Advertisements

Checking if a matrix is a Sudoku with one loop and none

In this post I will share a little snippet I wrote for my programming class some time ago. We were assigned to code a Sudoku resolver using MATLAB. If you have working with this programming tool before you may agree with me that is an excellent language for the task mostly because his syntax for expressing sub matrixs. The intention was to use some simple math towards removing lines of code whiteout affecting performance.

I will not discuss the resolver implementation, just only the routine that check if full matrix of size 9×9 is a Sudoku, if you want to see the code of the resolver follow this link.

The conditions needed for a given matrix to be a Sudoku is that in every row, column and 3×3 block must exist all the natural numbers from 1 to 9 without repetitions, viz. there can’t be any repetition on each of the 27 vectors we can form. An elegant way of checking this condition is using a property of the powers of two, that states that the exponents involved on a sum of two powers uniquely identifies the total sum. That is that in order to check if a nine length vector v is a valid Sudoku row the following expression must comply:

2^v[1] + 2^v[2] + 2^v[3] + 2^v[4] + 2^v[5] + 2^v[6] + 2^v[7] + 2^v[8] + 2^v[9] == 1022

Or in another words that the total sum of the powers divided by 1022 is 1.

(2^v[1] + 2^v[2] + 2^v[3] + 2^v[4] + 2^v[5] + 2^v[6] + 2^v[7] + 2^v[8] + 2^v[9] / 1022) == 1

On the other hand if we group each of the 27 vectors in terns of the form (rows, columns, cells) results that we get a set of nine tern identified by a number from 1 to 9. This enumeration just guide us to use only one loop to check is if the above condition holds for each components of a tern made by a row, column and block. Easily we can denote the above condition as C and the following expression will be sufficient to test if the nth tern is made by valid Sudoku vectors:

C(Row[n]) + C(Column[n]) + C(Block[n]) = 3 

Any other value will implies that the one of the vectors doesn’t have the first nine natural numbers and then the matrix is not a Sudoku. Taking this ideas to coding a MATLAB version can be code in the following way:

function R = IsSudoku(SU) 
f = @(x) sum(sum(power(2,min(10,max(0,x))))) / 1022; 
R = 1; 
for n = 1:9 
    r = sum([1 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9] == n); 
    c = sum([1 2 2 3 3 3 4 5 5 6 6 6 7 8 8 9 9 9] == n); 
    if sum([f(SU(n,:)) f(SU(:,n)) f(SU(3*r-2:3*r,3*c-2:3*c))]) ~= 3 
        R = 0; return; 
    end 
end 

You can say it uses just one for loop. The code is very simple, being the more difficult to read the parts that compute the nth block boundaries, a proceeding for obtained that formula is explained in the following steps:

Step 1

(1) Lets assign an index for each block and tracks the boundaries.

sudoku2image 

(2) Write the start positions for each block compromising the row and the column (first column in the image). Think in a formula to build a mapping between the block and the beginning positions, no IF statements are allowed =). We can create simple integer mappings on MATLAB by using a vector, the sum function and the == operator as in the following example that is a very limited mapping to the Fibonacci sequence:

@(x) sum ([1 2 2 3 3 3 4 4 4 4 5 5 5 5 5] == x)

The == operator applied to a vector and the x parameter will produce that contains 0 for the positions in witch the element is not the same as x and 1 in the opposites cases, the sum function makes the rest as you may infer. If we follow this approach we will end with a very large vector both for the rows and columns. The method for using an small vector is to introduce a helper function that reduces the value and his inverse will transform back to the correct start positions. In the second column of the image you see the result of this approach using the linear function x = 3k – 2 and his inverse k = (x + 2)/3. Now the vectors for both column and rows can be simplified for the ones in the MATLAB code, notice that before computing the sub matrix we must transform back the reduced values.

An implementation entirely functional (no state, no loops) with the F# language can be coded using the same ideas but changing the remaining loop for a sequence expression and a consolidation operator:

let isSudoku (m : FMatrix) =
    requires (m.Dimension = (9,9)) "Not a 9x9 matrix"
    let f (a : FMatrix) = (a.Items |> Seq.map (fun x -> Math.Pow(2.0,Math.Min(10,Math.Max(0,x)))) 
                               |> Seq.sum) / 1022.0
    let pos arr n = arr |> Array.filter ((=) n) |> Array.length
    let h n = 
        let r,c = pos [|1;2;3;4;4;5;5;6;6;7;7;7;8;8;8;9;9;9|] n, 
                  pos [|1;2;2;3;3;3;4;5;5;6;6;6;7;8;8;9;9;9|] n
        f(m.[n .. n, *]) + f(m.[*, n .. n]) + f(m.[3 * r - 2 .. 3 * r, 3 * c - 2 .. 3 * c])
    seq {1..9} |> Seq.map h |> Seq.forall ((=) 3.0)

They data type FMatrix can be seeing in this page, this type supports the extraction of sub matrix using the F# array slice notation which is similar to the one proposed by MATLAB. In this version of the routine in the sake of use zero loops we recall on sequence expressions ({1 .. 9}) as an input sequence for a map (Seq.map) operation that will be later processed by the forall operator to check if the conditions comply.

The performance of both codes are good, and thanks for the min(10,max(0,x)) there is no risk for evaluating a huge power of 2. In the MATLAB specific case Im not 100% sure there is no internally any loop working while retrieving a sub matrix, for F# we can check that even within the chained iterations there is no use of mutable state (and so no loops).

hope you like it

-horacio

My F# article on dotNetMania 73

dnm73

For the first time I decided to do some propaganda about an article of mine that was published in the Spanish magazine called dotNetMania. As far as I know this magazine is the oldest one that is on spanish dedicated only to .NET and I can ensure every month delivers a range of interesting articles topics in wide used .NET technologies. I have said first time because the editors already gave me the honor to participate twice in the past, this time however it means a bit more for me as it features an F# solution to bind together lessons from algebra ith the functional programming paradigm. The translated title of the work is “Functional representation of numerical matrixs”, as it suggests the articles introduces a functional abstraction of the concept of a numerical matrix, wich in turns mean that it is possible to represent a matrix whitout using not a single array, the alternative I propose is via functions…

If you find the subject of the article interesting you can check the the project behind the article on this codeplex site. Currently I use Codeplex as a frontend while the source is hosted on GitHub. The project also includes a customization of the F# Interactive to start playing with the new data type.

Also if you are capable of reading spanish I encourage the reader to visit the magazine site and subcribe to it.  One of the benefits of being a subscriber is that from time to time a hands-on-labs books written by experts are included that targets .NET

Just for finnish I left you a few examples about how you can build some easy matrix objects via a function:


    let empty m n = 
        requires (m = 0 || n = 0) MATRIX_EMPTYCREATION_BADDIMENSIONS
        new FMatrix(m,n, fun (i,j) -> failwith MATRIX_EMPTY )

    let identity m =
        requires (m >= 0) MATRIX_IDENTITYCREATION_BADDIMENSIONS
        if m = 0 then
           empty 0 0
        else
            new FMatrix(m,m, fun (i,j) -> if i = j then 1.0 else 0.0)

    let diagonal m v =
        requires (m > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        new FMatrix(m,m, fun (i,j) -> if i = j then v else 0.0)

    let zero m n =
        requires (m > 0 && n > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        new FMatrix(m,n, fun (i,j) -> 0.0)

    let scalar m n value =
        requires (m > 0 && n > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        new FMatrix(m,n, fun (i,j) -> value)

    let hilbert m n =
        requires (m > 0 && n > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        let newGen = (fun (i,j) -> 1.0 / (float(i) + float(j) - 1.0))
        new FMatrix(m,n,newGen)

    let upperTriangular m n v =
        requires (m > 0 && n > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        let newGen = (fun (i,j) -> if float(i) > float(j) then 0.0 else v)
        new FMatrix(m,n,newGen)

    let lowerTriangular m n v =
        requires (m > 0 && n > 0) MATRIX_CUSTOMCREATION_BADDIMENSIONS
        let newGen = (fun (i,j) -> if float(i) < float(j) then 0.0 else v)
        new FMatrix(m,n,newGen) 

hope you like it

-horacio