In ORM's Defence

10 Apr 2012

For some reason, as of late, I can't seem to attend any user group or conference without a speaker slating ORM's. Several speakers at the PHP UK Conference this year expressed their disapproval, as well as the speaker at this months PHP London talk. However, no one is giving me a strong enough argument to not use an ORM. Remarks such as "That's a whole other talk" or "Don't get me started on ORM's" seem to be thrown about. But whenever I get a chance to talk about any concerns or issues they're having the conversation just seems to deflate. Am I missing something really terrible about ORM's that's going to creep up and bite me?

This expanding general dislike within the community is concerning, as the only reason I can think of as to why its gathering pace, is that developers haven't had a chance to try out Doctrine 2 yet.

I've been using it commercially for about six months now and have found it so much better than the predecessors that gave ORM's a bad name. Things have changed a huge amount, and I think its time for the community to give ORM's another go. If you've not had a chance yet I'd advise staying away from it until you have a reasonable chunk of time to dedicate. It's a complete paradigm shift from previous implementations and at first glance can be a little daunting. Trying to understanding what your repositories are, why you can't edit your proxy classes or how to set up an event might confuse you a little, or even throw you off track, but if you stick with it, I promise you, the time invested hurdling over the learning curve is certainly worthwhile.

So in their defence, and with Doctrine now firmly established on my tool belt, I'm going to start banging the ORM drum. I feel they're a great addition to a project and in my opinion, a worthy abstraction. They carry a great suite of benefits to an application.

Once you've done your research, you'll start to see how the ORM should be used, and will be better protected against a flaky implementation. One of the most important things when adopting any 3rd party library is to protect yourself from "doing it wrong" by RTFM. I feel its exactly this problem that has led developers to be dismissive of ORM's and has begun the fuelling of general disapproval. Getting a prototype up and running may take no time at all. Once you've established your properties, relations and all your getters and setters are in place you can easily traverse through your objects like a deck of cards, but it doesn't end there. Developers seems to be skipping over the fine tuning part and then blaming the ORM for being slow. Things are often overlooked such as making sure your not eagerly loading relationships, not using DQL correctly for complex queries, or not applying the right hydrator for your use case.

I thought I'd share my response to some of the common reasons developers come up with to not use an ORM for their application. As well as expose my personal experiences with using Doctrine (version 2.2), and hopefully, convince you that they're not all bad, maybe just slightly misunderstood.

"Using ORM's means having a one to one relation between object and table"

In the general sense this is probably the most common use case, but in my experience, you're not forced to architect this way. When working on a project many moons ago I needed to build a schedulling tool. It was to be used to schedule various components such as articles, images or products to go live on a site at a specified time. These components had no relation to each other, needed to have their own space for applying business logic, and should all share the common ability of being schedulled. Using column aggregation mean't I could have three seperate classes (ScheduleProduct, ScheduleImage, ScheduleArticle) that fed data into a single scheduling table, aggregated by using columns. I could now apply any logic specific to scheduling an image, keep it seperate from my other models (articles / images), and persist into a single table. Inheritance mapping in Doctrine enables you to structure your objects in a completely different manner to your underlying table schema. You create your object graph, and then you can tell Doctrine how it should be persisted.

"ORM's produce sub-optimal SQL and far too many queries"

This is not true at all, Doctrine gives you an abstraction of SQL through DQL, which gives you the power to query exactly what fields you want, and connect up any relations you want to join. You pretty much get to produce the SQL yourself, albeit through an abstraction layer. And incorporated with this abstraction is a bunch of helpful functions / operators that work across database vendors.

I think the problem lies again in implementation. Take for example..

$users = $em->getRepository('Entities\User')->findAll();
foreach($users as $user)
{
    $profile = $user->getProfile();
}

If the relation between users and profile is set to lazy load then this will incur an additional query to retrieve each profile row. We know this can be done much better by simple applying a join on a DQL query.

$q = $em->createQuery('SELECT u,p FROM Entities\User u LEFT JOIN u.profile p');
$users = $q->getResult();

These are exactly the kinds of thing you need to think about when doing your data retrieval. The tools are there to help you tailor the results set to your need, but you can't expect Doctrine to know what you plan to use it for.

"Using ORM's means using active record"

This was certainly the case some years ago, but not anymore. One of Doctrine's major failings in the 1.x release was the huge weight your models would inherit by extending the Doctrine_Record class. This gave your models the ability to save themselves (being active). And if you so wished, could act like a service layer, pulling in / saving any data you wanted. With this came with a bunch of problems, the most frustrating for me being memory consumption. With a frequent appearence of a memory leak caused by circular referencing you can just forget about using it for data processing.

The Doctrine 2.x release focus’ strongly on being a data mapper. This means your model classes (entities) are plain old PHP objects that only deal with themselves. They have no external dependencies and are not coupled with anything in the ORM library.

"ORM is slower than just using SQL, Unlike other abstraction layers, which make up for their performance hit with faster development, ORM layers add almost nothing."

This is taken from a post by Laurie Voss on seldo.com named In defence of SQL which was later followed up with ORM is an antipattern. It's common knowledge that in a general sense abstractions will slow things down. Any additional layers added to your stack will take you further away from the metal, and will generally incur a speed cost to your application. Whenever implementing the abstraction you should always be able to justify the decrease in speed with a benefit for adding it. Common arguments for adding an ORM might be;

Getting your application released to market quicker.
Readability / Maintainability through code clarity and clean design.
Testability / Stability, keeping a clean domain model encapsulating your logic makes for easy testing.
The option to farm off intensive processing. A good abstraction may enable you to snip out the intensive (CPU / Memory / Disk IO) parts of your application and have them processed asynchronously on a seperate resource to your application.

All of these points are valid in the case of incorporating an ORM into your project, but there's one other that you might not have expected. Doctrine can actually improve the speed of your processing! You wouldn't have though it, but its true. Let me explain;

I'm sure you've often come across the situation where you need to write a large set of rows to the same table. A typical programmer might interate over a dataset and insert each row one at a time. A better programmer might prepare the data for a single insert. Now, take that idea and apply it across the entire runtime of your application. This is exactly what doctrine does using the unit of work pattern. When persisting your entities, or retrieving them from the database they are in a managed state. Changes can be applied several times throughout the runtime of you application. Once you've finished tinkering with your objects your manager can be flushed and optimal queries are written to perform the inserts / updates.

This is a great feature of doctrine which helps hugely reduce the round trips you'd typically be making to your database. A benchmark was done to compare 20 inserts using Doctrine's unit of work vs mysql_query. And Doctrine came up trumps with a completion time of 0.0094s, almost half of what it took to do 20 individual mysql inserts (0.0165s).

More information about Doctrine's unit of work can be found here.

"But just pulling out arrays are quicker"

One argument I've heard for not using an ORM is the dislike of it always retrieving a data object by default. This carries more weight that an associative array and is not necessary. Although this can be changed by using the array hydrator, I find in the majority of cases my application requires an object.

I think if your data is being punched straight into a view, with no manipulation then your right, an array would make more sense, but this is seldom the case. You'll often find your data requiring additional manipulation. If your going to be processing an order, or calculating a total cost then you'll need to apply business logic, which should be encapsulated into your domain model. No matter how hard you try you'll seriously struggle to encapsulate this logic into an array! Remember objects can provide a state well beyond what a key / value can, so should really be favoured.

"Incorrect abstraction - if you don't need relational data features you're using the wrong data store"

I completely agree with this, and Doctrine has the perfect solution. The project I'm working on at the moment has a few entities that have no relations, and will only be used for reporting. Consistency is not important, and we're not going to be providing any real-time statistic. However, we will need to ensure availability under heavy load. We've settled on using mongoDB for storing these entities and implementing this using Doctrine couldn't be simpler. Once I'd bootstrapped Doctrine mongoDB ODM all I had to do was remove the ORM annotation from my Entities, and replace them with annotations compatible with the ODM. Now these were happily persisted to mongoDB and ready to scale with load.

As the annotation mappings are pulled in as a namespace into your entity, so you could maintain both ORM and ODM definitions (if you wanted). This flexibility could be useful for using sqlite to quickly test your entities business logic.

Closing note

So please don't follow the general opinion on ORM's before giving them a try. Things have changed for the better and its time to give them another go. Remember as with any application not using an ORM you still need to optimise. And as I said it can take some time pick up but you'll find plenty of help and examples in the online documentation. And there is also a reasonably active IRC channel where alot of the core developers loiter.