PHP UK Conference

06 Mar 2012

In late february I attended the annual PHP UK Conference hosted by the chaps that run the PHP London user group and sponsored by Automattic. I had a great time at the two day event and got to catch up with several colleagues from the past. There were some great talks lined up and I took a fair bit out the event. I was so keen I'd scheduled the exact talks I was attending weeks before the event (sad huh!). Which I kind of regret as I heard the unconference track has some great speakers. I really should have given it more of my time. The main hall was littered with the typical stands you'd expect there. O'Reilly we're selling books, zend were selling studio / server (where I picked up a cool php hero pin badge) and iBuildings, Engine Yard, Wordpress and many other were happy to chat. Not surprisingly there were a fair amount of recruiters going round taking names too. So as with the jQuery conference I thought I'd write up on the talks I enjoyed the most and share what I took from the event with you. There were a few other talks that I really wanted to write up on but have struggled to find the time, I may add my thoughts on the following great sessions at a later date:

Data abstraction in large web applications - Brandon Savage
Distribute the workload - Helgi Þorbjörnsson
Challenges at scale - Hugh Williams
MongoDB with PHP - Derick Rethans

Rasmus Lerdorf - A Look at PHP in 2012

Slides:http://talks.php.net/show/phpuk2012
Twitter: @rasmus

Rasmus started off with some background on web technologies when it all first started getting cool back in 1993. You'd see webpages embedded with nasty perl cgi scripts that took an age to run just so you could have a hit counter. Server side includes were about in '94 but wasn't comprehensive enough to solve enough of the web problems. So in '95 PHP was launched. Yay! Albeit with some obvious mistakes. When reading up on using SGML processing instructions Rasmus missed the bit about setting a target name and PHP was initially opened with "<?". Which later caused a rattle with XML as they conflicted. Also PHP was closed with a single greater-than character. But other than that PHP back then looks very similar to a valid syntax you could use today. And according to Rasmus he'd (jokingly I'm sure) prefer it that way. People generally put PHP's success down to the fact its a tool you can have up and running with ease, and it was available at the right time. But, there was a focus on the web ecosystem in PHP's early days, and LAMP was far from an accident. A solution was needed for shared hosting ISP's that could use a technology stack that would keep their clients from interfering with other people's set-ups. The reason mod_perl wasn't widely adopted by ISP's is because it allowed you to mess deeply in apache's configuration. Potentially jepordising someone else's hosting setup who shares the same box. So onto 2012, first off some bad stuff. Denial of service by exploiting hash collisions. So hash tables are a data structure that PHP (and many other languages) use to map an identifying value (key) to an associated data value via a hash function. PHP uses djbx33a created by Daniel Julius Bernstein (http://en.wikipedia.org/wiki/Daniel_J._Bernstein) for this. So there will often be instances where the hashkey returned might collide with an existing one, in this instance PHP resolves this internally and performs an additional step to find its associated data. But, by providing the right keys to be hashed (via POST) you could force this collision to occur. Rasmus demonstrated this exploit with the following code:

function djbx33a($key, $len) {
$hash = 5381;
for($i=0; $i < $len; $i++) {
$hash = $hash * 33 + chr($key[$i]);
}
return $hash . "<br>\n";
}

echo djbx33a('1024',4);
echo djbx33a('2048',4);
echo djbx33a('3072',4);

// Outputs
// 6381440901
// 6381440901
// 6381440901

So to help prevent this an additional setting was shipped in 5.3.10 called "max_input_vars" which is set to 1000 by default. But the root problem is still there, and things like randomized hashing would cause complications with PHP's compatibility with other components such as APC cache. So this did kind of feel like a plee from Rasmus to the PHP community to jump on board and get their thinking caps on to solve it.
Anyway, after that he moved onto the good stuff, shiny new features in PHP 5.4. Starting with broad performance improvements such as better memory handling, FastCGI request handling and improved startup/shutdown. He then listed a bunch of features that will be getting removed. Good to see register_globals on that list along with magic_quotes. Although he did raise the concern that anyone dependent on magic quotes (but didn't realise) to save their arses from SQL injection may find themselves open to attack from an upgrade. But a point well made is that we have to move on.

Traits! So I could probably talk forever on this topic as there's been a real mix of feeling on this within the community. Some seeing it as a way to badly implement multiple inheritence. But Rasmus kept it simple. Think of it as compiler assisted copy and paste, and thats it! Its a snippet of code that you want several of your classes to consume. over. Personally when I initially read about this feature I wasn't much of a fan myself, until about 3 months later when I can across a design issue. I wanted to reuse a common piece of functionality but using composition for it just didn't make sense, and inheritence was out of the question. A trait would be a better solution. Rasmus listed off a few other features such as short array syntax, so you can leave out the parentheses when building an array and just use square brackets (just how you would in javascript). And Function Array Dereferencing which won't just be a fad ( <- heh, see what I did there). Meaning you can reference an array offset directly off the back of a method call, like, $obj->gimmieArray()[0] would give you the first offset result. We'll see a web server shipped in 5.4 (not to be used in production) along with some improvements in some json methods.

As a whole it was a great talk from the general, who finished up by pitching to the delegates to contribute. He showed us just how easy it was to pull out a random bug report, or see the untested code area's in a test coverage report. With around only 15-20 frequent core contributors, PHP really needs to get as many people on board as possible. PHP needs you!

Ian Barber -Teaching your machine to find fraudsters

Slides: speakerdeck.com
Twitter: @ianbarber
Website: PHP/ir

Ian is the development manager at Virgin Management in London. I absolutely love hearing this guy speak, I think i'd go as far as to say he's the best technical speaker i've had the pleasure to listen to. I've been hugely inspired by his previous talks such as the ZeroMq talk, and he did great keynote at PHPNW last year. He's clear, informative, engages with the audience well and the man knows his shit. You can pick some great bits of wisdom up from his blog PHP/ir. Anyway, the talk. Ian's talk surrounded the PHP Support Vector Machine extension and how it can be used to predict....wait for it...the future! Yep, Marty Mcfly and Mystic Meg step aside, Ian's got this covered. He spoke about how you can apply a support vector machine (which is a machine learning method) to solve problems like transaction fraud or spam prevention. The idea is to train your machine by sending it a whole bunch of test cases. Each case can have several pieces of data representing it which are then analyzed to create an acceptable classification. So a good example when preventing fraud might be if the buyer had ever bought anything before, or if they have a positive user rating score. All these pieces of data are bundled together and an outcome is scored as positive or false based on patterns determined from previous results. In the instance of false positives or vice versa these are passed back into the machine to train it further. I hope that makes sense, either way id advise you check out Ians blog for a better (more indepth) explanation of how the SVM extension works. As always from Ian, he never fails to impress. You can always expect to learn something new from this guy.

Davey Shafik - PHP 5.4 The new bits

Slides: speakerdeck.com
Twitter: @dshafik
Website: daveyshafik.com/

I was really keen to catch this talk as I've read a fair bit about 5.4 and some of the really complicated quirks that are coming with it. Davey done a general run through the small changes such as new / removed php.ini settings. Then he went onto talk about closures and the changes to them within 5.4. Closures can now reference the current object they're constructed in ($this). I can see how this would be useful as it stops you having to pass it in using the 'use' keyword. But, it means the closure is bound to scope. However this isn't a problem, 5.4 ships with tools that allow you break scope and rebind. Confusing ey! Davey mentioned, (and I completely agree) its very difficult to see a use case for this, and one thing I don't like about this feature (and I know its allowed using php's reflection API) is it allows you to easily access private methods from another context. Check out Davey's blog for some examples of this, and my thoughts about it in the comments section. So the general message about this feature is that you can do some really nasty things with it which you may find useful when you understand whats happening, but can be a sure way to get your colleagues to bitchslap you when they have to pick up your code.

Traits were the next big topic and are nowhere near as complex as the previous feature. They were quite well summarized by Rasmus in his talk. Compiler assisted copy and paste. They're not classes so dont try to treat or detect them as such. PHP has shipped a bunch of tools you can use to detect a trait, or a method that originates from a trait. Davey finished off by demoing the new server shipped with 5.4. It all seems just too easy! You go to your public root via the command line and fire up the service stating which port you want to use, and your serving. I can't believe just how easy it is. Davey went through some tips on how you can cater for the loading of files in the public root if they exist, and pushing everything else through your bootstrap (index.php). This is typically done with apache's htaccess file. So a great talk from Davey and a fantastic insight into the new features. Glad I sat in on this one. If there's anything negative to say about this talk it would be that I didn't win a copy of his book (sad face).

Harrie Verveer - Recognising smelly code

Website: http://www.harrieverveer.nl/
Slides: harrie.friends.ibuildings.com/
Twitter: @harrieverveer

Harrie went through a number of tells you can use to detect when your code is becoming smelly. Some great tips here, although personally I didn't learn anything new. I think they're a great set of guidelines to work to, but don't get caught up in them. Remember quality is relative to time and cost, creating too many abstractions or not releasing code because you think a method needs to be broken up could end up costing you (your employer) more.
Levels of nesting - When you start seeing large amounts of indentations, this typically means you have far too many conditions before a certain snippet of code is executed. Its advised you break routines down into small tasks that describe what they do. An example would be if an alert has been triggered and the user has an email address, then send an email. Test / perform these seperately.
Long methods - If you start seeing your method creep over 20 lines, then your doing something wrong. Its time to break it down either into seperate methods, or see if you can abstract these routines into a reusable class.
Inline comments - Not sure I entirely agree with this one, but Harrie thinks its often the case than inline comments means your describing the next action, which could typically be broken into a seperate routine.
Uncommunicative name - using method names such as "run(), go() or execute()" dont describe whats actually happening. Also try not to be too descriptive, as you may find yourself with 30 character+ method names.
Too many parameters - I hate seeing this myself. One i've started going over 3 input parameters I begin to question if I'm doing something wrong. The more input parameters, the more dependencies that routine has. Could there be parts of the routine that dont depend on any of the input parameters. Could you use a configuration object? Are there any parts of the routine your likely to resuse? Think.
The god object - This is one I can relate to. It does make my blood boil (maybe I take my work too seriously) when seeing code structured like this. When a single object constructs everything, holds all your data or performs a whole bunch of unrelated methods we call it the god object. Its something your going to become dependent on throughout the rest of your application. If you write code this way, seriously, don't talk to me.
New() - Constructing objects within another class' construct means that it will becomes both a dependency, and unchangable. These objecta are now coupled to the class that instantiated them. Harrie demostrated how dependency injection containors can solve this coupling issue.
Inappropriate intimacy and indecent exposure - Ok you can stop chuckling now. I believe Harrie was trying to make the point that objects should only really know and deal with themselves. Object A should never depend on a property / method existing in Object B, or even know about it for that matter.
Primitive obsession - Don't use an unsuitable data type, when its clearly time to create a class. Often people throw data into multi-dimensional arrays, and then have to depend on the existence of an offset being there. If this was to become an object, then you can write simple routines to interact with the data. Do so.
Duplicated code - needs no explanation.
Type embedded in the name - Don't create a method called getArticlesArray(), as one day, you might want it to be an object (see primitive obsession above). Never describe the dataset being returned in a method name. Same goes for naming properties.
Speculative generality - Code for todays requirements, dont try to cater for every single edge case. Its unlikely you'll ever need to.
There is an article by Jeff Atwood (which I believe to be the source of Harrie's talk, the pictures of cheese are just too much of a coincidence) that you can find here. It outlines alot of the code smells that Harrie mentioned and a few others. Overall it was a great talk, and certainly something useful for the rookies.

Johannes Schlüter - PHP Under the hood

Website: schlueters.de
Slides: schlueters.de
Twitter: @phperror

This was a really interesting talk exposing the workings underneath PHP at the C level. Johannes went through some of the typical micro-optimisation arguments you always hear within the community. Such as; what's quicker, testing if a module is loaded, or if one of the modules functions exists. There are tens, maybe even hundreds of these debated amongst the community, and the general message I got from Johannes was that often these tests do different things, so you can't always conclude which is faster. Your efforts are best focused on exactly what it is your testing for. I have to say I completely agree, people spend far too much time on micro optimisations, "prints quicker than echo", "isset is quicker than in_array" although these things have a place and should / could be addressed at some point there really is no point throwing all your toys out of the cot over them. Johannes then went onto cover the hash collision issue that Rasmus mention in his talk. He covered a bit more detail on the problem before urging people to upgrade anything earlier than PHP 5.3.10.