13 April 2017

Is PHP a good fit for an API server?

Image: Pixabay
Calling PHP a double-claw hammer is a bit of an in-joke in the PHP community.  A lot of people bemoan PHP as a language – it's fashionable to do so and it seems to be a way to look clever.  The joke came about from a blog post where somebody pointed out all of the problems with PHP (here's a rebuttal - https://blog.codinghorror.com/the-php-singularity/ )

Anyway, PHP is a warty language that sucks in academic circles but it doesn't matter because it's really good at web stuff, there are lots of people who know it (so it's cheap to hire), there are lots of libraries and frameworks (so it's cheap and fast to develop in).  The commercial world is willing to overlook the academic warts.

I'm busy helping to improve the performance of an API server.  As part of my effort I'm profiling calls to the endpoints.  I'm using Blackfire to generate call graphs and also logging the sql queries that the ORM is producing so that I can check the indexes and joins.

Here's a callgraph for a call to the endpoint where we are looking to run a paginated SQL query.  We're not applying any business logic or having any side-effects - all we're trying to do is query the database and return a JSON string to the frontend.

Blackfire call graph
That's a pretty substantial call graph for what sounds like a simple task right?  All I want to do is route the request to a controller, query the database, and send the results back.

Blackfire tells me that 172 different MySQL queries are being run.  The PHP code responsible is using the ORM to build up the joins and so on.  I suspect that the problem is that there is pagination being applied and the ORM is not able to optimize the queries it needs to do in order to paginate efficiently.

Okay, so what questions do I have?

Why are we not querying the database more directly?  I appreciate that developer productivity is a good reason to use ORM but is it a good reason in this case?  172 queries is an awful lot, especially when a lot of them are related to querying the schema so our ORM can run.

Why on earth does PHP have to spend so much time in disk I/O reading all of those source files when really what we need is request routing, a database query, and a response handler?  

Blackfire reports that 304kb of data was transmitted across the wire for this.  That seems like a lot of data for the five or six records that I'm returning to the frontend.

The call graph is frustrating – I'm lumbered with a whole lot of black box code and I have no control over the SQL that is being run.  How can I improve the performance of this transaction?

So is PHP the best tool for this job?

I have previously had intractable problems with PHP when it comes to memory management.  It's pretty complicated and it differs depending on the way that PHP is run but I do not have 100% confidence in PHP's garbage collection.  

Circular object references (which I encountered while using an ORM where a model referenced itself as a parent to form a hierarchy) cannot be completely collected by PHP.  PHP actually relies on the container the machine runs in to collect this memory.  

PHP is not built for being a long-running program.  It was never designed for this and it should never be used for this.  It was built to handle a request for a page and then terminate. 

The application is bootstrapped for every request.  How much overhead does this add?  Well there's a question that Blackfire is raising for me.  Take a look at the timeline for the transaction from before:
Blackfire call timeline
The timeline shows when a PHP function was called in relation to the time taken to generate the response. 

My controller function starts at around 750ms into the transaction.  The actual time is irrelevant as a benchmark, but the fact that the first time *my* code runs is half-way into the transaction is what is relevant.   

Until halfway into my application I've been waiting for PHP to bootstrap my application.  You could argue that this is because of the PHP framework I'm using, but actually it is the limitation of PHP not being able to maintain state that requires us to continuously bootstrap the application.

Bootstrapping our application might involve disk I/O (depending on OpCache).  It definitely involves network I/O because we have to connect to MySQL and wait for it to authenticate us.  I know that there are ways to improve this, like by not using a framework and by tuning OpCache to improve compile time.

I'm concerned about what will happen when the application has 50,000 concurrent users.  How much of a strain will it place on my database server to be constantly connecting (and authenticating)?

I think PHP is brilliant at web pages and not so good at being a long-running application that is capable of reusing resources.  I'm a huge PHP fan but as an architect I do not want it to be my only tool.  

I'm busy learning Elixir and the Phoenix framework (again with the frameworks!) response in microseconds (not milliseconds).  I don't think we should using PHP like the hammer we use for everything.