Skip to main content

Grokking PHP monolog context into Elastic


An indexed and searchable centralized log is one of those tools that once you've had it you'll wonder how you managed without it. 
 
I've experienced a couple of advantages to using a central log - debugging, monitoring performance, and catching unknown problems.

Debugging

Debugging becomes easier because instead of poking around grepping text logs on servers you're able to use a GUI to contrast and compare values between different time ranges.

A ticket will often include sparse information about the problem and observed error, but if you know more or less when a problem occurred then you can check the logs of all your systems at that time.

Problem behaviour in your application can occur as a result of the services you depend on.  A database fault will produce errors in your application, for example.

If you log your database errors and your application errors in the same central platform then it's much more convenient to compare behaviour between the systems.


I've often found that knowing the application context makes it a lot easier to debug a problem.  I can also more easily reason about the path my code would have taken.  If I'm feeling energetic I can make an attempt to reproduce it and play it back in slow motion with my debugger.

Monitoring performance

Indexing your logs lets you use a query language to search them.  All of the popular logging platforms include a DSL that lets you perform aggregations and other statistical functions on values contained in your logs.

This makes it a lot easier for your team to ask and answer questions about historical behaviour.

You can very easily ask "find me a list of all endpoints that take longer than 0.5 seconds to respond", or "find me all the requests that took longer than two standard deviations above mean to respond to".

Central logging doesn't enable you to answer questions, you could already do that manually.  Central logging makes it easy to answer questions, which encourages you to ask more of them.

Being able to query patterns makes it easier to identify client use patterns that are harmful to your server.  This lets you mitigate them on either the server or the client.

You could notice, for example, that a client is not respecting your cache control headers and is performing a repeat call to the API when it should be relying on a local cache.  This lets you engage the client developers with alternatives and SDK's to help them take advantage of your caching strategy.

Managing verbosity and unknowns

Anomaly detection is a powerful addition to central logging.  Logging is great for analyzing historical performance, but is too verbose to be useful for monitoring in any level of detail. 

Machine learning algorithms can analyze your logs and find outliers in message frequency or values.  They can do this quite cheaply in real time after an initial learning process.  Your central logs can become an early warning system for problems that your customers haven't reported yet.

I'm a fan of Unomaly, but there are lots of other options out there, Splunk and Elastic both include machine learning options for example.

A PHP implementation

So now that you're convinced that central logging will improve your debugging, allow you to identify performance bottlenecks, and make auditing more convenient what is the next step in your PHP project?

Monolog is the de facto standard PHP logging library. Part of its fancy features include the ability to log a contextual array of information, which is provided as a second parameter, like this:

$log->info("Example of something being logged", $visitDetails);

This results in the array being output as a JSON string at the end of your log message and can be very useful indeed.

Some examples of the useful information you can put in the context could be a request id that lets you group log entries from any particular request.  You'd assign a request a unique id when you first start processing it, and then consistently include this id in your logs.

Monolog lets you push logs straight into Elastic, but if we push our logs to a local logstash instance then we save ourselves some network I/O and allow for some buffering in the event that Elastic is down.

Logstash also lets you grok your logs and massage them into a form that Elastic can understand.

Actually writing the grok pattern is a lot like writing a regex pattern, but luckily there are some tools that will greatly speed this up for you.

Take a look at:
  1. https://grokdebug.herokuapp.com/
  2. https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
  3. https://github.com/kkos/oniguruma/blob/master/doc/RE
Oniguruma lets you use regex in the Grok pattern, like this: (?<field_name>the pattern here)

For my out of the box installation of Monolog the logstash configuration looked like this:

The aim here is to use grok to tokenize your log line into fields. You can then use the json filter to further expand any json fields that you've grokked.

In my example the field "context" is grokked from the context that can be added as "extra data". See the Monolog manual for details.


I was able to get Monolog to push logs to Logstash by using a SockerHandler, like this:

When you mix this up with AWS managed Elasticsearch (and their plugin) you can very quickly get up and running with Kibana.

Example repository is at https://github.com/andybeak/monolog-elk-demo


Comments

Popular posts from this blog

Separating business logic from persistence layer in Laravel

There are several reasons to separate business logic from your persistence layer.  Perhaps the biggest advantage is that the parts of your application which are unique are not coupled to how data are persisted.  This makes the code easier to port and maintain. I'm going to use Doctrine to replace the Eloquent ORM in Laravel.  A thorough comparison of the patterns is available  here . By using Doctrine I am also hoping to mitigate the risk of a major version upgrade on the underlying framework.  It can be expected for the ORM to change between major versions of a framework and upgrading to a new release can be quite costly. Another advantage to this approach is to limit the access that objects have to the database.  Unless a developer is aware of the business rules in place on an Eloquent model there is a chance they will mistakenly ignore them by calling the ActiveRecord save method directly. I'm not implementing the repository pattern in all its ...

"Word of the Day" PHP script (with word list)

I was looking around for a way to generate a word of the day on the web and didn't find anything. So I coded a quick and dirty script to do it. Just in case anybody does a Google search and manages to find my blog: here is my Word of the Day PHP script : Copy this code snippet into a wordoftheday.php file: $file = fopen("interesting_words.txt","r"); $raw_string = fread($file,filesize("interesting_words.txt")); fclose($file); $words_array = explode("|",$raw_string); echo $words_array[array_rand($words_array)]; Of course the real issue I had was finding a list of interesting words in the right format. Here is the list of interesting words that I used: Copy this into a file called interesting_words.txt : ubiquitous : being or seeming to be everywhere at the same time; omnipresent| ecdysiast : a striptease artist| eleemosynary : of, relating to, or dependent on charity| gregious : c...

Using Azure Active directory as an OAuth2 provider for Django

Azure Active Directory is a great product and is invaluable in the enterprise space. In this article we'll be setting it up to provide tokens for the OAuth2 client credentials grant. This authorization flow is useful when you want to authorize server-to-server communication that might not be on behalf of a user. This diagram, by Microsoft, shows the client credentials grant flow. From Microsoft documentation  The flow goes like this: The client sends a request to Azure AD for a token Azure AD verifies the attached authentication information and issues an access token The client calls the API with the access token. The API server is able to verify the validity of the token and therefore the identity of the client. The API responds to the client Setting up Azure AD as an OAuth2 identity provider The first step is to create applications in your AD for both your API server and the client. You can find step-by-step instructions on how to register the applications o...