24 September 2016

Connecting to Elixir web channels from the Angular 2 quickstart application

I am busy learning Elixir, a language that adds syntactic sugar to the awesomely scalable and concurrent Erlang language.  The "go to" framework in Elixir is Phoenix and I'm busy writing my "hello world" application which will serve up data across a web channel.

I followed the Typescript version of the Quickstart guide for Angular 2 (here).  I really like what I've seen of Typescript so far.  Dependencies are easy to manage and the ability to define interfaces is a good sign of how well structured the language is.  I think Satya Nadella should be made an open source hero, if such an award exists.

Anyway, what I wanted to do was get my Angular 2 application to be able to connect to the Elixir server channel and send a request to listen to a particular stream.  The idea is to use the Actor concurrency model (explained brilliantly in "The Little Elixir & OTP Book") to start up a new process whenever a request for a stream arrives.  This article focuses on setting up the Angular 2 connection.

The official Phoenix javascript client is packaged up and can be installed with "npm install --save phoenix".  Once it's installed we need to tell SystemJS where to find it so we amend systemjs.config.js and include it in our map array.

Now we'll be able to import the Phoenix class from it wherever we need to. We'll need it in the service that we're going to use to wrap the Phoenix channel support. Lets take a look…

We import the library using the map name that we set up in our systemjs config to make it available to our class.  We then copy the code that Phoenix shows on the channels manual page to actually handle the connection.

This gives us a channel service that we'll be able to inject into components.

Angular 2 uses constructor injection so we'll be passing a channel service into the constructor function. Before we do that though we need to let the component know that our channel service is a provider so that we can properly typecast the injected variable. Once all this is saved run your app with 'npm start' and if you pop over to your browser window you should see error messages in your console log saying that the channel connection was refused, unless of course you're already running your Phoenix server.

06 September 2016

Laravel refusing to start up

I'm very much a fan of the clean implementation of Laravel but really dislike the fact that if there is something wrong with the .env file it refuses to give any meaningful information.

Laravel uses the vlucas/phpdotenv package to manage its environment.

It's pretty well known that if you have a space on either side the = sign in a key value pair then the .env file is invalid, but I had checked for this (and checked again).

Laravel will try to use its standard logging methods before they have actually had a chance to be booted up with the result that you're left with a "reflection error" exception message on the CLI rather than the actual cause of the problem in the dotenv package.

Debugging this is not trivial and I resorted to using strace to try and determine exactly what was going on.  Don't do this at home kids!  The easier solution is at the end of the article.

I used the following command to generate a trace of the system calls being made by PHP while trying (and failing) to run artisan optimize.

 strace php artisan optimize &> /tmp/strace.txt  

That let me walk through the calls and eventually confirm that the first PHP exception was thrown in the package that deals with reading the environment file.

 access("/var/www/raffle.bhf.org.uk.new/vendor/vlucas/phpdotenv/src/Dotenv.php", F_OK) = 0  
 ... more lines loading up more of the package and showing us processing it ....  
 access("/var/www/raffle.bhf.org.uk.new/vendor/vlucas/phpdotenv/src/Exception/InvalidFileException.php", F_OK) = 0  

But sadly there was no indication of exactly what the problem is with the file!

I decided that creating a minimal project dedicated to debugging my .env file was going to be faster than anything else.

I created a temporary directory and ran "composer require vlucas/phpdotenv".  Then I placed my faulty .env file into the directory and ran the following PHP file:

 $dotenv = new Dotenv\Dotenv(__DIR__);  

This gave me the actual exception in DotEnv which was that "Dotenv values containing spaces must be surrounded by quotes".  So it wasn't a space around the = sign but rather a space in one of my values, which made my life a lot easier!  As an extra bonus the first line in the stack showed exactly which key was problematic.

21 August 2016

Solving a Docker in VirtualBox DNS issue

I've recently been playing with Docker on Windows in conjunction with Linux on Windows.  I'm really amazed at how cool the stuff coming out of Microsoft is under Satya Nadella.  

When I was using Docker Toolbox on Windows my Dockerfiles would build correctly but as soon as I tried to run them in a Virtualbox host they would fail with the error "Something wicked happened resolving 'archive.ubuntu.com:http' (-5 - No address associated with hostname)"

Of course this error wasn't distribution specific and none of the distros I tried were working.

The stack that I am using is Windows Home hosting Ubuntu on VirtualBox which is running Docker.  I'm using bash on Linux for Windows because it's easier to do stuff like ssh but it's not relevant to this setup.

I tried setting the DNS in the Docker setup by using RUN steps to update /etc/resolv.  This felt a bit hacky and didn't work anyway.

In the end the fix was to go to my Windows shell and run ipconfig to get its DNS servers.  I then went and edited /etc/default/docker in my VirtualBox guest server and set the Docker options to use this DNS.  After restarting the docker service it was able to resolve DNS properly.

# Use DOCKER_OPTS to modify the daemon startup options.

I had installed Docker with the package but I think if you install with binary you would set the DNS when you start up the daemon.

Ironically Docker was easier to use with the Windows toolkit but I'll be deploying to Linux machines so I wanted to get this working under Ubuntu.

12 July 2016

Limiting how often you run seeds and migrations in Laravel 5.2 testing

Image: Pixabay
If integration tests take too long to run then we stop running them because they're just an annoyance.  This is obviously not ideal and so getting my tests to run quickly is important.

I'm busy writing a test suite for Laravel and wanted more flexibility than the built-in options that Laravel offers for working with databases between tests.

Using the DataMigrations trait meant that my seed data would be lost after every test.  Migrating and seeding my entire database for every test in my suite is very time consuming.

Even when using an in-memory sqlite database doing a complete migration and seed was taking several seconds on my dev box.

If I used the DataTransactions trait then my migrations and seeds would never run and so my tests would fail because the fixture data is missing.

My solution was to use a static variable in the base TestCase class that Laravel supplies.  It has to be static so that it retains its values between tests by the way.  This variable is a boolean flag and tracks whether we need to run the migrations and seeds.  We initialise the class with it set on and so the migrations and seeds will run the first time that any test run.

Now the fact that I'm using Laravel DataTransactions should spare me from affecting the database between tests but if I wanted to be 100% certain I could set the flag and have my database refreshed when the next test runs.

This also means that if the DataTransactions trait is not sufficient (for example I'm using multiple database connections) then I can manually refresh when I want to.

06 July 2016

Implementing a very simple "all of the above" checkbox


Implementing a checkbox that lets the user "select all of the above" is trivial with jQuery.

The idea is that when the user selects the "all" checkbox then all of the other checkboxes are set to the same value. Similarly if they deselect any of the other options we need to turn off the "all" checkbox.

28 June 2016

Why am I so late on the Bitcoin train?

Image: Pixabay
I've been somewhat of a Bitcoin sceptic for quite some time.  When it first became a thing I was worried that governments would legislate it out of existence.

It has had a pretty bad rap of being associated with the dark web and it is definitely the choice of currency for malware authors.

In its normal usage Bitcoin is more transparent than cash.  If I give you a cash note there is no permanent record of the transaction and the tax man can't get a sniff into our business.

Governments hate transactions they can't tax or police and so in the beginning there was a concern that Bitcoin would be outlawed.

In contrast to cash, if I transfer you Bitcoin then there is a record of the transaction that anybody in the world can inspect.  It's possible to trace the coins in your Bitcoin wallet back through the various people who owned them.  Anybody in the world can watch the contents of your wallet and see where you spend your money.

This is exactly the sort of thing that governments love.

Of course not everybody wants to share their transactions with the world and so there are Bitcoin laundering services that attempt to anonymise the coins in your wallet.  This puts us back to square one with Bitcoin being very convenient for criminals to use in order to evade financial intelligence controls.

I suspect that the process of banning Bitcoin transactions would impinge too much on the freedom of citizens.  Some governments are talking about banning cryptography in order to maintain surveillance on their citizens so it's not a stretch to imagine them being displeased with Bitcoin laundering services.  *sigh*.

Anyway, back to my killer app for Bitcoin... I've recently emigrated and am still paying debt in South Africa.  Sending money back to South Africa costs about £30 and takes 2-4 days if I use the banking system.  If I use Bitcoin the process costs around £ 3 and I can have the money in my South African bank account on the very same day.

Bitcoin costs one tenth the price of using banks and is at least twice as fast.

My transaction is not at all anonymous and a government can trace the funds to me on either end of the transaction where copies of my passport are stored with the exchanges.  If I wanted to hide this from thieves, the government, spear-phishers, and other people who want to take my money without giving anything in return then I would use a Bitcoin laundry service.

10 June 2016

Restarting BOINC automatically

Image: https://boinc.berkeley.edu, fair use
BOINC is a program curated by the University of Berkeley that allows people around the world to contribute to science projects.  

It works by using spare cycles from your computer to perform calculations that help do things like folding proteins to find candidates for cancer treatment, mapping the milky way galaxy, searching for pulsar stars, and improving our understanding of climate change and its effects.

It runs as a background process and is easily configured to only run in certain conditions - like when you haven't used your computer for 10 minutes for example.

It comes with a nifty GUI manager and for most people using it on their desktop this post is not going to be at all relevant.  This post deals with the case where a person is running it on a server without the GUI manager.

Anyway, the easiest solution I found to restarting BOINC on a headless server was to use supervisord.  It's pretty much the "go to" tool for simple process management and adding the BOINC program was as easy as would be expected:

Here's the program definition from my /etc/supervisord.conf file:

 command=sh /root/boinc/startup.sh  

I use a script to restart BOINC because I want to make sure that I get reconnected to my account manager in case something goes wrong.

Here's what /root/boinc/startup.sh script looks like:

 /etc/init.d/boinc-client start  
 sleep 10  
 boinccmd --join_acct_mgr http://bam.boincstats.com <user> <pass>  

If BOINC crashes it will automatically get restarted and reconnected to my account manager.  This means I don't need to monitor that process on all the servers I install it on.

01 June 2016

Associating Vagrant 1.7.2 with an existing VM

My Vagrant 1.7.2 machine bugged out and when I tried to `vagrant up` it spawned a new box instead of bringing up my existing machine.

Naturally this was a problem because I had made some manual changes to the config that I hadn't had a chance to persist to my puppet config files yet.

To fix the problem I found used the command `VBoxManage list vms` in the directory where my Vagrantfile is.  This provided me a list of the machine images it could find.

I then went and edited the file at .vagrant/machines/default/virtualbox/id and replaced the UUID that was in there with the one that the VBoxManage command had output.

Now when I run 'vagrant up' it spins up the correct VM.  Happy days.

27 May 2016

Redirecting non-www urls to www and http to https in Nginx web server

Image: Pixabay
Although I'm currently playing with Elixir and its HTTP servers like Cowboy at the moment Nginx is still my go-to server for production PHP.

If you haven't already swapped your web-server from Apache then you really should consider installing Nginx on a test server and running some stress tests on it.  I wrote about stress testing in my book on scaling PHP.

Redirecting non-www traffic to www in nginx is best accomplished by using the "return" verb.  You could use a rewrite but the Nginx manual suggests that a return is better in the section on "Taxing Rewrites".

Server blocks are cheap in Nginx and I find it's simplest to have two redirects for the person who arrives on the non-secure non-canonical form of my link.  I wouldn't expect many people to reach this link because obviously every link that I create will be properly formatted so being redirected twice will only affect a small minority of people.

Anyway, here's the config:

11 May 2016

Logging as a debugging tool

Image: https://www.pexels.com
Logging is such an important part of my approach to debugging that I sometimes struggle to understand how programmers avoid including logging in their applications.

Having sufficiently detailed logs enables me to avoid having to make assumptions about variable values and program logic flow.

For example, when a customer wants to know why their credit card was charged twice I want to be able to answer with certainty that we processed the transaction only once and be able to produce the data that I sent to the payment provider.

I have three very simple rules for logging that I follow whenever I'm feeling like being nice to future me.  If I hate future me and want him to spend more time answering queries than is needed then I forget these rules:

  1. The first command in any function I write is a debug statement confirming entry into the function
  2. Any time that the script terminates with an error then the error condition is logged, along with the exception message and variable values if applicable.
  3. When I catch an exception then I log that as a debug message in the place that I catch it rather than letting it bubble up the stack.  If I'm the one throwing the exception then I log in the place where I throw it. 
These rules have grown on me from the experience of debugging code and having to deal with an assortment of customer queries that have been escalated to the development team.

By logging the entry into functions I can go back on my logs and see the path that a request took through the code.  Instead of wondering how a particular state of execution came about I have a good trail of functions that led me to that point.  

To me errors that fail silently are the worst possible errors.  I don't expect that the user needs to be alerted to every error or its details, but if I am unable to send a transactional email then I expect that there should be a log of that fact. That might sound self-evident but I've recently worked on a project where we send a mail and don't check if it was successful or not.  This would only occasionally happen and we only noticed something was amiss when a customer complained.  I was not able to determine when the problem arose, how often it happened, or to whom I should resend transactional mails along with an apology.

Logging an exception in the place I catch it has consistently proven to be helpful.  Having a log as close as possible to the source of the error condition helps to narrow down the stack that it occurred in.  This is especially valuable when I rethrow the exception with a user friendly message because I don't lose the technical details of the program state.

Because my logs can get very spammy in production I use the "Fingers Crossed" feature of Monolog.  I prefer this to the alternative of increasing the bar for logging to "info" and above because when an error occurs then I have a verbose track of my program state.  I've created a gist showing the setup in Laravel 5.1 and 5.2 but the approach will work anywhere that you use Monolog.

Another useful trick I've learned is to integrate my application errors into my log aggregating platform.  I use Loggly to aggregate my logs and push application error messages to it.  This lets me easily view my application errors in the context of my various server logs so spotting an nginx error or something in syslog that could contribute to the application problem is a lot easier.  The gist that I linked above shows my Loggly setup, but you can also read their documentation.

Useful and appropriate logging is an indispensable tool for debugging and if you're not working on developing your own logging style to support your approach to debugging then hop on it!

08 April 2016

Are tokens enough to prevent CSRF?

Image: Pixabay
CSRF attacks exploit the trust that a website has in a client like a web browser.  These attacks rely on the website trusting that a request from a client is actually the intention of the person using that client.

An attacker will try to trick the web browser into issuing a request to the server.  The server will assume that the request is valid because it trusts the client.

At its most simple a CSRF attack could involve making a malicious form on a webpage that causes the client to send a POST request to a url.

As an example, imagine that a user called Alice is logged into Facebook in one tab and is browsing the internet on another tab.  A filthy pirate Bob creates a malicious form in a webpage that submits a POST request to Facebook that sends a person to a link of Rick Astley dancing.  Alice arrives on the page we made and Javascript submits the form to Facebook.  Facebook trusts Alice's web browser and there is a valid session for her so it processes the request.  Before she knows it her Facebook status is a link to Rick Astley (who, by the way, will never give you up).

Of course Facebook is not vulnerable to this, and neither should your code be.

The best way to mitigate CSRF attacks is to generate a very random token which you store in Alice's session.  You then make sure that whenever your output a form on your site that you include this token in the form.  Alice will send the token whenever she submits the form and you can compare it to the one stored in her session to make sure that the request is originating from your site.

Bob has no way of knowing what the token in Alice's session is and so he can't trick her browser into submitting it to our site.  Our site will get a request from Alice's client but because it doesn't have the token we can reject it.

In other words the effect of the token is to stop relying on implicit trust for the client and rather set up a challenge response system whereby the client proves it is trustworthy.  If Bob wants to send a request that will be accepted he must find a way to read a token off a form that your site has rendered for Alice.  This is not a trivial task but can possibly be done - there are very creative ways (like this attack) to abuse requests.

Another way to prevent CSRF is to rely on multi-factor authentication.  We can group ways to authenticate into knowledge (where you know something like a password), possession (where you have something like a USB dongle), or inherent (where you are something).

Instead of just relying on one of these mechanisms we can use two (or more) in order to authenticate.  For example we can ask a person for a password and also require that they enter a code sent to the mobile phone which proves they have the mobile phone linked to their account.

CSRF will become much harder for Bob to accomplish if our form is protected with multi-factor authentication (MFA).  Of course this comes with a user experience cost so only critical forms need to be protected with MFA.  For less critical forms the single authentication method of a CSRF token will suffice.

There is debate around whether it is useful to check whether the referrer header matches your site is helpful in deterring CSRF.  It is true that it is trivial to spoof this header in a connection that you control.  However it is more difficult to get this level of control in a typical CSRF attack where browsers will rewrite the referrer header in an ajax call (see the specification).  By itself it is not sufficient to deter CSRF, but it can raise the difficulty level for attackers.

Cookies should obviously not be used to mitigate CSRF.  They are sent along with any request to the domain whether the user intended to make the request or not.

Setting a session timeout window can help a little bit as it will narrow the window that requests will be trusted by your application.  This will also improve your session security by making it harder for fixation attacks to be effective.

Tokens are the most convenient way to make CSRF harder to accomplish on your site.  When used in conjunction with referrer checks and a narrow session window you can make it significantly harder for an opponent to accomplish a successful attack.

For critically important forms multi-factor authentication are the way to go.  They interrupt the user experience and enforce explicit authentication.  This has a negative affect on your UX but makes it impossible (I think!) for an automated CSRF attack to be effective.

11 March 2016

Exploring Russian Doll Caching

This technique was developed in the Ruby community and is a great way to approach caching partial views. In Ruby rendering views is more expensive than PHP, but this technique is worth understanding as it could be applied to data models and not just views.

In the Ruby world Russian Doll caching is synonymous with key-based expiration caching.  I think it's useful to rather view the approach as being the blend of two ideas.  That's why I introduce key-based expiration separately.

Personally I think Russian Dolls are a bit of a counter-intuitive analogy.  Real life Russian Dolls each contain one additional doll, but the power of this technique rests on the fact that "dolls" can contain many other "dolls".  I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Cache Invalidation

I use Laravel which luckily allows the use of tagging cache entries as a way of grouping them.  I started the habit of tagging my cache keys with the name of the model, then whenever I update the model I invalidate the tag, which clears out all the related keys for that model.

In the absence of the ability to tag keys the next best approach to managing cache invalidation is to use key-based expiration.

The idea behind key-based expiration is to change your key name by adding a timestamp.  You store the timestamp separately and fetch it whenever you want to fetch the key.

If you change the value in the key then the timestamp changes.  This means the key name changes and so does the stored timestamp.  You'll always be able to get the most recent value, and Memcached or Redis will handle expiring the old key names.

The practical effect of this strategy is that you must change your model to update the stored timestamp whenever you change the cache.  You also have to retrieve the current timestamp whenever you want to get something out of the cache.

Nested view fragments, nested cache structure

Typically a page is rendered as a template which is filled out with a view.  Blocks are inserted into the view as partial views, and these can be nested.

The idea behind Russian Doll caching is to cache each nested part of the page.  We use cache keys that mimic the frontend nesting.

If a view fragment cache key is invalidated then all of the wrapping items keys are also invalidated.  We'll look at how to implement this in a moment.

The wrapping items are invalidated, so will need to be rendered, but the *other* nested fragments that have not changed still remain in the cache and can be reused.  This means that only part of the page will need to be rendered from scratch.

I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Implementing automatically busting containing layers

We can see that the magic of Russian Doll caching lies in the ability to bust the caches of the wrapping layers.  We'll use key-based expiration together with another refinement to implement this.

The actual implementation is non-trivial and you'll be needing to write your own helper class.  There are Github projects like Corollarium which implement Russian Doll caching for you.

In any case lets outline the requirements.

Lets have a two level cache, for simplicity, that looks like this:

Parent (version 1)
- Child (version 1)
- Child (version 1)
- Child (version 1)

I've created a basic two tier cache where every item is at version 1, freshly generated.  Expanding this to multiple tiers requires being able to let children nodes act as parents, but while I'm busy talking through this example lets constrain ourselves to just having one parent and multiple children.

Additional cache storage needs

First lets define our storage requirements.

We want keys to be automatically invalidated when they are updated and key-based expiration is the most convenient way to accomplish this.

This means that we'll have a value stored for each of them that holds the most recent value.  Currently all of these values are "version 1".

In addition to storing the current version of each key we will also need to store and maintain a list of dependencies for the key.  These are cache items which the key is built from.

We need to be certain that the dependencies have not changed since our current item was cached.  This means that our dependency list must store the version that each dependency was at when the current key was generated.

The parent node will need to store its list of dependencies and the version that they were when it was cached.  When we retrieve the parent key we need to check its list of dependencies and make sure that none of them have changed.

Putting it together

Now that we've stored all the information we need to manage our structure, lets see how it works.

Lets say that one of the children changes and is now version 2.  We update the key storing its most current value as part of the update to the value, using our key based expiration implementation.

On the next page render our class will try to pull the parent node from cache.  It first inspects the dependency list and it realises that one of the children is currently on version 2 and not the same version it was when the parent was cached.

We invalidate the parent cache object when we discover a dependency has changed.  This means we need to regenerate the parent.  We may want to implement a dogpile lock for this, if you're expecting concurrency on the page.

Only the child that has changed needs to be regenerated, and not the other two.  So the parent node can be rebuilt by generating one child node and reading the other two from cache.  This obviously results in a much less expensive operation.

03 February 2016

Working with classic ASP years after it died

I searched for "dead clown" but all the pictures were too
disturbing.  I suppose that's kind of like the experience
of trying to get classic ASP up and running with todays libraries
I'm having to work on a legacy site that runs on classic ASP.  The real challenge is trying to get the old code to run on my Ubuntu virtual machine.

There is a lot of old advice on the web and most of it was based on much older software versions, but I persevered and have finally managed to get classic ASP running on Apache 2.4 in Ubuntu.

The process will allow you to have a shot at getting your code running, but my best advice is to use a small Windows VM.  There's no guarantee that your code will actually compile and run using this solution, and the effort required is hardly worthwhile.

The Apache module you're looking for is Apache::ASP.  You will need to build it manually and be prepared to copy pieces of it to your perl include directories.  You will also need to manually edit one of the module files.

The best instructions I found for getting Apache::ASP installed were on the cspan site.  You'll find the source tarball on the cpan modules download page.

I'm assuming that you're able to install the pre-requisites and build the package by following those instructions.  I was able to use standard Ubuntu packages and didn't have to build everything from source:

 sudo apt-get install libapreq2-3 libapache2-request-perl  

Once you've built and installed Apache::ASP you need to edit your apache.conf file to make sure it's loaded:

 PerlModule Apache2::ASP  
  # All *.asp files are handled by Apache2::ASP  
  <Files ~ (\.asp$)>  
   SetHandler perl-script  
   PerlHandler Apache::ASP  

If you try to start Apache at this point you will get an error something like Can't locate Apache2/ASP.pm in @INC (you may need to install the Apache2::ASP module)

Unfortunately the automated installs don't place the modules correctly.  I'm not a perl developer and didn't find an easy standard way to add an external path to the include path, so I just copied the modules into my existing perl include path.  You'll find the requested files in the directory where you build Apache2::ASP

The next problem that I encountered was that Apache 2.4 has a different function name to retrieve the ip of the connecting request.  You'll spot an error in your log like this : Can't locate object method "remote_ip" via package "Apache2::Connection".

The bug fix is pretty simple and is documented at cpan.  You'll need to change line 85 of StateManager.pm.  You'll find the file in the directory where you copied the modules into the perl include directory, and its location is in your error log.

 # See https://rt.cpan.org/Public/Bug/Display.html?id=107118  
 Change line 85:  
     $self->{remote_ip}     = $r->connection()->remote_ip();  
     if (defined $r->useragent_ip()) {  
         $self->{remote_ip} = $r->useragent_ip();  
     } else {  
         $self->{remote_ip} = $r->connection->remote_ip();  

Finally after all that my code doesn't run because of compile issues - but known good test code does work.

This is in no way satisfactory for production purposes, but does help in getting a development environment up and running.

25 January 2016

Laravel - Using route parameters in middleware

I'm busy writing an application which is heavily dependent on personalized URLs.  Each visitor to the site will have a PURL which I need to communicate to the frontend so that my analytics tags can be associated with the user.

Before I go any further I should note that I'm using Piwik as my analytics package, and it respects "Do Not Track" requests.  We're not using this to track people, but we are tying it to our clients existing database of their user interests.

I want the process of identifying the user to be as magical as possible so that my controllers can stay nice and skinny.  Nobody likes a fat controller right?

I decided to use middleware to trap all my web requests to assign a "responder" to the request.  Then I'll use a view composer to make sure that all of the output views have this information readily available.

The only snag in this plan was that the Laravel documentation was a little sketchy on how to get the value of the request parameter in middleware.  It turns out that the syntax I was looking for was $request->route()->parameters()which neatly returns the route parameters in my middleware.

The result is that every web request to my application is associated with a visitor in my database and this unique id is sent magically to my frontend analytics.

So, here are enough of the working pieces to explain what my approach was:

19 January 2016

Using OpenSSH to setup an SFTP server on Ubuntu 14.04

I'm busy migrating an existing server to the cloud and need to replicate the SFTP setup.  They're using a password to authenticate a user and then uploading data files for a web service to consume.

YMMV - My use case is pretty specific to this legacy application so you'll need to give consideration to the directories you use.

It took a surprising amount of reading to find a consistent set of instructions so I thought I should document the setup from start to finish.

Firstly, I set up the group and user that I will be needing:

 groupadd sftponly  
 useradd -G sftponly username  
 passwd username  

Then I made a backup copy of and then edited /etc/ssh/sshd_config

Right at the end of the file add the following:

  Match group sftponly   
    ChrootDirectory /usr/share/nginx/html/website_directory/chroot   
    X11Forwarding no   
    AllowTcpForwarding no   
    ForceCommand internal-sftp -d /uploads   

For some reason if this block appears before the UsePAM setting then your sshd_config is borked and you won't be able to connect to port 22.

We force the user into the /uploads directory by default when they login using the ForceCommand setting.

Now change the Subsystem setting.  I've left the original as a comment in here.  The parameter "-u 0002" sets the default umask for the user.

 #Subsystem sftp /usr/lib/openssh/sftp-server  
 Subsystem sftp internal-sftp -u 0002  

I elected to place the base chroot folder inside the website directory for a few reasons.  Firstly, this is the only website or service running on this VM so it doesn't need to play nicely with other use cases.  Secondly I want the next sysadmin who is trying to work out how this all works to be able to immediately spot what is happening when she looks in the directory.

Then because my use case demanded it I enabled password logins for the sftp user by finding and changing the line in /etc/ssh/sshd_config like this:

 # Change to no to disable tunnelled clear text passwords  
 PasswordAuthentication yes  

The base chroot directory must be owned by root and not be writeable by any other groups.

cd /usr/share/nginx/html/website_directory
mkdir chroot
chown root:root chroot/  
chmod 755 chroot/  

If you skip this step then your connection will be dropped with a "broken pipe" message as soon as you connect.  Looking in your /var/log/auth.log file will reveal errors like this: fatal: bad ownership or modes for chroot directory

The next step is to make a directory that the user has write privileges to.  The base chroot folder is not writeable by your sftp user, so make an uploads directory and give them "writes" (ha!) to it:

 mkdir uploads  
 chown username:username uploads  
 chmod 755 uploads  

If you skip that step then when you connect you won't have any write privileges.  This is why we had to create a chroot base directory and then place the uploads folder off it.  I chose to stick the base in the web directory to make it obvious to spot, but obviously in more general cases you would place this in more sensible locations.

Finally I link the uploads directory in the chroot jail to the uploads directory where the web service expects to find files.

 cd /usr/share/nginx/html/website_directory  
 ln -s chroot/uploads uploads  

I feel a bit uneasy about a password login being used to write files to a directory being used by a webservice, but in my particular use case my firewall whitelists our office IP address on port 22.  So nobody outside of our office can connect.  I'm also using fail2ban just in case somebody manages to get access to our VPN.