Hi folks!  I'm fucking terrible at keeping up with personal projects.  I have a few dozen draft articles I've started putting together over the last two years that I've simply never had time to get to working on.    (Too many projects and tasks on the go;  never enough time for them all!)

Yesterday;  for a work-related task I was asked to come up with a solution to log a PHP application response header WITHOUT sending that header back to the requestor.  

We're doing this at work to collect some metrics created by the application but don't want to share these metrics: they are mostly a slew of timing + counts,  but some of them DO contain some flags indicating whether or not the request belongs to an A/B group or specific user-group:  indicating such to the requesting user would slew the results of the test.

I head a team that does the systems architecture for a fairly involved;  complex; internally written PHP-based application that serves a few million users around the world.  This has a few complexities that many people simply gloss over when talking about "applications"  and "systems"

Part of the operations side of this involves collecting and analyzing Access Logs generated by the application.   (we collect the load balancer logs along with the logs from our CDN;  but some of the application specific information needs to be collected from the actual nodes themselves)

We currently implement a somewhat complex architecture at present;  we operate in 11 different countries around the world and have recently completed a migration from a variety of providers to AWS.

PHP on AWS in Docker

The short summary of this is "Cloudflare to AWS Elastic Load Balancer to docker containers on EC2-backed ECS, running NGINX and FPM"

It's a complex topic handing NGINX and FPM communications;  we cannot actually "centralize" our NGINX to a single (or small pool of nodes) as we require fairly tight coupling between it and FPM;  and this leads to bandwidth constraints and unnecessary additional latency (which I'll revisit later!)

TLDR;  we have a PHP application;  here's a "silly" example that returns a single header,  and nothing else on the page:

header("test: " . date("Y-m-d H:i:s"));

This simple "application" will simply return a single header;  (alongside the other application headers and whatnot)

We log our access logs in "json" format;  which essentially looks like this: (I've omitted MANY fields,  they are not important for this topic)

log_format json_combined escape=json

and by making use of the special variable prefix upstream_http_ we can capture our test header in our NGINX logs:

log_format json_combined escape=json
    '"status": "$status",'

This get's us an access log containing the expected value:

app              | {"time_local":"27/Feb/2020:06:06:11 +0000","request":"GET / HTTP/1.1","status": "200","request_time":"0.002","test":"2020-02-27 06:06:11"}

...But this value is also sent back to the client who requested the page.

To get around this;  you can simply hide the value using fastcgi_hide_header which allows you to simply remove the value from the return object;  while still leaving the upstream_http_ object intact and available for logging:

This is going to be specific based on your application;  but either at the server level;  or at the specific location block:

location / {
  try_files $uri /index.php$is_args$args;

upstream php-fpm-sock {
  server unix:/var/run/php-fpm.sock;

location ~ \.php$ {
  fastcgi_hide_header test;
  fastcgi_pass php-fpm-sock;
  include fastcgi_params;

and magically;  your logs will retain the value;  without sending it along to the user.

We intend to use this to log a few bits of information;  like the state of a user;  what permissions they have (as a single byte) and some other bits useful in helping us identify large problems at the ultra-granular level;  with very minimal performance impact to the application itself. In our case logs are written to stdout of the container which is collected by Filebeat on the parent ECS node and sent along to an ELK cluster for processing, retention, and interrogation. This all happens with various levels of queuing + potential loss;  but gives us the flexibility to run an ultra lean stack and keep our cost-per-request orders of magnitude lower then many of the other large sites in the world.

Hopefully you find this helpful;  I cannot believe how hard it was for my coworkers to find this information online! (They looked before they asked me; it seems this is a very unusual or rare problem that people don't seem to often face!)