Set keepalive to 60 on gunicorn
The default is 2 seconds, the default for ALBs is 30 seconds
This can cause a race condition where gunicorn closes the connection
as the ALB sends a request, resulting in a 502.
Add kafka rack aware injection on docker-server
Currently we only inject this in plugin-server (aka reading from kafka)
Let's add it as well to docker-server (aka capture/writing to kafka)
* chore(gunicorn): increase thread count for workers
We currently run with 2 worker processes each with 4 threads. It seems
occasionally we see spikes in the number of pending requests within a
worker in the 20-25 region. I suspect this is due to 4 slow requests
blocking the thread pool.
I'd suggest that the majority of work is going to be IO bound, thus it's
hopefully going to not peg the CPU. If it does then it should end up
triggering the HPA and hopefully resolve itself :fingerscrossed:
There is however gzip running on events, which could be intensive
(suggest we offload this to a downstream at some point). If lots of
large requests come in this could be an issue. Some profiling here
wouldn't go amiss.
Note these are the same settings between web and event ingestion
workload. At some point we may want to split.
I've added a Dashboard for gunicorn worker stats
[here](https://github.com/PostHog/charts-clickhouse/pull/559) which we
can monitor to see the effects.
Aside: if would be wise to be able to specify these settings from the
chart itself such that we do not have to consider chart/posthog version
combos, and allow tweaking according to the deployment.
* reduce to 8 threads
Prior to this the containing script was recieving the TERM signal e.g.
from Kubernetes eviction. This was as a result terminating the root
process without waiting on gunicorn.
We solve this by avoiding spawning a new process and rather have
gunicorn replace the current process.
* chore: Allow instrumentation of gunicorn with statsd (#11372)
* chore: Allow instrumentation of gunicorn with statsd
In order to ensure that gunicorn is performing optimally, it helps to
monitor it with statsd.
This change allows us to include the flags needed to send UDP packets to
a statsd instance.
Docs: https://docs.gunicorn.org/en/stable/instrumentation.html
* Update bin/docker-server
Co-authored-by: Harry Waye <harry@posthog.com>
* Update bin/docker-server
Co-authored-by: Harry Waye <harry@posthog.com>
Co-authored-by: Harry Waye <harry@posthog.com>
* Include the STATSD_PORT correctly
Co-authored-by: Harry Waye <harry@posthog.com>
* chore: Allow instrumentation of gunicorn with statsd
In order to ensure that gunicorn is performing optimally, it helps to
monitor it with statsd.
This change allows us to include the flags needed to send UDP packets to
a statsd instance.
Docs: https://docs.gunicorn.org/en/stable/instrumentation.html
* Update bin/docker-server
Co-authored-by: Harry Waye <harry@posthog.com>
* Update bin/docker-server
Co-authored-by: Harry Waye <harry@posthog.com>
Co-authored-by: Harry Waye <harry@posthog.com>
* chore(web): add django-prometheus exposed on /_metrics
This exposes a number of metrics, see
97d5748664/documentation/exports.md
for details. It includes histogram of timings by viewname before and
after middleware.
I'm not particularly interested in these right now, but rather would
like to expose Kafka Producer metrics as per
https://github.com/PostHog/posthog/pull/10997
* Refactor to use gunicorn server hooks
* also add expose to dockerfile
* wip