* chore(logging): use structlog for stdlib log and gunicorn
Change ensures that we have a common format for all log lines,
additionally adding e.g. pid and tid for gunicorn logs.
Previously there were large plain text log lines over multiple lines
that results in very awkward to logs when ingested into e.g. CloudWatch
or Loki, not least because these log apps typically show the most recent
line first.
* Update gunicorn.config.py
Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
* remove print_warning
Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
This is not the request timeout, but rather the time between the worker
notifying the arbiter that it is still alive. This is runs in the main
loop of the [gthread
worker](https://labs.openai.com/s/YaxC326oohNreRzdYEzkmQdd) which should
be pretty tight.
We might even consider reducing this further.
TODO: allow these vars to be set by whatever is running this. There are
dependencies between this value and e.g. the downstream proxy server
timeouts.
* style(annotations): Revamp Annotations page
* Add annotations to API builder
* Re-add base of annotation modal
* Update `LemonModal` and `IconClose` for visual polish
* Fix missing export
* Fix and align Date and time + Scope fields
* Hook up all the logic to the annotation modal
* Add annotations page story
* Fix typos
* Make date picker fit in
* Prevent ugly text wrapping
* Sync `AnnotationType` with API
* Clarify experience of insight-scoped annotations
* Restore Cypress instrumentation
* Fix typing
* Remove `data-tooltip`
* Rewrite Annotations page description
* Improve edge case with downgrading annotation scope
* Remove redundant function in logic
We add a couple of monitor threads:
1. to monitor the number of connection requests that are yet to be
accepted. This should give us an idea of how well the workers are
getting through the requests. A backup here could suggest that the
workers are completely saturated.
2. to monitor the number of idle threads, and number of active
connections the worker has.
* chore(web): add django-prometheus exposed on /_metrics
This exposes a number of metrics, see
97d5748664/documentation/exports.md
for details. It includes histogram of timings by viewname before and
after middleware.
I'm not particularly interested in these right now, but rather would
like to expose Kafka Producer metrics as per
https://github.com/PostHog/posthog/pull/10997
* Refactor to use gunicorn server hooks
* also add expose to dockerfile
* wip
We're hosting gunicorn behind an ELB with idle timeout of 120 seconds.
I think this is leading to sporatic 504 errors:
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html
> HTTP 504: Gateway timeout
> Cause 2: Registered instances closing the connection to Elastic Load Balancing.
> Solution 2: Enable keep-alive settings on your EC2 instances and make sure that the keep-alive timeout is greater than the idle timeout settings of your load balancer.
Related reading: https://serverfault.com/questions/782022/keepalive-setting-for-gunicorn-behind-elb-without-nginx
Not 100% this is appropriate for setups without ELB but let's deploy,
see if it makes a dent on monitoring and then move appropriately (e.g.
env variable this)