* chore(recordings): remove hub dependency on recordings ingestion
Hub is a grab bag of depencencies that are not all required for
recordings ingestion. To keep the recordings ingestion lean, we
remove the hub dependency and use the postgres and kafka client
directly.
This should increase the availability of the session recordings
workload, e.g. it should not go down it Redis or ClickHouse is down.
* fix capabilities call
* reuse clients if available
* wip
* wip
* wip
* fix tests
* fix healthcheck
* refactor(recordings): remove session code from event pipeline
We have moved session recrodings to a separate topic and consumer. There
may be session recordings in the old topic, but we divert these to the
new logic for processing them.
* refactor to just send to the new topic!
* fix import
* remove empty line
* fix no team_id test
* implement recordings opt in
* remove old $snapshot unit tests
* remove performance tests
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* add back $snapshot format test
* Add comment re functional test assumptions
Co-authored-by: Tiina Turban <tiina303@gmail.com>
The timestamp is a requirement for the alert defined in
https://github.com/PostHog/charts-clickhouse/pull/669
The batch size metric is added because I'm curious about 1. how many
batches we fetch and 2. what effect setting [KafkaJSs
`minBytes`](https://kafka.js.org/docs/consuming#a-name-options-a-options)
might have on the number and size of batches we fetch, perhaps reducing
down the amount of IO we're performing both consuming and on flushing
the producer wrapper.
* fix(session-recordings): fix missing distinct_id for session recordings
Previously I'd assumed that the distinct_id would be in the event.
That's not true, rather it is at the top level of the Kafka message
value JSON.
This commit fixes that, and also updates all functional tests to not
include the `distinct_id` within the event body.
* Revert "chore(session-recordings): revert to sending events to old topic (#13756)"
This reverts commit 41874de277.
* add test for session without team_id, only token
* pull out event names as variable
* Change info -> debug otherwise its very noisy
* chore(session-recordings): separate topics for events as recordings
WIP
* fix tests
* Use simpler consumer for session recordings
* wip
* still batch things by batchSize
* add tests, improve comments
* rename topic var
* push performance_events to session recordings topic also
* Add completely separate consumer for session-recordings
* wip
* use session_id for partition key
* fix test
* handle team_id/token null
* wip
* fix tests
* wip
* use kafka_topic var in logs
* use logger
* fix test
* Fix $performance_event topic usage
* fix tests
* fix check for null/undefined
* Update posthog/api/capture.py
Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
* Add test for kafka error handling
* Remove falsy teamId check
* fix statsd error
* kick ci
* Use existing getTeamByToken
* remove partition key from recordings
* Make sure producer is connected !
* fix session id kafka key test
* add back throws!
* set producer on each test
* skip flaky test
* add flush error logs
* wait for persons to be ingested
* fix skip
Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
* chore(plugin-server): split functional tests into feature based files
This is intended to make it more obvious what we are testing, and to try
and identify the major themes of the plugin-server functionality.
As a by product it should make things more parallelizable for jest as
the tests in different files will be isolated, runnable in separate
workers.
* use random api token, avoid db constraints
* make tests silent
* format
* chore: set number of jest workers
These tests should be pretty light given they just hit other APIs and
don't do much themselves. Memory could be an issue on constrained
environments. We shall see.