Problem
We're not able to play back all of the recordings captured by the blob ingester, the offset high water mark processing is new and if it isn't working correctly would lead to us skipping data we should not have
Changes
Allow us to set an environment variable that uses a no-op high-water mark processor
sneaks in removal of the no-longer-used SESSION_RECORDING_BLOB_PROCESSING_TEAMS env var
Problem
see #15200 (comment)
When we store session recording events we materialize a lot of information using the snapshot data column.
We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.
Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.
add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
* refactor: Eliminate the `KAFKA_ENABLED` setting
* Remove dead code
* Consolidate plugin server test scripts and CI
* Fix CI command
* Remove Celery queues
* Rearrange test directories
* Update import paths
* feat(object_storage): add unused object storage with health checks
* only prompt debug users if object storage not available at preflight
* safe plugin server health check for unused object storage
* explicit object storage settings
* explicit object storage settings
* explicit object storage settings
* downgrade pip tools
* without spaces?
* like this?
* without updating pip?
* remove object_storage from dev volumes
* named volume on hobby
* lazily init object storage
* simplify conditional check
* reproduced error locally
* reproduced error locally
* object_storage_endpoint not host and port
* log more when checking kafka and clickhouse
* don't filter docker output
* add kafka to hosts before starting stack?
* silly cloud tests (not my brain)