* fix(plugin-server): write out less overrides on behalf of personless mode
* don't skip writing overrides (so we can backfill posthog_personlessdistinctid)
* swap ugly double-array params for array of objects
* bigint id for PersonlessDistinctId
* add LRU cache for posthog_personlessdistinctid inserts
* fix tests
* overzealous search and replace
* chore(plugin-server): change captureIngestionWarning to not await acks
* chore(plugin-server): move person kafka ack awaits to the batch-level… (#22772)
chore(plugin-server): move person kafka ack awaits to the batch-level await
* fix: activity detection for rrweb data
* move tests out of process event - they don't need any of the setup there
* Add tests
* fix test name
* fix test name
* fix
* Move UUID validation from `process` step to `prepare` step
* Move uuid validation all the way up to `populateTeamData`
* Update tests to use `processEvent` instead of `eventsProcessor.processEvent` since we moved the logic
* Add temporary vscode config to just run relavent tests
* Tests are fixed; was looking for an event that won't be produced
* Remove temp vscode debug config
* nit: remove added newline
* Fix another test that wasn't using a proper UUID
* Move UUID validation from `process` step to `prepare` step
* Move uuid validation all the way up to `populateTeamData`
* Update tests to use `processEvent` instead of `eventsProcessor.processEvent` since we moved the logic
* Add temporary vscode config to just run relavent tests
* Tests are fixed; was looking for an event that won't be produced
* Remove temp vscode debug config
* nit: remove added newline
* Fix another test that wasn't using a proper UUID
* Add missing import after git stupidity
* fix: we can receive strings that ClickHouse can't receive
* let's not throw when gathering console logs
* don't accept arbitrary length content
* alright... a little longer content
* fix
* migration to add the new column
* and populate it
* fix
* Update query snapshots
* test assertions
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
We collect console logs in session recordings but you can't filter to find recordings that have them.
Worse, when we move to storing recordings in blob storage it would be impossible to filter for them.
Changes
adds new columns to the session replay summary table that will let us add counts of levels log, warn, and error from console logs collected with recordings
alters the recordings-ingestion consumer to count those three levels of log
Problem
see #15200 (comment)
When we store session recording events we materialize a lot of information using the snapshot data column.
We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.
Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.
add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
* Revert "Revert "perf(recordings): use node-librdkafka for ingester production" (#15069)"
This reverts commit ac5e084f48.
* fix(plugin-server): only set ssl config when defined
Hopefully this means it will use the global CA bundle.
* hack: enable debug logs
* honor KAFKAJS_LOG_LEVEL envvar
* add SegfaultHandler
* disable ssl verification
* debug -> info
* only log brokers
* Revert "add SegfaultHandler"
This reverts commit b22f40b802.
---------
Co-authored-by: Xavier Vello <xavier@posthog.com>
Previously we've been using the KafkaJS Producer with a wrapper around
it to handle batching. There are a number of issues with the batching
implementation e.g. not having a way to provide guarantees on delivery
and rather than fix that, we can simply use the librdkafka Producer
which is a lot more mature and battle-tested.
* refactor(recordings): remove session code from event pipeline
We have moved session recrodings to a separate topic and consumer. There
may be session recordings in the old topic, but we divert these to the
new logic for processing them.
* refactor to just send to the new topic!
* fix import
* remove empty line
* fix no team_id test
* implement recordings opt in
* remove old $snapshot unit tests
* remove performance tests
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* add back $snapshot format test
* Add comment re functional test assumptions
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* migration for person/group property support in property definitions table
* Use database default
* Validate correct constraint
* Ingest person and group type property definitions
* Exclude person/group type definitions from API
* Update property definitions test
* Ignore $groups
* Add new unique index which accounts for type and group_type_index
* Run new code only in test
* Ignore errors from propertyDefinitionsManager which may occur due to migrations
* Update constraint name
* Update test describe
* ON CONFLICT based on the index expression
* Add a -- not-null-ignore
* Combine migrations
* Remove some test code temporarily
* fixup latest_migrations