0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-28 18:26:15 +01:00
posthog/ee/kafka_client/topics.py
Karl-Aksel Puulmann e3bf0cb31d
Session recording on clickhouse, separate tables and retention cronjob (#2051)
* Add scheduled task to wipe session recordings

* Create a new table for session recording

* Save snapshot events to different table

* Use SessionRecordingEvent over Events everywhere

We can remove a ton of cruft this way as well

* Add missing signature

* Extract util from models/event

* Attempt to update ingest side of clickhouse session recording events

Note that it's using main kafka topic - not sure if a good idea.

* Get separate table in ch working for session recording events

* WIP: query sessions

* Make both session recording queries work

* Make linter happy

* Rebase migration

* Make tests work

* Apply a TTL to session recordings and other configuration:

- toYYYYMMDD partitioning should be smoother with TTL setup
- TTL achieves not needing to archive the data ourselves
- index_granularity will enable smaller reads per session_id
- ORDER BY clause is to make single session as well as time range query
  reasonable

* Convert retention cronjob to new model

* Add tests to process_event changes

* Add test for ee_capture change

* Fixup migration

* Make clickhouse tests drop/create session recording tables

* Make TTL not be there in tests

Otherwise writes get eaten by it during tests when mocking time

* Fix retention task

Co-authored-by: Tim Glaser <tim@glsr.nl>
2020-10-28 21:22:16 +01:00

8 lines
280 B
Python

KAFKA_EVENTS = "clickhouse_events"
KAFKA_ELEMENTS = "clickhouse_elements"
KAFKA_PERSON = "clickhouse_person"
KAFKA_PERSON_UNIQUE_ID = "clickhouse_person_unique_id"
KAFKA_SESSION_RECORDING_EVENTS = "clickhouse_session_recording_events"
KAFKA_EVENTS_WAL = "events_write_ahead_log"