0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-28 18:26:15 +01:00
Commit Graph

16 Commits

Author SHA1 Message Date
Eric Duong
96a614ee25
Prune person materialized (#2166)
* remove person properties up to date

* remove person props mv

* move latest person

* prune rest of person materialized

* missing parenth

* add type

* remove migration
2020-11-02 15:44:38 -05:00
Paolo D'Amico
066721e3c1
Stability & dev experience improvements (#2152) 2020-11-02 14:55:20 +00:00
Karl-Aksel Puulmann
e3bf0cb31d
Session recording on clickhouse, separate tables and retention cronjob (#2051)
* Add scheduled task to wipe session recordings

* Create a new table for session recording

* Save snapshot events to different table

* Use SessionRecordingEvent over Events everywhere

We can remove a ton of cruft this way as well

* Add missing signature

* Extract util from models/event

* Attempt to update ingest side of clickhouse session recording events

Note that it's using main kafka topic - not sure if a good idea.

* Get separate table in ch working for session recording events

* WIP: query sessions

* Make both session recording queries work

* Make linter happy

* Rebase migration

* Make tests work

* Apply a TTL to session recordings and other configuration:

- toYYYYMMDD partitioning should be smoother with TTL setup
- TTL achieves not needing to archive the data ourselves
- index_granularity will enable smaller reads per session_id
- ORDER BY clause is to make single session as well as time range query
  reasonable

* Convert retention cronjob to new model

* Add tests to process_event changes

* Add test for ee_capture change

* Fixup migration

* Make clickhouse tests drop/create session recording tables

* Make TTL not be there in tests

Otherwise writes get eaten by it during tests when mocking time

* Fix retention task

Co-authored-by: Tim Glaser <tim@glsr.nl>
2020-10-28 21:22:16 +01:00
Eric Duong
51105ace47
Add new person matieralized (#1944)
* add new table migrations and change table names

* include necessaray config for new tables in tests

* fix tests and table

* fix table name param

* add populate clause

* added table for key value person props

* adjust person filtering to use new table

* .

* add ordering on updated_at

* add back all the condition handling on persons filtering endpoint

* fix typgin

* remove print

* re-order sort key for persons_up_to_date

Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-10-22 13:22:43 -07:00
James Greenhill
7ab30a836c
Remove Omni-Person logic for ee (#1972)
* Remove Omni-Person logic for ee

* remove more omni person references
2020-10-21 14:06:45 -07:00
Tim Glaser
80d20e385b
Clickhouse use elements chain (#1849)
* Clickhouse use elements chain

* Fix stuff

* Add action tests and start regex

* Progress

* Progress part deux

* Fix everything

* Add tag name filtering

* Fix funnels

* Fix tag name regex

* Fix ordering

* Fix type issues

* Fix empty nth-child

* Remove commented code

* Split with semicolon and escaped quotes

* Specify all select columns
2020-10-16 14:07:03 +02:00
James Greenhill
1ed6263a71
Create Omni-Person model for managing people in Clickhouse (#1712)
* Create Omni-Person model for managing people in Clickhouse

* type fixes

* rebase all the things

* cleanups

* id -> uuid for events in clickhouse

* cleanups and type checks

* Further cleanups and uuid conversions

* kafka fix

* break out serializer across kafka clients

* fix a few bugs w/ datetime types

* basic fix for people kafka table

* fix migration errors (copy pasta errors)

* Use KafkaProducer for Omni Person emitting

* setup mock kafka producer

* undo some work for inserting

* Test TestKafkaProducer

* change if order, obvious mistake

* remove unnecessary function arg

* Fix getters for new column

* Test fixes

* mirror columns across element queries

* firm up handling of timestamps

* only return timestamps for handle_timestamp

* Correct heroku config for Kafka
2020-09-25 11:05:50 +01:00
Marius Andra
dd7e38c5b5
Clickhouse Elements Dedup (based on master) (#1698)
* Use ReplacingMergeTree for elements, remove element_groups and use elements_hash as a virtual "pk"

* remove unused ELEMENT_GROUP_TABLE_SQL

* merge fixes

* use redis cache to avoid writing duplicate elements to clickhouse

* move fakeredis to requirements.txt

* add team_id to cache key

* remove elements_group kafka table references

* add elements_hash to clickhouse element serializer

* fix cache key

* rename few keys

* add test runner to ease pycharm dev

* fix a some mypy errors

* remove typo

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-24 06:47:28 -04:00
Marius Andra
1eeed28751
Fix Master EE code (#1701)
* add test runner to ease pycharm dev

* fix broken import

* drop and recreate the clickhouse test db before running tests

* fix person uuid str json serialization issue

* make kafka optional in tests

* fix inits

* remove need for kafka in person.py

* fix a bunch of mypy errors

* fix function and add process_event to pipeline

* fixed missing params and tests

* change uuid and fix types

* types

* optimize for merge prop test

* make ClickhouseProducer to produce to clickhouse one way or another

* annotate types

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-09-24 06:14:17 -04:00
James Greenhill
2fd4fc90d4
Materialize Views to wrap data coming in from Kafka for Events, Elements, People (#1678) 2020-09-17 11:22:00 -07:00
James Greenhill
adeb7694cc
Publish events to Kafka for consumption (#1644)
* Publish events to Kafka for consumption

* Commit avro idl's for event schemas

* convert client to use github.com/dpkp/kafka-python

* events loaded into clickhouse from Kafka

* remove cruft

* Publish events to Kafka for consumption

* convert client to use github.com/dpkp/kafka-python

* remove cruft

* include kafka migrations

* bugfixes for migrations

* use constants for consistency

* wrap up local migrations

* small fixes

* tune ups
2020-09-15 20:04:38 -07:00
James Greenhill
dedf5582c1
first chunk of clickhouse framework (#1613)
* first chunk of clickhouse framework

* prod, not dev docker-compose.yml

* add clickhouse sql files
2020-09-08 16:12:27 -07:00
James Greenhill
c2e03a3a46
Revert this to bring back the working copy of ee / clickhouse (#1588) 2020-09-04 21:12:07 -07:00
James Greenhill
5506135c3c
Bring back clickhouse changes along with queue size metrics (#1579)
* Revert "Revert "Clickhouse setup (#1463)""

This reverts commit 7f2cab4b93.

* add queue backlog to _stats endpoint

* celery queue length into heartbeat -> statsd

* reformat

* type check fixes

* bump expected number of queries to 8

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-04 13:44:53 -07:00
James Greenhill
7f2cab4b93 Revert "Clickhouse setup (#1463)"
This reverts commit a0327587cb.
Time to process events shot way up and logs are missing.
2020-09-03 19:27:02 -07:00
Eric Duong
a0327587cb
Clickhouse setup (#1463)
* initial

* migration command

* migrations working

* add modelless views for clickhouse

* initial testing structure

* use test factory

* scaffold for all tests

* add insight and person api

* add basic readme

* add client

* change how migrations are run

* add base tables

* ingesting events

* restore delay

* remove print

* updated testing flow

* changed sessions tests

* update tests

* reorganized sql

* parametrize strings

* element list query

* change to seralizer

* add values endpoint

* retrieve with filter

* pruned code to prepare for staged merge

* working ingestion again

* tests for ee

* undo unneeded tests right now

* fix linting

* more typing errors

* fix tests

* add clickhouse image to workflow

* move to right job

* remove django_clickhouse

* return database url

* run super

* remove keepdb

* reordered calls

* fix type

* fractional seconds

* fix type error

* add checks

* remove retention sql

* fix tests

* add property storage and tests

* merge master

* fix tests

* fix tests

* .

* remove keepdb

* format python files

* update CI env vars

* Override defaults and insecure tests

* Update how ClickHouse database gets evaluated

* remove bootstrapping clickhouse database routine

* Don't initialize the clickhouse connection unless we say it's primary

* .

* fixed id generation

* remove dump

* black settings

* empty client

* add param

* move docker-compose for ch to ee dir

* Add _public_ key to repo for verifying self signed cert on server

* update ee compose file for ee dir

* fix a few issues with tls in migrations

* update migrations to be flexible about storage profile and engine

* black settings

* add elements prop tables

Co-authored-by: James Greenhill <jams@uber.com>
2020-09-03 10:27:45 -07:00