0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-30 19:41:46 +01:00
posthog/ee/clickhouse
James Greenhill 379518e285
"Clickhouse Features V2 (#1565)" (#1750)
* initial

* migration command

* migrations working

* add modelless views for clickhouse

* initial testing structure

* use test factory

* scaffold for all tests

* add insight and person api

* add basic readme

* add client

* change how migrations are run

* add base tables

* ingesting events

* restore delay

* remove print

* updated testing flow

* changed sessions tests

* update tests

* reorganized sql

* parametrize strings

* element list query

* change to seralizer

* add values endpoint

* retrieve with filter

* pruned code to prepare for staged merge

* working ingestion again

* tests for ee

* undo unneeded tests right now

* fix linting

* more typing errors

* fix tests

* add clickhouse image to workflow

* move to right job

* remove django_clickhouse

* return database url

* run super

* remove keepdb

* reordered calls

* fix type

* fractional seconds

* fix type error

* add checks

* remove retention sql

* fix tests

* add property storage and tests

* merge master

* fix tests

* fix tests

* .

* remove keepdb

* format python files

* update CI env vars

* Override defaults and insecure tests

* Update how ClickHouse database gets evaluated

* remove bootstrapping clickhouse database routine

* Don't initialize the clickhouse connection unless we say it's primary

* .

* fixed id generation

* remove dump

* black settings

* empty client

* add param

* move docker-compose for ch to ee dir

* Add _public_ key to repo for verifying self signed cert on server

* update ee compose file for ee dir

* fix a few issues with tls in migrations

* update migrations to be flexible about storage profile and engine

* black settings

* add elements prop tables

* add elements prop tables

* working filter

* refactored

* better url handling

* add mapping table

* add processing to worker task

* working cohort with actions

* add cohort property filtering

* add cohort property filtering

* reformat and add cohort processing

* prop clauses

* add util

* add more util

* add clickhouse modifier

* Clickhouse Sessions (#1623)

* sessions sql

* skeleton

* add endpoint

* better tests

* sessions list

* merge clickhouse-actions

* added session endpoint

* sessions sql working again

* add clickhouse modifier

* session avg with props working

* add dist

* tests working (no list)

* list working

* add formatting

* more formatting

* fix tests

* dummy commit

* fix types

* remove unnecessary improt

* ignore type when importing from ee in task

* fix test running

* Clickhouse Trends Base (#1609)

* initial working

* date param almost working

* fix date range and labels

* fixed monthly math

* handle compare

* change table

* using new event ingestion

* direct query actions working

* remove interface

* fix date range

* properties initial working

* handle operator

* handle operator

* move timestamp parse

* move more to util

* inital breaking down working

* working cohort breakdown

* some tests running

* fix sessions

* cohort tests

* action and interval test

* reorder cohort filtering

* rename retention test

* fix inits

* change multitenancy tests

* fix types

* fix optional types

* replace ch_client.execute with sync_execute

* replace ch_client.execute with sync_execute, part 2

* Clickhouse Stickiness + Process Event (#1654)

* generate clickhouse uuid script

* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG

* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function

* Fix test_process_event_ee.py, remove all calls to Person.objects.*

* add back util

* fix broken imports

* improve process_event test clickhouse queries

* Basic stickiness query

* Clickhouse Stickiness tests

* stickiness test [WIP, actions fail]

* generate clickhouse uuid script

* change default test runner if PRIMARY_DB=clickhouse

* fix stickiness test for actions

* fix merge bug

* remove _create_person stub; cohort person_id is UUID now

* fix typing

* Clickhouse trends process math (#1660)

* most of process math works

* all process math

* fix ordering issue

* unusued imports

* update property comparison for process_event_ee

* indentation wrong missing calls

* demo users and events (#1661)

* finish breakdown filtering tests and reformat label function

* add increment to demo_data

* update demo data populating

* Add people endpoint for ch (#1670)

* add people endpoint for ch

* stickiness people

* fix value padding

* add process math to breakdown and

* add limit

* fix tests

* condensed code

* converted test to factory

* add people tests

* add month handling

* add typing fix

* change people test handling

* fix tests

* Clickhouse funnels 2 (#1668)

* add elements to create_event

* WIP closes #1663 Add funnels to clickhouse

* Make funnels work

* Clean up

* Move filtering around

* Add mypy tests and fix

* Performance improvements

* fix person tests again

* add people for funnel endpoint

* fix prop numbering

Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>

* merge master

* add retention

* update types

* more typing errors

* fix types

* bug with kafka payload, elements insert, and demo data

* Clickhouse Paths (#1657)

* paths clickhouse test (fails)

* add elements to create_event

* make this fail for clickhouse

* hardcoded query that returns good results for $pageviews, no filters yet

* clean up queries

* bound by time, fix 30min new session boundary

* support screen and custom events

* add properties filter

* paths url

* filter by path start

* better path start test

* even better path start test

* start from the first "path start" in a group

* test for person_id in paths

* partition by person_id for POSTGRES paths

* partition by person_id for Clickhouse paths

* clean up order in paths test

* clean up order in paths test

* join elements

* force element order on element group creation

* remove "order" when creating elements in tests and demo

* get list of elements for paths

* add limit to paths query

* use materialized view

* rename "element_hash" to "elements_hash" (no change in db)

* cull rows that are definitely unused

* simplify query

* New highly optimized paths clickhouse query

* start_point for $autocapture paths

* extract event property values from clickhouse

* prevent crash

* select one element sql

* get elements for event

* remove lodash

* remove host from $pageview path elements if same domain as incoming path

* show metadata based on loaded paths filter, not in flight filter

* fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph

* add test that makes the Postgres paths query fail

* fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points

* create automatic /demo urls that match the real urls (no ending /)

* fix elements queries

* path element joins

* create persons via postgres in paths test

* change serializers back to id

* fix tests with uuid

* fix demo

* more bugs

* fix type

* change now to timezone aware

* [clickhouse] retention filters (#1725)

* implemented target entity and prop filtering

* add insight view override

* fix endpoint and filters

* include tests

* fix tests

* add period filtering

* .

* fix pg param name

* add filtering params to both queries in retention sql

* fix param again

* change to todatetime

* change tz to timezone

* add back timezone in model/event

* [clickhouse] feature flag endpoint requests (#1731)

* add feature flags to endpoints

* add flags to endpoints that check on request

* remove magic strings and fill in missing flags

* fix types

* add missing flag

* change from iso

* fix more timestamps and comparator

* change _people to get_people in actions view

* remove action and cohort populating

* change inheritance

* "Clickhouse Features V2 (#1565)"

This reverts commit 0b371d43ec.

* fix types

* change to super

* change to super x2

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 15:17:26 +01:00
..
migrations Create Omni-Person model for managing people in Clickhouse (#1712) 2020-09-25 11:05:50 +01:00
models "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
queries "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
sql "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
test "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
views "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
__init__.py Fix Master EE code (#1701) 2020-09-24 06:14:17 -04:00
clickhouse_test_runner.py Fix Master EE code (#1701) 2020-09-24 06:14:17 -04:00
client.py Fix Master EE code (#1701) 2020-09-24 06:14:17 -04:00
demo.py "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
process_event.py "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00
README.md first chunk of clickhouse framework (#1613) 2020-09-08 16:12:27 -07:00
util.py "Clickhouse Features V2 (#1565)" (#1750) 2020-09-29 15:17:26 +01:00

Clickhouse Support (Enterprise Feature)

To accomodate high volume deployments, Posthog can use Clickhouse instead of Postgres. Clickhouse isn't used by default because Postgres is easier to deploy and maintain on smaller instances and on platforms such as Heroku.

Clickhouse Support works by swapping in separate queries and classes in the system for models that are most impacted by high volume usage (ex: events and actions).

Migrations and Models

The django_clickhouse orm is used to manage migrations and models. The ORM is used to mimic the django model and migration structure in the main folder.

Queries

Queries parallel the queries folder in the main folder however, clickhouse queries are written in SQL and do not utilize the ORM.

Tests

The tests are inherited from the main folder. The Clickhouse query classes are based off BaseQuery so their run function should work just as the Django ORM backed query classes. These classes are called with the paramterized tests declared in the main folder which allows the same suite of tests to be run with different implementations.

Views

Views contain Viewset classes that are not backed by models. Instead the views query Clickhouse tables using SQL. These views match the interface provide by the views in the main folder.