0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-12-01 04:12:23 +01:00
posthog/ee/clickhouse/sql/clickhouse.py

80 lines
2.5 KiB
Python
Raw Normal View History

"Clickhouse Features V2 (#1565)" (#1750) * initial * migration command * migrations working * add modelless views for clickhouse * initial testing structure * use test factory * scaffold for all tests * add insight and person api * add basic readme * add client * change how migrations are run * add base tables * ingesting events * restore delay * remove print * updated testing flow * changed sessions tests * update tests * reorganized sql * parametrize strings * element list query * change to seralizer * add values endpoint * retrieve with filter * pruned code to prepare for staged merge * working ingestion again * tests for ee * undo unneeded tests right now * fix linting * more typing errors * fix tests * add clickhouse image to workflow * move to right job * remove django_clickhouse * return database url * run super * remove keepdb * reordered calls * fix type * fractional seconds * fix type error * add checks * remove retention sql * fix tests * add property storage and tests * merge master * fix tests * fix tests * . * remove keepdb * format python files * update CI env vars * Override defaults and insecure tests * Update how ClickHouse database gets evaluated * remove bootstrapping clickhouse database routine * Don't initialize the clickhouse connection unless we say it's primary * . * fixed id generation * remove dump * black settings * empty client * add param * move docker-compose for ch to ee dir * Add _public_ key to repo for verifying self signed cert on server * update ee compose file for ee dir * fix a few issues with tls in migrations * update migrations to be flexible about storage profile and engine * black settings * add elements prop tables * add elements prop tables * working filter * refactored * better url handling * add mapping table * add processing to worker task * working cohort with actions * add cohort property filtering * add cohort property filtering * reformat and add cohort processing * prop clauses * add util * add more util * add clickhouse modifier * Clickhouse Sessions (#1623) * sessions sql * skeleton * add endpoint * better tests * sessions list * merge clickhouse-actions * added session endpoint * sessions sql working again * add clickhouse modifier * session avg with props working * add dist * tests working (no list) * list working * add formatting * more formatting * fix tests * dummy commit * fix types * remove unnecessary improt * ignore type when importing from ee in task * fix test running * Clickhouse Trends Base (#1609) * initial working * date param almost working * fix date range and labels * fixed monthly math * handle compare * change table * using new event ingestion * direct query actions working * remove interface * fix date range * properties initial working * handle operator * handle operator * move timestamp parse * move more to util * inital breaking down working * working cohort breakdown * some tests running * fix sessions * cohort tests * action and interval test * reorder cohort filtering * rename retention test * fix inits * change multitenancy tests * fix types * fix optional types * replace ch_client.execute with sync_execute * replace ch_client.execute with sync_execute, part 2 * Clickhouse Stickiness + Process Event (#1654) * generate clickhouse uuid script * set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG * convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function * Fix test_process_event_ee.py, remove all calls to Person.objects.* * add back util * fix broken imports * improve process_event test clickhouse queries * Basic stickiness query * Clickhouse Stickiness tests * stickiness test [WIP, actions fail] * generate clickhouse uuid script * change default test runner if PRIMARY_DB=clickhouse * fix stickiness test for actions * fix merge bug * remove _create_person stub; cohort person_id is UUID now * fix typing * Clickhouse trends process math (#1660) * most of process math works * all process math * fix ordering issue * unusued imports * update property comparison for process_event_ee * indentation wrong missing calls * demo users and events (#1661) * finish breakdown filtering tests and reformat label function * add increment to demo_data * update demo data populating * Add people endpoint for ch (#1670) * add people endpoint for ch * stickiness people * fix value padding * add process math to breakdown and * add limit * fix tests * condensed code * converted test to factory * add people tests * add month handling * add typing fix * change people test handling * fix tests * Clickhouse funnels 2 (#1668) * add elements to create_event * WIP closes #1663 Add funnels to clickhouse * Make funnels work * Clean up * Move filtering around * Add mypy tests and fix * Performance improvements * fix person tests again * add people for funnel endpoint * fix prop numbering Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Eric <eeoneric@gmail.com> * merge master * add retention * update types * more typing errors * fix types * bug with kafka payload, elements insert, and demo data * Clickhouse Paths (#1657) * paths clickhouse test (fails) * add elements to create_event * make this fail for clickhouse * hardcoded query that returns good results for $pageviews, no filters yet * clean up queries * bound by time, fix 30min new session boundary * support screen and custom events * add properties filter * paths url * filter by path start * better path start test * even better path start test * start from the first "path start" in a group * test for person_id in paths * partition by person_id for POSTGRES paths * partition by person_id for Clickhouse paths * clean up order in paths test * clean up order in paths test * join elements * force element order on element group creation * remove "order" when creating elements in tests and demo * get list of elements for paths * add limit to paths query * use materialized view * rename "element_hash" to "elements_hash" (no change in db) * cull rows that are definitely unused * simplify query * New highly optimized paths clickhouse query * start_point for $autocapture paths * extract event property values from clickhouse * prevent crash * select one element sql * get elements for event * remove lodash * remove host from $pageview path elements if same domain as incoming path * show metadata based on loaded paths filter, not in flight filter * fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph * add test that makes the Postgres paths query fail * fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points * create automatic /demo urls that match the real urls (no ending /) * fix elements queries * path element joins * create persons via postgres in paths test * change serializers back to id * fix tests with uuid * fix demo * more bugs * fix type * change now to timezone aware * [clickhouse] retention filters (#1725) * implemented target entity and prop filtering * add insight view override * fix endpoint and filters * include tests * fix tests * add period filtering * . * fix pg param name * add filtering params to both queries in retention sql * fix param again * change to todatetime * change tz to timezone * add back timezone in model/event * [clickhouse] feature flag endpoint requests (#1731) * add feature flags to endpoints * add flags to endpoints that check on request * remove magic strings and fill in missing flags * fix types * add missing flag * change from iso * fix more timestamps and comparator * change _people to get_people in actions view * remove action and cohort populating * change inheritance * "Clickhouse Features V2 (#1565)" This reverts commit 0b371d43eca149cd3632410ea5653b85b2173e39. * fix types * change to super * change to super x2 Co-authored-by: Eric <eeoneric@gmail.com> Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 16:17:26 +02:00
from typing import Optional
from posthog.settings import CLICKHOUSE_ENABLE_STORAGE_POLICY, CLICKHOUSE_REPLICATION, KAFKA_HOSTS, TEST
STORAGE_POLICY = "SETTINGS storage_policy = 'hot_to_cold'" if CLICKHOUSE_ENABLE_STORAGE_POLICY else ""
REPLACING_TABLE_ENGINE = (
"ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/posthog.{table}', '{{replica}}', {ver})"
if CLICKHOUSE_REPLICATION
else "ReplacingMergeTree({ver})"
)
MERGE_TABLE_ENGINE = (
"Clickhouse Features V2 (#1565)" (#1750) * initial * migration command * migrations working * add modelless views for clickhouse * initial testing structure * use test factory * scaffold for all tests * add insight and person api * add basic readme * add client * change how migrations are run * add base tables * ingesting events * restore delay * remove print * updated testing flow * changed sessions tests * update tests * reorganized sql * parametrize strings * element list query * change to seralizer * add values endpoint * retrieve with filter * pruned code to prepare for staged merge * working ingestion again * tests for ee * undo unneeded tests right now * fix linting * more typing errors * fix tests * add clickhouse image to workflow * move to right job * remove django_clickhouse * return database url * run super * remove keepdb * reordered calls * fix type * fractional seconds * fix type error * add checks * remove retention sql * fix tests * add property storage and tests * merge master * fix tests * fix tests * . * remove keepdb * format python files * update CI env vars * Override defaults and insecure tests * Update how ClickHouse database gets evaluated * remove bootstrapping clickhouse database routine * Don't initialize the clickhouse connection unless we say it's primary * . * fixed id generation * remove dump * black settings * empty client * add param * move docker-compose for ch to ee dir * Add _public_ key to repo for verifying self signed cert on server * update ee compose file for ee dir * fix a few issues with tls in migrations * update migrations to be flexible about storage profile and engine * black settings * add elements prop tables * add elements prop tables * working filter * refactored * better url handling * add mapping table * add processing to worker task * working cohort with actions * add cohort property filtering * add cohort property filtering * reformat and add cohort processing * prop clauses * add util * add more util * add clickhouse modifier * Clickhouse Sessions (#1623) * sessions sql * skeleton * add endpoint * better tests * sessions list * merge clickhouse-actions * added session endpoint * sessions sql working again * add clickhouse modifier * session avg with props working * add dist * tests working (no list) * list working * add formatting * more formatting * fix tests * dummy commit * fix types * remove unnecessary improt * ignore type when importing from ee in task * fix test running * Clickhouse Trends Base (#1609) * initial working * date param almost working * fix date range and labels * fixed monthly math * handle compare * change table * using new event ingestion * direct query actions working * remove interface * fix date range * properties initial working * handle operator * handle operator * move timestamp parse * move more to util * inital breaking down working * working cohort breakdown * some tests running * fix sessions * cohort tests * action and interval test * reorder cohort filtering * rename retention test * fix inits * change multitenancy tests * fix types * fix optional types * replace ch_client.execute with sync_execute * replace ch_client.execute with sync_execute, part 2 * Clickhouse Stickiness + Process Event (#1654) * generate clickhouse uuid script * set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG * convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function * Fix test_process_event_ee.py, remove all calls to Person.objects.* * add back util * fix broken imports * improve process_event test clickhouse queries * Basic stickiness query * Clickhouse Stickiness tests * stickiness test [WIP, actions fail] * generate clickhouse uuid script * change default test runner if PRIMARY_DB=clickhouse * fix stickiness test for actions * fix merge bug * remove _create_person stub; cohort person_id is UUID now * fix typing * Clickhouse trends process math (#1660) * most of process math works * all process math * fix ordering issue * unusued imports * update property comparison for process_event_ee * indentation wrong missing calls * demo users and events (#1661) * finish breakdown filtering tests and reformat label function * add increment to demo_data * update demo data populating * Add people endpoint for ch (#1670) * add people endpoint for ch * stickiness people * fix value padding * add process math to breakdown and * add limit * fix tests * condensed code * converted test to factory * add people tests * add month handling * add typing fix * change people test handling * fix tests * Clickhouse funnels 2 (#1668) * add elements to create_event * WIP closes #1663 Add funnels to clickhouse * Make funnels work * Clean up * Move filtering around * Add mypy tests and fix * Performance improvements * fix person tests again * add people for funnel endpoint * fix prop numbering Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Eric <eeoneric@gmail.com> * merge master * add retention * update types * more typing errors * fix types * bug with kafka payload, elements insert, and demo data * Clickhouse Paths (#1657) * paths clickhouse test (fails) * add elements to create_event * make this fail for clickhouse * hardcoded query that returns good results for $pageviews, no filters yet * clean up queries * bound by time, fix 30min new session boundary * support screen and custom events * add properties filter * paths url * filter by path start * better path start test * even better path start test * start from the first "path start" in a group * test for person_id in paths * partition by person_id for POSTGRES paths * partition by person_id for Clickhouse paths * clean up order in paths test * clean up order in paths test * join elements * force element order on element group creation * remove "order" when creating elements in tests and demo * get list of elements for paths * add limit to paths query * use materialized view * rename "element_hash" to "elements_hash" (no change in db) * cull rows that are definitely unused * simplify query * New highly optimized paths clickhouse query * start_point for $autocapture paths * extract event property values from clickhouse * prevent crash * select one element sql * get elements for event * remove lodash * remove host from $pageview path elements if same domain as incoming path * show metadata based on loaded paths filter, not in flight filter * fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph * add test that makes the Postgres paths query fail * fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points * create automatic /demo urls that match the real urls (no ending /) * fix elements queries * path element joins * create persons via postgres in paths test * change serializers back to id * fix tests with uuid * fix demo * more bugs * fix type * change now to timezone aware * [clickhouse] retention filters (#1725) * implemented target entity and prop filtering * add insight view override * fix endpoint and filters * include tests * fix tests * add period filtering * . * fix pg param name * add filtering params to both queries in retention sql * fix param again * change to todatetime * change tz to timezone * add back timezone in model/event * [clickhouse] feature flag endpoint requests (#1731) * add feature flags to endpoints * add flags to endpoints that check on request * remove magic strings and fill in missing flags * fix types * add missing flag * change from iso * fix more timestamps and comparator * change _people to get_people in actions view * remove action and cohort populating * change inheritance * "Clickhouse Features V2 (#1565)" This reverts commit 0b371d43eca149cd3632410ea5653b85b2173e39. * fix types * change to super * change to super x2 Co-authored-by: Eric <eeoneric@gmail.com> Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 16:17:26 +02:00
"ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/posthog.{table}', '{{replica}}')"
if CLICKHOUSE_REPLICATION
else "MergeTree()"
)
COLLAPSING_TABLE_ENGINE = (
"ReplicatedCollapsingMergeTree('/clickhouse/tables/noshard/posthog.{table}', '{{replica}}-{{shard}}', {ver})"
if CLICKHOUSE_REPLICATION
else "CollapsingMergeTree({ver})"
)
KAFKA_ENGINE = "Kafka('{kafka_host}', '{topic}', '{group}', '{serialization}')"
KAFKA_PROTO_ENGINE = """
Kafka () SETTINGS
kafka_broker_list = '{kafka_host}',
kafka_topic_list = '{topic}',
kafka_group_name = '{group}',
kafka_format = 'Protobuf',
kafka_schema = '{proto_schema}',
kafka_skip_broken_messages = {skip_broken_messages}
"""
GENERATE_UUID_SQL = """
SELECT generateUUIDv4()
"""
KAFKA_COLUMNS = """
, _timestamp DateTime
, _offset UInt64
"""
COLLAPSING_MERGE_TREE = "collapsing_merge_tree"
REPLACING_MERGE_TREE = "replacing_merge_tree"
def table_engine(table: str, ver: Optional[str] = None, engine_type: Optional[str] = None) -> str:
if engine_type == COLLAPSING_MERGE_TREE and ver:
return COLLAPSING_TABLE_ENGINE.format(table=table, ver=ver)
elif engine_type == REPLACING_MERGE_TREE and ver:
return REPLACING_TABLE_ENGINE.format(table=table, ver=ver)
"Clickhouse Features V2 (#1565)" (#1750) * initial * migration command * migrations working * add modelless views for clickhouse * initial testing structure * use test factory * scaffold for all tests * add insight and person api * add basic readme * add client * change how migrations are run * add base tables * ingesting events * restore delay * remove print * updated testing flow * changed sessions tests * update tests * reorganized sql * parametrize strings * element list query * change to seralizer * add values endpoint * retrieve with filter * pruned code to prepare for staged merge * working ingestion again * tests for ee * undo unneeded tests right now * fix linting * more typing errors * fix tests * add clickhouse image to workflow * move to right job * remove django_clickhouse * return database url * run super * remove keepdb * reordered calls * fix type * fractional seconds * fix type error * add checks * remove retention sql * fix tests * add property storage and tests * merge master * fix tests * fix tests * . * remove keepdb * format python files * update CI env vars * Override defaults and insecure tests * Update how ClickHouse database gets evaluated * remove bootstrapping clickhouse database routine * Don't initialize the clickhouse connection unless we say it's primary * . * fixed id generation * remove dump * black settings * empty client * add param * move docker-compose for ch to ee dir * Add _public_ key to repo for verifying self signed cert on server * update ee compose file for ee dir * fix a few issues with tls in migrations * update migrations to be flexible about storage profile and engine * black settings * add elements prop tables * add elements prop tables * working filter * refactored * better url handling * add mapping table * add processing to worker task * working cohort with actions * add cohort property filtering * add cohort property filtering * reformat and add cohort processing * prop clauses * add util * add more util * add clickhouse modifier * Clickhouse Sessions (#1623) * sessions sql * skeleton * add endpoint * better tests * sessions list * merge clickhouse-actions * added session endpoint * sessions sql working again * add clickhouse modifier * session avg with props working * add dist * tests working (no list) * list working * add formatting * more formatting * fix tests * dummy commit * fix types * remove unnecessary improt * ignore type when importing from ee in task * fix test running * Clickhouse Trends Base (#1609) * initial working * date param almost working * fix date range and labels * fixed monthly math * handle compare * change table * using new event ingestion * direct query actions working * remove interface * fix date range * properties initial working * handle operator * handle operator * move timestamp parse * move more to util * inital breaking down working * working cohort breakdown * some tests running * fix sessions * cohort tests * action and interval test * reorder cohort filtering * rename retention test * fix inits * change multitenancy tests * fix types * fix optional types * replace ch_client.execute with sync_execute * replace ch_client.execute with sync_execute, part 2 * Clickhouse Stickiness + Process Event (#1654) * generate clickhouse uuid script * set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG * convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function * Fix test_process_event_ee.py, remove all calls to Person.objects.* * add back util * fix broken imports * improve process_event test clickhouse queries * Basic stickiness query * Clickhouse Stickiness tests * stickiness test [WIP, actions fail] * generate clickhouse uuid script * change default test runner if PRIMARY_DB=clickhouse * fix stickiness test for actions * fix merge bug * remove _create_person stub; cohort person_id is UUID now * fix typing * Clickhouse trends process math (#1660) * most of process math works * all process math * fix ordering issue * unusued imports * update property comparison for process_event_ee * indentation wrong missing calls * demo users and events (#1661) * finish breakdown filtering tests and reformat label function * add increment to demo_data * update demo data populating * Add people endpoint for ch (#1670) * add people endpoint for ch * stickiness people * fix value padding * add process math to breakdown and * add limit * fix tests * condensed code * converted test to factory * add people tests * add month handling * add typing fix * change people test handling * fix tests * Clickhouse funnels 2 (#1668) * add elements to create_event * WIP closes #1663 Add funnels to clickhouse * Make funnels work * Clean up * Move filtering around * Add mypy tests and fix * Performance improvements * fix person tests again * add people for funnel endpoint * fix prop numbering Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Eric <eeoneric@gmail.com> * merge master * add retention * update types * more typing errors * fix types * bug with kafka payload, elements insert, and demo data * Clickhouse Paths (#1657) * paths clickhouse test (fails) * add elements to create_event * make this fail for clickhouse * hardcoded query that returns good results for $pageviews, no filters yet * clean up queries * bound by time, fix 30min new session boundary * support screen and custom events * add properties filter * paths url * filter by path start * better path start test * even better path start test * start from the first "path start" in a group * test for person_id in paths * partition by person_id for POSTGRES paths * partition by person_id for Clickhouse paths * clean up order in paths test * clean up order in paths test * join elements * force element order on element group creation * remove "order" when creating elements in tests and demo * get list of elements for paths * add limit to paths query * use materialized view * rename "element_hash" to "elements_hash" (no change in db) * cull rows that are definitely unused * simplify query * New highly optimized paths clickhouse query * start_point for $autocapture paths * extract event property values from clickhouse * prevent crash * select one element sql * get elements for event * remove lodash * remove host from $pageview path elements if same domain as incoming path * show metadata based on loaded paths filter, not in flight filter * fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph * add test that makes the Postgres paths query fail * fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points * create automatic /demo urls that match the real urls (no ending /) * fix elements queries * path element joins * create persons via postgres in paths test * change serializers back to id * fix tests with uuid * fix demo * more bugs * fix type * change now to timezone aware * [clickhouse] retention filters (#1725) * implemented target entity and prop filtering * add insight view override * fix endpoint and filters * include tests * fix tests * add period filtering * . * fix pg param name * add filtering params to both queries in retention sql * fix param again * change to todatetime * change tz to timezone * add back timezone in model/event * [clickhouse] feature flag endpoint requests (#1731) * add feature flags to endpoints * add flags to endpoints that check on request * remove magic strings and fill in missing flags * fix types * add missing flag * change from iso * fix more timestamps and comparator * change _people to get_people in actions view * remove action and cohort populating * change inheritance * "Clickhouse Features V2 (#1565)" This reverts commit 0b371d43eca149cd3632410ea5653b85b2173e39. * fix types * change to super * change to super x2 Co-authored-by: Eric <eeoneric@gmail.com> Co-authored-by: Marius Andra <marius.andra@gmail.com> Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 16:17:26 +02:00
else:
return MERGE_TABLE_ENGINE.format(table=table)
def kafka_engine(
topic: str,
kafka_host=KAFKA_HOSTS,
group="group1",
serialization="JSONEachRow",
proto_schema=None,
skip_broken_messages=100,
):
if serialization == "JSONEachRow":
return KAFKA_ENGINE.format(topic=topic, kafka_host=kafka_host, group=group, serialization=serialization)
elif serialization == "Protobuf":
return KAFKA_PROTO_ENGINE.format(
topic=topic,
kafka_host=kafka_host,
group=group,
proto_schema=proto_schema,
skip_broken_messages=skip_broken_messages,
)
Plugin log entries (#3482) * Add Postgres model PluginLogEntry * Add equivalent PluginLogEntry to Kafka+ClickHouse * Add migration * Add PluginLogEntry.Type.LOG & make PluginLogEntry.message a TextField * Update 0130_pluginlogentry.py * Add PluginLogEntry.instance_id * Update migration * Update migration * Add plugin log entries API * Test plugin log entries DB fetching * Add PluginLogs component prototype * Fix API * Improve PluginLogs component * Remove almost unused plugin Feedback button * Update migration * Fixed typing * Fix org permission error test asserts * Fix plugin log entry tests * Fix CH plugin log entry timestamp string * Update CH test_plugin_log_entry.py * Fix plugin log entry tests across PG/CH * Satisfy mypy * Add search and limit to plugin log entry API * Send team_id in plugin config API * Rework plugin logs UI * Add plugin config team ID in tests * Add plugin config team ID in tests actually * Fix code quality * Make logs plugin config-based * Fix CH queries * Fix typing * Improve UX and fix things * Polish plugin logs logic * Update migration * Add Celery task to delete old plugin logs * Fix UX bug with loading more plugin logs * Fix missing import * Remove OrganizationMemberPermissions message change * Make mypy happy * Add PluginLogEntry.is_system * Optimize CH plugin_log_entires PARTITION/ORDER * Increment migration * Adjust plugin logs drawer display * Fix plugin_log_factory_ch * Fix plugin_log_factory_ch fix * Replace PluginLogEntry.is_system with source * Adjust PluginLogEntrySerializer * Update CH fetch_plugin_log_entries * Make kea-typegen happy
2021-05-06 09:54:32 +02:00
def ttl_period(field: str = "created_at", weeks: int = 3):
return "" if TEST else f"TTL toDate({field}) + INTERVAL {weeks} WEEK"