0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-12-01 12:21:02 +01:00
Commit Graph

52 Commits

Author SHA1 Message Date
Karl-Aksel Puulmann
8d8705d1bb
Materialize person columns automatically (#5760)
* Hotfix: Use materialized columns on cloud

This was broken since default_kind was different on Distributed tables
on cloud

* Improve __init__.py

* Make reverting DEFAULT column async, only ON CLUSTER for events table
2021-08-28 12:14:30 +03:00
Michael Matloka
c2bc2fecd0
Use proper interval calculation in the funnel trends query (#5467)
* Use proper interval calculation in the funnel trends query

* Add some comments

* Update `test_filter`

* Rework `NULL_SQL` to use CH `INTERVAL` too

* Fix week-based relative `date_from` support not existing

* Make use of `toInterval*` functions and inject less

* Add fallback for `date_from` in `ClickhouseSessionsAvg`
2021-08-06 11:29:35 +02:00
Karl-Aksel Puulmann
fc5c6476a0
Revert "Revert "Add is_deleted column to person_distinct_id"" (#5194)
* Revert "Revert "Add is_deleted column to person_distinct_id (#5151)" (#5193)"

This reverts commit 401268bdba.

* A tweak for docker-compose builds

Co-authored-by: James Greenhill <fuziontech@gmail.com>
2021-07-19 19:47:41 -07:00
Karl-Aksel Puulmann
401268bdba
Revert "Add is_deleted column to person_distinct_id (#5151)" (#5193)
This reverts commit b1c11ba7dc.
2021-07-19 12:57:42 +03:00
Karl-Aksel Puulmann
b1c11ba7dc
Add is_deleted column to person_distinct_id (#5151)
* Update PERSONS_ACTIVE_USER_SQL query

* Remove dead import

* Update lifecycle queries

* Update BREAKDOWN_ACTIVE_USER_INNER_SQL to use new persons query

* Update STICKINESS_SQL

* Update STICKINESS_PEOPLE_SQL

* Update STICKINESS_ACTIONS_SQL

* Update paths query

* Update events query

* Update CALCULATE_COHORT_PEOPLE_SQL

* Update retention queries

* Update TOP_PERSON_PROPS_ARRAY_OF_KEY_SQL

* Update EVENT_JOIN_PERSON_SQL

* Update GET_PERSON_ID_BY_ENTITY_COUNT_SQL

* Remove remaining references to old get latest person query

* Update GET_DISTINCT_IDS_BY_PROPERTY_SQL

* Fix code style issue

* Update table engine for person_distinct_id table

* don't select team_id

* Make person deletion work

* Use replacingmergetree over collapsing with is_deleted

Replacing an existing engine is hard, let's not do it

* Update query in test

* add migration

* set database on materialized views

* Update plugin server to 1.1.6

Co-authored-by: James Greenhill <fuziontech@gmail.com>
Co-authored-by: posthog-bot <posthog-bot@users.noreply.github.com>
2021-07-19 12:43:44 +03:00
James Greenhill
751a35cd35
Make DDLs more friendly towards running on a cluster and cleanups (#5091)
* Make DDLs more friendly towards running on a cluster

* Use primary CLICKHOUSE host for migrations and DDL

* loose ends on person kafka create

* posthog -> cluster typo

* add cluster to KAFKA create for plugin logs

* Feed the type monster

* clusterfy local clickhouse

* test docker-compose backed github action

* run just clickhouse and postgres from docker-compose

* move option to between up and <services>

* posthog all the things

* suggest tests run on  cluster

* posthog cluster for ci

* use deploy path for docker-compose

* fix for a clickhouse bug 🐛

* complete CH bug fixes

* 5439 the github actions pg configs

* remove CLICKHOUSE_DATABASE (handled automatically)

* update DATABASE_URL for code quality checks

* Missed a few DDLs on Person

* 5439 -> 5432 to please the people

* cleanup persons and use f strings <3 f strings

* remove auto parens

* Update requirements to use our fork of infi.clickhouse_orm

* fix person.py formatting

* Include boilerplate macros for a cluster
2021-07-15 17:20:37 -07:00
Michael Matloka
d655ca7c8d
Fix some insights query building 500s (#4971)
* Fix handling of cohort without match groups

* Fix bug with CH precalculated cohort query building

* Fix `get_person_ids_by_cohort_id`
2021-07-05 11:49:55 +02:00
Michael Matloka
4b0c163eb8
New funnel trends query (#4875)
* wip: pagination for persons on clickhouse funnels

* wip: added offset support for getting a list of persons; added support for conversion window;

* fixed mypy exception

* helper function to insert data for local testing

* moved generate code into separate class for more functionality later

* corrected person_distinct_id to use the person id from postgres

* minor corrections to generate local class along with addition of data cleanup via destroy() method

* reduce the number of persons who make it to each step

* moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes;

* funnel persons and tests

* initial implementation

* invoke the funnel or funnel trends class respectively

* add a test

* add breakdown handling and first test

* add test stubs

* remove repeats

* mypy corrections and PR feedback

* run funnel test suite on new query implementation

* remove imports

* corrected tests

* minor test updates

* correct func name

* fix types

* func name change

* Make `SHELL_PLUS_PRINT_SQL` clearer

* Add ClickhouseFunnelTrendsNew

* Create test_funnel_trends_new.py

* Create test_funnel_trends_v2.py

* move builder functions to funnel base

* add test classe for new funnel

* Inherit from `ClickhouseFunnelNew` and fix intervals

* Add proper formatting of trends results

* Clean tests up a little bit

* Group `FunnelWindowDaysMixin` tests in `test_funnel_persons`

* Rename `ClickhouseFunnelTrendsNew` things for clarity

* Port some original `ClickhouseFunnel` trends tests for the new query

* Only fetch initial page (100) of persons in trends query

* Describe assumptions and rename things

* Finish porting old ClickhouseFunnelTrends tests and add some new ones

* Remove unused imports

* Try to fix `test_period_not_final`

* Try to fix `test_period_not_final` again

* remove persons lists

* rename

* fix test

* add timezone to results

* add funnel trends new to api path

* revert random change

Co-authored-by: Buddy Williams <buddy@posthog.com>
Co-authored-by: eric <eeoneric@gmail.com>
2021-06-28 18:48:35 -04:00
Buddy Williams
7d045c86f3
Refactoring funnel trends (#4419)
* checkpoint: refactoring funnel trends so that they work correctly

* wip: refactoring funnel trends query to return the results we actually need

* wip: added in new query for testing

* wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality

* wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python

* moved code into funnel_trends for both logic and tests to isolate the concern

* reordered methods for readability

* wip: refactoring funnel trends to support filters

* wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL

* correct column name so that it's clearer

* added substitution variables to new query

* fixed missing substitution variable

* wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test

* several query corrections

* summarize funnel trends

* moved method down

* removed unused code

* added data quality checks

* corrected cohort size for tests

* test window size and incomplete status

* corrected a few names

* removed unnecessary comment

* commented out old funnel trends tests

* removed print statement

* removed old funnel trend code

* made funnel trends response match existing data structure layout

* removed unused imports

* removed more unused imports

* fixed mypy errors

* Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests;

* removed unused type comment

* corrected test to account for new funnel_window_days mixin

* fixed clickhouse funnel tests

* fixes for automated tests

* changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue

* corrections prompted by PR review

* corrected test to dict test with funnel_window_days
2021-06-03 17:06:08 -04:00
Eric Duong
7471ade922
Fixing breakdown bug with retrieving top elements (#4313)
* fixing bug

* conslidate breakdown query

* remove clause

* add missing param

* fix name and bracket

* fix tests add none clause

* fix more tests

* more test adjustments

* add person prop filter test

* only use None when offset is 0 (first query)

* add null check and fix api helper
2021-05-28 10:24:50 -04:00
Eric Duong
be301e7112
Create collapsing table for cohortpeople (#4477)
* just create the table

* add replication logic

* add param

* change replacing name
2021-05-27 17:44:43 +03:00
Eric Duong
954069b00d
Swap out person_distinct_id in queries with subquery (#3828)
* swap out distinct_id table in queries with a subquery that will only consider latest distinct_ids

* wrong import

* fix missin params

* more missing params
2021-04-01 12:56:42 -04:00
James Greenhill
a0a637bb49
Don't include materialized columns in kafka table (#3026) 2021-01-20 20:39:20 -08:00
Tim Glaser
02044c616f
Denormalize clickhouse props (#2903)
* Denormalize clickhouse props

* Add allow_denormalized_props option

* Use funnels

* fix funnel query

* Add more denormalized props

* Fix comma

* Use materialized columns instead of mat views

* duplicate ,'s

Co-authored-by: James Greenhill <fuziontech@gmail.com>
2021-01-20 12:38:27 +01:00
Eric Duong
d420c931a9
restore path custom events (#2923)
* add pg logic

* add in custom event values
2021-01-13 15:05:06 +01:00
Eric Duong
6241a2b8a3
Quick fix to limit time range on event prop query (#2844)
* add range limit on props

* add exception handler

* fix time limit
2021-01-04 14:02:38 -05:00
James Greenhill
2935a65dc5
MV -> View for events_with_array_props_view and remove EVENT_PROP_TABLE_SQL (#2766)
* MV -> View for events_with_array_props_view

* sort

* convert another mv into view because mv's are not triggered by views

* pedantic - no mat in naming

* remove unused tables
2020-12-15 10:59:28 -08:00
Eric Duong
8fbbe679f5
Stickiness improvement and filter refactor (#2638)
* stickiness filter refactor

* stickiness clickhouse

* parametrize clickhouse trunc

* add interval tests

* fix type and casing

* change name to interval not period

* change defaults

* remove offsets

* move stickinses people endpoint

* move imports

* remove unused imports

* fix time defaults

* swap endpoint

* add interval tests

* move api test

* fix all time calculation and add team_id filter to earliest timestamp ch
2020-12-04 20:42:01 +01:00
Tim Glaser
9823e38163
Move away from ewap to speed up queries (#2540)
* Move away from ewap to speed up queries

* Lower limit to 10
2020-11-30 11:22:31 +01:00
Tim Glaser
ac24aa903e
Revert "Event usage split clickhouse queries (#2388)" (#2469)
This reverts commit 7e343c2a89.
2020-11-23 13:31:02 +01:00
Tim Glaser
7e343c2a89
Event usage split clickhouse queries (#2388)
* Event usage split clickhouse queries

* Ignore

* Remove unnecessary code

* Use Karl's refactor
2020-11-23 12:40:09 +01:00
Tim Glaser
7bd0ea7039
Fix fetch single event clickhouse (#2436) 2020-11-19 11:27:29 +01:00
Tim Glaser
4e610a0136
Fix aggregates clickhouse (#2432)
* Fix aggregates clickhouse

* remove join
2020-11-18 16:30:09 +01:00
Tim Glaser
25a053cda4
Manage events view (#2319)
* Finish the local dev w/ proto setup

* WIP manage events view

* Add task, add interface etc

* Move everything to 'manage events' view

* Move all settings into single dropdown (can be reverted)

* Urls for tabs

* Fix migration

* Clickhouse and humanize volume

* Fix cypress test

* Fix sidebar cypress

* Fix cypress again

* Fix some small issues

* Address comments

* Corect naming

* Fix test'

Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-11-13 14:59:08 +01:00
James Greenhill
ccd70fa3f9
Finish the local dev w/ proto setup (#2254) 2020-11-06 17:15:19 -08:00
Tim Glaser
efc62e1999
Fix feature flags clickhouse (#2170)
* remove person properties up to date

* remove person props mv

* move latest person

* prune rest of person materialized

* missing parenth

* add type

* remove migration

* Fix feature flags clickhouse

* Fix feature flags clickhouse

* Fix types

* Fix stuff

* Silly me

Co-authored-by: Eric <eeoneric@gmail.com>
2020-11-05 16:01:32 +01:00
Eric Duong
06cf6562e3
Filter person_distinct_id table further before joining (#2028)
* add aliases

* select specific
2020-10-26 09:18:11 -04:00
James Greenhill
b74d06a96a
Create a write ahead log for cloud event processing (#1962)
* Create a write ahead log for cloud event processing

* mypy fix

* if we are on app (ee) don't log to postgres

* don't disable writing to postgres
2020-10-21 20:35:07 +02:00
Tim Glaser
00153764ef
Simplify clickhouse query (#1948) 2020-10-21 09:48:08 +02:00
Tim Glaser
484f33b951
Use todate to speed up ordering (#1908)
* Clickhouse use elements chain

* Fix stuff

* Add action tests and start regex

* Progress

* Progress part deux

* Fix everything

* Add tag name filtering

* Fix funnels

* Fix tag name regex

* Fix ordering

* Fix type issues

* Fix empty nth-child

* Remove commented code

* Split with semicolon and escaped quotes

* Specify all select columns

* Use toDate to speed up ordering

* Add another check

* Try debugging test

* Bla

* aaaaa

* cba

* Only return 100 results

* not even sure

* order by desc both

* Add proper test

Co-authored-by: Eric <eeoneric@gmail.com>
2020-10-20 10:37:42 +02:00
Eric Duong
8e5347b4e1
Implement property filtering operators (#1886)
* change parsing to include operators'

* make properties test into factory

* add clickhouse test implementation and fix another test

* add custom test to clickhouse filter tests

* all tests besides json filtering

* add json test

* fix tests

* fix type errors
2020-10-19 06:01:01 -04:00
Tim Glaser
80d20e385b
Clickhouse use elements chain (#1849)
* Clickhouse use elements chain

* Fix stuff

* Add action tests and start regex

* Progress

* Progress part deux

* Fix everything

* Add tag name filtering

* Fix funnels

* Fix tag name regex

* Fix ordering

* Fix type issues

* Fix empty nth-child

* Remove commented code

* Split with semicolon and escaped quotes

* Specify all select columns
2020-10-16 14:07:03 +02:00
Tim Glaser
2cad2b16ec
[Clickhouse] fix event filtering (#1804) 2020-10-02 20:31:53 +02:00
Tim Glaser
88c54896d0
Fix person querying (#1797)
* convert sessions table logic to TS

* convert rest of sessions to TS

* sessions table logic refactor, store date in the url

* add back/forward buttons

* load sessions based on the URL, not after mount --> avoids duplicate query if opening an url with a filter

* prevent multiple queries

* throw error if failed instead of returning an empty list

* date from filters

* rename offset to nextOffset

* initial limit/offset block

* indent sql

* support limit + offset

* load LIMIT+1 sessions in postgres, pop last and show load more sign. (was: show sign if exactly LIMIT fetched)

* based offset is always 0

* default limit to 50

* events in clickhouse sessions

* add elements to query results

* add person properties to sessions query response

* show seconds with two digits

* fix pagination, timestamp calculation and ordering on pages 2 and beyond

* mypy

* fix test

* add default time to fix test, fix some any(*) filter issues

* remove reverse

* WIP event list

* Events progress

* Finish off event listing, skip live actions for now

* Fix mypy

* Fix mypy again

* Try fixing mypy

* Fix assertnumqueries

* Fix tests

* Fix tests

* fix test

* Fix tests

* Fix tests

* Fix tests again

* Fix person querying

* Fix flake

* Fix person stuff

* Fix test

Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>
2020-10-02 19:30:05 +02:00
Tim Glaser
64a5ad7c9a
Fix ambiguous timestamp ordering (#1792) 2020-10-01 16:48:25 +02:00
Tim Glaser
92e8bbd283
[Clickhouse] Event list (#1787)
* convert sessions table logic to TS

* convert rest of sessions to TS

* sessions table logic refactor, store date in the url

* add back/forward buttons

* load sessions based on the URL, not after mount --> avoids duplicate query if opening an url with a filter

* prevent multiple queries

* throw error if failed instead of returning an empty list

* date from filters

* rename offset to nextOffset

* initial limit/offset block

* indent sql

* support limit + offset

* load LIMIT+1 sessions in postgres, pop last and show load more sign. (was: show sign if exactly LIMIT fetched)

* based offset is always 0

* default limit to 50

* events in clickhouse sessions

* add elements to query results

* add person properties to sessions query response

* show seconds with two digits

* fix pagination, timestamp calculation and ordering on pages 2 and beyond

* mypy

* fix test

* add default time to fix test, fix some any(*) filter issues

* remove reverse

* WIP event list

* Events progress

* Finish off event listing, skip live actions for now

* Fix mypy

* Fix mypy again

* Try fixing mypy

* Fix assertnumqueries

* Fix tests

* Fix tests

* fix test

* Fix tests

* Fix tests

* Fix tests again

Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>
2020-10-01 15:47:35 +02:00
James Greenhill
379518e285
"Clickhouse Features V2 (#1565)" (#1750)
* initial

* migration command

* migrations working

* add modelless views for clickhouse

* initial testing structure

* use test factory

* scaffold for all tests

* add insight and person api

* add basic readme

* add client

* change how migrations are run

* add base tables

* ingesting events

* restore delay

* remove print

* updated testing flow

* changed sessions tests

* update tests

* reorganized sql

* parametrize strings

* element list query

* change to seralizer

* add values endpoint

* retrieve with filter

* pruned code to prepare for staged merge

* working ingestion again

* tests for ee

* undo unneeded tests right now

* fix linting

* more typing errors

* fix tests

* add clickhouse image to workflow

* move to right job

* remove django_clickhouse

* return database url

* run super

* remove keepdb

* reordered calls

* fix type

* fractional seconds

* fix type error

* add checks

* remove retention sql

* fix tests

* add property storage and tests

* merge master

* fix tests

* fix tests

* .

* remove keepdb

* format python files

* update CI env vars

* Override defaults and insecure tests

* Update how ClickHouse database gets evaluated

* remove bootstrapping clickhouse database routine

* Don't initialize the clickhouse connection unless we say it's primary

* .

* fixed id generation

* remove dump

* black settings

* empty client

* add param

* move docker-compose for ch to ee dir

* Add _public_ key to repo for verifying self signed cert on server

* update ee compose file for ee dir

* fix a few issues with tls in migrations

* update migrations to be flexible about storage profile and engine

* black settings

* add elements prop tables

* add elements prop tables

* working filter

* refactored

* better url handling

* add mapping table

* add processing to worker task

* working cohort with actions

* add cohort property filtering

* add cohort property filtering

* reformat and add cohort processing

* prop clauses

* add util

* add more util

* add clickhouse modifier

* Clickhouse Sessions (#1623)

* sessions sql

* skeleton

* add endpoint

* better tests

* sessions list

* merge clickhouse-actions

* added session endpoint

* sessions sql working again

* add clickhouse modifier

* session avg with props working

* add dist

* tests working (no list)

* list working

* add formatting

* more formatting

* fix tests

* dummy commit

* fix types

* remove unnecessary improt

* ignore type when importing from ee in task

* fix test running

* Clickhouse Trends Base (#1609)

* initial working

* date param almost working

* fix date range and labels

* fixed monthly math

* handle compare

* change table

* using new event ingestion

* direct query actions working

* remove interface

* fix date range

* properties initial working

* handle operator

* handle operator

* move timestamp parse

* move more to util

* inital breaking down working

* working cohort breakdown

* some tests running

* fix sessions

* cohort tests

* action and interval test

* reorder cohort filtering

* rename retention test

* fix inits

* change multitenancy tests

* fix types

* fix optional types

* replace ch_client.execute with sync_execute

* replace ch_client.execute with sync_execute, part 2

* Clickhouse Stickiness + Process Event (#1654)

* generate clickhouse uuid script

* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG

* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function

* Fix test_process_event_ee.py, remove all calls to Person.objects.*

* add back util

* fix broken imports

* improve process_event test clickhouse queries

* Basic stickiness query

* Clickhouse Stickiness tests

* stickiness test [WIP, actions fail]

* generate clickhouse uuid script

* change default test runner if PRIMARY_DB=clickhouse

* fix stickiness test for actions

* fix merge bug

* remove _create_person stub; cohort person_id is UUID now

* fix typing

* Clickhouse trends process math (#1660)

* most of process math works

* all process math

* fix ordering issue

* unusued imports

* update property comparison for process_event_ee

* indentation wrong missing calls

* demo users and events (#1661)

* finish breakdown filtering tests and reformat label function

* add increment to demo_data

* update demo data populating

* Add people endpoint for ch (#1670)

* add people endpoint for ch

* stickiness people

* fix value padding

* add process math to breakdown and

* add limit

* fix tests

* condensed code

* converted test to factory

* add people tests

* add month handling

* add typing fix

* change people test handling

* fix tests

* Clickhouse funnels 2 (#1668)

* add elements to create_event

* WIP closes #1663 Add funnels to clickhouse

* Make funnels work

* Clean up

* Move filtering around

* Add mypy tests and fix

* Performance improvements

* fix person tests again

* add people for funnel endpoint

* fix prop numbering

Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>

* merge master

* add retention

* update types

* more typing errors

* fix types

* bug with kafka payload, elements insert, and demo data

* Clickhouse Paths (#1657)

* paths clickhouse test (fails)

* add elements to create_event

* make this fail for clickhouse

* hardcoded query that returns good results for $pageviews, no filters yet

* clean up queries

* bound by time, fix 30min new session boundary

* support screen and custom events

* add properties filter

* paths url

* filter by path start

* better path start test

* even better path start test

* start from the first "path start" in a group

* test for person_id in paths

* partition by person_id for POSTGRES paths

* partition by person_id for Clickhouse paths

* clean up order in paths test

* clean up order in paths test

* join elements

* force element order on element group creation

* remove "order" when creating elements in tests and demo

* get list of elements for paths

* add limit to paths query

* use materialized view

* rename "element_hash" to "elements_hash" (no change in db)

* cull rows that are definitely unused

* simplify query

* New highly optimized paths clickhouse query

* start_point for $autocapture paths

* extract event property values from clickhouse

* prevent crash

* select one element sql

* get elements for event

* remove lodash

* remove host from $pageview path elements if same domain as incoming path

* show metadata based on loaded paths filter, not in flight filter

* fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph

* add test that makes the Postgres paths query fail

* fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points

* create automatic /demo urls that match the real urls (no ending /)

* fix elements queries

* path element joins

* create persons via postgres in paths test

* change serializers back to id

* fix tests with uuid

* fix demo

* more bugs

* fix type

* change now to timezone aware

* [clickhouse] retention filters (#1725)

* implemented target entity and prop filtering

* add insight view override

* fix endpoint and filters

* include tests

* fix tests

* add period filtering

* .

* fix pg param name

* add filtering params to both queries in retention sql

* fix param again

* change to todatetime

* change tz to timezone

* add back timezone in model/event

* [clickhouse] feature flag endpoint requests (#1731)

* add feature flags to endpoints

* add flags to endpoints that check on request

* remove magic strings and fill in missing flags

* fix types

* add missing flag

* change from iso

* fix more timestamps and comparator

* change _people to get_people in actions view

* remove action and cohort populating

* change inheritance

* "Clickhouse Features V2 (#1565)"

This reverts commit 0b371d43ec.

* fix types

* change to super

* change to super x2

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 15:17:26 +01:00
James Greenhill
0b371d43ec
Revert "Clickhouse Features (#1565)" (#1748)
This reverts commit 24713b923d.
2020-09-29 12:11:28 +01:00
Eric Duong
24713b923d
Clickhouse Features (#1565)
* initial

* migration command

* migrations working

* add modelless views for clickhouse

* initial testing structure

* use test factory

* scaffold for all tests

* add insight and person api

* add basic readme

* add client

* change how migrations are run

* add base tables

* ingesting events

* restore delay

* remove print

* updated testing flow

* changed sessions tests

* update tests

* reorganized sql

* parametrize strings

* element list query

* change to seralizer

* add values endpoint

* retrieve with filter

* pruned code to prepare for staged merge

* working ingestion again

* tests for ee

* undo unneeded tests right now

* fix linting

* more typing errors

* fix tests

* add clickhouse image to workflow

* move to right job

* remove django_clickhouse

* return database url

* run super

* remove keepdb

* reordered calls

* fix type

* fractional seconds

* fix type error

* add checks

* remove retention sql

* fix tests

* add property storage and tests

* merge master

* fix tests

* fix tests

* .

* remove keepdb

* format python files

* update CI env vars

* Override defaults and insecure tests

* Update how ClickHouse database gets evaluated

* remove bootstrapping clickhouse database routine

* Don't initialize the clickhouse connection unless we say it's primary

* .

* fixed id generation

* remove dump

* black settings

* empty client

* add param

* move docker-compose for ch to ee dir

* Add _public_ key to repo for verifying self signed cert on server

* update ee compose file for ee dir

* fix a few issues with tls in migrations

* update migrations to be flexible about storage profile and engine

* black settings

* add elements prop tables

* add elements prop tables

* working filter

* refactored

* better url handling

* add mapping table

* add processing to worker task

* working cohort with actions

* add cohort property filtering

* add cohort property filtering

* reformat and add cohort processing

* prop clauses

* add util

* add more util

* add clickhouse modifier

* Clickhouse Sessions (#1623)

* sessions sql

* skeleton

* add endpoint

* better tests

* sessions list

* merge clickhouse-actions

* added session endpoint

* sessions sql working again

* add clickhouse modifier

* session avg with props working

* add dist

* tests working (no list)

* list working

* add formatting

* more formatting

* fix tests

* dummy commit

* fix types

* remove unnecessary improt

* ignore type when importing from ee in task

* fix test running

* Clickhouse Trends Base (#1609)

* initial working

* date param almost working

* fix date range and labels

* fixed monthly math

* handle compare

* change table

* using new event ingestion

* direct query actions working

* remove interface

* fix date range

* properties initial working

* handle operator

* handle operator

* move timestamp parse

* move more to util

* inital breaking down working

* working cohort breakdown

* some tests running

* fix sessions

* cohort tests

* action and interval test

* reorder cohort filtering

* rename retention test

* fix inits

* change multitenancy tests

* fix types

* fix optional types

* replace ch_client.execute with sync_execute

* replace ch_client.execute with sync_execute, part 2

* Clickhouse Stickiness + Process Event (#1654)

* generate clickhouse uuid script

* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG

* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function

* Fix test_process_event_ee.py, remove all calls to Person.objects.*

* add back util

* fix broken imports

* improve process_event test clickhouse queries

* Basic stickiness query

* Clickhouse Stickiness tests

* stickiness test [WIP, actions fail]

* generate clickhouse uuid script

* change default test runner if PRIMARY_DB=clickhouse

* fix stickiness test for actions

* fix merge bug

* remove _create_person stub; cohort person_id is UUID now

* fix typing

* Clickhouse trends process math (#1660)

* most of process math works

* all process math

* fix ordering issue

* unusued imports

* update property comparison for process_event_ee

* indentation wrong missing calls

* demo users and events (#1661)

* finish breakdown filtering tests and reformat label function

* add increment to demo_data

* update demo data populating

* Add people endpoint for ch (#1670)

* add people endpoint for ch

* stickiness people

* fix value padding

* add process math to breakdown and

* add limit

* fix tests

* condensed code

* converted test to factory

* add people tests

* add month handling

* add typing fix

* change people test handling

* fix tests

* Clickhouse funnels 2 (#1668)

* add elements to create_event

* WIP closes #1663 Add funnels to clickhouse

* Make funnels work

* Clean up

* Move filtering around

* Add mypy tests and fix

* Performance improvements

* fix person tests again

* add people for funnel endpoint

* fix prop numbering

Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>

* merge master

* add retention

* update types

* more typing errors

* fix types

* bug with kafka payload, elements insert, and demo data

* Clickhouse Paths (#1657)

* paths clickhouse test (fails)

* add elements to create_event

* make this fail for clickhouse

* hardcoded query that returns good results for $pageviews, no filters yet

* clean up queries

* bound by time, fix 30min new session boundary

* support screen and custom events

* add properties filter

* paths url

* filter by path start

* better path start test

* even better path start test

* start from the first "path start" in a group

* test for person_id in paths

* partition by person_id for POSTGRES paths

* partition by person_id for Clickhouse paths

* clean up order in paths test

* clean up order in paths test

* join elements

* force element order on element group creation

* remove "order" when creating elements in tests and demo

* get list of elements for paths

* add limit to paths query

* use materialized view

* rename "element_hash" to "elements_hash" (no change in db)

* cull rows that are definitely unused

* simplify query

* New highly optimized paths clickhouse query

* start_point for $autocapture paths

* extract event property values from clickhouse

* prevent crash

* select one element sql

* get elements for event

* remove lodash

* remove host from $pageview path elements if same domain as incoming path

* show metadata based on loaded paths filter, not in flight filter

* fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph

* add test that makes the Postgres paths query fail

* fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points

* create automatic /demo urls that match the real urls (no ending /)

* fix elements queries

* path element joins

* create persons via postgres in paths test

* change serializers back to id

* fix tests with uuid

* fix demo

* more bugs

* fix type

* change now to timezone aware

* [clickhouse] retention filters (#1725)

* implemented target entity and prop filtering

* add insight view override

* fix endpoint and filters

* include tests

* fix tests

* add period filtering

* .

* fix pg param name

* add filtering params to both queries in retention sql

* fix param again

* change to todatetime

* change tz to timezone

* add back timezone in model/event

* [clickhouse] feature flag endpoint requests (#1731)

* add feature flags to endpoints

* add flags to endpoints that check on request

* remove magic strings and fill in missing flags

* fix types

* add missing flag

* change from iso

* fix more timestamps and comparator

* change _people to get_people in actions view

* remove action and cohort populating

Co-authored-by: James Greenhill <jams@uber.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
2020-09-29 06:36:50 -04:00
James Greenhill
2bb97b8efa
Do not shadow Kafka default columns _timestamp and _offset (#1718)
* Do not shadow Kafka default columns _timestamp and _offset

* drop the columns only on the kafka_ tables

* include data types

* fields cleanups

* datetime -> datetime64

* double import datetime
2020-09-25 16:23:48 +01:00
James Greenhill
1ed6263a71
Create Omni-Person model for managing people in Clickhouse (#1712)
* Create Omni-Person model for managing people in Clickhouse

* type fixes

* rebase all the things

* cleanups

* id -> uuid for events in clickhouse

* cleanups and type checks

* Further cleanups and uuid conversions

* kafka fix

* break out serializer across kafka clients

* fix a few bugs w/ datetime types

* basic fix for people kafka table

* fix migration errors (copy pasta errors)

* Use KafkaProducer for Omni Person emitting

* setup mock kafka producer

* undo some work for inserting

* Test TestKafkaProducer

* change if order, obvious mistake

* remove unnecessary function arg

* Fix getters for new column

* Test fixes

* mirror columns across element queries

* firm up handling of timestamps

* only return timestamps for handle_timestamp

* Correct heroku config for Kafka
2020-09-25 11:05:50 +01:00
Marius Andra
dd7e38c5b5
Clickhouse Elements Dedup (based on master) (#1698)
* Use ReplacingMergeTree for elements, remove element_groups and use elements_hash as a virtual "pk"

* remove unused ELEMENT_GROUP_TABLE_SQL

* merge fixes

* use redis cache to avoid writing duplicate elements to clickhouse

* move fakeredis to requirements.txt

* add team_id to cache key

* remove elements_group kafka table references

* add elements_hash to clickhouse element serializer

* fix cache key

* rename few keys

* add test runner to ease pycharm dev

* fix a some mypy errors

* remove typo

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-24 06:47:28 -04:00
Marius Andra
1eeed28751
Fix Master EE code (#1701)
* add test runner to ease pycharm dev

* fix broken import

* drop and recreate the clickhouse test db before running tests

* fix person uuid str json serialization issue

* make kafka optional in tests

* fix inits

* remove need for kafka in person.py

* fix a bunch of mypy errors

* fix function and add process_event to pipeline

* fixed missing params and tests

* change uuid and fix types

* types

* optimize for merge prop test

* make ClickhouseProducer to produce to clickhouse one way or another

* annotate types

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-09-24 06:14:17 -04:00
James Greenhill
97a665d277
Leverage Postgres for Persons and reorder order by's on clickhouse" (#1681)
* Query optimizations

* more sql optimizations

* checkpoint

* fix migration

* add UUID field to person

* use django signals to signal that clickhouse needs to be updated

* cleanup person logic

* cleanups

* update migration

* Don't setup the django signals unless we are for sure using ee setup

* expecting back to 30 queries for capturing with new person

* add .venv to .gitignore

* add env back to .gitignore

Co-authored-by: Ubuntu <ubuntu@ip-172-31-73-18.ec2.internal>
2020-09-22 13:41:08 +01:00
James Greenhill
2fd4fc90d4
Materialize Views to wrap data coming in from Kafka for Events, Elements, People (#1678) 2020-09-17 11:22:00 -07:00
James Greenhill
adeb7694cc
Publish events to Kafka for consumption (#1644)
* Publish events to Kafka for consumption

* Commit avro idl's for event schemas

* convert client to use github.com/dpkp/kafka-python

* events loaded into clickhouse from Kafka

* remove cruft

* Publish events to Kafka for consumption

* convert client to use github.com/dpkp/kafka-python

* remove cruft

* include kafka migrations

* bugfixes for migrations

* use constants for consistency

* wrap up local migrations

* small fixes

* tune ups
2020-09-15 20:04:38 -07:00
Marius Andra
50c683e691
Clickhouse process event (#1652)
* generate clickhouse uuid script

* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG

* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function

* Fix test_process_event_ee.py, remove all calls to Person.objects.*

* add back util

* fix broken imports

* improve process_event test clickhouse queries

* change property parsing

* indentation wrong missing calls

* uuid4 instead of call to CH

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-09-15 12:40:35 -07:00
James Greenhill
dedf5582c1
first chunk of clickhouse framework (#1613)
* first chunk of clickhouse framework

* prod, not dev docker-compose.yml

* add clickhouse sql files
2020-09-08 16:12:27 -07:00
James Greenhill
c2e03a3a46
Revert this to bring back the working copy of ee / clickhouse (#1588) 2020-09-04 21:12:07 -07:00
James Greenhill
5506135c3c
Bring back clickhouse changes along with queue size metrics (#1579)
* Revert "Revert "Clickhouse setup (#1463)""

This reverts commit 7f2cab4b93.

* add queue backlog to _stats endpoint

* celery queue length into heartbeat -> statsd

* reformat

* type check fixes

* bump expected number of queries to 8

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-04 13:44:53 -07:00