* cherry pick client from metahog
* cherrypick celery and requirements from metahog
* Change key to be based on hash of query and add test
* test that caching works
* black formatting
* Remove last references to uuid since it's not a uuid anymore
* don't statically set CLICKHOUSE_DATABASE
* mypy fixes (oof)
* add more tests!
* last test
* black format again
* only a bit of feedback incorporated - more to come
* add query_id override, force, and tests
* black format
* Flake8 and test docs
* black format tests
* mypy fixes
* from_json typing pains
* Feedbacked
* mypy feedback
* pin redis to 6.x
* fix(sharding): Improve the 0004 sharding migration
This change:
- Makes the rollbacks always work by
- Fixing some operations from before
- Creating/deleting materialized views correctly
- Ensuring zookeeper paths are unique
- Handles an edge case around moving tables by retrying
* Rename a parameter
* Verify migration status
* Update a test
* >=
* Update ee/clickhouse/errors.py
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* Update code comment
* Update log message
* Type:ignore
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* Make CLICKHOUSE_REPLICATION default to True
* Update some insert statements
* Create distributed tables during tests
* Delete from sharded_events
* Update test_migrations_not_required.py
* Improve 0002_events_sample_by is_required
1. SHOW CREATE TABLE is truncated if table has tens of materialized
columns, reasonably causing failures
2. We need to handle CLICKHOUSE_REPLICATED setups
* Update test_schema to handle CLICKHOUSE_REPLICATED, better test naming
* Fix issue with materialized columns
Note: Should make sure that these tests have coverage both ways
* Update test for recordings TTL
* Reorder table creation
* Correct schema for materialized columns on distributed tables
* Do correct setup in test_columns
* Lazily decide table to delete data from
* Make test_columns resilient to CLICKHOUSE_REPLICATION
* Make inserts resilient to CLICKHOUSE_REPLICATION
* Reset CLICKHOUSE_REPLICATION
* Create distributed tables conditionally
* Update snapshots, tests
* Fixup conftest
* Remove Event dependency on action api tests
* Remove a dead function
* Remove BaseQuery
* Remove dead imports
* Remove Event creation from posthog/test/test_person_model.py
* Remove Event.earliest_timestamp function
* Remove some unused event model methods
* Remove query_db_by_action + associated migration code
* Remove dead filtering methods from Events model
* Remove a dead test class
* Remove some event model usage
* Remove events model usage from actions test
* Remove session recording related views
* Remove model usage in posthog/queries/session_recordings/session_recording.py
* Remove old pg-session recording code
* Remove dead import
* Re-add missing dependency
* Make lint/tests pass
* Make filter tests uuid-based
* Remove event admin
* Move posthog/tasks/test/test_org_usage_report.py clickhouse version inline
* Remove postgres-specific code from org usage report
* Kill dead on_perform method
* Remove dead EventSerializer
* Remove a dead import
* Remove a dead command
* Clean up test, dont create a model
* Remove dead code
* Clean up test_element
* Clean up test event code
* Remove a dead function
* Clean up dead imports
* Remove dead code
* Code style cleanup
* Fix foss test
* Simplify fn
* Org usage fixup #3
* Add logging to all postgresql queries with query context
Uses the exact same pattern as we do currently for clickhouse, just
hooking in there differently
* Support psycopg2.sql.SQL
* Better docs
* update a test
* implement multi property breakdown as an array from the spike
* correct type hint on method
* really resolve the conflict
* don't break groups
* refactor test assertions for breakdown cases
* adds a test to prove that funnels can receive a string and not an array
* protect saved dashboards from multi property changeover
* WIP
* multi breakdown working with funnel step breakdown
* prove funnel step person breakdown works with multi property breakdown
* don't need to protect cached dashboards from multi property breakdowns when they can't be set from the UI
* capitalise keywords in SQL
* convert a single test to journey helper
* wip
* account for funnel step breakdown sometimes being an array sent as a string
* safer handling of funnel step breakdown
* convert a test
* revert commits that made things worse
* simpler handling of funnel step breakdown
* no need to change funnel step breakdown type hint
* update imports
* guard against integer properties
* compare funnel step breakdown differently now there are arrays involved
* look for strict intersection for funnel step breakdown
* update test snapshots
* need to set breakdown_values earlier in processing
* remove tests that cover speculative functionality
* update snapshot
* move setup of breakdown values back out of update_filters
* update snapshots
* remove a sql parameter that was never assigned to
* Update ee/clickhouse/models/test/test_property.py
Co-authored-by: Harry Waye <harry@posthog.com>
* Update ee/clickhouse/queries/funnels/base.py
Co-authored-by: Harry Waye <harry@posthog.com>
* address review comment to simplify reading json expressions for breakdown
* clarify why some uses of get_property_string_expr escape params before passing
* add keyword arguments for calls to getting property string expressions in funnels
* switch to keyword arguments in test helper method
* fix parameterised test
* add multi property materialized column tests
* introduce the shim to allow new API for breakdown properties
* can't remove the naive funnel step breakdown list detection
* move funnel step breakdown list handling
* better handling of numeric funnel step breakdown values
* update snapshots
Co-authored-by: Harry Waye <harry@posthog.com>
* dev(clickhouse): strip out comments before executing sql
This is so we can easily copy/paste from e.g. Metabase by querying the
system.query_log. In metabase is doesn't display new lines (although you
can download to file for this), but it's not very practical.
* test(clickhouse): use `capture_select_queries` in comment strip test
* test(clickhouse): only sub. params if non-insert query
This parallels `clickhouse_driver` behaviour.
* chore(clickhouse): move sql preparation to dedicated function
* refactor: rearrange func and type definitions
* Show clickhouse disk and system.stats on /instance/status
Part of https://github.com/PostHog/vpc/issues/45
* Show stats on clickhouse table sizes, remove postgres table size stats
* Add metric for whether clickhouse is alive
* Move clickhouse stats above redis
* Compile requirements-dev.txt with latest pip-tools
* Install pytest
* Avoid picking up factories as tests
* New runner
* Always set TEST env variable running tests
Some of our tests rely on it.
* Remove repetition
* Fix a broken test
* Cut down noise from bin/tests
* Rename test factory
* Fix stickiness filter
* Skip a broken test
This has been broken since numpy removal PR. Sadly tests were not
running for this submodule
* Fix import on ee
* Run ee tests properly
The django_db_setup fixture will be automatically run when running ee/
module tests.
* Make tests run on CI
* Include REDIS_URL, fix cloud
* Set TEST env variable
* Hack cloud tests to work
* Attempt at workflow fix
* Import Person model when running ee tests
This module implicitly adds hooks, so this is needed when running tests
* Respect reuse-db for clickhouse
* Add custom markers to avoid warnings
* pytest: use ch test database always
Accidentally wiped by ch setup a few times without this. Oops
* Remove repetition in tests
* Pytest: Always run migrations
Testing a state cleanup fix
* Use same DB in conftest and main code
* Pytest: autoset TEST setting without env variable
* fix broken test
Co-authored-by: eric <eeoneric@gmail.com>
* Debug CH queries
* tests
* Logout when impersonated session
* Put "Debug ClickHouse queries" in its own command
* Clean up ClickHouse modal
Co-authored-by: Michael Matloka <dev@twixes.com>
* Finish the local dev w/ proto setup
* WIP manage events view
* Add task, add interface etc
* Move everything to 'manage events' view
* Move all settings into single dropdown (can be reverted)
* Urls for tabs
* Fix migration
* Clickhouse and humanize volume
* Fix cypress test
* Fix sidebar cypress
* Fix cypress again
* Fix some small issues
* Address comments
* Corect naming
* Fix test'
Co-authored-by: James Greenhill <fuziontech@gmail.com>
* Basic caching for Clickhouse to redis
* Use redis for caching results
* add tests and fix bugs
* fix mypy
* add fakeredis as req
* add fakeredis to github action for testing
* add fakeredis to cloud tests too
* pickle -> json
* bytes
* json in tests
* tuplefy
* skip celery for ee path
* mypy fixes
* take celery out and fix types as cleanly and performant as possible
* add timing
* setup statsd, need to clean this up
* use sane defaults for statsd
* Add scheduled task to wipe session recordings
* Create a new table for session recording
* Save snapshot events to different table
* Use SessionRecordingEvent over Events everywhere
We can remove a ton of cruft this way as well
* Add missing signature
* Extract util from models/event
* Attempt to update ingest side of clickhouse session recording events
Note that it's using main kafka topic - not sure if a good idea.
* Get separate table in ch working for session recording events
* WIP: query sessions
* Make both session recording queries work
* Make linter happy
* Rebase migration
* Make tests work
* Apply a TTL to session recordings and other configuration:
- toYYYYMMDD partitioning should be smoother with TTL setup
- TTL achieves not needing to archive the data ourselves
- index_granularity will enable smaller reads per session_id
- ORDER BY clause is to make single session as well as time range query
reasonable
* Convert retention cronjob to new model
* Add tests to process_event changes
* Add test for ee_capture change
* Fixup migration
* Make clickhouse tests drop/create session recording tables
* Make TTL not be there in tests
Otherwise writes get eaten by it during tests when mocking time
* Fix retention task
Co-authored-by: Tim Glaser <tim@glsr.nl>
* add new table migrations and change table names
* include necessaray config for new tables in tests
* fix tests and table
* fix table name param
* add populate clause
* added table for key value person props
* adjust person filtering to use new table
* .
* add ordering on updated_at
* add back all the condition handling on persons filtering endpoint
* fix typgin
* remove print
* re-order sort key for persons_up_to_date
Co-authored-by: James Greenhill <fuziontech@gmail.com>
* Clickhouse use elements chain
* Fix stuff
* Add action tests and start regex
* Progress
* Progress part deux
* Fix everything
* Add tag name filtering
* Fix funnels
* Fix tag name regex
* Fix ordering
* Fix type issues
* Fix empty nth-child
* Remove commented code
* Split with semicolon and escaped quotes
* Specify all select columns
* initial
* migration command
* migrations working
* add modelless views for clickhouse
* initial testing structure
* use test factory
* scaffold for all tests
* add insight and person api
* add basic readme
* add client
* change how migrations are run
* add base tables
* ingesting events
* restore delay
* remove print
* updated testing flow
* changed sessions tests
* update tests
* reorganized sql
* parametrize strings
* element list query
* change to seralizer
* add values endpoint
* retrieve with filter
* pruned code to prepare for staged merge
* working ingestion again
* tests for ee
* undo unneeded tests right now
* fix linting
* more typing errors
* fix tests
* add clickhouse image to workflow
* move to right job
* remove django_clickhouse
* return database url
* run super
* remove keepdb
* reordered calls
* fix type
* fractional seconds
* fix type error
* add checks
* remove retention sql
* fix tests
* add property storage and tests
* merge master
* fix tests
* fix tests
* .
* remove keepdb
* format python files
* update CI env vars
* Override defaults and insecure tests
* Update how ClickHouse database gets evaluated
* remove bootstrapping clickhouse database routine
* Don't initialize the clickhouse connection unless we say it's primary
* .
* fixed id generation
* remove dump
* black settings
* empty client
* add param
* move docker-compose for ch to ee dir
* Add _public_ key to repo for verifying self signed cert on server
* update ee compose file for ee dir
* fix a few issues with tls in migrations
* update migrations to be flexible about storage profile and engine
* black settings
* add elements prop tables
* add elements prop tables
* working filter
* refactored
* better url handling
* add mapping table
* add processing to worker task
* working cohort with actions
* add cohort property filtering
* add cohort property filtering
* reformat and add cohort processing
* prop clauses
* add util
* add more util
* add clickhouse modifier
* Clickhouse Sessions (#1623)
* sessions sql
* skeleton
* add endpoint
* better tests
* sessions list
* merge clickhouse-actions
* added session endpoint
* sessions sql working again
* add clickhouse modifier
* session avg with props working
* add dist
* tests working (no list)
* list working
* add formatting
* more formatting
* fix tests
* dummy commit
* fix types
* remove unnecessary improt
* ignore type when importing from ee in task
* fix test running
* Clickhouse Trends Base (#1609)
* initial working
* date param almost working
* fix date range and labels
* fixed monthly math
* handle compare
* change table
* using new event ingestion
* direct query actions working
* remove interface
* fix date range
* properties initial working
* handle operator
* handle operator
* move timestamp parse
* move more to util
* inital breaking down working
* working cohort breakdown
* some tests running
* fix sessions
* cohort tests
* action and interval test
* reorder cohort filtering
* rename retention test
* fix inits
* change multitenancy tests
* fix types
* fix optional types
* replace ch_client.execute with sync_execute
* replace ch_client.execute with sync_execute, part 2
* Clickhouse Stickiness + Process Event (#1654)
* generate clickhouse uuid script
* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG
* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function
* Fix test_process_event_ee.py, remove all calls to Person.objects.*
* add back util
* fix broken imports
* improve process_event test clickhouse queries
* Basic stickiness query
* Clickhouse Stickiness tests
* stickiness test [WIP, actions fail]
* generate clickhouse uuid script
* change default test runner if PRIMARY_DB=clickhouse
* fix stickiness test for actions
* fix merge bug
* remove _create_person stub; cohort person_id is UUID now
* fix typing
* Clickhouse trends process math (#1660)
* most of process math works
* all process math
* fix ordering issue
* unusued imports
* update property comparison for process_event_ee
* indentation wrong missing calls
* demo users and events (#1661)
* finish breakdown filtering tests and reformat label function
* add increment to demo_data
* update demo data populating
* Add people endpoint for ch (#1670)
* add people endpoint for ch
* stickiness people
* fix value padding
* add process math to breakdown and
* add limit
* fix tests
* condensed code
* converted test to factory
* add people tests
* add month handling
* add typing fix
* change people test handling
* fix tests
* Clickhouse funnels 2 (#1668)
* add elements to create_event
* WIP closes #1663 Add funnels to clickhouse
* Make funnels work
* Clean up
* Move filtering around
* Add mypy tests and fix
* Performance improvements
* fix person tests again
* add people for funnel endpoint
* fix prop numbering
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>
* merge master
* add retention
* update types
* more typing errors
* fix types
* bug with kafka payload, elements insert, and demo data
* Clickhouse Paths (#1657)
* paths clickhouse test (fails)
* add elements to create_event
* make this fail for clickhouse
* hardcoded query that returns good results for $pageviews, no filters yet
* clean up queries
* bound by time, fix 30min new session boundary
* support screen and custom events
* add properties filter
* paths url
* filter by path start
* better path start test
* even better path start test
* start from the first "path start" in a group
* test for person_id in paths
* partition by person_id for POSTGRES paths
* partition by person_id for Clickhouse paths
* clean up order in paths test
* clean up order in paths test
* join elements
* force element order on element group creation
* remove "order" when creating elements in tests and demo
* get list of elements for paths
* add limit to paths query
* use materialized view
* rename "element_hash" to "elements_hash" (no change in db)
* cull rows that are definitely unused
* simplify query
* New highly optimized paths clickhouse query
* start_point for $autocapture paths
* extract event property values from clickhouse
* prevent crash
* select one element sql
* get elements for event
* remove lodash
* remove host from $pageview path elements if same domain as incoming path
* show metadata based on loaded paths filter, not in flight filter
* fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph
* add test that makes the Postgres paths query fail
* fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points
* create automatic /demo urls that match the real urls (no ending /)
* fix elements queries
* path element joins
* create persons via postgres in paths test
* change serializers back to id
* fix tests with uuid
* fix demo
* more bugs
* fix type
* change now to timezone aware
* [clickhouse] retention filters (#1725)
* implemented target entity and prop filtering
* add insight view override
* fix endpoint and filters
* include tests
* fix tests
* add period filtering
* .
* fix pg param name
* add filtering params to both queries in retention sql
* fix param again
* change to todatetime
* change tz to timezone
* add back timezone in model/event
* [clickhouse] feature flag endpoint requests (#1731)
* add feature flags to endpoints
* add flags to endpoints that check on request
* remove magic strings and fill in missing flags
* fix types
* add missing flag
* change from iso
* fix more timestamps and comparator
* change _people to get_people in actions view
* remove action and cohort populating
* change inheritance
* "Clickhouse Features V2 (#1565)"
This reverts commit 0b371d43ec.
* fix types
* change to super
* change to super x2
Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>
* initial
* migration command
* migrations working
* add modelless views for clickhouse
* initial testing structure
* use test factory
* scaffold for all tests
* add insight and person api
* add basic readme
* add client
* change how migrations are run
* add base tables
* ingesting events
* restore delay
* remove print
* updated testing flow
* changed sessions tests
* update tests
* reorganized sql
* parametrize strings
* element list query
* change to seralizer
* add values endpoint
* retrieve with filter
* pruned code to prepare for staged merge
* working ingestion again
* tests for ee
* undo unneeded tests right now
* fix linting
* more typing errors
* fix tests
* add clickhouse image to workflow
* move to right job
* remove django_clickhouse
* return database url
* run super
* remove keepdb
* reordered calls
* fix type
* fractional seconds
* fix type error
* add checks
* remove retention sql
* fix tests
* add property storage and tests
* merge master
* fix tests
* fix tests
* .
* remove keepdb
* format python files
* update CI env vars
* Override defaults and insecure tests
* Update how ClickHouse database gets evaluated
* remove bootstrapping clickhouse database routine
* Don't initialize the clickhouse connection unless we say it's primary
* .
* fixed id generation
* remove dump
* black settings
* empty client
* add param
* move docker-compose for ch to ee dir
* Add _public_ key to repo for verifying self signed cert on server
* update ee compose file for ee dir
* fix a few issues with tls in migrations
* update migrations to be flexible about storage profile and engine
* black settings
* add elements prop tables
* add elements prop tables
* working filter
* refactored
* better url handling
* add mapping table
* add processing to worker task
* working cohort with actions
* add cohort property filtering
* add cohort property filtering
* reformat and add cohort processing
* prop clauses
* add util
* add more util
* add clickhouse modifier
* Clickhouse Sessions (#1623)
* sessions sql
* skeleton
* add endpoint
* better tests
* sessions list
* merge clickhouse-actions
* added session endpoint
* sessions sql working again
* add clickhouse modifier
* session avg with props working
* add dist
* tests working (no list)
* list working
* add formatting
* more formatting
* fix tests
* dummy commit
* fix types
* remove unnecessary improt
* ignore type when importing from ee in task
* fix test running
* Clickhouse Trends Base (#1609)
* initial working
* date param almost working
* fix date range and labels
* fixed monthly math
* handle compare
* change table
* using new event ingestion
* direct query actions working
* remove interface
* fix date range
* properties initial working
* handle operator
* handle operator
* move timestamp parse
* move more to util
* inital breaking down working
* working cohort breakdown
* some tests running
* fix sessions
* cohort tests
* action and interval test
* reorder cohort filtering
* rename retention test
* fix inits
* change multitenancy tests
* fix types
* fix optional types
* replace ch_client.execute with sync_execute
* replace ch_client.execute with sync_execute, part 2
* Clickhouse Stickiness + Process Event (#1654)
* generate clickhouse uuid script
* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG
* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function
* Fix test_process_event_ee.py, remove all calls to Person.objects.*
* add back util
* fix broken imports
* improve process_event test clickhouse queries
* Basic stickiness query
* Clickhouse Stickiness tests
* stickiness test [WIP, actions fail]
* generate clickhouse uuid script
* change default test runner if PRIMARY_DB=clickhouse
* fix stickiness test for actions
* fix merge bug
* remove _create_person stub; cohort person_id is UUID now
* fix typing
* Clickhouse trends process math (#1660)
* most of process math works
* all process math
* fix ordering issue
* unusued imports
* update property comparison for process_event_ee
* indentation wrong missing calls
* demo users and events (#1661)
* finish breakdown filtering tests and reformat label function
* add increment to demo_data
* update demo data populating
* Add people endpoint for ch (#1670)
* add people endpoint for ch
* stickiness people
* fix value padding
* add process math to breakdown and
* add limit
* fix tests
* condensed code
* converted test to factory
* add people tests
* add month handling
* add typing fix
* change people test handling
* fix tests
* Clickhouse funnels 2 (#1668)
* add elements to create_event
* WIP closes #1663 Add funnels to clickhouse
* Make funnels work
* Clean up
* Move filtering around
* Add mypy tests and fix
* Performance improvements
* fix person tests again
* add people for funnel endpoint
* fix prop numbering
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Eric <eeoneric@gmail.com>
* merge master
* add retention
* update types
* more typing errors
* fix types
* bug with kafka payload, elements insert, and demo data
* Clickhouse Paths (#1657)
* paths clickhouse test (fails)
* add elements to create_event
* make this fail for clickhouse
* hardcoded query that returns good results for $pageviews, no filters yet
* clean up queries
* bound by time, fix 30min new session boundary
* support screen and custom events
* add properties filter
* paths url
* filter by path start
* better path start test
* even better path start test
* start from the first "path start" in a group
* test for person_id in paths
* partition by person_id for POSTGRES paths
* partition by person_id for Clickhouse paths
* clean up order in paths test
* clean up order in paths test
* join elements
* force element order on element group creation
* remove "order" when creating elements in tests and demo
* get list of elements for paths
* add limit to paths query
* use materialized view
* rename "element_hash" to "elements_hash" (no change in db)
* cull rows that are definitely unused
* simplify query
* New highly optimized paths clickhouse query
* start_point for $autocapture paths
* extract event property values from clickhouse
* prevent crash
* select one element sql
* get elements for event
* remove lodash
* remove host from $pageview path elements if same domain as incoming path
* show metadata based on loaded paths filter, not in flight filter
* fix order (all soures and targets in order, not all sources first, then all targets after) - makes for a better looking graph
* add test that makes the Postgres paths query fail
* fix postgres paths --> no fuzzy matching, breaks "starts with" for urls and gives too many incorrect start points
* create automatic /demo urls that match the real urls (no ending /)
* fix elements queries
* path element joins
* create persons via postgres in paths test
* change serializers back to id
* fix tests with uuid
* fix demo
* more bugs
* fix type
* change now to timezone aware
* [clickhouse] retention filters (#1725)
* implemented target entity and prop filtering
* add insight view override
* fix endpoint and filters
* include tests
* fix tests
* add period filtering
* .
* fix pg param name
* add filtering params to both queries in retention sql
* fix param again
* change to todatetime
* change tz to timezone
* add back timezone in model/event
* [clickhouse] feature flag endpoint requests (#1731)
* add feature flags to endpoints
* add flags to endpoints that check on request
* remove magic strings and fill in missing flags
* fix types
* add missing flag
* change from iso
* fix more timestamps and comparator
* change _people to get_people in actions view
* remove action and cohort populating
Co-authored-by: James Greenhill <jams@uber.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
Co-authored-by: Tim Glaser <tim.glaser@hiberly.com>