* refactor(ingestion): establish setup for json consumption from kafka into clickhouse [nuke protobuf pt. 1]
* address review
* fix kafka table name across the board
* Update posthog/async_migrations/test/test_0004_replicated_schema.py
* run checks
* feat(persons-on-events): add required person and group columns to events table
* rename
* update snapshots
* address review
* Revert "update snapshots"
This reverts commit 63d7126e08.
* address review
* update snapshots
* update more snapshots
* use runpython
* update schemas
* update more queries
* some improvements :D
* fix naming
* fix breakdown prop name
* update snapshot
* fix naming
* fix ambiguous test
* fix queries'
* last bits
* fix typo to retrigger tests
* also handle kafka and mv tables in migration
* update snapshots
* drop tables if exists
Co-authored-by: eric <eeoneric@gmail.com>
* Check async migrations instead of CLICKHOUSE_REPLICATION for mat columns
* Update a comment
* Default for CLICKHOUSE_REPLICATION
* add replication file
* Assert is replicated in tests
* Remove DROP TABLE query from cohortpeople migration
* Update snapshots
* Ignore migration in typechecker
* Truncate right table
* Add KAFKA_COLUMNS to distributed tables
* Make CLICKHOUSE_REPLICATION default to True
* Update some insert statements
* Create distributed tables during tests
* Delete from sharded_events
* Update test_migrations_not_required.py
* Improve 0002_events_sample_by is_required
1. SHOW CREATE TABLE is truncated if table has tens of materialized
columns, reasonably causing failures
2. We need to handle CLICKHOUSE_REPLICATED setups
* Update test_schema to handle CLICKHOUSE_REPLICATED, better test naming
* Fix issue with materialized columns
Note: Should make sure that these tests have coverage both ways
* Update test for recordings TTL
* Reorder table creation
* Correct schema for materialized columns on distributed tables
* Do correct setup in test_columns
* Lazily decide table to delete data from
* Make test_columns resilient to CLICKHOUSE_REPLICATION
* Make inserts resilient to CLICKHOUSE_REPLICATION
* Reset CLICKHOUSE_REPLICATION
* Create distributed tables conditionally
* Update snapshots, tests
* Fixup conftest
* Remove event admin
* Move posthog/tasks/test/test_org_usage_report.py clickhouse version inline
* Remove postgres-specific code from org usage report
* Kill dead on_perform method
* Remove dead EventSerializer
* Remove a dead import
* Remove a dead command
* Clean up test, dont create a model
* Remove dead code
* Clean up test_element
* Clean up test event code
* Remove a dead function
* Clean up dead imports
* Remove dead code
* Code style cleanup
* Fix foss test
* Simplify fn
* Org usage fixup #3
* version insights
* version and lock update
* make sure all tests work
* restore exception
* fix test
* fix test
* add specific id
* update plugin server test utils
* cleanup
* match filtering
* use timestamp comparison
* make tests work
* one more test field
* fix more tests
* more cleanup
* lock frontend when updating and restore refresh
* pass undefined
* add timestamp to background update
* use incrementer
* add field
* snapshot
* types
* more cleanup
* update tests
* remove crumbs
* use expressions
* make nullable
* batch delete
* fill null for static cohorts
* batch_delete
* typing
* remove queryset function
* working for unique_groups math
* fix types
* add null check
* update snapshots
* update payload
* update snapshots
* use constructor
* adjust queries
* introduce base class
* consolidate querying
* shared serializer and typed
* sort imports
* snapshots
* typing
* change name
* Add group model
```sql
BEGIN;
--
-- Create model Group
--
CREATE TABLE "posthog_group" ("id" serial NOT NULL PRIMARY KEY, "group_key" varchar(400) NOT NULL, "group_type_index" integer NOT NULL, "group_properties" jsonb NOT NULL, "created_at" timestamp with time zone NOT NULL, "properties_last_updated_at" jsonb NOT NULL, "properties_last_operation" jsonb NOT NULL, "version" bigint NOT NULL, "team_id" integer NOT NULL);
--
-- Create constraint unique team_id/group_key/group_type_index combo on model group
--
ALTER TABLE "posthog_group" ADD CONSTRAINT "unique team_id/group_key/group_type_index combo" UNIQUE ("team_id", "group_key", "group_type_index");
ALTER TABLE "posthog_group" ADD CONSTRAINT "posthog_group_team_id_b3aed896_fk_posthog_team_id" FOREIGN KEY ("team_id") REFERENCES "posthog_team" ("id") DEFERRABLE INITIALLY DEFERRED;
CREATE INDEX "posthog_group_team_id_b3aed896" ON "posthog_group" ("team_id");
COMMIT;
```
* Remove a dead import
* Improve typing for groups
* Make groups updating more generic, avoid mutation
This simplifies using the same logic for groups
Note there's a behavioral change: We don't produce a new kafka message
if nothing has been updated anymore.
* Rename a function
* WIP: Handle group property updates
... by storing them in postgres
Uses identical pattern to person property updates, except we handle
first-seen case within updates as well.
* Get rid of boolean option
* WIP continued
* fetchGroup() and upsertGroup()
* Test more edge cases
* Add tests for upsertGroup() in properties-updater
* Rename to PropertyUpdateOperation
* Followup
* Solve typing issues
* changed implementation to use pg
* unusd
* update type
* update snapshots
* rename and remove inlining
* restore bad merge code
* adjust types
* add flag
* remove var
* misnamed
* change to uuid
* make sure to use string when passing result
* remove from columnoptimizer logic and have group join logic implemented by event query classes per insight
* remove unnecessary logic
* typing
* remove dead imports
* remove verbosity
* update snapshots
* typos
* remove signals
* remove plugin excess
Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
* Extract GroupsJoinQuery
* Add test for breakdown filtering
* Unify breakdown mixins
* Allow passing breakdown_type == 'group' with breakdown_group_type_index
* Allow breakdown by group props in trends
* Add tests for trends breakdown_props function on group breakdowns
* Solve common issues
* Output snapshot diff into console
* Clean up materialized columns after tests
* Add zero protection
* Solve test failure
* Type math in Entity
* Allow passing group_type_index from FE to BE
* Get a initial query running
* Add group value filter if aggregating by groups
* Add snapshot testing for trends queries
* isort
* Update tests
* Add test for column_optimizer
* Update ee/clickhouse/queries/trends/util.py
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* Hotfix: Use materialized columns on cloud
This was broken since default_kind was different on Distributed tables
on cloud
* Improve __init__.py
* Make reverting DEFAULT column async, only ON CLUSTER for events table
* wip: pagination for persons on clickhouse funnels
* wip: added offset support for getting a list of persons; added support for conversion window;
* fixed mypy exception
* helper function to insert data for local testing
* moved generate code into separate class for more functionality later
* corrected person_distinct_id to use the person id from postgres
* minor corrections to generate local class along with addition of data cleanup via destroy() method
* reduce the number of persons who make it to each step
* moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes;
* funnel persons and tests
* initial implementation
* invoke the funnel or funnel trends class respectively
* add a test
* add breakdown handling and first test
* add test stubs
* remove repeats
* mypy corrections and PR feedback
* run funnel test suite on new query implementation
* remove imports
* corrected tests
* minor test updates
* correct func name
* fix types
* func name change
* move builder functions to funnel base
* add test classe for new funnel
* Handle multiple same events in the funnel (#4863)
* dedup + tests
* deep equality. Tests to come
* write test for entity equality
* finish testing funnels
* clean up comments
* add ability to specify per step or dropoff persons
* remove defaults
* remove funnel_window parameter unless it's needed
* add param to filters
* test api
* remove print
* fix tests
* change distribution
* add none condition for funnel step
* add order by
* remove funnel window days
Co-authored-by: Buddy Williams <buddy@posthog.com>
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* checkpoint: refactoring funnel trends so that they work correctly
* wip: refactoring funnel trends query to return the results we actually need
* wip: added in new query for testing
* wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality
* wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python
* moved code into funnel_trends for both logic and tests to isolate the concern
* reordered methods for readability
* wip: refactoring funnel trends to support filters
* wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL
* correct column name so that it's clearer
* added substitution variables to new query
* fixed missing substitution variable
* wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test
* several query corrections
* summarize funnel trends
* moved method down
* removed unused code
* added data quality checks
* corrected cohort size for tests
* test window size and incomplete status
* corrected a few names
* removed unnecessary comment
* commented out old funnel trends tests
* removed print statement
* removed old funnel trend code
* made funnel trends response match existing data structure layout
* removed unused imports
* removed more unused imports
* fixed mypy errors
* Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests;
* removed unused type comment
* corrected test to account for new funnel_window_days mixin
* fixed clickhouse funnel tests
* fixes for automated tests
* changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue
* corrections prompted by PR review
* corrected test to dict test with funnel_window_days
* Use `statshog` over python-statsd
More support for tags!
* Include custom tags for every query + add annotation to query
After this we can:
- Figure out from query logs where queries are coming from (speeding up debugging)
- Break down query speeds by user queries vs others (e.g. celery) --
better represents overall speed
- Can figure out how fast queries are on average for various teams
* Use tags in more queries over interpolation
This way we can set up more interesting graphs \o/
* Solve mypy error
* Fix a flaky test (due to ordering)