* add recordings to path query
* uncomment cache
* add clarifying comment
* works for start/end paths
* move to extra fields/properties
* add tests
* cleanup
* update ff name
* fix flaky test
* test and handle path_dropoff_key case
* refactor(lifecycle): simplify clickhouse sql logic
This updates the SQL to be comprised of two queries, one for getting
new, returning, and resurrecting periods of activity, one for getting
dormant periods right after periods of activity.
Refers to https://github.com/PostHog/posthog/issues/7382
* refactor(lifecyle): use `ClickhouseEventQuery` to build event query
* format
* Use bounded_person_activity_by_period for both sides of dormant join
* refactor(lifecycle): reduce pdi2 join by one
This means we're now under the current query memory limit for orgs with
around 20m distinct_ids. It does remove some readability though :(
* update snapshot
* Add further comments to query
* Add further comments to query
* Add further comments to query
* Remove dead variables
* Refactor person_query overriding
* Lifecycle refactoring continued
* Update lifecycle tests (except people ones)
* Make lifecycle people endpoint happy
* Remove django lifecycle tests
* Add some edge case tests
* Add missing type
Co-authored-by: Harry Waye <harry@posthog.com>
* Set time bounds for "all of time" filter
We won't display data points from before 2015 anymore, avoiding
confusion like in https://github.com/PostHog/posthog/issues/7626
* Disable dates before 2015, add tooltip
* Run queries against person_distinct_id2 when async migration is done
* Only write to clickhouse_person_unique_id topic if async migration is incomplete
* Update query snapshots
* Update plugin-server
* Adjust caching logic
* chore(lifecycle): add comments and CTEs to clickhouse sql
It was really big and confusing, but hopefully this clarifies a little
what is going on. As a followup PR I'll be doing some work to make the
query faster :fingerscrossed: but I think worth at least getting this
in, assuming I haven't broken any tests!
* update snapshots
* remove day references
* update snapshots
* fix(retention): fix breakdown people urls
This change returns people_url for each breakdown cohort in the
response. We also merge the initial and returning queries together,
as this makes it easier to align the people query also.
Note that I'm talking about person_id as opposed to actor_type etc.
but perhaps that can be a followup.
* clean up clickhouse params
* tidy up a little
* remove import
* remove non-breakdown specific code
* make cohort by initial event date a special breakdown case
* keep date for backwards compat
* Remove unused sql
* make test stable
* wip
* Get most of the tests working
* test(retention): remove graph retention test
We no longer need this, we have all the information we need from the
table response for retention, and can construct this on the frontend.
* revert any changes to posthog/queries/retention.py
* revert any changes to ee/clickhouse/models/person.py
* Revert posthog/queries/retention.py to merge-base
* Ensure actor id is a str
* Add type for actor serialiser for type narrowing
* run black
* sort imports
* Remove retention_actors.py
* fix typings
* format
* reverse str type
* sort imports
* rename
* split out functions
* remove deuplicate logic
* working
* fix type
* don't stringify
* fix test
* ordering doesn't matter
* trigger ci
Co-authored-by: eric <eeoneric@gmail.com>
* convert to actor form
* change var name
* remove unused imports
* typing issue
* use subquery
* bad import
* groups for general retention query
* actor in period
* update imports
* update test
* remove comment
* Create a new way to get distinct id queries thats gated by team_id
* Update most cases to use the new query
* Convert EVENT_JOIN_PERSON_SQL to new query
* Mostly convert GET_DISTINCT_IDS_BY_PERSON_ID_FILTER
* Mostly convert GET_DISTINCT_IDS_BY_PROPERTY_SQL
* Convert GET_PERSON_IDS_BY_FILTER
* Flag benchmarks
* Resolve circular imports
* Update a snapshot test
* Add a test for the new query logic
* refactor(stickiness): refactor one stickiness test to use api
This change demonstrates how to migrate a `Query` object level test to
an api based test. It purely focuses on the method of action invocation
and not on any of the e.g. setup or assertions. The StickinessQuery
object is only used by the REST API (and benchmarking), where as the
REST stickiness API is used by external users including our own frontend
developers, so makes sense to test at this level.
* Migrate stickiness query tests to api
This doesn't touch the stickiness people API however
* Migrate clickhouse specific stickiness tests
* Migrate stickiness people query tests to http api level
NOTE! This isn't just a straight migration, but also makes one important
change to application code that would otherwise result in a test
failure. Specifically, when trying to find an action based on the
`entity_id` query param, we need to consider that the entity_id is a
string. This is fine for when trying to find events, as we are comparing
event ids which are strings, but for actions the id is an int, so we
need to ensure we cast the action id to a string before comparison.
* Move stickiness query tests to api tests location
* make stickiness tests stable across postgres/clickhouse
* Add comment regarding casting action ids to strings
* Extract common subquery into a variable
* BE: handle group properties in more cases
* Add tests for lifecycle and sessions query changes
* Better docs
* Stable date range
* working for unique_groups math
* fix types
* add null check
* update snapshots
* update payload
* update snapshots
* use constructor
* adjust queries
* introduce base class
* consolidate querying
* shared serializer and typed
* sort imports
* snapshots
* typing
* change name
* Add group model
```sql
BEGIN;
--
-- Create model Group
--
CREATE TABLE "posthog_group" ("id" serial NOT NULL PRIMARY KEY, "group_key" varchar(400) NOT NULL, "group_type_index" integer NOT NULL, "group_properties" jsonb NOT NULL, "created_at" timestamp with time zone NOT NULL, "properties_last_updated_at" jsonb NOT NULL, "properties_last_operation" jsonb NOT NULL, "version" bigint NOT NULL, "team_id" integer NOT NULL);
--
-- Create constraint unique team_id/group_key/group_type_index combo on model group
--
ALTER TABLE "posthog_group" ADD CONSTRAINT "unique team_id/group_key/group_type_index combo" UNIQUE ("team_id", "group_key", "group_type_index");
ALTER TABLE "posthog_group" ADD CONSTRAINT "posthog_group_team_id_b3aed896_fk_posthog_team_id" FOREIGN KEY ("team_id") REFERENCES "posthog_team" ("id") DEFERRABLE INITIALLY DEFERRED;
CREATE INDEX "posthog_group_team_id_b3aed896" ON "posthog_group" ("team_id");
COMMIT;
```
* Remove a dead import
* Improve typing for groups
* Make groups updating more generic, avoid mutation
This simplifies using the same logic for groups
Note there's a behavioral change: We don't produce a new kafka message
if nothing has been updated anymore.
* Rename a function
* WIP: Handle group property updates
... by storing them in postgres
Uses identical pattern to person property updates, except we handle
first-seen case within updates as well.
* Get rid of boolean option
* WIP continued
* fetchGroup() and upsertGroup()
* Test more edge cases
* Add tests for upsertGroup() in properties-updater
* Rename to PropertyUpdateOperation
* Followup
* Solve typing issues
* changed implementation to use pg
* unusd
* update type
* update snapshots
* rename and remove inlining
* restore bad merge code
* adjust types
* add flag
* remove var
* misnamed
* change to uuid
* make sure to use string when passing result
* remove from columnoptimizer logic and have group join logic implemented by event query classes per insight
* remove unnecessary logic
* typing
* remove dead imports
* remove verbosity
* update snapshots
* typos
* remove signals
* remove plugin excess
Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
* Refactor stickiness to have its own event_query
This will speed up queries significantly and allow for filtering by
group properties
* Use same event_query for stickiness people
* Minor cleanup
* Add tests (and missing file) to group filtering in stickiness
* Allow aggregating by groups in stickiness
* Show group property filters in FE for stickiness
* Paths filtering by groups backend
* update correlation tests, now that CTEs are included in sqls
* use decorator for materialising to ensure clean up happens
* cleanup offending tests
* Migration to use materialized columns for groups
Workaround for https://github.com/PostHog/posthog/issues/6422
* Use groups materialized columns in queries
* Update mat column creation tests
* Simplify aggregation_target_field
* Fix migration
* Update snapshots
* Improve process_math
* Add test for overlapping group keys
* Improve event query tests
* Add test for filtering by person properties together with groups
* Avoid flaky tests due to cohort_id changing
* Update queries and snapshots
* Add groups stuff
* Rename column from person_id to `target` in retention queries
No behavioral change, preparing for groups work :)
* Remove dead if statement
* WIP: Retention aggregation by groups
* Handle aggregation by groups in retention
Also handles the case where not every event has a property defined
* Test groups validation mixin
* Reformat
* Improve test for aggregation in retention
* Extract GroupsJoinQuery
* Add test for breakdown filtering
* Unify breakdown mixins
* Allow passing breakdown_type == 'group' with breakdown_group_type_index
* Allow breakdown by group props in trends
* Add tests for trends breakdown_props function on group breakdowns
* Solve common issues
* Output snapshot diff into console
* Clean up materialized columns after tests
* Add zero protection
* Solve test failure
* Type math in Entity
* Allow passing group_type_index from FE to BE
* Get a initial query running
* Add group value filter if aggregating by groups
* Add snapshot testing for trends queries
* isort
* Update tests
* Add test for column_optimizer
* Update ee/clickhouse/queries/trends/util.py
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* Add group type, group_type_index
* Raise an error when handling unsupported properties in CH
* Improve repr
* Fix is_superset function
This was previously broken - sorting and zipping doesn't really work for
this intent.
* Add group_type_index to analysis results
* Add `group_types_to_query`
* Minor typing fixes
* Create groups tables in tests
* Simple first filter by groups query
* isort
* Use snapshot testing in event_query tests, add test for groups
* backend fixes and test
* add breakdown value to pie chart
* adjust test
* fix faulty test
* fill param
* fix formula tests
* more date passing
* more cleanup
* all tests working
* make test data explicit and add better checks
* support both ee and postgres
* length checks
* paginate recording compression
* some tests
* more accurate duration calculation
* add tests and types
* tons of decompression fixes
* rename test file to avoid conflict
* move decompression to helper
* add test for helper
* type fix
* rename method
* simplify paginated decomression
* handle case where offset exceeds length
* clean up
* test fixes
* clean up on aisle 12
* Add surrounding object for metadata response
* initial refactoring
* popup UI
* refactor path cleaning logic
* add nullable
* all ui working
* fix migration
* use regex replacement from team object
* add flag
* add switch
* fix type
* fix type
* UI update
* restore removed arg
* add local path cleaning filters to api
* add test for local path filters
* working new UI
* reduced repeated code
* fix numbering
* minor refactoring
* update copy
* add under advanced features
* address comments, minor cleanup
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* Refactor column_optimizer to work differently
* WIP: Use counter over set
* Handle person filters in person query
* Remove a dead argument
* Use enum over parameter for determining behavior
* Allow excluding person properties mode when handled in person query
* Fix _get_person_query type
* Use correct table for funnel_event_query
* Remove unneeded override
* Add extra typing
* Filter by entity.properties in person query for trends
* Handle error 184 due to naming clash
* Better default for prop_filter_json_extract
* Update column_optimizer tests for Counter
* Handle person_props as extra_fields
* Handle breakdowns and person property filter pushdown
* Transform values correctly
* Simplify get_entity_filtering_params
* Fix funnel correlations
* Solve caching issues in trend people queries
* Remove @skip test
* Add syrupy tests for parse_prop_clauses
Can update these via --snapshot-update
* Add snapshot tests for person queries
* Add a few notes
* Update test to avoid collision
* Kill dead code
* Handle PR comments
* Update ee/clickhouse/queries/person_query.py
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* WIP: Create new property types for simplified cohorts
* Add documentation on simplified_cohort_filter_properties
* Handle static-cohort/precalculated-cohort property types
* Handle new property filters properly
* Add casting
* Test cohorts in more cases
* Fix a bug
* Fix benchmark simplifying
* Avoid redoing work every setup for benchmarks
* Update typing;
* Remove unneeded scope
* Add tests for simplifying and cohorts
* Roll more of "do we need to join persons table" behavior into ClickhousePersonQuery class
* Handle precalculated cohort logic in sessions
* Simplify event query
* More tests without any JSONExtract
* Simplify entity properties as well
* Improve docstring
* Add test for breakdown & precalculated cohorts
* Add test for filtering sessions by precalculated cohorts
* Reset unneeded change
* Update cohort
* Solve some typing issues
* Update benchmarking
* Fix cohort filtering tests
* Fix cohort tests
* Fix a caching issue
* Typecheck
* Handle exclusion filters
* Simplify filters code
* Simplify filters ASAP if filter is created
* Simplify route
* Remove simplification-specific logic from queries
* Remove recursion, update tests
* Pass team in more cases
* Update column optimizer specs
* Test simplify
* Update trends test
* Fix rebase fail