0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-12-01 12:21:02 +01:00
posthog/ee/clickhouse/queries/funnels/funnel_trends.py

230 lines
9.9 KiB
Python
Raw Normal View History

from datetime import date, datetime
from itertools import groupby
from typing import Optional, Tuple, Type, Union, cast
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
from dateutil.relativedelta import relativedelta
Setup Funnel Unordered persons and Testing (#4943) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * unordered step and test * func name change * move builder functions to funnel base * add test classe for new funnel * resolve issues with unordered funnel * oops * remove breakdown, fix mypy error * Handle multiple same events in the funnel (#4863) * dedup + tests * deep equality. Tests to come * write test for entity equality * finish testing funnels * clean up comments * from O(2^N) to O(N) * add query intuition blurb * rm todo * wip persons * wip persons 2 * address comments * test things, fix bugs * match result format to funnel.py Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-07-01 18:11:54 +02:00
from ee.clickhouse.queries.funnels.base import ClickhouseFunnelBase
from ee.clickhouse.queries.funnels.funnel import ClickhouseFunnel
from ee.clickhouse.queries.util import format_ch_timestamp, get_earliest_timestamp, get_time_diff, get_trunc_func_ch
from posthog.models.cohort import Cohort
from posthog.models.filters.filter import Filter
from posthog.models.team import Team
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
TIMESTAMP_FORMAT = "%Y-%m-%d %H:%M:%S"
HUMAN_READABLE_TIMESTAMP_FORMAT = "%-d-%b-%Y"
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
Setup Funnel Unordered persons and Testing (#4943) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * unordered step and test * func name change * move builder functions to funnel base * add test classe for new funnel * resolve issues with unordered funnel * oops * remove breakdown, fix mypy error * Handle multiple same events in the funnel (#4863) * dedup + tests * deep equality. Tests to come * write test for entity equality * finish testing funnels * clean up comments * from O(2^N) to O(N) * add query intuition blurb * rm todo * wip persons * wip persons 2 * address comments * test things, fix bugs * match result format to funnel.py Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-07-01 18:11:54 +02:00
class ClickhouseFunnelTrends(ClickhouseFunnelBase):
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
"""
## Funnel trends assumptions
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
Funnel trends are a graph of conversion over time meaning a Y ({conversion_rate}) for each X ({entrance_period}).
### What is {entrance_period}?
A funnel is considered entered by a user when they have performed its first step.
When that happens, we consider that an entrance of funnel.
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
Now, our time series is based on a sequence of {entrance_period}s, each starting at {entrance_period_start}
and ending _right before the next_ {entrance_period_start}. A person is then counted at most once in each
{entrance_period}.
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
### What is {conversion_rate}?
Each time a funnel is entered by a person, they have exactly {funnel_window_interval} {funnel_window_interval_unit} to go
through the funnel's steps. Later events are just not taken into account.
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
For {conversion_rate}, we need to know reference steps: {from_step} and {to_step}.
By default they are respectively the first and the last steps of the funnel.
Then for each {entrance_period} we calculate {reached_from_step_count} the number of persons
who entered the funnel and reached step {from_step} (along with all the steps leading up to it, if there any).
Similarly we calculate {reached_to_step_count}, which is the number of persons from {reached_from_step_count}
who also reached step {to_step} (along with all the steps leading up to it, including of course step {from_step}).
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
{conversion_rate} is simply {reached_to_step_count} divided by {reached_from_step_count},
multiplied by 100 to be a percentage.
If no people have reached step {from_step} in the period, {conversion_rate} is zero.
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
"""
def __init__(
self, filter: Filter, team: Team, funnel_order_class: Type[ClickhouseFunnelBase] = ClickhouseFunnel
) -> None:
super().__init__(filter, team)
self.funnel_order = funnel_order_class(filter, team)
def _exec_query(self):
return self._summarize_data(super()._exec_query())
def get_step_counts_without_aggregation_query(
self, *, specific_entrance_period_start: Optional[datetime] = None
) -> str:
steps_per_person_query = self.funnel_order.get_step_counts_without_aggregation_query()
interval_method = get_trunc_func_ch(self._filter.interval)
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
# This is used by funnel trends when we only need data for one period, e.g. person per data point
if specific_entrance_period_start:
self.params["entrance_period_start"] = specific_entrance_period_start.strftime(TIMESTAMP_FORMAT)
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
breakdown_clause = self._get_breakdown_prop()
return f"""
SELECT
person_id,
{interval_method}(timestamp) AS entrance_period_start,
max(steps) AS steps_completed
{breakdown_clause}
FROM (
{steps_per_person_query}
)
{"WHERE toDateTime(entrance_period_start) = %(entrance_period_start)s" if specific_entrance_period_start else ""}
GROUP BY person_id, entrance_period_start {breakdown_clause}"""
def get_query(self) -> str:
step_counts = self.get_step_counts_without_aggregation_query()
# Expects multiple rows for same person, first event time, steps taken.
self.params.update(self.funnel_order.params)
reached_from_step_count_condition, reached_to_step_count_condition, _ = self.get_steps_reached_conditions()
interval_method = get_trunc_func_ch(self._filter.interval)
if self._filter.date_from is None:
_date_from = get_earliest_timestamp(self._team.pk)
else:
_date_from = self._filter.date_from
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
num_intervals, seconds_in_interval, _ = get_time_diff(
self._filter.interval or "day", _date_from, self._filter.date_to, team_id=self._team.pk
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
)
breakdown_clause = self._get_breakdown_prop()
formatted_date_from = format_ch_timestamp(_date_from, self._filter)
self.params.update(
{
"formatted_date_from": formatted_date_from,
"seconds_in_interval": seconds_in_interval,
"num_intervals": num_intervals,
}
)
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
query = f"""
SELECT
entrance_period_start,
reached_from_step_count,
reached_to_step_count,
if(reached_from_step_count > 0, round(reached_to_step_count / reached_from_step_count * 100, 2), 0) AS conversion_rate
{breakdown_clause}
FROM (
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
SELECT
entrance_period_start,
countIf({reached_from_step_count_condition}) AS reached_from_step_count,
countIf({reached_to_step_count_condition}) AS reached_to_step_count
{breakdown_clause}
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
FROM (
{step_counts}
) GROUP BY entrance_period_start {breakdown_clause}
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
) data
RIGHT OUTER JOIN (
SELECT
{interval_method}(toDateTime(%(formatted_date_from)s) + number * %(seconds_in_interval)s) AS entrance_period_start
{', breakdown_value as prop' if breakdown_clause else ''}
FROM numbers(%(num_intervals)s) AS period_offsets
{'ARRAY JOIN (%(breakdown_values)s) AS breakdown_value' if breakdown_clause else ''}
) fill
USING (entrance_period_start {breakdown_clause})
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
ORDER BY entrance_period_start ASC
SETTINGS allow_experimental_window_functions = 1"""
return query
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
def get_steps_reached_conditions(self) -> Tuple[str, str, str]:
# How many steps must have been done to count for the denominator of a funnel trends data point
from_step = self._filter.funnel_from_step or 0
# How many steps must have been done to count for the numerator of a funnel trends data point
to_step = self._filter.funnel_to_step or len(self._filter.entities) - 1
# Those who converted OR dropped off
reached_from_step_count_condition = f"steps_completed >= {from_step+1}"
# Those who converted
reached_to_step_count_condition = f"steps_completed >= {to_step+1}"
# Those who dropped off
did_not_reach_to_step_count_condition = f"{reached_from_step_count_condition} AND steps_completed < {to_step+1}"
return reached_from_step_count_condition, reached_to_step_count_condition, did_not_reach_to_step_count_condition
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
def _summarize_data(self, results):
breakdown_clause = self._get_breakdown_prop()
summary = []
for period_row in results:
serialized_result = {
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
"timestamp": period_row[0],
"reached_from_step_count": period_row[1],
"reached_to_step_count": period_row[2],
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
"conversion_rate": period_row[3],
"is_period_final": self._is_period_final(period_row[0]),
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
}
if breakdown_clause:
serialized_result.update(
{
"breakdown_value": period_row[-1]
if isinstance(period_row[-1], str)
else Cohort.objects.get(pk=period_row[-1]).name
}
)
summary.append(serialized_result)
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
return summary
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
def _format_results(self, summary):
if self._filter.breakdown:
grouper = lambda row: row["breakdown_value"]
sorted_data = sorted(summary, key=grouper)
final_res = []
for key, value in groupby(sorted_data, grouper):
breakdown_res = self._format_single_summary(list(value))
final_res.append({**breakdown_res, "breakdown_value": key})
return final_res
else:
res = self._format_single_summary(summary)
return [res]
def _format_single_summary(self, summary):
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
count = len(summary)
data = []
days = []
labels = []
for row in summary:
timestamp: datetime = row["timestamp"]
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
data.append(row["conversion_rate"])
hour_min_sec = " %H:%M:%S" if self._filter.interval == "hour" or self._filter.interval == "minute" else ""
days.append(timestamp.strftime(f"%Y-%m-%d{hour_min_sec}"))
labels.append(timestamp.strftime(HUMAN_READABLE_TIMESTAMP_FORMAT))
return {
"count": count,
"data": data,
"days": days,
"labels": labels,
}
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
def _is_period_final(self, timestamp: Union[datetime, date]):
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
# difference between current date and timestamp greater than window
now = datetime.utcnow().date()
intervals_to_subtract = cast(int, self._filter.funnel_window_interval) * -1
interval_unit = (
"day" if self._filter.funnel_window_interval_unit is None else self._filter.funnel_window_interval_unit
)
delta = relativedelta(**{f"{interval_unit}s": intervals_to_subtract}) # type: ignore
Refactoring funnel trends (#4419) * checkpoint: refactoring funnel trends so that they work correctly * wip: refactoring funnel trends query to return the results we actually need * wip: added in new query for testing * wip: moved sql into a separate file, converted list to dictionary, and added several tests to check data quality * wip: with a better understaning of funnel trends I've refactored the query so that I can write a data transformer in python * moved code into funnel_trends for both logic and tests to isolate the concern * reordered methods for readability * wip: refactoring funnel trends to support filters * wip: added updated SQL which will replace the existing FUNNEL_TREND_SQL * correct column name so that it's clearer * added substitution variables to new query * fixed missing substitution variable * wip: integrating new query with correct params, added mixins for funnel_window, and working toward working test * several query corrections * summarize funnel trends * moved method down * removed unused code * added data quality checks * corrected cohort size for tests * test window size and incomplete status * corrected a few names * removed unnecessary comment * commented out old funnel trends tests * removed print statement * removed old funnel trend code * made funnel trends response match existing data structure layout * removed unused imports * removed more unused imports * fixed mypy errors * Added ClickhouseFunnelBase to extract common methods for both ClickhouseFunnelTrends and ClickhouseFunnel; this also fixes issues with tests; * removed unused type comment * corrected test to account for new funnel_window_days mixin * fixed clickhouse funnel tests * fixes for automated tests * changed team_id to use client substitution to avoid sql injection attempts in the future but since it's not user supplied it's not currently an issue * corrections prompted by PR review * corrected test to dict test with funnel_window_days
2021-06-03 23:06:08 +02:00
completed_end = now + delta
New funnel trends query (#4875) * wip: pagination for persons on clickhouse funnels * wip: added offset support for getting a list of persons; added support for conversion window; * fixed mypy exception * helper function to insert data for local testing * moved generate code into separate class for more functionality later * corrected person_distinct_id to use the person id from postgres * minor corrections to generate local class along with addition of data cleanup via destroy() method * reduce the number of persons who make it to each step * moved funnel queries to a new folder for better organization; separated funnel_persons and funnel_trends_persons into individual classes; * funnel persons and tests * initial implementation * invoke the funnel or funnel trends class respectively * add a test * add breakdown handling and first test * add test stubs * remove repeats * mypy corrections and PR feedback * run funnel test suite on new query implementation * remove imports * corrected tests * minor test updates * correct func name * fix types * func name change * Make `SHELL_PLUS_PRINT_SQL` clearer * Add ClickhouseFunnelTrendsNew * Create test_funnel_trends_new.py * Create test_funnel_trends_v2.py * move builder functions to funnel base * add test classe for new funnel * Inherit from `ClickhouseFunnelNew` and fix intervals * Add proper formatting of trends results * Clean tests up a little bit * Group `FunnelWindowDaysMixin` tests in `test_funnel_persons` * Rename `ClickhouseFunnelTrendsNew` things for clarity * Port some original `ClickhouseFunnel` trends tests for the new query * Only fetch initial page (100) of persons in trends query * Describe assumptions and rename things * Finish porting old ClickhouseFunnelTrends tests and add some new ones * Remove unused imports * Try to fix `test_period_not_final` * Try to fix `test_period_not_final` again * remove persons lists * rename * fix test * add timezone to results * add funnel trends new to api path * revert random change Co-authored-by: Buddy Williams <buddy@posthog.com> Co-authored-by: eric <eeoneric@gmail.com>
2021-06-29 00:48:35 +02:00
compare_timestamp = timestamp.date() if isinstance(timestamp, datetime) else timestamp
is_final = compare_timestamp <= completed_end
return is_final