* make migration
* general flow
* abstract shared methods
* generate input
* remove postgres migration
* generate embedding strings
* remove random file
* Update query snapshots
* Update query snapshots
* feat: create periodic replay embedding
* first sketch of table
* batch and flush embeddings
* add default to timestamp generation
* fetch recordings query
* save first embeddings to CH
* dump session metadata into tokens
* fix lint
* brain dump to help th future traveller
* prom timing instead
* fix input concatenation
* add an e :/
* obey mypy
* some time limits to reduce what we query
* a little fiddling to get it to run locally
* paging and counting
* Update query snapshots
* Update query snapshots
* move the AI stuff to EE for now
* Update query snapshots
* kick off the task with interval from settings
* push embedding generation onto its own queue
* on a different queue
* EE to the max
* doh
* fix
* fangling
* Remove clashes so we can merge this into the other PR
* Remove clashes so we can merge this into the other PR
* start wiring up Celery task
* hmmm
* it's a chord
* wire up celery simple version
* rename
* why is worker failing
* Update .run/Celery.run.xml
* update embedding input to remove duplicates
* ttl on the table
* Revert "update embedding input to remove duplicates"
This reverts commit 9a09d9c9f0.
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Paul D'Ambra <paul@posthog.com>
* Add Celery queues env file with default queues
Reasoning:
We need to configure Celery workers in several places to consume
from a specific set of queues.
* Define some queues
first pass through using the EE licensed replay transformer in playback
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: David Newell <d.newell1@outlook.com>
Problem
We're not able to play back all of the recordings captured by the blob ingester, the offset high water mark processing is new and if it isn't working correctly would lead to us skipping data we should not have
Changes
Allow us to set an environment variable that uses a no-op high-water mark processor
sneaks in removal of the no-longer-used SESSION_RECORDING_BLOB_PROCESSING_TEAMS env var
* Swapped to use KAFKA_HOSTS everywhere
* Fixed up type of kafka config options and setup separate kafka hosts for blob consumer
* allow session recordings to have its own kafka security protocol
* remove slash commands from this pr
* syntax must be obeyed
* Update UI snapshots for `chromium` (2)
* Update UI snapshots for `chromium` (2)
* fix
* Update query snapshots
* no empty strings in kafka hosts
* fix snapshot
* fix test
---------
Co-authored-by: Paul D'Ambra <paul@posthog.com>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Problem
see #15200 (comment)
When we store session recording events we materialize a lot of information using the snapshot data column.
We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.
Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.
add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
You can now switch insights to "query" mode but the UI flow to edit them is different to other insights
Was a stacked branch... work really begins at commit e3b979c
# Problem
You can add gifs and images to text cards but need to know how to write markdown (and realise it is possible)
# Changes
* Adds an API (authenticated) to allow image upload
* Adds an endpoint to view images (immutable cache headers set)
* Adds some basic validation
* Adds UI to allow drop of file onto a text card (well, any component using the LemonTextMarkdown) to upload the image and insert a link to it in the markdown content
* launch celery with debug logging
* autoimport a single task which decides what type of export to run
* still need to manually inject root folder so tests can clear up
* fix mock
* refactor: Eliminate the `KAFKA_ENABLED` setting
* Remove dead code
* Consolidate plugin server test scripts and CI
* Fix CI command
* Remove Celery queues
* Rearrange test directories
* Update import paths
* feat(object_storage): add unused object storage with health checks
* only prompt debug users if object storage not available at preflight
* safe plugin server health check for unused object storage
* explicit object storage settings
* explicit object storage settings
* explicit object storage settings
* downgrade pip tools
* without spaces?
* like this?
* without updating pip?
* remove object_storage from dev volumes
* named volume on hobby
* lazily init object storage
* simplify conditional check
* reproduced error locally
* reproduced error locally
* object_storage_endpoint not host and port
* log more when checking kafka and clickhouse
* don't filter docker output
* add kafka to hosts before starting stack?
* silly cloud tests (not my brain)
* fix router redirect
* remove dependence on user var
* split scenes and sceneTypes out of sceneLogic
* rename LoadedScene type to SceneExport
* export SceneExport from most scenes
* use exported scene objects, warn if not exported
* fix type import bugs
* remove dashboard
* keep all loaded scene logics in memory
* fix sorting bugs
* support scenes with params, make it work with dashboards
* fetch result from dashboard if mounted
* fix mutations
* add lastTouch
* refactor scene parameters to include searchParams and hashParams
* add insights scene to scene router
* add insight router scene to scene router
* fix cohort back/forward bug
* this works
* bring back delayed loading of scenes
* set insight from dashboard also in afterMount
* split events, actions, event stats and properties stats into their own scenes
* refactor to options object for setInsight
* override filters
* clean filters for comparison
* fix cohort bug
* get a better feature flag
* make turbo mode faster by making non-turbo-mode slower
* less noise in failed tests
* fix tests
* flyby: add jest tests pycharm launcher
* clean up scenes
* add test for loading from dashboardLogic
* fix bug
* split test init code into two
* have the same data in the context and in the api
* add basic tests for sceneLogic
* run the latest and greatest
* fix menu highlight
* implement screaming snake
* only show scenes with logics