* meaningful histogram buckets for tracking tokens
* we're not being clever about deduplicating, let's only ever look at the most recent
* Select a random selection of eligible recordings
* make migration
* general flow
* abstract shared methods
* generate input
* remove postgres migration
* generate embedding strings
* remove random file
* Update query snapshots
* Update query snapshots
* feat: create periodic replay embedding
* first sketch of table
* batch and flush embeddings
* add default to timestamp generation
* fetch recordings query
* save first embeddings to CH
* dump session metadata into tokens
* fix lint
* brain dump to help th future traveller
* prom timing instead
* fix input concatenation
* add an e :/
* obey mypy
* some time limits to reduce what we query
* a little fiddling to get it to run locally
* paging and counting
* Update query snapshots
* Update query snapshots
* move the AI stuff to EE for now
* Update query snapshots
* kick off the task with interval from settings
* push embedding generation onto its own queue
* on a different queue
* EE to the max
* doh
* fix
* fangling
* Remove clashes so we can merge this into the other PR
* Remove clashes so we can merge this into the other PR
* start wiring up Celery task
* hmmm
* it's a chord
* wire up celery simple version
* rename
* why is worker failing
* Update .run/Celery.run.xml
* update embedding input to remove duplicates
* ttl on the table
* Revert "update embedding input to remove duplicates"
This reverts commit 9a09d9c9f0.
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Paul D'Ambra <paul@posthog.com>