0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-22 08:40:03 +01:00
Commit Graph

43 Commits

Author SHA1 Message Date
Paul D'Ambra
432396c170
feat: using gzip by hand in the replay pipeline (#23479) 2024-07-08 07:48:20 +01:00
Tom Owers
08e2129068
feat(dev-ex): Auto reload celery without an external watcher, and with debugging su… (#23402)
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Bez <julian@posthog.com>
2024-07-03 10:55:52 +02:00
Julian Bez
69927a7675
feat(celery): Restart Celery automatically on code changes (#23279) 2024-06-27 17:02:11 +02:00
Paul D'Ambra
916c4c5527
chore: add even more replay ingestion metrics to the hourly task (#23169) 2024-06-22 13:15:05 +00:00
Sandy Spicer
23a789d9fe
chore: upgrade python to 3.11 🐍 (#22932)
🐍
2024-06-21 16:45:42 +00:00
Sandy Spicer
9cdbbcfefc
feat: move query performance polling to its own celery task in a performant manner (#22497) 2024-05-30 22:25:08 -07:00
Ben White
1b15f27ba8
feat: Swapped to express for plugin server http server (#20506) 2024-02-22 17:03:23 +00:00
Julian Bez
d13ffcf7f6
chore(celery): Change local Celery Pycharm run to solo for better debugging (#20439) 2024-02-20 10:50:10 +00:00
David Newell
4f6d9c8673
feat: generate recording text embeddings (#20046)
* make migration

* general flow

* abstract shared methods

* generate input

* remove postgres migration

* generate embedding strings

* remove random file

* Update query snapshots

* Update query snapshots

* feat: create periodic replay embedding

* first sketch of table

* batch and flush embeddings

* add default to timestamp generation

* fetch recordings query

* save first embeddings to CH

* dump session metadata into tokens

* fix lint

* brain dump to help th future traveller

* prom timing instead

* fix input concatenation

* add an e :/

* obey mypy

* some time limits to reduce what we query

* a little fiddling to get it to run locally

* paging and counting

* Update query snapshots

* Update query snapshots

* move the AI stuff to EE for now

* Update query snapshots

* kick off the task with interval from settings

* push embedding generation onto its own queue

* on a different queue

* EE to the max

* doh

* fix

* fangling

* Remove clashes so we can merge this into the other PR

* Remove clashes so we can merge this into the other PR

* start wiring up Celery task

* hmmm

* it's a chord

* wire up celery simple version

* rename

* why is worker failing

* Update .run/Celery.run.xml

* update embedding input to remove duplicates

* ttl on the table

* Revert "update embedding input to remove duplicates"

This reverts commit 9a09d9c9f0.

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Paul D'Ambra <paul@posthog.com>
2024-02-14 12:50:42 +00:00
Julian Bez
3e14b183da
chore: Remove without-gossip from PyCharm Celery run config (#19960) 2024-01-25 10:50:45 +00:00
Julian Bez
95fec19aaf
feat(celery): Prepare to run on multiple queues (#19157)
* Add Celery queues env file with default queues

Reasoning:
We need to configure Celery workers in several places to consume
from a specific set of queues.

* Define some queues
2024-01-17 11:54:12 +00:00
Paul D'Ambra
16323959fd
feat: add ee licensed replay transformer (#18874)
first pass through using the EE licensed replay transformer in playback

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: David Newell <d.newell1@outlook.com>
2023-11-29 10:41:18 +00:00
Julian Bez
c7e48b1c66
chore(pycharm): Include compound run configuration (#18739)
Include compound run configuration to run all four things at once
2023-11-20 11:06:30 +01:00
Paul D'Ambra
c3ec949474
fix: only use stringy distinct ids (#17255)
* fix: only use stringy distinct ids

* fix
2023-08-31 09:26:19 +01:00
Paul D'Ambra
0465857218
feat: refactor listing to join persons twice (#16760) 2023-07-26 12:12:37 +01:00
Ben White
d8df34f4ab
feat: Replay events consumer (#16642) 2023-07-20 14:41:25 +00:00
Paul D'Ambra
9ecd34553e
fix: delete is sometimes timing out (#16569)
* fix: delete is sometimes timing out

* but without importing from FE code
2023-07-14 08:48:40 +01:00
Paul D'Ambra
1621ad956c
feat: allow disabling high water mark processing (#16399)
Problem
We're not able to play back all of the recordings captured by the blob ingester, the offset high water mark processing is new and if it isn't working correctly would lead to us skipping data we should not have

Changes
Allow us to set an environment variable that uses a no-op high-water mark processor
sneaks in removal of the no-longer-used SESSION_RECORDING_BLOB_PROCESSING_TEAMS env var
2023-07-06 10:37:06 +01:00
Paul D'Ambra
892d829cbe
chore: log message too large error (#16182)
* chore: log message too large error

* send data to loki not sentry, they rhyme
2023-06-22 12:41:40 +01:00
Paul D'Ambra
6835b4d658
fix: message size and compression settings for session recording production (#16171)
* fix: message size and compression settings for session recording production

* fix config setting

* actually use the message size

* Update query snapshots

* Update query snapshots

* fix default and tests

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-21 14:35:12 +01:00
Paul D'Ambra
5e6af4300c
feat: separate blob ingestion topic (#16145) 2023-06-20 14:25:39 +02:00
Paul D'Ambra
6aaec89a0a
chore: add a test that shows the session recording kafka hosts end up in the session recording kafka consumer (#16132) 2023-06-19 14:27:20 +01:00
Ben White
8deaf4e8ea
feat: Swapped to use KAFKA_HOSTS everywhere (#16109)
* Swapped to use KAFKA_HOSTS everywhere

* Fixed up type of kafka config options and setup separate kafka hosts for blob consumer

* allow session recordings to have its own kafka security protocol

* remove slash commands from this pr

* syntax must be obeyed

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

* fix

* Update query snapshots

* no empty strings in kafka hosts

* fix snapshot

* fix test

---------

Co-authored-by: Paul D'Ambra <paul@posthog.com>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-19 12:15:17 +01:00
Ben White
27b75226b0
feat: Completely separate ingestion for replay events (#16024)
---------

Co-authored-by: Paul D'Ambra <paul@posthog.com>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-15 14:13:28 +01:00
Paul D'Ambra
067d73cb4f
feat: write recording summary events (#15245)
Problem
see #15200 (comment)

When we store session recording events we materialize a lot of information using the snapshot data column.

We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.

Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.

add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
2023-05-09 14:41:16 +00:00
Paul D'Ambra
359177127d
fix: push the buffer files storage down a level (#15295) 2023-05-02 08:12:52 +01:00
Paul D'Ambra
9e640bb9eb
feat: query into log comment (#14722)
* add the capture time to see env variable but disabled cos I always have to check what it is

* feat: tag query onto log comment
2023-03-13 20:37:32 +00:00
Paul D'Ambra
de4c83040d
feat: show editor panel for query-based insights (#14467)
You can now switch insights to "query" mode but the UI flow to edit them is different to other insights

Was a stacked branch... work really begins at commit e3b979c
2023-03-02 22:34:39 +00:00
Paul D'Ambra
81e33ffb2b
feat: add verbose logging for kea (#14457)
* feat: add verbose logging for kea

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

* Update frontend/src/initKea.ts

Co-authored-by: Thomas Obermüller <thomas.obermueller@gmail.com>

* present for discovery, but off

* run prettier

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (2)

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Thomas Obermüller <thomas.obermueller@gmail.com>
2023-02-28 21:52:24 +00:00
Paul D'Ambra
e04f7210e1
chore: pycharm run config after updating to 3.10 (#14369)
* chore:pycharm run config after updating to 3.9

* now for 3.10
2023-02-24 14:23:33 +00:00
Ben White
cb7e7d5e5e
feat: Added performance API (#13452) 2023-01-06 09:51:51 +01:00
Paul D'Ambra
525ac12045
feat: markdown upload media (#12231)
# Problem

You can add gifs and images to text cards but need to know how to write markdown (and realise it is possible)

# Changes

* Adds an API (authenticated) to allow image upload
* Adds an endpoint to view images (immutable cache headers set)
* Adds some basic validation
* Adds UI to allow drop of file onto a text card (well, any component using the LemonTextMarkdown) to upload the image and insert a link to it in the markdown content
2022-10-14 11:27:44 +01:00
Paul D'Ambra
18c5927f57
fix: celery autoimport was ignoring CSV exports (#10586)
* launch celery with debug logging

* autoimport a single task which decides what type of export to run

* still need to manually inject root folder so tests can clear up

* fix mock
2022-06-30 17:42:28 +01:00
Michael Matloka
64317238e6
refactor: Eliminate the KAFKA_ENABLED setting (#10059)
* refactor: Eliminate the `KAFKA_ENABLED` setting

* Remove dead code

* Consolidate plugin server test scripts and CI

* Fix CI command

* Remove Celery queues

* Rearrange test directories

* Update import paths
2022-05-30 18:39:33 +00:00
Paul D'Ambra
49e3ceef5c
feat(object storage): add unused object storage (#9846)
* feat(object_storage): add unused object storage with health checks

* only prompt debug users if object storage not available at preflight

* safe plugin server health check for unused object storage

* explicit object storage settings

* explicit object storage settings

* explicit object storage settings

* downgrade pip tools

* without spaces?

* like this?

* without updating pip?

* remove object_storage from dev volumes

* named volume on hobby

* lazily init object storage

* simplify conditional check

* reproduced error locally

* reproduced error locally

* object_storage_endpoint not host and port

* log more when checking kafka and clickhouse

* don't filter docker output

* add kafka to hosts before starting stack?

* silly cloud tests (not my brain)
2022-05-20 09:56:50 +01:00
Michael Matloka
500d4623ba
refactor: Yeet PRIMARY_DB (#9017)
* refactor: Yeet `PRIMARY_DB`

* Remove `db_backend`

* Eliminate "Analytics database in use"

* Satisfy mypy
2022-03-21 13:15:50 +01:00
Marius Andra
c16354c3d2
update pycharm configs (#8882) 2022-03-07 08:46:21 +00:00
Paul D'Ambra
3b14f7f20a
fix the date formatting typo (#8253) 2022-01-25 13:14:09 +01:00
Marius Andra
ffa638bbb2
update pycharm scripts for "postgres in docker" scenario (#7631) 2021-12-10 11:28:42 +00:00
Alex Gyujin Kim
e3690f3c4d
Update pycharm configs with new plugin server scripts (#7140)
* change start scripts for plugin-server

* fix'
2021-11-16 08:19:26 +01:00
Marius Andra
05d7817d14
update pycharm plugin server run configurations (#6742) 2021-10-29 13:48:39 +02:00
Marius Andra
f76d0b6521
Turbo mode (#6632)
* fix router redirect

* remove dependence on user var

* split scenes and sceneTypes out of sceneLogic

* rename LoadedScene type to SceneExport

* export SceneExport from most scenes

* use exported scene objects, warn if not exported

* fix type import bugs

* remove dashboard

* keep all loaded scene logics in memory

* fix sorting bugs

* support scenes with params, make it work with dashboards

* fetch result from dashboard if mounted

* fix mutations

* add lastTouch

* refactor scene parameters to include searchParams and hashParams

* add insights scene to scene router

* add insight router scene to scene router

* fix cohort back/forward bug

* this works

* bring back delayed loading of scenes

* set insight from dashboard also in afterMount

* split events, actions, event stats and properties stats into their own scenes

* refactor to options object for setInsight

* override filters

* clean filters for comparison

* fix cohort bug

* get a better feature flag

* make turbo mode faster by making non-turbo-mode slower

* less noise in failed tests

* fix tests

* flyby: add jest tests pycharm launcher

* clean up scenes

* add test for loading from dashboardLogic

* fix bug

* split test init code into two

* have the same data in the context and in the api

* add basic tests for sceneLogic

* run the latest and greatest

* fix menu highlight

* implement screaming snake

* only show scenes with logics
2021-10-26 20:08:45 +00:00
Marius Andra
06f3f3a3f3
PyCharm run configurations (#6026)
Adds a bunch of environment-agnostic run configurations that'll make it easier for anyone to get started on PyCharm
2021-09-20 13:03:37 +01:00