0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-29 03:04:16 +01:00
Commit Graph

326 Commits

Author SHA1 Message Date
Harry Waye
9dcb8aa030
chore: fix plugin server tests for acitonMatcher (#16548)
* chore: fix plugin server tests for acitonMatcher

I broke these but there was an issue with CI so they were merges broken
but now maybe they are fixed?

* more fixes

* wip

* wip

* wip
2023-07-13 10:56:43 +01:00
Ben White
f3fedaa91d
fix: Session manager cleanup and deletion (#16521) 2023-07-13 11:48:49 +02:00
Paul D'Ambra
78a4ade041
chore: switch back to sync commit and gauge actual offset committed (#16540) 2023-07-12 21:30:09 +01:00
Ben White
cd2f7f398a
feat: Optimised blob storage team loading (#16486) 2023-07-12 10:21:06 +02:00
Ben White
0a84d018fc
feat: Reduce offset tracking (#16482) 2023-07-11 16:56:45 +02:00
Harry Waye
634472c3ad
fix: plugin mode string check (#16490)
* fix: plugin mode string check

Previously we had to keep these in sync, but now we can just use the
PluginServerMode enum directly.

* add test
2023-07-11 14:09:28 +00:00
Ben White
5a636f6bd1
feat: Optimise resource usage for blob ingester (#16478) 2023-07-11 15:11:36 +02:00
Paul D'Ambra
d068ba9410
feat: don't track byte size, it isn't useful enough (#16481)
* feat: track file line count not byte size

* just remove it
2023-07-11 13:23:01 +01:00
Harry Waye
8adac54130
chore(webhooks): remove abstractions from webhook consumer logic (#16418)
* chore(webhooks): remove abstractions from webhook consumer logic

Previously we were jumping through a few hoops to make webhook calls
e.g. still using the piscina abstraction, still using the runner code.
This commit removes those abstractions while still maintaining the
existing functionality wrt error handling and metrics gathering.

I'll leave further refactoring of the webhook consumer code to a
separate PR. For example, moving the statsd metrics to be based on
OpenMetrics instead. And further adding some tracing around key parts of
the webhook matching and firing logic.

* fix typing

* fix typing

* fix typing

* fix unit tests

* fix tests

* chore(plugin-server): simplify action manager deps

Previously we were passing in the kitchen sync, but the only
dependency is postgres. This should make it easier to e.g. refactor to
not need to load the kitchen sync on some deployments.

* chore(plugin-server): simplify hook commander deps

Previously we passed in DB which is a lot of stuff. Now we just pass in
the postgres pool.

* fix import
2023-07-10 11:02:04 +00:00
Harry Waye
12d0a29957
chore(plugin-server): simplify action matcher deps (#16429)
* chore(plugin-server): simplify action matcher deps

Specifically this only depends on postgres, so passing in the `DB`
object is unnecessary. This should make refactors easier to e.g. only
load the required dependencies when they are needed.

* pass only postgres to ActionMatcher
2023-07-07 15:02:46 +01:00
Paul D'Ambra
efb77278b4
feat: async commits in the blob ingester (#16387)
* feat: async commits in the blob ingester

* Fix
2023-07-05 14:03:29 +01:00
Tiina Turban
a45c51c8c9
feat: Break up webhooks from onEvent (#16316) 2023-07-05 12:29:28 +02:00
Tiina Turban
876ce6a43c
fix: Nuke siteUrlManager reducing PG load (#16290) 2023-07-05 12:19:32 +02:00
Paul D'Ambra
0130bea5be
chore: add some realtime snapshot playback observability (#16363)
* chore: add some realtime snapshot playback observability

* fixes
2023-07-04 12:32:16 +00:00
Harry Waye
055776d461
chore(exports): try calling heartbeat a bit more often (#16295)
* chore(exports): try calling heartbeat a bit more often

Looks like we end up rebalancing often. Possibly because we're not
sending the heartbeats in time and the session timing out.

* wip

* fix tests
2023-06-29 13:55:31 +01:00
Paul D'Ambra
8a29cc679f
feat: estimate size of session on ingestion (#16241)
* migration to add the new column

* and populate it

* fix

* Update query snapshots

* test assertions

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-27 08:12:42 +01:00
Paul D'Ambra
4d0b51f764
fix: fake timer blob consumer tests (#16245)
* fix: fake timer blob consumer tests

* Update plugin-server/tests/main/ingestion-queues/session-recording/session-recordings-blob-consumer.test.ts

* remove unnecessary advance time
2023-06-26 12:52:01 +01:00
Paul D'Ambra
2623a77c05
chore: skip test to unblock CI (#16243)
YOLO to unblock CI
2023-06-26 12:04:13 +01:00
Ben White
34f6dde752
feat: New offset committing logic (#16220) 2023-06-23 11:18:35 +00:00
Ben White
ffdda1d392
feat: Realtime playback for new ingestion flow (#15627) 2023-06-23 12:39:07 +02:00
Paul D'Ambra
6480a2d30f
fix: high-water mark committing before updating (#16202)
* fix: high water mark committing before updating

* make it safe to set as well as get before getAll has run

* hand the shopkeeper my wallet 😂

* 🤦
2023-06-22 16:10:33 +00:00
Paul D'Ambra
756e2ed91a
feat: Do not re process s3 writes (#15777) 2023-06-22 11:44:56 +02:00
Paul D'Ambra
ccbbee93e1
feat: control accepted session replay dates (#16142) 2023-06-20 15:33:16 +01:00
Xavier Vello
3414f8ebbc
chore(ingestion): re-introduce rdkafka consumer alongside kafkajs (#16048) 2023-06-20 14:29:26 +02:00
Harry Waye
924deae8dc
fix(ingestion): add DLQ for non-retriable errors (#16124)
* fix(ingestion): add DLQ for non-retriable errors

This is due to
https://posthog.slack.com/archives/C0185UNBSJZ/p1687006425094159 which
is causing some lag on ingestion.

* fix error handleing

* fix tests
2023-06-17 22:14:00 +01:00
Ben White
27b75226b0
feat: Completely separate ingestion for replay events (#16024)
---------

Co-authored-by: Paul D'Ambra <paul@posthog.com>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-15 14:13:28 +01:00
Paul D'Ambra
bdc346de41
feat: summarise console logs too (#15954)
We collect console logs in session recordings but you can't filter to find recordings that have them.

Worse, when we move to storing recordings in blob storage it would be impossible to filter for them.

Changes
adds new columns to the session replay summary table that will let us add counts of levels log, warn, and error from console logs collected with recordings
alters the recordings-ingestion consumer to count those three levels of log
2023-06-14 15:26:34 +01:00
Tiina Turban
7376e4fdff
feat: Person creation and update retries (#15925) 2023-06-12 16:07:13 +02:00
Michael Matloka
1aee725409
feat(webhooks): Support person display name preferences and nested properties (#15882)
* Use same logic for `[person]` webhook token as person display in the app

* Allow accessing nested properties in webhook message

* Update hooks.test.ts

* Fix team fetching test

* Update query snapshots

* Update query snapshots

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-05 16:25:39 +00:00
Xavier Vello
6ff188c09e
perf(ingestion): don't block for each event's kafka produce (#15835) 2023-06-01 15:05:21 +02:00
Paul D'Ambra
dd796c8cab
fix: only track timestamps of completed chunks (#15801) 2023-05-30 22:00:24 +01:00
Paul D'Ambra
2e2721cb86
fix: idle partitions never flush (#15776) 2023-05-30 17:35:21 +01:00
Xavier Vello
bece269c32
perf(ingestion): revert rdkafka consumer for both ingestion queues (#15711)
* Revert "perf(ingestion): use rdkafka consumer for both ingestion queues (#15695)"

This reverts commit fea9e4d77c.

* format

* fix split test

* no really, fix the test

---------

Co-authored-by: Harry Waye <harry@posthog.com>
2023-05-30 17:02:11 +01:00
Xavier Vello
1ea6619e3a
perf(ingestion): re-add: don't batch events by distinct_id when consuming from overflow (#15785) 2023-05-30 15:17:01 +02:00
Xavier Vello
81c4214f5b
perf(ingestion): revert: don't batch events by distinct_id when consuming from overflow (#15782)
Revert "perf(ingestion): don't batch events by distinct_id when consuming from overflow (#15744)"

This reverts commit 89bd1d30aa.
2023-05-30 11:39:27 +01:00
Paul D'Ambra
42b54af3bd
chore: very specific logging (#15769)
* chore: add some very specific logging to figure out the impossible

* even more very specific logging

* track offsets too

* even more careful
2023-05-29 18:50:22 +01:00
Xavier Vello
89bd1d30aa
perf(ingestion): don't batch events by distinct_id when consuming from overflow (#15744) 2023-05-26 11:18:31 +02:00
Michael Matloka
39ad3cd68c
feat(actions): Support "Link target contains/matches regex" (#15535)
* Add `ActionStep.href_matching` + `ActionStep.text_matching`

* Use `href_matching` and `text_matching` in matching

* Show new matching options in the UI

* Update query snapshots

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (2)

* Add support in the API

* Fix `LemonLabel` overflow

* Add support in the toolbar

* Update plugin server tests

* Add Django support

* Update query snapshots

* Add Django test

* Don't italicize text input placeholder

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (1)

* Update query snapshots

* Fix typing

* Update query snapshots

* Fix typing more

* Update query snapshots

* Update UI snapshots for `chromium` (2)

* Update UI snapshots for `chromium` (1)

* Update query snapshots

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-05-25 16:35:02 -07:00
Paul D'Ambra
0edad3ca33
chore: tidy up after blob tests (#15701) 2023-05-25 12:02:08 +01:00
Ben White
57e7ed7cfa
fix: Still commit offsets for empty buffer (#15710) 2023-05-25 08:45:59 +00:00
Xavier Vello
fea9e4d77c
perf(ingestion): use rdkafka consumer for both ingestion queues (#15695) 2023-05-25 10:42:55 +02:00
Xavier Vello
d389cf0ead
chore(ingestion): remove old batching code (#15689) 2023-05-25 09:58:55 +02:00
Paul D'Ambra
09164ab35d
fix: commit sync and commit one more (#15703) 2023-05-25 08:43:27 +02:00
Ben White
9dd75c770c
fix: Change the offset tracking logic for testing purposes (#15697) 2023-05-24 15:24:40 +00:00
Paul D'Ambra
631f799c6a
fix: remove empty session managers (#15683) 2023-05-24 13:04:30 +00:00
Tiina Turban
bf535847a5
chore: nuke person redis cache (#15458) 2023-05-24 13:46:12 +02:00
Xavier Vello
6cb7e53893
chore(ingestion): compute and report actual parallelism (#15688) 2023-05-24 13:31:24 +02:00
Paul D'Ambra
f3f6d9a77f
fix: less blocking chunks (#15687)
* fix: less blocking chunks

* fix
2023-05-24 10:12:04 +01:00
Paul D'Ambra
16d7a605dd
fix: pending chunks idle test (#15681)
* fix: pending chunks idle test

* remove dangling test

* fix
2023-05-23 18:28:54 +01:00
Paul D'Ambra
7858d5ad4c
chore: worrying at ingestion still (#15673)
* less pending chunk logging

* the flush threshold multiplies is too confusing to operate

* must provide all chunk offsets and the correct chunks when processing pending chunks
2023-05-23 13:48:55 +00:00
Paul D'Ambra
e6551c31c2
feat: key blobs by event times (#15640)
* feat: key blobs by event times

* remove TODO

* softly softlly

* Update plugin-server/src/main/ingestion-queues/session-recording/blob-ingester/session-manager.ts

Co-authored-by: Ben White <ben@posthog.com>

---------

Co-authored-by: Ben White <ben@posthog.com>
2023-05-22 16:36:30 +01:00
Paul D'Ambra
10e34efd93
feat: handle chunks less aggresively on flush (#15628)
Co-authored-by: Ben White <ben@posthog.com>
2023-05-22 14:04:00 +01:00
Paul D'Ambra
473b31bf58
feat: back off buffer age threshold as lag grows (#15626)
* feat: back off buffer age threshold as lag grows

* but not _forever_

* max backoff configurable
2023-05-19 17:20:09 +01:00
Xavier Vello
2f27f43406
perf(ingestion): port overflow logic into eachBatchParallelIngestion (#15621) 2023-05-19 16:37:27 +02:00
Paul D'Ambra
5f936b45f0
feat: be more tolerant of lag when bundling (#15596) 2023-05-19 13:48:06 +01:00
Xavier Vello
4f8b981057
perf(ingestion): increase event processing concurrency (#15612) 2023-05-19 08:46:34 +00:00
Paul D'Ambra
9a7bcfce5c
chore: track blocking session when committing offsets (#15606)
* chore: track blocking session when committing offsets

* track lowest session id even after removal
2023-05-18 11:06:15 +01:00
Paul D'Ambra
aa4ec230c2
fix: respect opt out (#15583) 2023-05-17 12:19:02 +01:00
Xavier Vello
10f1bcc28d
chore(ingestion): dont wait on logs to be persisted to Kafka (#15579)
* chore(ingestion): dont wait on logs to be persisted to Kafka

With how bad concurrency is atm re. batching, we end up waiting a long
time on logs to be persisted to Kafka. We don't need to guarantee logs
so instead we let these be async. We still want to await for the message
to have been handled by librdkafka queuing internals though so we still
await but ask that we resolve as soon as we've handed off the message to
librdkafka.

---------

Co-authored-by: Harry Waye <harry@posthog.com>
2023-05-17 10:21:26 +00:00
Paul D'Ambra
d068f2fd53
fix: ensure that offsets are sorted (#15580) 2023-05-17 11:09:09 +01:00
Paul D'Ambra
4509f98628
feat: remove summary config guard (#15572)
* feat: remove summary config guard

* fix
2023-05-17 09:14:48 +01:00
Paul D'Ambra
d1c2fa84fa
fix: support double decoding base64 on decompress (#15542) 2023-05-15 13:48:17 +01:00
Paul D'Ambra
e962afabe1
fix: from logs to changes (#15539)
* fix: from logs to changes

* Update plugin-server/src/main/ingestion-queues/session-recording/blob-ingester/session-manager.ts

* use the key from the map instead of calculating it

* 🙈
2023-05-14 11:59:21 +01:00
Paul D'Ambra
9305651289
chore: worrying at what is happening (#15479)
* chore: worrying at why the blob ingester gets unhappy

* log when file deletion succeeds

* can you wait for e2e ci step without a specified build name

* wait on build posthog cloud?

* disable the step for now

* rugh
2023-05-11 10:57:39 +01:00
Harry Waye
616389713b
revert: use rdkafka consumer for analytics ingestion and onEvent (#15469)
Revert "chore: use rdkafka consumer for analytics ingestion and onEvent (#15432)"

This reverts commit 85bb582cee.
2023-05-10 12:22:38 +00:00
Harry Waye
85bb582cee
chore: use rdkafka consumer for analytics ingestion and onEvent (#15432)
This _should_ give us better performance and reliability, but it's
hard to tell without a lot of testing. Will monitor closely on rollout.

Note that this will require a delete on the old consumer members as they
are using round eager robin partition strategy, whereas this is using
the cooperative sticky partition strategy. librdkafka doesn't support
mixing the two, unlike the Java Kafka Client.

---------

Co-authored-by: Tiina Turban <tiina303@gmail.com>
2023-05-10 11:15:02 +00:00
Ben White
6378d66d30
fix: Correct decision for oldest timestamp in blob consumer (#15452) 2023-05-09 15:40:18 +00:00
Paul D'Ambra
067d73cb4f
feat: write recording summary events (#15245)
Problem
see #15200 (comment)

When we store session recording events we materialize a lot of information using the snapshot data column.

We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.

Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.

add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
2023-05-09 14:41:16 +00:00
Tiina Turban
4cd5447f0e
chore: Nuke buffer pipeline code (#15404) 2023-05-09 14:50:18 +02:00
Paul D'Ambra
92b04ae84b
fix: flush sessions when idle not when buffer has reached an age (#15405)
* get rid of the annoying type errors

* fix: flush sessions when idle not based on buffer file age

* inline

* push timestamp into metadata arg
2023-05-05 17:45:12 +01:00
Harry Waye
cff0dab1ee
fix(plugin-server): send headers as well with KafkaProducerWrapper (#15382)
I forgot to pass this through. I think we nuked the buffer tests so was
only apparent in production :grimace:
2023-05-04 15:28:53 +00:00
Harry Waye
2f9e2928fe
chore(plugin-server): use librdkafka producer everywhere (#15314)
* chore(plugin-server): use librdkafka producer everywhere

We say some 10x improvements in the throughput for session recordings.
Hopefully there will be more improvements here as well, although it's a
little less clear cut.

I don't try to provide any improvements in guarantees around message
production here.

* we still need to enable snappy for kafkajs
2023-05-04 13:02:44 +00:00
Tiina Turban
a5544cf7e4
feat: Async handlers use person info from event (#15307) 2023-05-04 13:25:56 +02:00
Ben White
83d57c5d77
fix: Fixed the logic for assigning and revoking partitions (#15350)
* fix: Fixed the logic for assigning and revoking partitions

* Fix

* reverse the offset manager revoke logic

---------

Co-authored-by: Paul D'Ambra <paul@posthog.com>
2023-05-03 17:42:54 +01:00
Harry Waye
7ba6fa7148
chore(plugin-server): remove piscina workers (#15327)
* chore(plugin-server): remove piscina workers

Using Piscina workers introduces complexity that would rather be
avoided. It does offer the ability to scale work across multiple CPUs,
but we can achieve this via starting multiple processes instead. It may
also provide some protection from deadlocking the worker process, which
I believe Piscina will handle by killing worker processes and
respawning, but we have K8s liveness checks that will also handle this.

This should simplify 1. prom metrics exporting, and 2. using
node-rdkafka.

* remove piscina from package.json

* use createWorker

* wip

* wip

* wip

* wip

* fix export test

* wip

* wip

* fix server stop tests

* wip

* mock process.exit everywhere

* fix health server tests

* Remove collectMetrics

* wip
2023-05-03 14:42:16 +00:00
Harry Waye
96fe16fd3c
chore(recordings): use cooperative-sticky rebalance strategy (#15260)
Revert "revert(recordings): use cooperative-sticky rebalance strategy (#15211)"

This reverts commit a40f01138e.
2023-04-26 13:09:13 +00:00
Ben White
fdb2c71a39
feat: S3 backed recording ingestion (take 2) (#14864) 2023-04-25 09:43:07 +00:00
Harry Waye
a40f01138e
revert(recordings): use cooperative-sticky rebalance strategy (#15211)
Revert "chore(recordings): use cooperative-sticky rebalance strategy (#15197)"

This reverts commit 3eddb96b9b.
2023-04-24 15:06:33 +00:00
Harry Waye
3eddb96b9b
chore(recordings): use cooperative-sticky rebalance strategy (#15197)
* chore(recordings): use cooperative-sticky rebalance strategy

This should make rebalances and lag during deploys a little less
painful. I'm setting this as the globally used strategy, when we e.g.
want to use another strategy for a specific consumer group, we can make
this configurable.

* disable rebalance_callback

* use node-rdkafka-acosom fork instead, for cooperative support
2023-04-24 13:25:24 +00:00
Xavier Vello
5058f71ccf
chore(ingestion): move mmdb database in worker process (#15173) 2023-04-24 11:34:52 +02:00
Harry Waye
67465fce9c
chore(recordings): Add librdkafka to recordings consumer (#15091)
* chore(recordings): Add librdkafka to recordings consumer

This is the sister PR to the change to use the librdkafka producer in
the recordings consumer.

Things of interest here:

 1. we use offset auto commit
 2. we handle storing the offset ourselves, after the message has been
    processed
 3. we do everything concurrently
 4. we implement back pressure based on the number of messages in the
    flight

* Update plugin-server/src/kafka/admin.ts

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>

* Update plugin-server/src/kafka/admin.ts

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>

* Update plugin-server/src/kafka/admin.ts

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>

* Update plugin-server/src/kafka/consumer.ts

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>

* Update plugin-server/src/kafka/consumer.ts

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>

* add default queued values

* clarify linger

---------

Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
2023-04-19 14:42:34 +00:00
Harry Waye
997b3ff9dd
fix(plugin-server): only set ssl config when defined (#15071)
* Revert "Revert "perf(recordings): use node-librdkafka for ingester production" (#15069)"

This reverts commit ac5e084f48.

* fix(plugin-server): only set ssl config when defined

Hopefully this means it will use the global CA bundle.

* hack: enable debug logs

* honor KAFKAJS_LOG_LEVEL envvar

* add SegfaultHandler

* disable ssl verification

* debug -> info

* only log brokers

* Revert "add SegfaultHandler"

This reverts commit b22f40b802.

---------

Co-authored-by: Xavier Vello <xavier@posthog.com>
2023-04-13 16:52:51 +01:00
Xavier Vello
ac5e084f48
Revert "perf(recordings): use node-librdkafka for ingester production" (#15069)
Revert "perf(recordings): use node-librdkafka for ingester production (#15041)"

This reverts commit 7f852ab618.
2023-04-13 11:23:22 +01:00
Xavier Vello
7f852ab618
perf(recordings): use node-librdkafka for ingester production (#15041) 2023-04-13 11:55:16 +02:00
Xavier Vello
adc0acc4bc
chore(recordings): revert: use node-librdkafka for ingester production (#15032)
Revert "chore(recordings): use node-librdkafka for ingester production (#14460)"

This reverts commit c34979853e.
2023-04-11 17:10:49 +00:00
Harry Waye
c34979853e
chore(recordings): use node-librdkafka for ingester production (#14460)
Previously we've been using the KafkaJS Producer with a wrapper around
it to handle batching. There are a number of issues with the batching
implementation e.g. not having a way to provide guarantees on delivery
and rather than fix that, we can simply use the librdkafka Producer
which is a lot more mature and battle-tested.
2023-04-11 16:44:39 +01:00
Xavier Vello
2a5f6d3691
fix(tests): make getEventsByPerson output stable to avoid flakes (#15009) 2023-04-07 12:00:11 +00:00
Tiina Turban
0163b63dad
fix: Remove sentry noise for missing group_type (#14825) 2023-03-20 16:16:54 +01:00
Tiina Turban
f065757ae7
revert "chore: person props last-op/ts clean-up" (#14741)
Revert "chore: person props last-op/ts clean-up (#14316)"

This reverts commit 4bf7ddd8e9.
2023-03-14 16:36:44 +00:00
Tiina Turban
4bf7ddd8e9
chore: person props last-op/ts clean-up (#14316) 2023-03-14 16:35:53 +01:00
Xavier Vello
504a3edab8
chore: rename runLightweightCaptureEndpointEventPipeline to runEventPipeline (#14365) 2023-03-14 15:38:30 +01:00
Tiina Turban
d35239c658
feat: Add merge_dangerously event (#14625) 2023-03-08 14:37:48 +01:00
Harry Waye
d840c5ac09
chore: fix flakey session recordings error case test (#14562)
* chore: fix flakey session recordings error case test

We need to make sure that we hit the case where the events are produced
for ClickHouse ingestion. The signature of the `queueMessage` function
is interesting in that it's behaviour depends on some heiuristics as to
if it will flush and therefore reject or not.

I would like to change this behavior but my preference would be to
update to use rdkafka and update to have more sensible behaviour then.

* Add call expects
2023-03-06 13:01:23 +00:00
Harry Waye
fdf97c9a92
chore(kafka): ensure retry on kafkajs produce failure (#14543)
* chore(kafka): ensure retry on kafkajs produce failure

This is a fix to ensure that we do not simply drop events when Kafka is
e.g. down. We were previously catching the KafkaJSError but it seems the
errors are always run using the `retry` function, which means we always
get a KafkaJSNumberOfRetriesExceeded error.

* wip

* use real timers
2023-03-06 10:37:16 +00:00
Harry Waye
dcc9acc47d
chore(recordings): remove hub dependency on recordings ingestion (#14418)
* chore(recordings): remove hub dependency on recordings ingestion

Hub is a grab bag of depencencies that are not all required for
recordings ingestion. To keep the recordings ingestion lean, we
remove the hub dependency and use the postgres and kafka client
directly.

This should increase the availability of the session recordings
workload, e.g. it should not go down it Redis or ClickHouse is down.

* fix capabilities call

* reuse clients if available

* wip

* wip

* wip

* fix tests

* fix healthcheck
2023-02-28 10:23:07 +00:00
Raquel Smith
bc9b449ae5
fix: set msclkid on person like gclid (#14386)
* set msclkid on person like gclid

* fix typos (thx copilot)
2023-02-24 19:35:35 +00:00
Xavier Vello
25cb653c1b
feat(ingestion): run all analytic events through populateTeamDataStep (#14341)
* team-manager: expire negative lookups after 5 minutes, improve docs
* populateTeamDataStep: don't drop token, keep team_id from capture if present, report results
* ingestEvent: run all analytic events through runLightweightCaptureEndpointEventPipeline
* continue accepting events with no token but a team_id
2023-02-22 14:24:05 +01:00
Tiina Turban
5f24e281b1
fix: groupidentify to always update props and ignore timestamps (#14240) 2023-02-21 17:07:47 +01:00
Tomás Farías Santana
4f2412ea9a
fix(slowlane-ingestion): Access db attribute instead of hub (#14292) 2023-02-17 15:27:36 +01:00
Tomás Farías Santana
66750cf2cc
refactor: Set key to null during batching to process in parallel (#14278)
* fix(slowlane-ingestion): Clarify in the warning we are still processing

* docs(slowlane-ingestion): Clarify we are re-producing when running with ingestionOverflow enabled

* refactor(slowlane-ingestion): Set key to null during batching to process in parallel

* refactor(slowlane-ingestion): Simplify batching logic and send warning on eachMessage

* fix: Add missing whitespace in comment

Co-authored-by: Tiina Turban <tiina303@gmail.com>

* test(slowlane-ingestion): Assert event pipeline doesn't run if overflowing

* fix: Check for batch length bigger than  batchSize

Co-authored-by: Tiina Turban <tiina303@gmail.com>

* refactor(slowlane-ingestion): Raise warning on overflow consumer instead

* test(slowlane-ingestion): Add tests for overflow consumer

* refactor(slowlane-ingestion): Use groupIntoBatches utility in overflow consumer

---------

Co-authored-by: Tiina Turban <tiina303@gmail.com>
2023-02-17 10:42:58 +01:00
Tomás Farías Santana
1e94d8e138
feat(ingestion-slowlane): Re-route events in plugin-server on capacity exceeded (#14211)
* feat(ingestion-slowlane): Add token-bucket utility

* feat(ingestion-slowlane): Re-route overflow events

* fix: Import missing stringToBoolean

* fix(ingestion-slowlane): Flip around kafka topics according to mode

* refactor(ingestion-slowlane): Use dash instead of underscore in filename

* fix(ingestion-slowlane): Do not increase tokens beyond bucket capacity

* feat(ingestion-slowlane): Add ingestion-overflow mode/capability/consumer

* feat(ingestion-slowlane): Add ingestion warning for capacity overflow

* test(ingestion-slowlane): Add test for ingestion of overflow events

* fix(ingestion-slowlane): Rate limit warnings to 1 per hour

* test(ingestion-slowlane): Add a couple more tests for overflow re-route

* fix(slowlane-ingestion): Look at batch topic to determine message topic

* refactor(slowlane-ingestion): Use refactored consumer model

* fix(slowlane-ingestion): Undo topic requirement in eachMessageIngestion

* refactor(slowlane-ingestion): Only produce events if ingestionOverflow is also enabled

* refactor(slowlane-ingestion): Use an env variable to determine if ingestionOverflow is enabled

* chore(slowlane-ingestion): Add a comment explaining env variable
2023-02-16 14:30:13 +01:00
Harry Waye
ce777f7efa
fix: make sort order deterministic in property definitions manager test (#14266)
* fix: make sort order deterministic in property definitions manager test

I'd added an order in a previous PR, but its a UUID.

* fix sql statements
2023-02-16 12:10:53 +00:00
Harry Waye
f62041a833
refactor(ingestion): pull out topic/groupid from kafka-queue (#14249)
* refactor(ingestion): pull out topic/groupid from kafka-queue

We have `IngestionConsumer` at the moment that holds a lot of complexity
in it regarding topics/groupid/message handlers. This is a step towards
moving that logic out of the `IngestionConsumer`, and making the top
level of the pluginsServer simpler to reason about.

* wip

* wip

* wip

* wip
2023-02-15 14:16:23 +00:00
Harry Waye
235b379707
refactor(plugin-server): remove nextStep functionality from pipeline (#13964)
This is intended to make the pipeline a little more readable by moving
the control flow out of the steps and into the runner. It also makes
it easier to add new steps to the pipeline.

Co-authored-by: Tiina Turban <tiina303@gmail.com>
2023-01-31 10:21:05 +00:00
Harry Waye
da482a3cba
refactor(recordings): remove session code from event pipeline (#13919)
* refactor(recordings): remove session code from event pipeline

We have moved session recrodings to a separate topic and consumer. There
may be session recordings in the old topic, but we divert these to the
new logic for processing them.

* refactor to just send to the new topic!

* fix import

* remove empty line

* fix no team_id test

* implement recordings opt in

* remove old $snapshot unit tests

* remove performance tests

* Update plugin-server/functional_tests/session-recordings.test.ts

Co-authored-by: Tiina Turban <tiina303@gmail.com>

* Update plugin-server/functional_tests/session-recordings.test.ts

Co-authored-by: Tiina Turban <tiina303@gmail.com>

* add back $snapshot format test

* Add comment re functional test assumptions

Co-authored-by: Tiina Turban <tiina303@gmail.com>
2023-01-27 12:36:45 +00:00
Tiina Turban
3cdad732fd
feat: PoE placeholder for ingestion and testing enabling (#13881) 2023-01-26 15:18:25 +01:00
Karl-Aksel Puulmann
8930d9e460
feat: capture person/group property definitions (2/2) (#13816)
* feat: ingest person and group property definitions 2/2

* Update test
2023-01-20 15:42:00 +02:00
Karl-Aksel Puulmann
15b6ade4a0
feat: capture person/group property definitions (1/2) (#13809)
* migration for person/group property support in property definitions table

* Use database default

* Validate correct constraint

* Ingest person and group type property definitions

* Exclude person/group type definitions from API

* Update property definitions test

* Ignore $groups

* Add new unique index which accounts for type and group_type_index

* Run new code only in test

* Ignore errors from propertyDefinitionsManager which may occur due to migrations

* Update constraint name

* Update test describe

* ON CONFLICT based on the index expression

* Add a -- not-null-ignore

* Combine migrations

* Remove some test code temporarily

* fixup latest_migrations
2023-01-20 14:47:32 +02:00
Harry Waye
51e134e98c
chore(session-recordings): separate topics for events as recordings (#13654)
* chore(session-recordings): separate topics for events as recordings

WIP

* fix tests

* Use simpler consumer for session recordings

* wip

* still batch things by batchSize

* add tests, improve comments

* rename topic var

* push performance_events to session recordings topic also

* Add completely separate consumer for session-recordings

* wip

* use session_id for partition key

* fix test

* handle team_id/token null

* wip

* fix tests

* wip

* use kafka_topic var in logs

* use logger

* fix test

* Fix $performance_event topic usage

* fix tests

* fix check for null/undefined

* Update posthog/api/capture.py

Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>

* Add test for kafka error handling

* Remove falsy teamId check

* fix statsd error

* kick ci

* Use existing getTeamByToken

* remove partition key from recordings

* Make sure producer is connected !

* fix session id kafka key test

* add back throws!

* set producer on each test

* skip flaky test

* add flush error logs

* wait for persons to be ingested

* fix skip

Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
2023-01-17 12:04:03 +00:00
Ben White
cb7e7d5e5e
feat: Added performance API (#13452) 2023-01-06 09:51:51 +01:00
Harry Waye
a27d452171
feat(person-on-events): add option to delay all events (#13505)
* feat(person-on-events): add option to delay all events

This change implements the option outlined in
https://github.com/PostHog/product-internal/pull/405

Here I do not try to do any large structural changes to the code, I'll
leave that for later although it does mean the code has a few loose
couplings between pipeline steps that probably should be strongly
coupled. I've tried to comment these to try to make it clear about the
couplings.

I've also added a workflow to run the functional tests against both
configurations, which we can remove once we're happy with the new
implementation.

Things of note:

 1. We can't enable this for all users yet, not without the live events
    view and not without verifying that the buffer size is sufficiently
    large. We can however enable this for the test team and verify that
    it functions as expected.
 2. I have not handled the case mentioned in the above PR regarding
    guarding against processing the delayed events before all events in
    the delay window have been processed.

wip

test(person-on-events): add currently failing test for person on events

This test doesn't work with the previous behaviour of the
person-on-events implementation, but should pass with the new delay all
events behaviour.

* add test for KafkaJSError behaviour

* add comment re delay

* add test for create_alias

* chore: increase exports timeout

It seems to fail in CI, but only for the delayed events enabled tests.
I'm not sure why, but I'm guessing it's because the events are further
delayed by the new implementation.

* chore: fix test

* add test for ordering of person properties

* use ubuntu-latest-8-cores runner

* add tests for plugin processEvent

* chore: ensure plugin processEvent isn't run multiple times

* expand on person properties ordering test

* wip

* wip

* add additional test

* change fullyProcessEvent to onlyUpdatePersonIdAssociations

* update test

* add test to ensure person properties do not propagate backwards in time

* simplicfy person property tests

* weaken guarantee in test

* chore: make sure we don't update properties on the first parse

We should only be updating person_id and asociated distinct_ids on first
parse.

* add tests for dropping events

* increase export timeout

* increase historical exports timeout

* increase default waitForExpect interval to 1 second
2023-01-05 16:38:43 +00:00
Tiina Turban
a051f37a7a
feat(plugin-server): track person creation event uuid (#13102) 2022-12-05 20:11:23 +01:00
Harry Waye
1e6c062095
feat(plugin-server): distribute scheduled tasks i.e. runEveryX (#13124)
* chore(plugin-server): disrtibute scheduled tasks

Changes I've made here from the original PR:

 1. add some logging of task run times
 2. add concurrency, except only one task of a plugin will run at a time
 3. add a timeout to task run times

This reverts commit 23db43a0dc.

* chore: add timings for scheduled tasks runtime

* chore: add timeouts for scheduled tasks

* chore: clarify duration unit

* chore: deduplicate tasks in a batch, add partition concurrency

* chore: add flag to switch between old and new behaviour

This defaults to new, but can be set to old by setting environment
variable `USE_KAFKA_FOR_SCHEDULED_TASKS=false`

* fix tests

* enable USE_KAFKA_FOR_SCHEDULED_TASKS in tests
2022-12-05 12:30:52 +00:00
Harry Waye
23db43a0dc
Revert "Revert "Revert "fix(plugin-server): ignore old cron tasks from graphile-worker """ (#13107)
Revert "Revert "Revert "fix(plugin-server): ignore old cron tasks from graphile-worker "" (#13100)"

This reverts commit 8eec4c9346.
2022-12-03 00:13:27 +00:00
Harry Waye
8eec4c9346
Revert "Revert "fix(plugin-server): ignore old cron tasks from graphile-worker "" (#13100)
Revert "Revert "fix(plugin-server): ignore old cron tasks from graphile-worker " (#13095)"

This reverts commit 5634ab4d7f.
2022-12-02 18:11:17 +00:00
Harry Waye
5634ab4d7f
Revert "fix(plugin-server): ignore old cron tasks from graphile-worker " (#13095)
Revert "fix(plugin-server): ignore old cron tasks from graphile-worker  (#13094)"

This reverts commit b079a8cc8e.
2022-12-02 16:28:30 +00:00
Harry Waye
b079a8cc8e
fix(plugin-server): ignore old cron tasks from graphile-worker (#13094)
* Revert "Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)"

This reverts commit 78e6f48660.

* fix(plugin-server): ignore old cron tasks from graphile-worker

When we are backed up on jobs, we end up still creating tasks in the
graphile-worker job table, i.e. there is no backpressure. This change
makes us skip over old tasks, so that we don't get backed up.

* fix tests
2022-12-02 15:20:16 +00:00
Harry Waye
78e6f48660
Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)
Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX (#13037)"

This reverts commit 45912e839c.
2022-12-02 10:40:58 +00:00
Harry Waye
45912e839c
feat(plugin-server): distribute scheduled tasks i.e. runEveryX (#13037)
* feat(plugin-server): distribute scheduled tasks i.e. runEveryX

At the moment we only run on which ever Graphile worker node picks up
the scheduled tasks. Tasks are run in sequence, running through each of
the associated pluginConfigIds.

We tried to spread the workload by creating a Graphile Worker job for
each pluginConfigId, but this caused a lot of load on the Graphile
Worker database.

One thing this PR doesn't tackle is what happens if we end up having the
jobs back up. There is probably some logic we should add to avoid really
old scheduled tasks from running.

* wip

* wip

* fix tests

* fix tests

* types

* update unit test

* add key

* fix order

* Update plugin-server/src/main/ingestion-queues/scheduled-tasks-consumer.ts

* chore: skip stale scheduled tasks

* update comments

* add statsd counter
2022-12-02 09:42:55 +00:00
Yakko Majuri
aa89545a66
fix(ingestion): do not create or update person from $snapshot events (#13048)
* fix(ingestion): do not create or update person from  events

* fix tests
2022-12-01 10:37:53 -03:00
Yakko Majuri
90f1b16285
feat(ingestion): remove postgres dependency from capture endpoint (#12802)
* add support for token field in kafka message

* formPipelineEvent

* rename pipeline files according to new order

* wip team_id and anonymize ips

* conditional handlers and tests

* some plugin server fixes

* fix capture bug

* fix

* more fixes

* fix capture tests

* pipeline update

* fix + investigate database resets

* fix import order

* testing and typing updates

* add test for capture endpoint

* testing

* python typing

* plugin server test

* functional test

* fix test

* another fix

* make sure no team ids clash in tests

* fix

* add more metrics and logs

* cache nulls

* updates

* add more metrics
2022-11-23 09:55:26 -03:00
Tiina Turban
41c983cc93
chore: throw when we ran out of wait time waiting for CH ingestion (#12126) 2022-11-09 15:26:52 +01:00
Yakko Majuri
469057b905
refactor(plugin-server): rename KafkaQueue to IngestionConsumer (#12540)
* refactor(plugin-server): rename KafkaQueue to IngestionConsumer

* fix

* final fix

* welp
2022-11-08 13:44:29 -03:00
Harry Waye
ac5a40f5b2
chore(ingestion): remove graphile as dependency of ingestion pipeline (#12551)
* chore(ingestion): remove graphile as dependency of ingestion pipeline

This allows us to run just the ingestion part of the plugin-server
without needing to perform any graphile operations e.g. creating
connections to the graphile database.

This has the advantage that:

 1. if the graphile database is down, the ingestion pods can still start
    up and will function correctly.
 2. avoids creating a connection pool to the graphile database for each
    ingestion pod, which could be a lot of connections and could cause
    the database to scale.
 3. avoids running the graphile migrations on each ingestion pod, which
    is unnecessary and could cause unnecessary database load.

* wip

* wip

* wip

* wip
2022-11-01 16:01:08 +00:00
Karl-Aksel Puulmann
f07f8763e4
fix(person-on-events): Fix groups caching in ingestion (#12547)
* fix(person-on-events): Fix groups caching in ingestion

We were seeing some groups-related events never get ingested in
playground. Digging in, it turned out that these events were serialized
with invalid timestamps due to cache containing dates in different
formats.

The bug was introduced in https://github.com/PostHog/posthog/pull/12403
and makes for a good case study for this common class of errors

There were multiple practices that could have indicated the error sooner:
1. Tests for the feature mocked out the DB and used a different data
format than what is used properly
2. Some methods related to caching were not properly updated to test the
caching logic
3. timestamps-as-strings: we deal with both ISO and clickhouse-format
timestamps, and the code didn't differentiate between them properly
4. `getGroupsColumns` signature was very loose, allowing for everything
to pass by

This change fixes the issue as well as updates relevant code to be more
in-line with best practices.

* Solve minor typing related issue
2022-11-01 14:27:34 +02:00
Yakko Majuri
5aafc7a115
feat(ingestion): buffer events in kafka if postgres is down (#12532)
* feat(ingestion): buffer events in kafka if postgres is down

* also add DependencyUnavailableError to transaction

* Update plugin-server/src/utils/db/db.ts
2022-10-31 19:11:10 +00:00
Harry Waye
13bca71383
chore(ingestion): remove old graphile bufferJob handling (#12528)
* chore(ingestion): remove old graphile bufferJob handling

This removes the emitting of graphile-worker events from the ingestion
anonymous events path. Note that we still have the graphile worker
running on ingestion, as we need to ensure that we have drained all of
these jobs. I'll handle this by first enabling the topic for all users
on prod then deploying this.

For self hosted I suggest we just go with adding a comment that
anonymous events that have been send to graphile in the meantime will be
lost. Or something else that makes sense.

* fix typing

* remove test
2022-10-31 12:09:20 +00:00
Harry Waye
cc2f424452
chore(plugins-server): use Kafka to buffer app jobs requests (#12345)
* chore(plugins-server): use Kafka to buffer app jobs requests

To remove the dependency on the Graphile Worker database on things that
may be requesting app job runs we push the jobs to a Kafka topic.

* chore: use KAFKA_JOBS instead of string literal `'jobs'`

* chore: rename startJobsBufferConsumer -> startJobsConsumer

* avoid checking eventId

* fix lint

* fix producer wrapper tests

* fix retries test

* handle offset sync

* wip

* wip

* remove exports

* do better

* use Producer not wrapper

* reset db

* mock once

* Add test for raising to the consumer

* Update plugin-server/tests/main/ingestion-queues/run-async-handlers-event-pipeline.test.ts

Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>

* and in the darkness bind them

* fix tests

* don't forget the name update!

* rename DependencyError to DependencyUnavailable

* separate dlq

* update comment

Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
2022-10-28 11:05:15 +01:00
Yakko Majuri
1c2713a7b9
Revert "feat(scheduler): allow spreading scheduled tasks load across the fleet" (#12482)
Revert "feat(scheduler): allow spreading scheduled tasks load across the fleet (#12477)"

This reverts commit 98a14fc7c8.
2022-10-27 15:34:16 -03:00
Yakko Majuri
98a14fc7c8
feat(scheduler): allow spreading scheduled tasks load across the fleet (#12477)
* feat(scheduler): allow spreading scheduled tasks load across the fleet

* update test

* Update plugin-server/src/main/graphile-worker/worker-setup.ts

Co-authored-by: Harry Waye <harry@posthog.com>

* tweaks

Co-authored-by: Harry Waye <harry@posthog.com>
2022-10-27 17:35:45 +00:00
Yakko Majuri
4f372c05f9
feat(plugin-server): simplify groups caching (#12403)
* refactor(plugin-server): simplify groups caching

* add multi groups test

* remove comments

* fix type, add debug

* fix

* stringify

* add groups created_at to types

* more test fixes

* use the right clickhouse timestampo format

* update created at to ch format in tests

* finally

* more fixes
2022-10-25 15:35:47 -03:00
Yakko Majuri
8ed495b327
fix: groups data fetching bugs (#12371)
* fix: groups data fetching bugs

* add tests
2022-10-21 12:33:36 -03:00
Yakko Majuri
c47a73165a
feat(plugin-server): use graphile-worker crontab (#12242)
* yeet references to redlock

* rename jobs/ to graphile-worker/

* feat(plugin-server): use graphile-worker crontab

* remove debugging

* yeet redlock dependency

* remove legacy test

* Update comment

* Update plugin-server/src/main/pluginsServer.ts

Co-authored-by: Harry Waye <harry@posthog.com>

* address review, update tests

* fix old tests

* testing, testing

* maybe fix sigterm

Co-authored-by: Harry Waye <harry@posthog.com>
2022-10-18 11:44:41 -03:00
Tiina Turban
377f2ae47f
chore: Rollout groups properties writes to events (#12233)
* chore: Rollout groups properties writes to events

* forgotten save

* fix test
2022-10-17 09:50:20 -03:00
Yakko Majuri
53b527dbbe
refactor(graphile-worker): update terminology, clearer capabilities approach for setup (#12203)
* rename legacy references to queue to more appropriate worker terminology

* rename startJobsConsumer -> startGraphileWorker, no-op refactor

* add back enqueue success and failure metrics

* fix mock import

* fix test for good
2022-10-12 10:24:22 -03:00
Yakko Majuri
a34228c49f
refactor: yeet job queues scaffolding in favor of only graphile worker (#12178)
* refactor: rename graphile queue to graphile worker

* refactor: rename job-queues/ to jobs/

* refactor: move graphile-worker to top level jobs/ dir

* refactor: remove references to jobQueueManager

* remove promise from startJobQueueConsumer

* remove job-queue-manager.ts

* remove non-test references to JobQueueBase

* make fs-queue independent from JobQueueBase

* rename FsQueue to MockGraphileWorker

* add missing pauseConsumer method to MockGraphileWorker

* rename fs-queue.ts --> mock-graphile-worker.ts

* delete job-queue-base.ts

* get rid of JobQueue type

* rename graphileQueue --> graphileWorker

* rename JobQueueConsumerControl --> JobsConsumerControl

* remove unused jobs test

* rename startJobQueueConsumer --> startJobsConsumer

* fix tests job imports

* rename jobQueueManager --> graphileWorker in tests

* remove JobQueueManager tests

* fix import

* handle metrics and retries on graphileWorker.enqueue

* minor fix

* Delete buffer.ts

* Revert "Delete buffer.ts"

This reverts commit 40f1761d31.

* add initial test scaffolding

* bring back relevant worker control promises

* fix existing tests

* add tests for graphile worker

* fix exportEvents retries test

* update e2e buffer test
2022-10-11 15:40:34 -03:00
Tiina Turban
20b9205877
fix: Always update is_identified = true for first identify or alias user (#12121) 2022-10-11 15:22:10 +02:00
Tiina Turban
c6b1da5932
fix: hide initial referrer as event property (#11536) 2022-08-30 18:07:02 +02:00
Karl-Aksel Puulmann
14b420da0a
fix(plugin-server): Fix cohort matching in actions (#11388)
* fix(plugin-server): Remove wild clickhouseQuery in ingestion pipeline

Point queries against clickhouse are slow and we should avoid them.
They're also not instrumented.

The postgres table already used in the method previously contains the
right data. Use that instead.

Reference: https://github.com/PostHog/posthog/blob/master/posthog/models/cohort/cohort.py#L274-L316

* Fixup and test doesPersonBelongToCohort

* Handle NULLs
2022-08-22 11:07:56 +03:00
Michael Matloka
7bd3cac2f5
refactor(plugin-server): Unify event types (#10612)
* Simplify Event, ClickHouseEvent, PreIngestionEvent, IngestionEvent

* Unify `ClickhouseEventKafka` with `RawEvent`

* Fix imports

* Eliminate PostgresSessionRecordingEvent

* Parse `Event.elements_chain` too

* Update process-event.test.ts

* Update tests

* Make `IngestionEvent['timestamp']` consistent

* Update tests

* Restore `PreIngestionEvent` vs. `PostIngestionEvent` split

* Update worker.test.ts

* Improve typing a bit

* Update tests to work with mandatory `DateTime`

* Remove ClonableIngestionEvent

* Rename RawEvent -> RawClickHouseEvent

* Rename Event -> ClickHouseEvent

* Update prepareEventStep tests

* Update convertToIngestionEvent behavior back to master

* Update tests to compile

* Use branded types for ISO/Clickhouse timestamp string disambiguation

* Test for parseRawClickHouseEvent()

* Update each-batch tests

* Tests for clickHouseTimestampToDateTime()

Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
2022-08-15 10:54:09 +03:00
Harry Waye
635fc7b23d
chore(plugin-server): remove healthcheck topic references (#11252)
* chore(plugin-server): remove healthcheck topic references

Rather than doing an end to end produce/consume from this topic, we
instead rely on the intrumentation of KafkaJS to understand if the
consumer is ready.

Note that this code is not being used since the change to just return an
HTTP 200 from the liveness endpoint:
https://github.com/PostHog/posthog/pull/11234

This is just a cleanup of dead code.

* Remove Kafka healthcheck tests
2022-08-11 12:11:43 +00:00
Karl-Aksel Puulmann
f8c203fe5a
fix(plugin-server): refactor groups caching (#11141)
* Remove unneeded method

* Refactor how groups are handled

* Remove .only
2022-08-05 09:26:45 +03:00
Karl-Aksel Puulmann
4f648268f2
feat(ingestion): Make person loading lazy (#11091)
* fix issues with fetchPerson() and add tests

- fetchPerson() returned extra columns that were not needed

* Add LazyPersonContainer class

* Load person data lazily through the event pipeline

* Make webhooks and action matching lazy

* Update runAsyncHandlersStep

* Return own person properties in process-event.ts

* Remove snapshots that caused pain

* Handle serialization of LazyPersonContainer

* Merge: Handle LHS only existing

.get() would be cached in that case not to do a query, which we can
avoid

* Serialize result args as well

* Make personContainer functional

* Resolve feedback
2022-08-04 09:57:43 +03:00
Karl-Aksel Puulmann
9c6f20b697
chore(plugin-server): Improve tracing (#11042)
* Include kafka topic for setup

* Sample runEventPipeline/runBufferEventPipeline less frequently comparatively

This is done by duration - we still want the long transactions, but not
the short ones

* Trace enqueue plugin jobs

* Trace node-fetch

* Trace worker creation

* Various fixes

* Line up query tags properly

* Make fetch mocking work

* Resolve typing-related issues
2022-08-03 16:12:56 +03:00
Ben White
f0f0cd4e15
feat: Testing alpha releases of JS libs (#11011)
* feat: Updated to alpha version of posthog-js
* Swap to alpha versions of other libs
2022-07-28 11:19:56 +00:00
Karl-Aksel Puulmann
156fa2353f
feat(plugin-server): Use Snappy compression codec for kafka production (#10974)
* feat(plugin-server): Use Snappy compression codec for kafka production

This helps avoid 'message too large' type errors (see
https://github.com/PostHog/posthog/pull/10968) by compressing in-flight
messages.

I would have preferred to use zstd, but the libraries did not compile
cleanly on my machine.

* Update tests
2022-07-28 11:58:33 +03:00
Karl-Aksel Puulmann
d00d587b1c
chore(plugin-server): Improve kafka producer wrapper (#10968)
* chore(plugin-server): include extra information on kafka producer errors

We're failing to send batches of messages to kafka on a semi-regular
basis due to message sizes. It's unclear why this is the case as we try
to limit each message batch size.

This PR adds information on these failed batches to sentry error
messages.

Example error: https://sentry.io/organizations/posthog2/issues/3291755686/?project=6423401&query=is%3Aunresolved+level%3Aerror

* refactor(plugin-server): Remove Buffer.from from kafka messages

This allows us to be much more accurate estimating message sizes,
hopefully eliminating a class of errors

* estimateMessageSize

* Track histogram with message sizes

* Flush immediately for too large messages

* fud
2022-07-27 11:26:19 +00:00
Yakko Majuri
4bce5dfa8a
feat: (bring back) buffer 3.0 again (#10896)
* Revert "Revert "feat: (bring back) buffer 3.0  (#10874)" (#10883)"

This reverts commit e203bc7cfa.

* reduce graphile load
2022-07-20 12:16:13 +00:00
Yakko Majuri
e203bc7cfa
Revert "feat: (bring back) buffer 3.0 (#10874)" (#10883)
This reverts commit 3e772b8614.
2022-07-19 17:50:06 +00:00
Yakko Majuri
3e772b8614
feat: (bring back) buffer 3.0 (#10874)
* Revert "Revert "feat: buffer 3.0 (graphile) (#10735)" (#10802)"

This reverts commit ca8c4d0271.

* add metrics and error tracking
2022-07-19 16:34:07 +00:00
Yakko Majuri
ca8c4d0271
Revert "feat: buffer 3.0 (graphile) (#10735)" (#10802)
This reverts commit 9a2a9046cb.
2022-07-14 18:24:58 +00:00
Yakko Majuri
9a2a9046cb
feat: buffer 3.0 (graphile) (#10735)
* feat: buffer 3.0 (graphile)

* fixes

* test

* address review

* add test for buffer processAt
2022-07-13 11:32:00 +00:00
Yakko Majuri
985148ee7e
feat: buffer 2.0 (#10653)
* feat: buffer 2.0 proposal

* add tests

* prevent infinite retrying

* perf

* updates

* tweaks

* Update latest_migrations.manifest

* Update plugin-server/src/main/ingestion-queues/buffer.ts

* update

* updates

* fix migrations issue

* reliability uopdates

* fix tests

* test fix

* e2e test

* test

* test

* ??

* cleanup
2022-07-08 10:48:25 +00:00
Yakko Majuri
58a1fea111
fix: handle stale batches in buffer (#10643)
* Revert "Revert "fix: handle stale batches in buffer (#10641)" (#10642)"

This reverts commit b564688ad8.

* fix test
2022-07-05 18:16:49 +00:00
Michael Matloka
b04015f25e
chore(plugin-server): Consume from buffer topic (#10475)
* chore(plugin-server): Consume from buffer topic

* Refactor `posthog` extension for buffering

* Properly form `bufferEvent` and don't throw error

* Add E2E test

* Test buffer more end-to-end and properly

* Put buffer-enabled test in a separate file

* Update each-batch.test.ts

* Test that the event goes through the buffer topic

* Fix formatting

* Refactor out `spyOnKafka()`

* Ensure reliability batching-wise

* Send heartbeats every so often

* Make test less flaky

* Commit offsets if necessary before sleep too

* Update tests

* Use seek-based mechanism (with KafkaJS 2.0.2)

* Add comment to clarify seeking

* Update each-batch.test.ts

* Make minor improvements
2022-06-28 13:30:10 +02:00
Yakko Majuri
a598c7b664
feat(persons-on-events): cache + send persons and groups created_at with events (#10457)
* feat(persons-on-events): cache + send persons and groups created_at with events

* more testing

* Update plugin-server/src/utils/db/db.ts

* better naming

* fixes

* testing

* update test
2022-06-27 11:39:58 +00:00
Neil Kakkar
9712fd9bb5
chore(feature-flags): Upsert hash key overrides on people merges (#10418) 2022-06-24 10:58:42 +01:00
Karl-Aksel Puulmann
773f922eef
feat(apps): Remove onAction plugin function (#10414)
* Remove onAction

* Avoid fetching actions that dont deal with REST - 99% reduction

* Plural hooks

* Avoid hook fetching where not needed

* Remove dead code

* Update lazy VM test

* Rename a function

* Update README

* Explicit reload actions in tests

* Only reload actions which are relevant for plugin server

* Remove excessive logging

* Reload actions when hooks are updated

* update action matching tests

* Remove commented code

* Solve naming issues
2022-06-24 12:29:10 +03:00
Karl-Aksel Puulmann
f4668ed855
refactor(plugin-server): move buffer as first step of event pipeline & more (#10360)
* WIP: Move person creation earlier

* WIP: move person updating, handle person property changing

* WIP: leverage person information

* Update `updatePersonDeprecated` signature

* Avoid (and test avoiding) unneeded lookups whether 'creating' person is needed

Note there were two tricky interactions within handleIdentify, which
again got solved by indirect message passing.

* Solve TODO

* Normalize event before updatePersonIfTouchedByPlugins

* Avoid another lookup for person in updatePersonProperties

* Avoid lookup for newPerson in handleIdentifyOrAlias

* Add kludge comments

* Fix runBufferEventPipeline

* Rename upsertPersonsStep => processPersonsStep

* Update emitToBufferStep tests

* Update some event pipeline step tests

* Update prepareEventStep tests

* Test processPersonStep

* Add tests for updatePersonIfTouchedByPlugins step

* Update runner tests

* verify person vesrion in event-pipeline-integration test

* Update process-event test suite

* Argument ordering for person state tests

* Update runner test snapshots

* Cast to UTC

* Fixup person-state tests

* Dont refetch persons needlessly on $identify

* Add missing version assertion

* Cast everything to UTC

* Remove version assertion

* Undo radical change to event pipeline - will re-add it later!

* Resolve comments
2022-06-23 10:27:01 +03:00
Tiina Turban
c659bad2ef
Revert "revert: Rollout ingestion batch breakup by distinctId (#10393)" (#10398)
This reverts commit 744d4ddf84.
2022-06-21 14:34:45 -07:00
Michael Matloka
744d4ddf84
revert: Rollout ingestion batch breakup by distinctId (#10393)
This reverts commit 9a085cb1f6.
2022-06-21 19:06:31 +02:00
Tiina Turban
9a085cb1f6
chore: Rollout ingestion batch breakup by distinctId (#10370)
* chore: Rollout ingetion batch breakup by distinctId

* Update task-definition.plugins-ingestion.json

Co-authored-by: Michael Matloka <dev@twixes.com>
2022-06-21 17:31:53 +02:00
Karl-Aksel Puulmann
dea8c6973a
perf(plugin-server): reduce number of person lookups in the event pipeline (#10324)
* Return person in PreIngestionEvent if possible

* Avoid unneccessarily fetching person in emitToBufferStep

* Avoid unneccessarily fetching person in createEvent

* Use unified type instead of separate type for cached data

* Pass person info forward explicitly in each event-pipeline step

* minor typing improvement

* Remove person from type

* Remove unneeded `undefined`

* Add person check for prepareEventStep test

* Fix hook test

* Update getPersonData tests

* Cast created_at to UTC

* Cast created_at to utc on fetch

* Remove personUuid var - unneeded

* Add unit tests for process-event.ts#createEvent
2022-06-21 09:18:22 +03:00
Michael Matloka
313226838c
revert: revert: Revert person properties updates refactor (#10349)
* Revert "revert: Revert person properties updates refactor (#10348)"

This reverts commit 6b3c4691b3.

* sanitizeEvent -> normalizeEvent

* Ensure we handle property updates from within plugins, test

Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
2022-06-20 09:49:11 +03:00
Neil Kakkar
6b3c4691b3
revert: Revert person properties updates refactor (#10348) 2022-06-17 17:48:20 +02:00
Karl-Aksel Puulmann
d6ec3aedc6
refactor(plugin-server): person state updating (#10321)
* Remove some excessive call signatures

* refactor: move property sanitization outside of .capture

* Move event sanitization into event sanitization logic

* Move person creation/updating logic outside of capture/createSnapshot

* refactor: remove personManager from arguments

* refactor: remove various properties from arguments

* Update `handleIdentifyOrAlias` signature

* refactor: inline timeoutGuard into personStateManager

* refactor: prefix pipeline steps with indexes

* Extract timestamp parsing logic from process-event.ts

* refactor: move timestamp tests over from process-event.ts

* refactor: update process-event.test.ts

* refactor: person-state-manager -> person-state

* Move sanitizeEvent to a more suitable module

* Fix some process-event tests
2022-06-17 09:17:08 +03:00
Karl-Aksel Puulmann
b4fee54222
refactor(plugin-server): extract person creation/handling logic from EventsProcessor (#10271)
* refactor: Start with PersonStateManager

* refactor: move createPerson to new service

* refactor: move team fetching before aliasing

* refactor: move `createPersonIfDistinctIdIsNew`

* refactor: move `updatePersonProperties`

* refactor: move `handleIdentifyOrAlias`

* refactor: `createPerson` to private

* Fix an import

* Remove weird mocking in an e2e integration test
2022-06-14 11:51:58 +03:00
Karl-Aksel Puulmann
c51a2f7bc1
fix(plugin-server): use histogram in metrics properly (#10276) 2022-06-13 16:48:57 +03:00
Karl-Aksel Puulmann
ebfc8251a7
fix(plugin-server): Properly set version in deletePerson (#10207)
* Use correct style for querying postgres

* Add test showing problems with deletePerson logic

* Fix deleting persons from clickhouse

* Fix concurrent tests

* Version + 100

* Fixup FINAL

* Remove console.log
2022-06-13 12:58:44 +03:00
Tiina Turban
569b50b4ec
feat(plugin-server): batching ingestion events to single process per distinct id (#10071) 2022-06-08 19:20:40 +02:00
Yakko Majuri
004ba66349
fix: pass ISO timestamp to onAction/onEvent (#10178)
* fix: pass ISO timestamp to onAction/onEvent

* fix prettier

* fix import

* update timestamps
2022-06-08 11:05:54 +00:00
Karl-Aksel Puulmann
c3c5eaad02
fix(plugin-server): properly unparse event.properties when PLUGIN_SERVER_MODE=async (#10156)
* Handle string properties in plugin-server convertToIngestionEvent

* Update typing

* fix: Add multi-server process event test

This got accidentally yeeted from my previous PR. Shame!

* Improve tests

* Update test to reflect reality
2022-06-07 13:42:07 +03:00
Karl-Aksel Puulmann
59797efce8
refactor(plugin-server): yeet element-group related postgres code (#10161) 2022-06-07 12:23:20 +03:00
Tiina Turban
95c9045cc1
fix: Postgres and CH Person version across other columns to match (#10135) 2022-06-06 17:57:03 +02:00
Tiina Turban
26435cb70d
feat: Add version column to person in CH (#10117) 2022-06-06 13:42:39 +02:00
Michael Matloka
64317238e6
refactor: Eliminate the KAFKA_ENABLED setting (#10059)
* refactor: Eliminate the `KAFKA_ENABLED` setting

* Remove dead code

* Consolidate plugin server test scripts and CI

* Fix CI command

* Remove Celery queues

* Rearrange test directories

* Update import paths
2022-05-30 18:39:33 +00:00