* chore: fix plugin server tests for acitonMatcher
I broke these but there was an issue with CI so they were merges broken
but now maybe they are fixed?
* more fixes
* wip
* wip
* wip
* chore(webhooks): remove abstractions from webhook consumer logic
Previously we were jumping through a few hoops to make webhook calls
e.g. still using the piscina abstraction, still using the runner code.
This commit removes those abstractions while still maintaining the
existing functionality wrt error handling and metrics gathering.
I'll leave further refactoring of the webhook consumer code to a
separate PR. For example, moving the statsd metrics to be based on
OpenMetrics instead. And further adding some tracing around key parts of
the webhook matching and firing logic.
* fix typing
* fix typing
* fix typing
* fix unit tests
* fix tests
* chore(plugin-server): simplify action manager deps
Previously we were passing in the kitchen sync, but the only
dependency is postgres. This should make it easier to e.g. refactor to
not need to load the kitchen sync on some deployments.
* chore(plugin-server): simplify hook commander deps
Previously we passed in DB which is a lot of stuff. Now we just pass in
the postgres pool.
* fix import
* chore(plugin-server): simplify action matcher deps
Specifically this only depends on postgres, so passing in the `DB`
object is unnecessary. This should make refactors easier to e.g. only
load the required dependencies when they are needed.
* pass only postgres to ActionMatcher
* chore(exports): try calling heartbeat a bit more often
Looks like we end up rebalancing often. Possibly because we're not
sending the heartbeats in time and the session timing out.
* wip
* fix tests
* migration to add the new column
* and populate it
* fix
* Update query snapshots
* test assertions
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
We collect console logs in session recordings but you can't filter to find recordings that have them.
Worse, when we move to storing recordings in blob storage it would be impossible to filter for them.
Changes
adds new columns to the session replay summary table that will let us add counts of levels log, warn, and error from console logs collected with recordings
alters the recordings-ingestion consumer to count those three levels of log
* Use same logic for `[person]` webhook token as person display in the app
* Allow accessing nested properties in webhook message
* Update hooks.test.ts
* Fix team fetching test
* Update query snapshots
* Update query snapshots
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
* Revert "perf(ingestion): use rdkafka consumer for both ingestion queues (#15695)"
This reverts commit fea9e4d77c.
* format
* fix split test
* no really, fix the test
---------
Co-authored-by: Harry Waye <harry@posthog.com>
* less pending chunk logging
* the flush threshold multiplies is too confusing to operate
* must provide all chunk offsets and the correct chunks when processing pending chunks
* feat: key blobs by event times
* remove TODO
* softly softlly
* Update plugin-server/src/main/ingestion-queues/session-recording/blob-ingester/session-manager.ts
Co-authored-by: Ben White <ben@posthog.com>
---------
Co-authored-by: Ben White <ben@posthog.com>
* chore(ingestion): dont wait on logs to be persisted to Kafka
With how bad concurrency is atm re. batching, we end up waiting a long
time on logs to be persisted to Kafka. We don't need to guarantee logs
so instead we let these be async. We still want to await for the message
to have been handled by librdkafka queuing internals though so we still
await but ask that we resolve as soon as we've handed off the message to
librdkafka.
---------
Co-authored-by: Harry Waye <harry@posthog.com>
* fix: from logs to changes
* Update plugin-server/src/main/ingestion-queues/session-recording/blob-ingester/session-manager.ts
* use the key from the map instead of calculating it
* 🙈
* chore: worrying at why the blob ingester gets unhappy
* log when file deletion succeeds
* can you wait for e2e ci step without a specified build name
* wait on build posthog cloud?
* disable the step for now
* rugh
This _should_ give us better performance and reliability, but it's
hard to tell without a lot of testing. Will monitor closely on rollout.
Note that this will require a delete on the old consumer members as they
are using round eager robin partition strategy, whereas this is using
the cooperative sticky partition strategy. librdkafka doesn't support
mixing the two, unlike the Java Kafka Client.
---------
Co-authored-by: Tiina Turban <tiina303@gmail.com>
Problem
see #15200 (comment)
When we store session recording events we materialize a lot of information using the snapshot data column.
We'll soon not be storing the snapshot data so won't be able to use that to materialize that information, so we need to capture it earlier in the pipeline. Since this is only used for searching for/summarizing recordings we don't need to store every event.
Changes
We'll push a summary event to a new kafka topic during ingestion. ClickHouse can ingest from that topic into an aggregating merge tree table. So that we store (in theory, although not in practice) only one row per session.
add config to turn this on and off by team in plugin server
add code behind that to write session recording summary events to a new topic in kafka
add ClickHouse tables to ingest and aggregate those summary events
* chore(plugin-server): use librdkafka producer everywhere
We say some 10x improvements in the throughput for session recordings.
Hopefully there will be more improvements here as well, although it's a
little less clear cut.
I don't try to provide any improvements in guarantees around message
production here.
* we still need to enable snappy for kafkajs
* fix: Fixed the logic for assigning and revoking partitions
* Fix
* reverse the offset manager revoke logic
---------
Co-authored-by: Paul D'Ambra <paul@posthog.com>
* chore(plugin-server): remove piscina workers
Using Piscina workers introduces complexity that would rather be
avoided. It does offer the ability to scale work across multiple CPUs,
but we can achieve this via starting multiple processes instead. It may
also provide some protection from deadlocking the worker process, which
I believe Piscina will handle by killing worker processes and
respawning, but we have K8s liveness checks that will also handle this.
This should simplify 1. prom metrics exporting, and 2. using
node-rdkafka.
* remove piscina from package.json
* use createWorker
* wip
* wip
* wip
* wip
* fix export test
* wip
* wip
* fix server stop tests
* wip
* mock process.exit everywhere
* fix health server tests
* Remove collectMetrics
* wip
* chore(recordings): use cooperative-sticky rebalance strategy
This should make rebalances and lag during deploys a little less
painful. I'm setting this as the globally used strategy, when we e.g.
want to use another strategy for a specific consumer group, we can make
this configurable.
* disable rebalance_callback
* use node-rdkafka-acosom fork instead, for cooperative support
* chore(recordings): Add librdkafka to recordings consumer
This is the sister PR to the change to use the librdkafka producer in
the recordings consumer.
Things of interest here:
1. we use offset auto commit
2. we handle storing the offset ourselves, after the message has been
processed
3. we do everything concurrently
4. we implement back pressure based on the number of messages in the
flight
* Update plugin-server/src/kafka/admin.ts
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* Update plugin-server/src/kafka/admin.ts
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* Update plugin-server/src/kafka/admin.ts
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* Update plugin-server/src/kafka/consumer.ts
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* Update plugin-server/src/kafka/consumer.ts
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* add default queued values
* clarify linger
---------
Co-authored-by: Xavier Vello <xavier.vello@gmail.com>
* Revert "Revert "perf(recordings): use node-librdkafka for ingester production" (#15069)"
This reverts commit ac5e084f48.
* fix(plugin-server): only set ssl config when defined
Hopefully this means it will use the global CA bundle.
* hack: enable debug logs
* honor KAFKAJS_LOG_LEVEL envvar
* add SegfaultHandler
* disable ssl verification
* debug -> info
* only log brokers
* Revert "add SegfaultHandler"
This reverts commit b22f40b802.
---------
Co-authored-by: Xavier Vello <xavier@posthog.com>
Previously we've been using the KafkaJS Producer with a wrapper around
it to handle batching. There are a number of issues with the batching
implementation e.g. not having a way to provide guarantees on delivery
and rather than fix that, we can simply use the librdkafka Producer
which is a lot more mature and battle-tested.
* chore: fix flakey session recordings error case test
We need to make sure that we hit the case where the events are produced
for ClickHouse ingestion. The signature of the `queueMessage` function
is interesting in that it's behaviour depends on some heiuristics as to
if it will flush and therefore reject or not.
I would like to change this behavior but my preference would be to
update to use rdkafka and update to have more sensible behaviour then.
* Add call expects
* chore(kafka): ensure retry on kafkajs produce failure
This is a fix to ensure that we do not simply drop events when Kafka is
e.g. down. We were previously catching the KafkaJSError but it seems the
errors are always run using the `retry` function, which means we always
get a KafkaJSNumberOfRetriesExceeded error.
* wip
* use real timers
* chore(recordings): remove hub dependency on recordings ingestion
Hub is a grab bag of depencencies that are not all required for
recordings ingestion. To keep the recordings ingestion lean, we
remove the hub dependency and use the postgres and kafka client
directly.
This should increase the availability of the session recordings
workload, e.g. it should not go down it Redis or ClickHouse is down.
* fix capabilities call
* reuse clients if available
* wip
* wip
* wip
* fix tests
* fix healthcheck
* team-manager: expire negative lookups after 5 minutes, improve docs
* populateTeamDataStep: don't drop token, keep team_id from capture if present, report results
* ingestEvent: run all analytic events through runLightweightCaptureEndpointEventPipeline
* continue accepting events with no token but a team_id
* fix(slowlane-ingestion): Clarify in the warning we are still processing
* docs(slowlane-ingestion): Clarify we are re-producing when running with ingestionOverflow enabled
* refactor(slowlane-ingestion): Set key to null during batching to process in parallel
* refactor(slowlane-ingestion): Simplify batching logic and send warning on eachMessage
* fix: Add missing whitespace in comment
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* test(slowlane-ingestion): Assert event pipeline doesn't run if overflowing
* fix: Check for batch length bigger than batchSize
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* refactor(slowlane-ingestion): Raise warning on overflow consumer instead
* test(slowlane-ingestion): Add tests for overflow consumer
* refactor(slowlane-ingestion): Use groupIntoBatches utility in overflow consumer
---------
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* feat(ingestion-slowlane): Add token-bucket utility
* feat(ingestion-slowlane): Re-route overflow events
* fix: Import missing stringToBoolean
* fix(ingestion-slowlane): Flip around kafka topics according to mode
* refactor(ingestion-slowlane): Use dash instead of underscore in filename
* fix(ingestion-slowlane): Do not increase tokens beyond bucket capacity
* feat(ingestion-slowlane): Add ingestion-overflow mode/capability/consumer
* feat(ingestion-slowlane): Add ingestion warning for capacity overflow
* test(ingestion-slowlane): Add test for ingestion of overflow events
* fix(ingestion-slowlane): Rate limit warnings to 1 per hour
* test(ingestion-slowlane): Add a couple more tests for overflow re-route
* fix(slowlane-ingestion): Look at batch topic to determine message topic
* refactor(slowlane-ingestion): Use refactored consumer model
* fix(slowlane-ingestion): Undo topic requirement in eachMessageIngestion
* refactor(slowlane-ingestion): Only produce events if ingestionOverflow is also enabled
* refactor(slowlane-ingestion): Use an env variable to determine if ingestionOverflow is enabled
* chore(slowlane-ingestion): Add a comment explaining env variable
* refactor(ingestion): pull out topic/groupid from kafka-queue
We have `IngestionConsumer` at the moment that holds a lot of complexity
in it regarding topics/groupid/message handlers. This is a step towards
moving that logic out of the `IngestionConsumer`, and making the top
level of the pluginsServer simpler to reason about.
* wip
* wip
* wip
* wip
This is intended to make the pipeline a little more readable by moving
the control flow out of the steps and into the runner. It also makes
it easier to add new steps to the pipeline.
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* refactor(recordings): remove session code from event pipeline
We have moved session recrodings to a separate topic and consumer. There
may be session recordings in the old topic, but we divert these to the
new logic for processing them.
* refactor to just send to the new topic!
* fix import
* remove empty line
* fix no team_id test
* implement recordings opt in
* remove old $snapshot unit tests
* remove performance tests
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* Update plugin-server/functional_tests/session-recordings.test.ts
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* add back $snapshot format test
* Add comment re functional test assumptions
Co-authored-by: Tiina Turban <tiina303@gmail.com>
* migration for person/group property support in property definitions table
* Use database default
* Validate correct constraint
* Ingest person and group type property definitions
* Exclude person/group type definitions from API
* Update property definitions test
* Ignore $groups
* Add new unique index which accounts for type and group_type_index
* Run new code only in test
* Ignore errors from propertyDefinitionsManager which may occur due to migrations
* Update constraint name
* Update test describe
* ON CONFLICT based on the index expression
* Add a -- not-null-ignore
* Combine migrations
* Remove some test code temporarily
* fixup latest_migrations
* chore(session-recordings): separate topics for events as recordings
WIP
* fix tests
* Use simpler consumer for session recordings
* wip
* still batch things by batchSize
* add tests, improve comments
* rename topic var
* push performance_events to session recordings topic also
* Add completely separate consumer for session-recordings
* wip
* use session_id for partition key
* fix test
* handle team_id/token null
* wip
* fix tests
* wip
* use kafka_topic var in logs
* use logger
* fix test
* Fix $performance_event topic usage
* fix tests
* fix check for null/undefined
* Update posthog/api/capture.py
Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
* Add test for kafka error handling
* Remove falsy teamId check
* fix statsd error
* kick ci
* Use existing getTeamByToken
* remove partition key from recordings
* Make sure producer is connected !
* fix session id kafka key test
* add back throws!
* set producer on each test
* skip flaky test
* add flush error logs
* wait for persons to be ingested
* fix skip
Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
* feat(person-on-events): add option to delay all events
This change implements the option outlined in
https://github.com/PostHog/product-internal/pull/405
Here I do not try to do any large structural changes to the code, I'll
leave that for later although it does mean the code has a few loose
couplings between pipeline steps that probably should be strongly
coupled. I've tried to comment these to try to make it clear about the
couplings.
I've also added a workflow to run the functional tests against both
configurations, which we can remove once we're happy with the new
implementation.
Things of note:
1. We can't enable this for all users yet, not without the live events
view and not without verifying that the buffer size is sufficiently
large. We can however enable this for the test team and verify that
it functions as expected.
2. I have not handled the case mentioned in the above PR regarding
guarding against processing the delayed events before all events in
the delay window have been processed.
wip
test(person-on-events): add currently failing test for person on events
This test doesn't work with the previous behaviour of the
person-on-events implementation, but should pass with the new delay all
events behaviour.
* add test for KafkaJSError behaviour
* add comment re delay
* add test for create_alias
* chore: increase exports timeout
It seems to fail in CI, but only for the delayed events enabled tests.
I'm not sure why, but I'm guessing it's because the events are further
delayed by the new implementation.
* chore: fix test
* add test for ordering of person properties
* use ubuntu-latest-8-cores runner
* add tests for plugin processEvent
* chore: ensure plugin processEvent isn't run multiple times
* expand on person properties ordering test
* wip
* wip
* add additional test
* change fullyProcessEvent to onlyUpdatePersonIdAssociations
* update test
* add test to ensure person properties do not propagate backwards in time
* simplicfy person property tests
* weaken guarantee in test
* chore: make sure we don't update properties on the first parse
We should only be updating person_id and asociated distinct_ids on first
parse.
* add tests for dropping events
* increase export timeout
* increase historical exports timeout
* increase default waitForExpect interval to 1 second
* chore(plugin-server): disrtibute scheduled tasks
Changes I've made here from the original PR:
1. add some logging of task run times
2. add concurrency, except only one task of a plugin will run at a time
3. add a timeout to task run times
This reverts commit 23db43a0dc.
* chore: add timings for scheduled tasks runtime
* chore: add timeouts for scheduled tasks
* chore: clarify duration unit
* chore: deduplicate tasks in a batch, add partition concurrency
* chore: add flag to switch between old and new behaviour
This defaults to new, but can be set to old by setting environment
variable `USE_KAFKA_FOR_SCHEDULED_TASKS=false`
* fix tests
* enable USE_KAFKA_FOR_SCHEDULED_TASKS in tests
* Revert "Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)"
This reverts commit 78e6f48660.
* fix(plugin-server): ignore old cron tasks from graphile-worker
When we are backed up on jobs, we end up still creating tasks in the
graphile-worker job table, i.e. there is no backpressure. This change
makes us skip over old tasks, so that we don't get backed up.
* fix tests
* feat(plugin-server): distribute scheduled tasks i.e. runEveryX
At the moment we only run on which ever Graphile worker node picks up
the scheduled tasks. Tasks are run in sequence, running through each of
the associated pluginConfigIds.
We tried to spread the workload by creating a Graphile Worker job for
each pluginConfigId, but this caused a lot of load on the Graphile
Worker database.
One thing this PR doesn't tackle is what happens if we end up having the
jobs back up. There is probably some logic we should add to avoid really
old scheduled tasks from running.
* wip
* wip
* fix tests
* fix tests
* types
* update unit test
* add key
* fix order
* Update plugin-server/src/main/ingestion-queues/scheduled-tasks-consumer.ts
* chore: skip stale scheduled tasks
* update comments
* add statsd counter
* add support for token field in kafka message
* formPipelineEvent
* rename pipeline files according to new order
* wip team_id and anonymize ips
* conditional handlers and tests
* some plugin server fixes
* fix capture bug
* fix
* more fixes
* fix capture tests
* pipeline update
* fix + investigate database resets
* fix import order
* testing and typing updates
* add test for capture endpoint
* testing
* python typing
* plugin server test
* functional test
* fix test
* another fix
* make sure no team ids clash in tests
* fix
* add more metrics and logs
* cache nulls
* updates
* add more metrics
* chore(ingestion): remove graphile as dependency of ingestion pipeline
This allows us to run just the ingestion part of the plugin-server
without needing to perform any graphile operations e.g. creating
connections to the graphile database.
This has the advantage that:
1. if the graphile database is down, the ingestion pods can still start
up and will function correctly.
2. avoids creating a connection pool to the graphile database for each
ingestion pod, which could be a lot of connections and could cause
the database to scale.
3. avoids running the graphile migrations on each ingestion pod, which
is unnecessary and could cause unnecessary database load.
* wip
* wip
* wip
* wip
* fix(person-on-events): Fix groups caching in ingestion
We were seeing some groups-related events never get ingested in
playground. Digging in, it turned out that these events were serialized
with invalid timestamps due to cache containing dates in different
formats.
The bug was introduced in https://github.com/PostHog/posthog/pull/12403
and makes for a good case study for this common class of errors
There were multiple practices that could have indicated the error sooner:
1. Tests for the feature mocked out the DB and used a different data
format than what is used properly
2. Some methods related to caching were not properly updated to test the
caching logic
3. timestamps-as-strings: we deal with both ISO and clickhouse-format
timestamps, and the code didn't differentiate between them properly
4. `getGroupsColumns` signature was very loose, allowing for everything
to pass by
This change fixes the issue as well as updates relevant code to be more
in-line with best practices.
* Solve minor typing related issue
* feat(ingestion): buffer events in kafka if postgres is down
* also add DependencyUnavailableError to transaction
* Update plugin-server/src/utils/db/db.ts
* chore(ingestion): remove old graphile bufferJob handling
This removes the emitting of graphile-worker events from the ingestion
anonymous events path. Note that we still have the graphile worker
running on ingestion, as we need to ensure that we have drained all of
these jobs. I'll handle this by first enabling the topic for all users
on prod then deploying this.
For self hosted I suggest we just go with adding a comment that
anonymous events that have been send to graphile in the meantime will be
lost. Or something else that makes sense.
* fix typing
* remove test
* chore(plugins-server): use Kafka to buffer app jobs requests
To remove the dependency on the Graphile Worker database on things that
may be requesting app job runs we push the jobs to a Kafka topic.
* chore: use KAFKA_JOBS instead of string literal `'jobs'`
* chore: rename startJobsBufferConsumer -> startJobsConsumer
* avoid checking eventId
* fix lint
* fix producer wrapper tests
* fix retries test
* handle offset sync
* wip
* wip
* remove exports
* do better
* use Producer not wrapper
* reset db
* mock once
* Add test for raising to the consumer
* Update plugin-server/tests/main/ingestion-queues/run-async-handlers-event-pipeline.test.ts
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* and in the darkness bind them
* fix tests
* don't forget the name update!
* rename DependencyError to DependencyUnavailable
* separate dlq
* update comment
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* refactor(plugin-server): simplify groups caching
* add multi groups test
* remove comments
* fix type, add debug
* fix
* stringify
* add groups created_at to types
* more test fixes
* use the right clickhouse timestampo format
* update created at to ch format in tests
* finally
* more fixes
* rename legacy references to queue to more appropriate worker terminology
* rename startJobsConsumer -> startGraphileWorker, no-op refactor
* add back enqueue success and failure metrics
* fix mock import
* fix test for good
* fix(plugin-server): Remove wild clickhouseQuery in ingestion pipeline
Point queries against clickhouse are slow and we should avoid them.
They're also not instrumented.
The postgres table already used in the method previously contains the
right data. Use that instead.
Reference: https://github.com/PostHog/posthog/blob/master/posthog/models/cohort/cohort.py#L274-L316
* Fixup and test doesPersonBelongToCohort
* Handle NULLs
* chore(plugin-server): remove healthcheck topic references
Rather than doing an end to end produce/consume from this topic, we
instead rely on the intrumentation of KafkaJS to understand if the
consumer is ready.
Note that this code is not being used since the change to just return an
HTTP 200 from the liveness endpoint:
https://github.com/PostHog/posthog/pull/11234
This is just a cleanup of dead code.
* Remove Kafka healthcheck tests
* fix issues with fetchPerson() and add tests
- fetchPerson() returned extra columns that were not needed
* Add LazyPersonContainer class
* Load person data lazily through the event pipeline
* Make webhooks and action matching lazy
* Update runAsyncHandlersStep
* Return own person properties in process-event.ts
* Remove snapshots that caused pain
* Handle serialization of LazyPersonContainer
* Merge: Handle LHS only existing
.get() would be cached in that case not to do a query, which we can
avoid
* Serialize result args as well
* Make personContainer functional
* Resolve feedback
* Include kafka topic for setup
* Sample runEventPipeline/runBufferEventPipeline less frequently comparatively
This is done by duration - we still want the long transactions, but not
the short ones
* Trace enqueue plugin jobs
* Trace node-fetch
* Trace worker creation
* Various fixes
* Line up query tags properly
* Make fetch mocking work
* Resolve typing-related issues
* feat(plugin-server): Use Snappy compression codec for kafka production
This helps avoid 'message too large' type errors (see
https://github.com/PostHog/posthog/pull/10968) by compressing in-flight
messages.
I would have preferred to use zstd, but the libraries did not compile
cleanly on my machine.
* Update tests
* chore(plugin-server): include extra information on kafka producer errors
We're failing to send batches of messages to kafka on a semi-regular
basis due to message sizes. It's unclear why this is the case as we try
to limit each message batch size.
This PR adds information on these failed batches to sentry error
messages.
Example error: https://sentry.io/organizations/posthog2/issues/3291755686/?project=6423401&query=is%3Aunresolved+level%3Aerror
* refactor(plugin-server): Remove Buffer.from from kafka messages
This allows us to be much more accurate estimating message sizes,
hopefully eliminating a class of errors
* estimateMessageSize
* Track histogram with message sizes
* Flush immediately for too large messages
* fud
* chore(plugin-server): Consume from buffer topic
* Refactor `posthog` extension for buffering
* Properly form `bufferEvent` and don't throw error
* Add E2E test
* Test buffer more end-to-end and properly
* Put buffer-enabled test in a separate file
* Update each-batch.test.ts
* Test that the event goes through the buffer topic
* Fix formatting
* Refactor out `spyOnKafka()`
* Ensure reliability batching-wise
* Send heartbeats every so often
* Make test less flaky
* Commit offsets if necessary before sleep too
* Update tests
* Use seek-based mechanism (with KafkaJS 2.0.2)
* Add comment to clarify seeking
* Update each-batch.test.ts
* Make minor improvements
* Remove onAction
* Avoid fetching actions that dont deal with REST - 99% reduction
* Plural hooks
* Avoid hook fetching where not needed
* Remove dead code
* Update lazy VM test
* Rename a function
* Update README
* Explicit reload actions in tests
* Only reload actions which are relevant for plugin server
* Remove excessive logging
* Reload actions when hooks are updated
* update action matching tests
* Remove commented code
* Solve naming issues
* WIP: Move person creation earlier
* WIP: move person updating, handle person property changing
* WIP: leverage person information
* Update `updatePersonDeprecated` signature
* Avoid (and test avoiding) unneeded lookups whether 'creating' person is needed
Note there were two tricky interactions within handleIdentify, which
again got solved by indirect message passing.
* Solve TODO
* Normalize event before updatePersonIfTouchedByPlugins
* Avoid another lookup for person in updatePersonProperties
* Avoid lookup for newPerson in handleIdentifyOrAlias
* Add kludge comments
* Fix runBufferEventPipeline
* Rename upsertPersonsStep => processPersonsStep
* Update emitToBufferStep tests
* Update some event pipeline step tests
* Update prepareEventStep tests
* Test processPersonStep
* Add tests for updatePersonIfTouchedByPlugins step
* Update runner tests
* verify person vesrion in event-pipeline-integration test
* Update process-event test suite
* Argument ordering for person state tests
* Update runner test snapshots
* Cast to UTC
* Fixup person-state tests
* Dont refetch persons needlessly on $identify
* Add missing version assertion
* Cast everything to UTC
* Remove version assertion
* Undo radical change to event pipeline - will re-add it later!
* Resolve comments
* Return person in PreIngestionEvent if possible
* Avoid unneccessarily fetching person in emitToBufferStep
* Avoid unneccessarily fetching person in createEvent
* Use unified type instead of separate type for cached data
* Pass person info forward explicitly in each event-pipeline step
* minor typing improvement
* Remove person from type
* Remove unneeded `undefined`
* Add person check for prepareEventStep test
* Fix hook test
* Update getPersonData tests
* Cast created_at to UTC
* Cast created_at to utc on fetch
* Remove personUuid var - unneeded
* Add unit tests for process-event.ts#createEvent
* refactor: Start with PersonStateManager
* refactor: move createPerson to new service
* refactor: move team fetching before aliasing
* refactor: move `createPersonIfDistinctIdIsNew`
* refactor: move `updatePersonProperties`
* refactor: move `handleIdentifyOrAlias`
* refactor: `createPerson` to private
* Fix an import
* Remove weird mocking in an e2e integration test
* Use correct style for querying postgres
* Add test showing problems with deletePerson logic
* Fix deleting persons from clickhouse
* Fix concurrent tests
* Version + 100
* Fixup FINAL
* Remove console.log
* Handle string properties in plugin-server convertToIngestionEvent
* Update typing
* fix: Add multi-server process event test
This got accidentally yeeted from my previous PR. Shame!
* Improve tests
* Update test to reflect reality
* refactor: Eliminate the `KAFKA_ENABLED` setting
* Remove dead code
* Consolidate plugin server test scripts and CI
* Fix CI command
* Remove Celery queues
* Rearrange test directories
* Update import paths