* chore(plugin-server): disrtibute scheduled tasks
Changes I've made here from the original PR:
1. add some logging of task run times
2. add concurrency, except only one task of a plugin will run at a time
3. add a timeout to task run times
This reverts commit 23db43a0dc.
* chore: add timings for scheduled tasks runtime
* chore: add timeouts for scheduled tasks
* chore: clarify duration unit
* chore: deduplicate tasks in a batch, add partition concurrency
* chore: add flag to switch between old and new behaviour
This defaults to new, but can be set to old by setting environment
variable `USE_KAFKA_FOR_SCHEDULED_TASKS=false`
* fix tests
* enable USE_KAFKA_FOR_SCHEDULED_TASKS in tests
* Revert "Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)"
This reverts commit 78e6f48660.
* fix(plugin-server): ignore old cron tasks from graphile-worker
When we are backed up on jobs, we end up still creating tasks in the
graphile-worker job table, i.e. there is no backpressure. This change
makes us skip over old tasks, so that we don't get backed up.
* fix tests
* feat(plugin-server): distribute scheduled tasks i.e. runEveryX
At the moment we only run on which ever Graphile worker node picks up
the scheduled tasks. Tasks are run in sequence, running through each of
the associated pluginConfigIds.
We tried to spread the workload by creating a Graphile Worker job for
each pluginConfigId, but this caused a lot of load on the Graphile
Worker database.
One thing this PR doesn't tackle is what happens if we end up having the
jobs back up. There is probably some logic we should add to avoid really
old scheduled tasks from running.
* wip
* wip
* fix tests
* fix tests
* types
* update unit test
* add key
* fix order
* Update plugin-server/src/main/ingestion-queues/scheduled-tasks-consumer.ts
* chore: skip stale scheduled tasks
* update comments
* add statsd counter
* ci(plugin-server): fix functional tests running forever
Seems that one of the changes I made resulted in the tests running
forever in GitHub Actions.
* make sure pino transport closed in workers
* chore(plugin-server): split functional tests into feature based files
This is intended to make it more obvious what we are testing, and to try
and identify the major themes of the plugin-server functionality.
As a by product it should make things more parallelizable for jest as
the tests in different files will be isolated, runnable in separate
workers.
* use random api token, avoid db constraints
* make tests silent
* format
* chore: set number of jest workers
These tests should be pretty light given they just hit other APIs and
don't do much themselves. Memory could be an issue on constrained
environments. We shall see.
* add support for token field in kafka message
* formPipelineEvent
* rename pipeline files according to new order
* wip team_id and anonymize ips
* conditional handlers and tests
* some plugin server fixes
* fix capture bug
* fix
* more fixes
* fix capture tests
* pipeline update
* fix + investigate database resets
* fix import order
* testing and typing updates
* add test for capture endpoint
* testing
* python typing
* plugin server test
* functional test
* fix test
* another fix
* make sure no team ids clash in tests
* fix
* add more metrics and logs
* cache nulls
* updates
* add more metrics
* ci(plugin-server): make function test output less confusing
Redirect logs to file, output only on test failure.
* kick ci
* put setup in group
* wip
* wip
* wip
Sometimes they look like numbers and get misidentified as such. This
makes them always strings.
Parially resolves https://github.com/PostHog/posthog/issues/12529 except
we do not update old property definitions. I'll do this separately.
* refactor(plugin-server): separate api from functional_tests
This just moves the api helpers to a separate file, such that we can
import from other files.
* test(plugin-server): add functional tests for property definitions
I was going to take a stab at
https://github.com/PostHog/posthog/issues/12529 but I wasn't sure how
the definition bits worked, so thought I'd add some tests first.
This doesn't just add tests but also:
1. starts demonstrating how we can split up the tests into
different files, thereby also allowing jest test isolation.
2. removes --runInBand, such that isolated tests can run in parallel
* ci(plugin-server): fun functional tests in parallel
By running the tests in parallel we should be able to keep test times
pretty low. The only test that is a little awkward is the
`runEveryMinute` 🤔 maybe we can do some something else for this
test.
* run migrations with test = true
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* chore(ingestion): remove old graphile bufferJob handling
This removes the emitting of graphile-worker events from the ingestion
anonymous events path. Note that we still have the graphile worker
running on ingestion, as we need to ensure that we have drained all of
these jobs. I'll handle this by first enabling the topic for all users
on prod then deploying this.
For self hosted I suggest we just go with adding a comment that
anonymous events that have been send to graphile in the meantime will be
lost. Or something else that makes sense.
* fix typing
* remove test
* test(plugin-server): add historical exports functional test
Note that historical exports v1 does not include the elements property.
chore(plugin-server): add tests for historical exports of $autocapture
Triggered by [this
issues](https://github.com/PostHog/snowflake-export-plugin/issues/31)
regarding snowflake not receiving elements.
chore: fix disparity between processEvent on ingest and historically
There appears to be a number if things that are different between the on
ingest and historical processing of events. Includes but not limited to:
1. ip format
2. elements chain handling
* wip
* wip
* wip
* add test for webhook
* wip
* add export for non-$autocapture event
* refactor(plugin-server): remove graphile dep for anonymous events
This removes the graphile dep for delaying events. Instead we just pause
the partition. Note that we also delay the each batch auto commit option
from KafkaJS. This is required as otherwise KafkaJS will simply commit
the offset. Throwing an error also avoids committing, but I think
disabling is probably better.
* remove graphile test
* chore(plugins-server): use Kafka to buffer app jobs requests
To remove the dependency on the Graphile Worker database on things that
may be requesting app job runs we push the jobs to a Kafka topic.
* chore: use KAFKA_JOBS instead of string literal `'jobs'`
* chore: rename startJobsBufferConsumer -> startJobsConsumer
* avoid checking eventId
* fix lint
* fix producer wrapper tests
* fix retries test
* handle offset sync
* wip
* wip
* remove exports
* do better
* use Producer not wrapper
* reset db
* mock once
* Add test for raising to the consumer
* Update plugin-server/tests/main/ingestion-queues/run-async-handlers-event-pipeline.test.ts
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* and in the darkness bind them
* fix tests
* don't forget the name update!
* rename DependencyError to DependencyUnavailable
* separate dlq
* update comment
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
Now we've moved to ingestion/exports/jobs/scheduler split we don't need
these tests anymore. Will keep the server mode but it's likely best
people either use the all in one or the complete split, and reduce the
number of configurations we are expecting to support.
Doesn't try to do any comparison to base yet although that would be
great, but as it stands it offers some useful insights into where we
might be missing coverage.
* refactor(plugin-server): split out plugin server functionality
To get better isolation we want to allow specific functionality to run
in separate pods. We already have the ingestion / async split, but there
are further divides we can make e.g. the cron style scheduler for plugin
server `runEveryMinute` tasks.
* split jobs as well
* Also start Kakfa consumers on processAsyncHandlers
* add status for async
* add runEveryMinute test
* avoid fake timers, just accept slower tests
* make e2e concurrent
* chore: also test ingestion/async split
* increase timeouts
* increase timeouts
* lint
* Add functional tests dir
* fix
* fix
* hack
* hack
* fix
* fix
* fix
* wip
* wip
* wip
* wip
* wip
* fix
* remove concurrency
* remove async-worker mode
* add async-handlers
* wip
* add modes to overrideWithEnv validation
* fix: async-handlers -> exports
* update comment