* cleanup: remove unused team arg from registerLastStep
* cleanup: rename promises to ackPromises to make it more clear thats what they are
* cleanup(plugin-server): make waitForAck explicit/required
* add Kafka produce/ack metrics
* Clarify Kafka produce metric/labels
* chore(plugin-server): use librdkafka producer everywhere
We say some 10x improvements in the throughput for session recordings.
Hopefully there will be more improvements here as well, although it's a
little less clear cut.
I don't try to provide any improvements in guarantees around message
production here.
* we still need to enable snappy for kafkajs
* chore(plugin-server): disrtibute scheduled tasks
Changes I've made here from the original PR:
1. add some logging of task run times
2. add concurrency, except only one task of a plugin will run at a time
3. add a timeout to task run times
This reverts commit 23db43a0dc.
* chore: add timings for scheduled tasks runtime
* chore: add timeouts for scheduled tasks
* chore: clarify duration unit
* chore: deduplicate tasks in a batch, add partition concurrency
* chore: add flag to switch between old and new behaviour
This defaults to new, but can be set to old by setting environment
variable `USE_KAFKA_FOR_SCHEDULED_TASKS=false`
* fix tests
* enable USE_KAFKA_FOR_SCHEDULED_TASKS in tests
* Revert "Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)"
This reverts commit 78e6f48660.
* fix(plugin-server): ignore old cron tasks from graphile-worker
When we are backed up on jobs, we end up still creating tasks in the
graphile-worker job table, i.e. there is no backpressure. This change
makes us skip over old tasks, so that we don't get backed up.
* fix tests
* feat(plugin-server): distribute scheduled tasks i.e. runEveryX
At the moment we only run on which ever Graphile worker node picks up
the scheduled tasks. Tasks are run in sequence, running through each of
the associated pluginConfigIds.
We tried to spread the workload by creating a Graphile Worker job for
each pluginConfigId, but this caused a lot of load on the Graphile
Worker database.
One thing this PR doesn't tackle is what happens if we end up having the
jobs back up. There is probably some logic we should add to avoid really
old scheduled tasks from running.
* wip
* wip
* fix tests
* fix tests
* types
* update unit test
* add key
* fix order
* Update plugin-server/src/main/ingestion-queues/scheduled-tasks-consumer.ts
* chore: skip stale scheduled tasks
* update comments
* add statsd counter
* rename legacy references to queue to more appropriate worker terminology
* rename startJobsConsumer -> startGraphileWorker, no-op refactor
* add back enqueue success and failure metrics
* fix mock import
* fix test for good