* cleanup: remove unused team arg from registerLastStep
* cleanup: rename promises to ackPromises to make it more clear thats what they are
* cleanup(plugin-server): make waitForAck explicit/required
* add Kafka produce/ack metrics
* Clarify Kafka produce metric/labels
There is a race condition in these tests where the consumer isn't
consuming in time to pick up bad messages, so we ensure that we set the
offsets to the earliest messages.
* test(plugin-server): use librdkafka for functional tests
While trying to port the session recordings to use node-librdkafka I
found it useful to first implement it in the functional tests.
* use obj destructuring to make calls more self explanatory
* chore(plugin-server): Add metrics for time of last processed message
Previously we have been alerting on Kafka consumer group offset lag.
However, really we care about the delay between messages being written
to Kafka and being processed by the plugin server.
By adding the last processed timestamp, as a gauge, we can then alert on
if that time and now is greater than a threshold.
This alert would not require the plugin-server to be up to trigger, just
that there be some time registered so it handles complete failure also.
For the case that there are no messages past the committed offsets, we
will end up triggering the alert if we do not also take into
consideration the production rate into the topic.
* wip
* wip
* fix imports order
* fix group id
* Add and use waitForExpect instead
* remove yarn.lock
* move comment
* wip
* chore(plugin-server): disrtibute scheduled tasks
Changes I've made here from the original PR:
1. add some logging of task run times
2. add concurrency, except only one task of a plugin will run at a time
3. add a timeout to task run times
This reverts commit 23db43a0dc.
* chore: add timings for scheduled tasks runtime
* chore: add timeouts for scheduled tasks
* chore: clarify duration unit
* chore: deduplicate tasks in a batch, add partition concurrency
* chore: add flag to switch between old and new behaviour
This defaults to new, but can be set to old by setting environment
variable `USE_KAFKA_FOR_SCHEDULED_TASKS=false`
* fix tests
* enable USE_KAFKA_FOR_SCHEDULED_TASKS in tests
* Revert "Revert "feat(plugin-server): distribute scheduled tasks i.e. runEveryX" (#13087)"
This reverts commit 78e6f48660.
* fix(plugin-server): ignore old cron tasks from graphile-worker
When we are backed up on jobs, we end up still creating tasks in the
graphile-worker job table, i.e. there is no backpressure. This change
makes us skip over old tasks, so that we don't get backed up.
* fix tests
* feat(plugin-server): distribute scheduled tasks i.e. runEveryX
At the moment we only run on which ever Graphile worker node picks up
the scheduled tasks. Tasks are run in sequence, running through each of
the associated pluginConfigIds.
We tried to spread the workload by creating a Graphile Worker job for
each pluginConfigId, but this caused a lot of load on the Graphile
Worker database.
One thing this PR doesn't tackle is what happens if we end up having the
jobs back up. There is probably some logic we should add to avoid really
old scheduled tasks from running.
* wip
* wip
* fix tests
* fix tests
* types
* update unit test
* add key
* fix order
* Update plugin-server/src/main/ingestion-queues/scheduled-tasks-consumer.ts
* chore: skip stale scheduled tasks
* update comments
* add statsd counter