* chore(plugin-server): remove piscina workers
Using Piscina workers introduces complexity that would rather be
avoided. It does offer the ability to scale work across multiple CPUs,
but we can achieve this via starting multiple processes instead. It may
also provide some protection from deadlocking the worker process, which
I believe Piscina will handle by killing worker processes and
respawning, but we have K8s liveness checks that will also handle this.
This should simplify 1. prom metrics exporting, and 2. using
node-rdkafka.
* remove piscina from package.json
* use createWorker
* wip
* wip
* wip
* wip
* fix export test
* wip
* wip
* fix server stop tests
* wip
* mock process.exit everywhere
* fix health server tests
* Remove collectMetrics
* wip
* refactor(ingestion): pull out topic/groupid from kafka-queue
We have `IngestionConsumer` at the moment that holds a lot of complexity
in it regarding topics/groupid/message handlers. This is a step towards
moving that logic out of the `IngestionConsumer`, and making the top
level of the pluginsServer simpler to reason about.
* wip
* wip
* wip
* wip
* chore(ingestion): remove graphile as dependency of ingestion pipeline
This allows us to run just the ingestion part of the plugin-server
without needing to perform any graphile operations e.g. creating
connections to the graphile database.
This has the advantage that:
1. if the graphile database is down, the ingestion pods can still start
up and will function correctly.
2. avoids creating a connection pool to the graphile database for each
ingestion pod, which could be a lot of connections and could cause
the database to scale.
3. avoids running the graphile migrations on each ingestion pod, which
is unnecessary and could cause unnecessary database load.
* wip
* wip
* wip
* wip
* chore(ingestion): remove old graphile bufferJob handling
This removes the emitting of graphile-worker events from the ingestion
anonymous events path. Note that we still have the graphile worker
running on ingestion, as we need to ensure that we have drained all of
these jobs. I'll handle this by first enabling the topic for all users
on prod then deploying this.
For self hosted I suggest we just go with adding a comment that
anonymous events that have been send to graphile in the meantime will be
lost. Or something else that makes sense.
* fix typing
* remove test
* rename legacy references to queue to more appropriate worker terminology
* rename startJobsConsumer -> startGraphileWorker, no-op refactor
* add back enqueue success and failure metrics
* fix mock import
* fix test for good
* refactor: Eliminate the `KAFKA_ENABLED` setting
* Remove dead code
* Consolidate plugin server test scripts and CI
* Fix CI command
* Remove Celery queues
* Rearrange test directories
* Update import paths