* feat(k8s): add kafka connection status to health check endpoint
This change adds a kafka check to the existing health check that uses
the underlying kafka python libs `bootstrap_connected` to check that we
are connected to kafka.
To accommodate the extra check in the response, I have updated to return
a JSON response with a lookup of `"check_name"` to it's status. We
return if any of the checks return `False`.
Something I didn't do was allow for checking each check in isolation,
e.g. we could expose for instance just the kafka_connected check at
`/_health/kafka`
* sort imports
* Remove unused requests arg, maybe useful later but not now.
* Add readyz and livez endpoint
* Add some docs
* link to k8s
* be specific about postgres
* some tests don't need postgres
* kubernetes -> process orchestration system
* update how we check kafka connection
* remove return
* Migration to add version to person_distinct_id
* Update plugin-server type
* Use queueMessages instead of for loops
* Update distinct id versions in postgres
* Add commented out new query
* Add person_distinct_id2 table setup/migration
This will be used for more efficient person_distinct_id queries
* Avoid sharding person_distinct_id2 on cloud
* Write to new distinct ids topic
* Attempt to use version in tests
* Tests attempt 2
* Fixup version - dont send with all messages
* Flush kafka more frequently
* Actually fix tests
* Add another await
* Add partition to person_distinct_id2 table
* Add a comment to keep topics in sync
* Clean up code relating to table engines
* Add snapshots for table creation queries
* Remove optional import
* Add snapshot tests for CLICKHOUSE_REPLICATION schemas
Note that these are out of sync with cloud in most cases
* Add another warning comment
* Improve naming
Without this, to run plugin-server tests you need to reset all
containers every time since otherwise both test- and non-test clickhouse
would attempt to read from the same topic.
* Add table for group_type_mapping
* Remove materialized columns from events table schema
These are not used and not needed w/ new mat columns work
* WIP: Migration to add group analytics columns
* Remove event table changes temporarily
* events dead letter queue CH table
* format
* update schemas
* also store raw payload
* better naming
* make table name more clear
* wip better testing
* remove unused imports
* remove kafka test
* prevent non null test from running on CH migrations
* add kafka testing
* minor tests cleanup
* test naive longer sleep
* make test end-to-end
* address review
* update ttl, format
* refactor delay func, address review
* Test Kafka
* black format python
* fix imports
* add kafka and zk deps for testing
* Include ZK and Kafka for all tests
* fix signature for kafka helper
* Connect to localhost for kafka
* update kafka host for all test runs
* Wrong env var for kafka
* consolidate env vars for github actions
* set the advertised hostname from the broker to localhost
* add env var to docker-compose for kafka broker advert host
* resort to what we do locally with /etc/hosts
* Remove configs for kafka that won't be used
* Enable PLUGIN_SERVER_INGESTION_HANDOFF = get_bool_from_env("PLUGIN_SERVER_INGESTION_HANDOFF
* Don't set PLUGIN_SERVER_INGESTION_HANDOFF in worker
* Add comments
* Remove _HANDOFF from PLUGIN_SERVER_INGESTION
* add stats counter for plugin server handoff, so we can verify events out and events in
* add whitelisted posthog and kea organizations
* disable ingestion this round --> first let's just check the plugin server can talk to kafka & clickhouse before sending real events to it
* enable ingestion in docker-compose.ch.yml
* eliminate bad merge
* async action event matching when using postgres plugin server ingestion (#3182)
* fix org
* remove _HANDOFF from topic
* add plugin_ to plugin server ingestion topic
* update plugin server to 0.7.0
Co-authored-by: Marius Andra <marius.andra@gmail.com>
* Add setting for handing off process_event_ee to plugin server
* Add StatsD settings to KEYS
* bin/plugin-server → start-plugin-server & docker-plugin-server
* Simplify to only add docker-plugin-server
* Bring back original comment
* Turn down verbosity of plugin server install
* Remove redundant if
* Fix comment
* Remove lone newline
* Roll back unsafe script changes
* Simplify dockerized plugins
* Add some depends_on
* Clarify HAND_OFF_INGESTION env var
* Use posthog-plugin-server 1.0.0-alpha.1
* Enhance bin/plugin-server and rm bin/docker-plugin-server
* Move around PLUGIN_SERVER_INGESTION_HANDOFF ifs
* Use posthog-plugin-server@1.0.0-alpha.2
* Support kafka+ssl:// in plugin-server
* Produce to topic events_ingestion_handoff for plugin server
* Use posthog-plugin-server@1.0.0-alpha.3
* Don't import Kafka topics in FOSS
* Use @posthog/plugin-server
* Update yarn.lock
* Add commands for external ClickHouse setup/teardown
* Actually delete test CH teardown command
* ClickhouseTestRunner.setup_test_environment() in setup_test_clickhouse
* Rework test setup script to work with Postgres too
* Restore master plugins dir for merge
* Unset PLUGIN_SERVER_INGESTION_HANDOFF in docker-compose.ch.yml
* Fix unimportant typo
* Build log_event data dict only once
* Make it clear in bin/plugin-server help that it's bin
* Space space
* Add relevant settings to KEYS in bin/plugins-server
* Log all EE events to events_handoff Kafka topic for plugin server
* Clean up settings
* Fix FOSS
* Don't introduce KAFKA_EVENTS_HANDOFF
* Add cosmetic newline
* Add DEBUG WAL print()
* add worker to the ecs config and deploy
* for testing
* pull from this branch for testing
* chain config renders
* split out events pipe
* Set is_heroku true because of heroku kafka
* update /e/ service to run on port 8001
* add 8001 to the container definition as well
* simplify
* test migrating w/ ecs task using aws cli
* split services
* typo in task def
* remove networkConfiguration from task definition
* duplicate
* task-def-web specific
* update events service name
* Handle base64 encoded kafka certs
* if it's empty then try to set it for env vars
* fix b64 decode call
* cleanups
* enable base64 encoding of keys for kafka
* depend on kafka-helper for deps
* reformat
* sort imports
* type fixes
* it's late, I can't type. typos.
* use get_bool_from_env
* remove debug bits. Trigger on master/main
* prettier my yaml
* add notes about ref in GA
* up cpu and memory
* Protobuf all the things
* oops
* Protobufize events to protect from malformed JSON
* format the generated files (will need to remember this for future)
* format
* clean up kafka produce serializer
* fixes
* Add scheduled task to wipe session recordings
* Create a new table for session recording
* Save snapshot events to different table
* Use SessionRecordingEvent over Events everywhere
We can remove a ton of cruft this way as well
* Add missing signature
* Extract util from models/event
* Attempt to update ingest side of clickhouse session recording events
Note that it's using main kafka topic - not sure if a good idea.
* Get separate table in ch working for session recording events
* WIP: query sessions
* Make both session recording queries work
* Make linter happy
* Rebase migration
* Make tests work
* Apply a TTL to session recordings and other configuration:
- toYYYYMMDD partitioning should be smoother with TTL setup
- TTL achieves not needing to archive the data ourselves
- index_granularity will enable smaller reads per session_id
- ORDER BY clause is to make single session as well as time range query
reasonable
* Convert retention cronjob to new model
* Add tests to process_event changes
* Add test for ee_capture change
* Fixup migration
* Make clickhouse tests drop/create session recording tables
* Make TTL not be there in tests
Otherwise writes get eaten by it during tests when mocking time
* Fix retention task
Co-authored-by: Tim Glaser <tim@glsr.nl>