0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-12-01 04:12:23 +01:00
Commit Graph

28 Commits

Author SHA1 Message Date
Karl-Aksel Puulmann
78a787bc3b
Create and populate person_distinct_id2 table, add versioning to person_distinct_id (#7576)
* Migration to add version to person_distinct_id

* Update plugin-server type

* Use queueMessages instead of for loops

* Update distinct id versions in postgres

* Add commented out new query

* Add person_distinct_id2 table setup/migration

This will be used for more efficient person_distinct_id queries

* Avoid sharding person_distinct_id2 on cloud

* Write to new distinct ids topic

* Attempt to use version in tests

* Tests attempt 2

* Fixup version - dont send with all messages

* Flush kafka more frequently

* Actually fix tests

* Add another await

* Add partition to person_distinct_id2 table
2021-12-08 16:47:57 +02:00
Karl-Aksel Puulmann
ba405c823c
Add snapshot tests for clickhouse table schemas (#7572)
* Add a comment to keep topics in sync

* Clean up code relating to table engines

* Add snapshots for table creation queries

* Remove optional import

* Add snapshot tests for CLICKHOUSE_REPLICATION schemas

Note that these are out of sync with cloud in most cases

* Add another warning comment

* Improve naming
2021-12-08 16:07:34 +02:00
Karl-Aksel Puulmann
55bf75a40a
Make KAFKA_EVENTS_PLUGIN_INGESTION_TOPIC configurable (#7349)
This is for razorpay - they're planning to hook into this with spark
2021-11-25 11:47:20 +01:00
Karl-Aksel Puulmann
8ac9c590ec
Proposal: Use unique topic names for kafka in test (#6746)
Without this, to run plugin-server tests you need to reset all
containers every time since otherwise both test- and non-test clickhouse
would attempt to read from the same topic.
2021-11-01 11:42:56 +02:00
James Greenhill
0937f0b0df
Add retries to kafka producer to mitigate event loss (#6638) 2021-10-25 22:40:32 -07:00
Karl-Aksel Puulmann
cef2af5e4c
Group analytics: Initial schema (#6462)
* Add table for group_type_mapping

* Remove materialized columns from events table schema

These are not used and not needed w/ new mat columns work

* WIP: Migration to add group analytics columns

* Remove event table changes temporarily
2021-10-25 15:05:58 +03:00
Yakko Majuri
7b50b0e35f
Events dead letter queue CH table (#6193)
* events dead letter queue CH table

* format

* update schemas

* also store raw payload

* better naming

* make table name more clear

* wip better testing

* remove unused imports

* remove kafka test

* prevent non null test from running on CH migrations

* add kafka testing

* minor tests cleanup

* test naive longer sleep

* make test end-to-end

* address review

* update ttl, format

* refactor delay func, address review
2021-10-07 08:30:13 +00:00
James Greenhill
d5fb987d53
Create Kafka consumer and write tests for consumer and producer (#6170)
* Test Kafka

* black format python

* fix imports

* add kafka and zk deps for testing

* Include ZK and Kafka for all tests

* fix signature for kafka helper

* Connect to localhost for kafka

* update kafka host for all test runs

* Wrong env var for kafka

* consolidate env vars for github actions

* set the advertised hostname from the broker to localhost

* add env var to docker-compose for kafka broker advert host

* resort to what we do locally with /etc/hosts

* Remove configs for kafka that won't be used
2021-10-01 09:43:50 +01:00
James Greenhill
22b574e50c
Don't provide key to kafka for now (broken partitioning) (#6127) 2021-09-27 19:22:54 +01:00
James Greenhill
ca2ae2a8ad
Default to None if there is no key, encode only if there is data (#6122) 2021-09-27 15:46:37 +01:00
James Greenhill
c2f7ba0d08
Test if key for kafka is none and set to empty string (#6121) 2021-09-27 14:56:49 +01:00
Yakko Majuri
dbd31b91ba
fix kafka partition key (#6117) 2021-09-27 14:35:34 +01:00
James Greenhill
2ae020d6ff
Partition events_plugin_ingestion by IP (#6091)
* Partition  by IP

* use correct version of black...

* fix kafka test

* picky tests

* use value vs data for test kafka
2021-09-27 14:08:00 +01:00
Michael Matloka
e96f95ef5a
Plugin log entries (#3482)
* Add Postgres model PluginLogEntry

* Add equivalent PluginLogEntry to Kafka+ClickHouse

* Add migration

* Add PluginLogEntry.Type.LOG & make PluginLogEntry.message a TextField

* Update 0130_pluginlogentry.py

* Add PluginLogEntry.instance_id

* Update migration

* Update migration

* Add plugin log entries API

* Test plugin log entries DB fetching

* Add PluginLogs component prototype

* Fix API

* Improve PluginLogs component

* Remove almost unused plugin Feedback button

* Update migration

* Fixed typing

* Fix org permission error test asserts

* Fix plugin log entry tests

* Fix CH plugin log entry timestamp string

* Update CH test_plugin_log_entry.py

* Fix plugin log entry tests across PG/CH

* Satisfy mypy

* Add search and limit to plugin log entry API

* Send team_id in plugin config API

* Rework plugin logs UI

* Add plugin config team ID in tests

* Add plugin config team ID in tests actually

* Fix code quality

* Make logs plugin config-based

* Fix CH queries

* Fix typing

* Improve UX and fix things

* Polish plugin logs logic

* Update migration

* Add Celery task to delete old plugin logs

* Fix UX bug with loading more plugin logs

* Fix missing import

* Remove OrganizationMemberPermissions message change

* Make mypy happy

* Add PluginLogEntry.is_system

* Optimize CH plugin_log_entires PARTITION/ORDER

* Increment migration

* Adjust plugin logs drawer display

* Fix plugin_log_factory_ch

* Fix plugin_log_factory_ch fix

* Replace PluginLogEntry.is_system with source

* Adjust PluginLogEntrySerializer

* Update CH fetch_plugin_log_entries

* Make kea-typegen happy
2021-05-06 10:54:32 +03:00
James Greenhill
1849223296
Remove logging to WAL, no longer used and duplicate of events_plugin_ingestion (#4132)
* Remove logging to WAL, no longer used and duplicate of events_plugin_ingestion

* Simplify log_event

Co-authored-by: Michael Matloka <dev@twixes.com>
2021-04-27 19:15:56 +00:00
Michael Matloka
1f3145128c
Enable PLUGIN_SERVER_INGESTION (#3107)
* Enable PLUGIN_SERVER_INGESTION_HANDOFF = get_bool_from_env("PLUGIN_SERVER_INGESTION_HANDOFF

* Don't set PLUGIN_SERVER_INGESTION_HANDOFF in worker

* Add comments

* Remove _HANDOFF from PLUGIN_SERVER_INGESTION

* add stats counter for plugin server handoff, so we can verify events out and events in

* add whitelisted posthog and kea organizations

* disable ingestion this round --> first let's just check the plugin server can talk to kafka & clickhouse before sending real events to it

* enable ingestion in docker-compose.ch.yml

* eliminate bad merge

* async action event matching when using postgres plugin server ingestion (#3182)

* fix org

* remove _HANDOFF from topic

* add plugin_ to plugin server ingestion topic

* update plugin server to 0.7.0

Co-authored-by: Marius Andra <marius.andra@gmail.com>
2021-02-04 16:17:24 +01:00
Michael Matloka
eaa169100a
Add handing off event ingestion to plugin server (#2898)
* Add setting for handing off process_event_ee to plugin server

* Add StatsD settings to KEYS

* bin/plugin-server → start-plugin-server & docker-plugin-server

* Simplify to only add docker-plugin-server

* Bring back original comment

* Turn down verbosity of plugin server install

* Remove redundant if

* Fix comment

* Remove lone newline

* Roll back unsafe script changes

* Simplify dockerized plugins

* Add some depends_on

* Clarify HAND_OFF_INGESTION env var

* Use posthog-plugin-server 1.0.0-alpha.1

* Enhance bin/plugin-server and rm bin/docker-plugin-server

* Move around PLUGIN_SERVER_INGESTION_HANDOFF ifs

* Use posthog-plugin-server@1.0.0-alpha.2

* Support kafka+ssl:// in plugin-server

* Produce to topic events_ingestion_handoff for plugin server

* Use posthog-plugin-server@1.0.0-alpha.3

* Don't import Kafka topics in FOSS

* Use @posthog/plugin-server

* Update yarn.lock

* Add commands for external ClickHouse setup/teardown

* Actually delete test CH teardown command

* ClickhouseTestRunner.setup_test_environment() in setup_test_clickhouse

* Rework test setup script to work with Postgres too

* Restore master plugins dir for merge

* Unset PLUGIN_SERVER_INGESTION_HANDOFF in docker-compose.ch.yml

* Fix unimportant typo

* Build log_event data dict only once

* Make it clear in bin/plugin-server help that it's bin

* Space space
2021-01-21 15:39:44 +01:00
Michael Matloka
7ba9f7de09
Plugin server ingestion base (#2732)
* Add relevant settings to KEYS in bin/plugins-server

* Log all EE events to events_handoff Kafka topic for plugin server

* Clean up settings

* Fix FOSS

* Don't introduce KAFKA_EVENTS_HANDOFF

* Add cosmetic newline

* Add DEBUG WAL print()
2020-12-14 16:05:18 +01:00
James Greenhill
ed6eb5e796
Setup ecs configs for web, worker, migration tasks and services (#2458)
* add worker to the ecs config and deploy

* for testing

* pull from this branch for testing

* chain config renders

* split out events pipe

* Set is_heroku true because of heroku kafka

* update /e/ service to run on port 8001

* add 8001 to the container definition as well

* simplify

* test migrating w/ ecs task using aws cli

* split services

* typo in task def

* remove networkConfiguration from task definition

* duplicate

* task-def-web specific

* update events service name

* Handle base64 encoded kafka certs

* if it's empty then try to set it for env vars

* fix b64 decode call

* cleanups

* enable base64 encoding of keys for kafka

* depend on kafka-helper for deps

* reformat

* sort imports

* type fixes

* it's late, I can't type. typos.

* use get_bool_from_env

* remove debug bits. Trigger on master/main

* prettier my yaml

* add notes about ref in GA

* up cpu and memory
2020-12-03 15:51:37 -08:00
James Greenhill
39081364e6
Watch person and person_distinct_id tables for lag (#2360)
* Watch person and person_distinct_id tables for lag

* record row counts as well

* add session_recording_events as well

* gofmt
2020-11-12 19:09:40 -08:00
Paolo D'Amico
066721e3c1
Stability & dev experience improvements (#2152) 2020-11-02 14:55:20 +00:00
James Greenhill
b64673ca4e
wire up the length to the proto message (#2089)
* wire up the length to the proto message

* we are so deep into the proto weeds we are using proto private methods
2020-10-28 17:41:13 -07:00
James Greenhill
601696456f
Start with a new topic (#2088) 2020-10-28 17:12:58 -07:00
James Greenhill
01099a5ffd
Provide required proto message length for our clickhouse overlords (#2087) 2020-10-28 16:48:05 -07:00
James Greenhill
83b5273113
Protobufize events to protect from malformed JSON (#2085)
* Protobuf all the things

* oops

* Protobufize events to protect from malformed JSON

* format the generated files (will need to remember this for future)

* format

* clean up kafka produce serializer

* fixes
2020-10-28 15:18:52 -07:00
Karl-Aksel Puulmann
e3bf0cb31d
Session recording on clickhouse, separate tables and retention cronjob (#2051)
* Add scheduled task to wipe session recordings

* Create a new table for session recording

* Save snapshot events to different table

* Use SessionRecordingEvent over Events everywhere

We can remove a ton of cruft this way as well

* Add missing signature

* Extract util from models/event

* Attempt to update ingest side of clickhouse session recording events

Note that it's using main kafka topic - not sure if a good idea.

* Get separate table in ch working for session recording events

* WIP: query sessions

* Make both session recording queries work

* Make linter happy

* Rebase migration

* Make tests work

* Apply a TTL to session recordings and other configuration:

- toYYYYMMDD partitioning should be smoother with TTL setup
- TTL achieves not needing to archive the data ourselves
- index_granularity will enable smaller reads per session_id
- ORDER BY clause is to make single session as well as time range query
  reasonable

* Convert retention cronjob to new model

* Add tests to process_event changes

* Add test for ee_capture change

* Fixup migration

* Make clickhouse tests drop/create session recording tables

* Make TTL not be there in tests

Otherwise writes get eaten by it during tests when mocking time

* Fix retention task

Co-authored-by: Tim Glaser <tim@glsr.nl>
2020-10-28 21:22:16 +01:00
James Greenhill
7ab30a836c
Remove Omni-Person logic for ee (#1972)
* Remove Omni-Person logic for ee

* remove more omni person references
2020-10-21 14:06:45 -07:00
James Greenhill
b74d06a96a
Create a write ahead log for cloud event processing (#1962)
* Create a write ahead log for cloud event processing

* mypy fix

* if we are on app (ee) don't log to postgres

* don't disable writing to postgres
2020-10-21 20:35:07 +02:00