0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-30 19:41:46 +01:00
Commit Graph

17 Commits

Author SHA1 Message Date
James Greenhill
2bb97b8efa
Do not shadow Kafka default columns _timestamp and _offset (#1718)
* Do not shadow Kafka default columns _timestamp and _offset

* drop the columns only on the kafka_ tables

* include data types

* fields cleanups

* datetime -> datetime64

* double import datetime
2020-09-25 16:23:48 +01:00
James Greenhill
1ed6263a71
Create Omni-Person model for managing people in Clickhouse (#1712)
* Create Omni-Person model for managing people in Clickhouse

* type fixes

* rebase all the things

* cleanups

* id -> uuid for events in clickhouse

* cleanups and type checks

* Further cleanups and uuid conversions

* kafka fix

* break out serializer across kafka clients

* fix a few bugs w/ datetime types

* basic fix for people kafka table

* fix migration errors (copy pasta errors)

* Use KafkaProducer for Omni Person emitting

* setup mock kafka producer

* undo some work for inserting

* Test TestKafkaProducer

* change if order, obvious mistake

* remove unnecessary function arg

* Fix getters for new column

* Test fixes

* mirror columns across element queries

* firm up handling of timestamps

* only return timestamps for handle_timestamp

* Correct heroku config for Kafka
2020-09-25 11:05:50 +01:00
Marius Andra
dd7e38c5b5
Clickhouse Elements Dedup (based on master) (#1698)
* Use ReplacingMergeTree for elements, remove element_groups and use elements_hash as a virtual "pk"

* remove unused ELEMENT_GROUP_TABLE_SQL

* merge fixes

* use redis cache to avoid writing duplicate elements to clickhouse

* move fakeredis to requirements.txt

* add team_id to cache key

* remove elements_group kafka table references

* add elements_hash to clickhouse element serializer

* fix cache key

* rename few keys

* add test runner to ease pycharm dev

* fix a some mypy errors

* remove typo

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-24 06:47:28 -04:00
Marius Andra
1eeed28751
Fix Master EE code (#1701)
* add test runner to ease pycharm dev

* fix broken import

* drop and recreate the clickhouse test db before running tests

* fix person uuid str json serialization issue

* make kafka optional in tests

* fix inits

* remove need for kafka in person.py

* fix a bunch of mypy errors

* fix function and add process_event to pipeline

* fixed missing params and tests

* change uuid and fix types

* types

* optimize for merge prop test

* make ClickhouseProducer to produce to clickhouse one way or another

* annotate types

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-09-24 06:14:17 -04:00
Michael Matloka
8a629179a9
Organizations – models (#1674)
* Update only models

* Bring in line with master and use uuid1_macless

* Update models and annotation scope support

* Delete test_team_model.py

* Update user creation, team retrieval and fix tests

* Make fixes

* Rename migration

* Fix migrating from master

* Bring back previous company_name max_length

* Use get_price_id()

* Temporarily disable team member deletion

* Update user joining and leaving, and billing

* Improve first_name handling

* Update warning

* Update TestTeamUser

* Fix migration

* Update 0085_org_models.py

* Improve bootstrapping

* Move multitenancy price tests to posthog-production

* Update team_user.py

* Update setup_review.py

* Enhance opt_slash_path

* Update team.py

* Fix default test email

* Fix typing
2020-09-24 00:53:51 +02:00
James Greenhill
97a665d277
Leverage Postgres for Persons and reorder order by's on clickhouse" (#1681)
* Query optimizations

* more sql optimizations

* checkpoint

* fix migration

* add UUID field to person

* use django signals to signal that clickhouse needs to be updated

* cleanup person logic

* cleanups

* update migration

* Don't setup the django signals unless we are for sure using ee setup

* expecting back to 30 queries for capturing with new person

* add .venv to .gitignore

* add env back to .gitignore

Co-authored-by: Ubuntu <ubuntu@ip-172-31-73-18.ec2.internal>
2020-09-22 13:41:08 +01:00
James Greenhill
463ce40ba6
Make get_is_identified more tolerant of missing person (#1675)
* Make get_is_identified more tolerant of missing person

* better test of out of range
2020-09-17 09:26:58 -07:00
James Greenhill
adeb7694cc
Publish events to Kafka for consumption (#1644)
* Publish events to Kafka for consumption

* Commit avro idl's for event schemas

* convert client to use github.com/dpkp/kafka-python

* events loaded into clickhouse from Kafka

* remove cruft

* Publish events to Kafka for consumption

* convert client to use github.com/dpkp/kafka-python

* remove cruft

* include kafka migrations

* bugfixes for migrations

* use constants for consistency

* wrap up local migrations

* small fixes

* tune ups
2020-09-15 20:04:38 -07:00
Marius Andra
50c683e691
Clickhouse process event (#1652)
* generate clickhouse uuid script

* set CLICKHOUSE_SECURE=False by default if running in TEST or DEBUG

* convert person_id to UUID, make adding `person_id` optional, add distinct_ids already in the `create_person` function

* Fix test_process_event_ee.py, remove all calls to Person.objects.*

* add back util

* fix broken imports

* improve process_event test clickhouse queries

* change property parsing

* indentation wrong missing calls

* uuid4 instead of call to CH

Co-authored-by: Eric <eeoneric@gmail.com>
Co-authored-by: James Greenhill <fuziontech@gmail.com>
2020-09-15 12:40:35 -07:00
James Greenhill
70868fc7db
Final ClickHouse module before wiring up to Posthog (#1617) 2020-09-08 21:00:37 -07:00
James Greenhill
f2680a61e3
clickhouse models (#1614) 2020-09-08 18:55:01 -07:00
James Greenhill
dedf5582c1
first chunk of clickhouse framework (#1613)
* first chunk of clickhouse framework

* prod, not dev docker-compose.yml

* add clickhouse sql files
2020-09-08 16:12:27 -07:00
James Greenhill
c2e03a3a46
Revert this to bring back the working copy of ee / clickhouse (#1588) 2020-09-04 21:12:07 -07:00
James Greenhill
8e26357770
Async writes to clickhouse (#1585)
* Initial commit async-ing clickhouse

* Async writes to clickhouse

* deconflict master

* sync events listing

* mypy checks
2020-09-04 17:46:02 -07:00
James Greenhill
5506135c3c
Bring back clickhouse changes along with queue size metrics (#1579)
* Revert "Revert "Clickhouse setup (#1463)""

This reverts commit 7f2cab4b93.

* add queue backlog to _stats endpoint

* celery queue length into heartbeat -> statsd

* reformat

* type check fixes

* bump expected number of queries to 8

Co-authored-by: Eric <eeoneric@gmail.com>
2020-09-04 13:44:53 -07:00
James Greenhill
7f2cab4b93 Revert "Clickhouse setup (#1463)"
This reverts commit a0327587cb.
Time to process events shot way up and logs are missing.
2020-09-03 19:27:02 -07:00
Eric Duong
a0327587cb
Clickhouse setup (#1463)
* initial

* migration command

* migrations working

* add modelless views for clickhouse

* initial testing structure

* use test factory

* scaffold for all tests

* add insight and person api

* add basic readme

* add client

* change how migrations are run

* add base tables

* ingesting events

* restore delay

* remove print

* updated testing flow

* changed sessions tests

* update tests

* reorganized sql

* parametrize strings

* element list query

* change to seralizer

* add values endpoint

* retrieve with filter

* pruned code to prepare for staged merge

* working ingestion again

* tests for ee

* undo unneeded tests right now

* fix linting

* more typing errors

* fix tests

* add clickhouse image to workflow

* move to right job

* remove django_clickhouse

* return database url

* run super

* remove keepdb

* reordered calls

* fix type

* fractional seconds

* fix type error

* add checks

* remove retention sql

* fix tests

* add property storage and tests

* merge master

* fix tests

* fix tests

* .

* remove keepdb

* format python files

* update CI env vars

* Override defaults and insecure tests

* Update how ClickHouse database gets evaluated

* remove bootstrapping clickhouse database routine

* Don't initialize the clickhouse connection unless we say it's primary

* .

* fixed id generation

* remove dump

* black settings

* empty client

* add param

* move docker-compose for ch to ee dir

* Add _public_ key to repo for verifying self signed cert on server

* update ee compose file for ee dir

* fix a few issues with tls in migrations

* update migrations to be flexible about storage profile and engine

* black settings

* add elements prop tables

Co-authored-by: James Greenhill <jams@uber.com>
2020-09-03 10:27:45 -07:00