- catch almost all errors: They are sent to Sentry anyways. There is no benefit at this point to make the Celery task fail.
- consolidate query status `complete` flag to include complete results as well as errors
- this makes it easier to use, no more separate checking for errors
- make results live longer: if the chain takes some time to complete, previously results expired/vanished while a dashboard was polling for them
- this fixes POSTHOG-1BC6
- other refactorings
* Upgrade dependencies
* Fix middleware error
Among
https://docs.djangoproject.com/en/4.2/releases/4.0/#features-removed-in-4-0
* Upgrade psycopg
We need to be on >= 3.1.8
Locally there is an additional problem that somehow psycopg2
seemingly overshadows psycopg, making it appear that 3.1 works.
Had to install pip install "psycopg[binary,pool]==3.1.2" to
recreate the problem.
* Go to Django 4.1 because of problems with psycopg3
We use custom SQL that somehow doesn't get formatted in the right way
using server or client side cursors.
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Switch TaggedItem tests to assert ValidationError
Because full_clean validates since Django 4.1, see
https://docs.djangoproject.com/en/4.2/releases/4.1/#validation-of-constraints
* Remove type: ignore comments
Come up as
error: Unused "type: ignore" comment
* Update query snapshots
* Figure out psycopg problem and try Django 4.2 again
* Update query snapshots
* Fix other IN errors
* Fix getting status
* Fix psycopg3 issues
* Fix psycopg issues
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Update deps
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Fix more tests
* Adjust baseline
* Remove sqlcommenter (should be PostgresQL only anyways)
* Fix file
* Update query snapshots
* Update query snapshots
* Update query snapshots
* Fix queries
* Fix query
* Revert
* Update requirements.in
* Remove restore-virtualenv
Because not maintained anymore
* Revert "Remove restore-virtualenv"
This reverts commit c2a7ef8a1e.
* mypy
* Adjust num queries
* Adjust num queries
* Adjust num queries
* Update query snapshots
* Add to updated_fields
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Neil Kakkar <neilkakkar@gmail.com>
* feat(batch-exports): Add HTTP Batch Export destination
To possibly be reused in the future, but for now it only submits
payloads in the PostHog /batch format.
* add geoip_disable, don't toJSONString elements_chain, and mark some HTTP status codes as non-retryable
* Add heartbeating to HTTP batch export
* Update query snapshots
* Update query snapshots
* fix: Re-use client session
* refactor: Rename last_uploaded_part_timestamp to last_uploaded_timestamp
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tomás Farías Santana <tomas@tomasfarias.dev>
* Upgrade pydantic and all related
* Upgrade mypy
* Add mypy-baseline
To update baseline when you fix something (only then!) use:
[mypy cmd] | mypy-baseline sync
* just code for migrations
* use all timezones because update would cause common_timezones to be less inclusive
* install new dependencies
* add comment
* revert
* restore
* fix type
* chore(batch-exports): add snowflake export workflow
This workflow uses Snowflake internal stages to load data from
ClickHouse into a Snowflake table. We maintain the existing events table
schema as used in the existing Snowflake App.
Something I haven't done yet is:
1. made sure e.g. we get the `elements` and `person_set` etc. data into
Snowflake.
2. the additional frontend to enable configuring the Snowflake
connection.
* remove unsed var
* include excluded events in test
* feat(batch_exports): add backend API and S3 temporal workflow
This adds the backend API for batch exports, which will handle reverse
ETL exports to e.g. S3, Snowflake etc.
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
* feat: Add Temporal to the dev and hobby stacks
* disable elastic for hobby because of resources
* checkpoint
* update requirements
* worker is up, but without the sandbox
* ensure temporal does not depend on elastic
* Feedbacked
* pip-compile dev
* mypy fixes
* add a bit of colorful logging
* add django temporal worker to the mix
* checkpoint for dev-full docker
* Working on docker-full, but checkpointing for now
* add migration bits for full
* chore(recordings): add command to generate session recording events
The intention of this is to be able to generate somewhat realistic
session recording events to test the ingestion pipeline performance,
such that we don't need to rely on pushing to production.
This command just outputs for stdout a single JSON encoded line per
event, which can then e.g. be piped into kafkacat or
kafka-console-producer, or put to a file to then be used by vegeta to
perform load testing on the capture endpoint.
* Python 3.10
Performance gains go brrr
* Add missing SAML deps
* Add missing dep to dockerfile
* Update mypy to 0.981 for 3.10.7 compatibility
Needed this bug to be fixed: https://github.com/python/mypy/issues/13627
This also incidentally fixed the mypy bug in csv_exporter.py
* bump to 3.10.10