0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-28 18:26:15 +01:00
posthog/Dockerfile

20 lines
403 B
Docker
Raw Normal View History

FROM node:14 AS builder
WORKDIR /code/
2021-01-22 13:22:56 +01:00
2021-02-22 20:24:05 +01:00
COPY package.json yarn.lock .eslintrc.js .prettierrc tsconfig.json tsconfig.eslint.json ./
Port Python process_event_ee over to plugin server (#34) * Integrate fastify-postgres and fastify-kafka * Explicitly add node-rdkafka for its types * Port UUIDT from Python * Consume events from Kafka * Update yarn.lock * Port handle_timestamp * Enhance startFastifyInstance with config * Prettier * Handle dates with Luxon for similarity to Python's datetime * Fix UUIDT stringification * Port _capture_ee * Create EventsProcessor with plugins server access * Rework ingestion to separate from Fastify, port over more Django * Update utils.test.ts * Remove fastify-kafka and fastify-postgres * Add protobuf * Add types * Update yarn.lock * Improve info * Update server.ts * Port element functions * Use Kafka producer stream and make overall optimizations * Add session recording * Add plugin processing * Fix castTimestampOrNow * Update yarn.lock * Improve typing and consume event messages in batches of 100 * Improve code clarity * Add timing with StatsD * Update style * Format * Merge and alias * Are last stuff * Reimplement Protobuf with protos compiled to JS+TS * Fix things * Fix UUID#toString implementation * Update UUID number handling * Unify * Prettier * Unify utils * Add newline * Prettier but correctly * Fix types * Improve types * Bump version to 0.5.0 * Update worker.test.ts * Update types.ts * Update yarn.lock * Update compile:protobuf with eslint --fix * Fix typing * Don't bump version * Fix style * Use @posthog/node-rdkafka for Worker thread support * Improve logging * Unify Redis and Kafka approach to queuing events * Fix consuming from Kafka * Make some optimizations * Don't introduce KAFKA_EVENTS_HANDOFF * Consume 1 kiloevent / 50 ms = 20 kiloevents / s * Don't start web server by default We don't need it at all yet. * Improve Redis logging * Fix connecting to dockerized Redis No idea why this fixes the issue, but it does. * Update yarn.lock * Fix merge * Clear Kafka consumption interval on graceful exit * Improve logging and error handling * Smooth out ingestion errors * Fix StatsD prefixing * Move @types/luxon to devDependencies * Use Kafka.js Producer instead of node-rdkafka * Use EventProto#encodeDelimited instead of #encode * Start UUIDT series from 0 instead of 1 * Use event UUID from Django server * Make some fixes and improvements * Remove console.logs * Use plugin-scaffold 0.2.7 * Fix style camelCase FTW * Simplify compile:typescript * await startQueue * Clean up castTimestampOrNow * Don't use pesky ||= * Change consumer group to 'group1' from main repo * Revert "await startQueue" This reverts commit 1ca29ba2c6db22dceef4fb02dac5136846d4dfa1. * Exit if queue failed to start * Clean up elements handling * Don't commit compiled protos * Rename consumer group to clickhouse-ingestion * Consume from topic events_ingestion_handoff * ee dev script * Backport set_once and event name sanitization * Add direct ClickHouse support base * Upgrade plugin-scaffold to 0.2.8 * Add clickhouse to server object * Support ClickHouse persons and person distinct IDs * Update README with ClickHouse settings * Fix sanitizeEventName * Await queue start * Prettier README * Try ClickHouse CI skeleton * Update yarn.lock * Possibly fix CI * Revert basic tests job to old name * Cache pip install * Fix ClickHouse teardown and add Yarn installation * Await startQueue in tests * Try existing test in new workflow * Don't cache pip requirements * Add Redis to CH CI * Make some env var changes * Try test_default as Django test DB name * Try test_posthog as Django test DB name * Don't tear down ClickHouse * Make env vars common to all tests-clickhouse steps * Debug Postgres * Debug Postgres actually * Revert "Make env vars common to all tests-clickhouse steps" This reverts commit 07bfd2917fc18b6674649fdd196432ed5c8e6a37. * Try fixing DB name discrepancy * Make some env vars common * Debug Postgres better * Remove Postgres debug code * Fix inserting * Fix posthog_team.app_urls * Catch insertRow errors * Add plugin.organization_id in base tests * DELETE FROM posthog_organization * Rework CI to use Django everywhere * Reorder tests for readability * Fix missing KAFKA_HOSTS in CI * Optimize createPosthog * Start separating Postgres- and ClickHouse-based suites * Fix regexes possibly * Fix regexes actually possibly * Improve fatal error handling * Fix type * Fix import * Debug Kafka queue crash * Debug consumer * Fix organizationId * Make some more suites Postgres-specific * Check out master branch of main repo of Django server in CI * Start schedule before queue * Clean up logging * Update Dockerfile * Refactor Postgres processing to DB class * Use more of DB methods * Quit Redis * Update test database name * Debug consumer * Don't mock KafkaJS! * Add new Kafka testing system incomplete * Clean up new testing system * Make prettier compatible with ESLint * Add compatibility with Plugin.latest_tag in test * Fix minor issues * Try not terminating program on Kafka consumer crash * Change KAFKA_ADVERTISED_HOST_NAME to localhost * Address feedback * Separate Postgres and DB classes * Consume from kafka_events topic in KafkaObserver * Fix process-event test file name * Check for UUID in handed off events * Return from EventsProcessor.processEvent * use `db.postgresQuery` instead of vague `db.query` * Run the plugins suite universally * Fix closeServer * javaScriptVariableCase * Capture handleTimestamp error with Sentry * Hand off message in test * Fix plugins suite imports * Run the vm suite universally * more javaScriptVariableCase * disconnect without waiting for pending replies (.diconnect() instead of .quit()) * dot quit seems to work better? * Add sanitizeSqlIdentifier * Fix updating person properties * Fix problems * Don't use status in piscina.js * Revert "Run the vm suite universally" This reverts commit 966056fe8b1479c85cdcd0494df5ce58e50f103a. * Fix sanitizeSqlIdentifier test * Use waitForProcessedMessages * Optimize updatePersonProperties * Rework DummyPostHog to capturing internal server events efficiently * Fix nested test files not being ran * Run prettier over Markdown * Fix mock paths * Fix issues * Fix some * Don't truncate Kafka tables * Start kafkaObserver inside test * Support Team.is_demo * Increase process-event tests timeout to 60 s * Cache Python location in ClickHouse CI * Debug KafkaObserver * Cache pip in all CI jobs * Add bash script for running tests locally * Try out a different way of watching Kafka messages * Simplify DummyPostHog back again * Use posthog-js-lite in EventsProcessor * Sanitize SQL identifiers more strictly Remove all characters in identifiers that are neither letter, digit or underscore. * Don't resolve KafkaObserver start before connection * Add ee.tasks.webhooks_ee.post_event_to_webhook_ee * Expect Client to have been called twice * Adjust resetTestDatabase for Organization.personalization * Update Cliient call asserts * Postgres ingestion + process event tests (#116) * start with the postgres event ingestion process event tests * get first test to work * remove siteUrl * pass partial event to test that it still works and retain parity with the django test * another test * refactor * add more tests * opt out of posthog in test mode * add first alias test * always use UTC times when talking to postgres * prevent a crash * bit of clarity to help debug * fix bug with table name * fix bug with passing object instead of id * add some alias tests (all green now) * save merged properties * few more tests * more missing tests * fix test * team event properties test * fix bug * another test (partial) * clarify postgres magic * different timestamp format for creating event in clickhouse & postgresql * make element tests fail * capture first team event test * insert session recording events * generate element hashes * create elements and element groups * "key in object" only works with objects, not arrays * test an extra thing * add missing awaits that caused things to be done out of order * few extra tests * another test * await for things to happen * fix to work with latest master * client is called twice - it's initialized for sending celery tasks on webhooks as well * check webhook celery client queue * split into postgres & shared process event test * add query counter * clickhouse process event tests v0.1 * fix vm test * fix "Your test suite must contain at least one test." error for shared tests * also run non-ingestion tests under test:postgres * Clean up utils.ts * Clean up TimestampFormat * Put get* type Postgres functions in DB class Co-authored-by: Michael Matloka <dev@twixes.com> * postgres e2e ingestion test (#120) * Kafka e2e tests (#121) * kafka e2e test * kafka e2e test * kafka host from env * get some ch process event tests working * more working clickhouse tests * wait a bit longer * add chainToElements, fix elementsToString bug * remove quotes from inside sanitizeSqlIdentifier to also work with clickhouse * split dev setup command * fetch elements from clickhouse * more elements * bugfix * refactor and reuse wait function * ingest kafka events with the right structure * remove leftover * fix clickhouse timestamp * simplify process event creation and call the methods directly * fix uuid test * refactor delayed event fetching to support session recording events * catch bad uuids * wait for session recording events in test * use right timestamp for session recording events * use the same database as posthog (the app) * use local db * deserialize clickhouse session recording events * split dev scripts * try to make tests work by specifying db * increase kafka log level * cleanup * pass idl protos to clickhouse in github actions * start the clickhouse container in another step * let's try like this * WIP * WIP * sudo * also alias zookeeper * export zookeeper * use docker-compose.ch.yml * detached * element group test * create tests * debug test * remove some redundancy * reduce some noise * try to make topics * compatible with posthog migration 0122 * hide error * try to close e2e open handles * reuse kafkaProducer on server * Add DB.clickhouseQuery * Put isUUIDFormat on the UUID class Co-authored-by: Michael Matloka <dev@twixes.com> * Update README.md with PLUGIN_SERVER_INGESTION * rename CLICKHOUSE_USERNAME --> CLICKHOUSE_USER for consistency with the rest of the app * Delete run-tests-locally.sh * Resolve all kafka offsets (#124) * resolve all kafka offsets * remove a few lines * clean up * sort events according to offsets * add sort info * simplify offset line * add comment * Type pluginEvents as PluginEvent[] * Commit last offset of batch manually Co-authored-by: Michael Matloka <dev@twixes.com> * Add KAFKA_INCOMING_TOPIC and clean up unused code (#125) * ingest to clickhouse only if PLUGIN_SERVER_INGESTION is enabled * KAFKA_INCOMING_TOPIC * test clickhouse connection * remove some unused code * Replace _INCOMING_ with _CONSUMPTION_ * Add KAFKA_CONSUMPTION_TOPIC to README.md * add // Co-authored-by: Michael Matloka <dev@twixes.com> * Postgres parity tests (#126) * create postgres parity tests, fix some bugs * create all topics * person and distinct id parity tests * add TODOs * fetch distinct ids from clickhouse * create a specialised function for moving distinct ids and fix postgres/clickhouse person_id difference (number vs string) * test for updating distinct ids * createPerson, updatePerson and deletePerson * add a debug line to help debug flaky github action * remove falsehood * Remove "_handoff" from Kafka topic (#127) * remove _HANDOFF from topic * add plugin_ to plugin server ingestion topic * User merge test (#129) * merge test and postgres query simplification * postgres fix * remove "_HANDOFF" * small comment * Switch ClickHouse driver (#128) * Use driver @posthog/clickhouse instead of clickhouse * Update CH querying * Fix clickhouseQuery usage * Don't quote 64-bit ints in JSON from CH Co-authored-by: Marius Andra <marius.andra@gmail.com>
2021-02-04 14:59:46 +01:00
COPY src/idl/ src/idl/
2021-01-22 13:22:56 +01:00
RUN yarn install --frozen-lockfile
COPY ./ ./
Port Python process_event_ee over to plugin server (#34) * Integrate fastify-postgres and fastify-kafka * Explicitly add node-rdkafka for its types * Port UUIDT from Python * Consume events from Kafka * Update yarn.lock * Port handle_timestamp * Enhance startFastifyInstance with config * Prettier * Handle dates with Luxon for similarity to Python's datetime * Fix UUIDT stringification * Port _capture_ee * Create EventsProcessor with plugins server access * Rework ingestion to separate from Fastify, port over more Django * Update utils.test.ts * Remove fastify-kafka and fastify-postgres * Add protobuf * Add types * Update yarn.lock * Improve info * Update server.ts * Port element functions * Use Kafka producer stream and make overall optimizations * Add session recording * Add plugin processing * Fix castTimestampOrNow * Update yarn.lock * Improve typing and consume event messages in batches of 100 * Improve code clarity * Add timing with StatsD * Update style * Format * Merge and alias * Are last stuff * Reimplement Protobuf with protos compiled to JS+TS * Fix things * Fix UUID#toString implementation * Update UUID number handling * Unify * Prettier * Unify utils * Add newline * Prettier but correctly * Fix types * Improve types * Bump version to 0.5.0 * Update worker.test.ts * Update types.ts * Update yarn.lock * Update compile:protobuf with eslint --fix * Fix typing * Don't bump version * Fix style * Use @posthog/node-rdkafka for Worker thread support * Improve logging * Unify Redis and Kafka approach to queuing events * Fix consuming from Kafka * Make some optimizations * Don't introduce KAFKA_EVENTS_HANDOFF * Consume 1 kiloevent / 50 ms = 20 kiloevents / s * Don't start web server by default We don't need it at all yet. * Improve Redis logging * Fix connecting to dockerized Redis No idea why this fixes the issue, but it does. * Update yarn.lock * Fix merge * Clear Kafka consumption interval on graceful exit * Improve logging and error handling * Smooth out ingestion errors * Fix StatsD prefixing * Move @types/luxon to devDependencies * Use Kafka.js Producer instead of node-rdkafka * Use EventProto#encodeDelimited instead of #encode * Start UUIDT series from 0 instead of 1 * Use event UUID from Django server * Make some fixes and improvements * Remove console.logs * Use plugin-scaffold 0.2.7 * Fix style camelCase FTW * Simplify compile:typescript * await startQueue * Clean up castTimestampOrNow * Don't use pesky ||= * Change consumer group to 'group1' from main repo * Revert "await startQueue" This reverts commit 1ca29ba2c6db22dceef4fb02dac5136846d4dfa1. * Exit if queue failed to start * Clean up elements handling * Don't commit compiled protos * Rename consumer group to clickhouse-ingestion * Consume from topic events_ingestion_handoff * ee dev script * Backport set_once and event name sanitization * Add direct ClickHouse support base * Upgrade plugin-scaffold to 0.2.8 * Add clickhouse to server object * Support ClickHouse persons and person distinct IDs * Update README with ClickHouse settings * Fix sanitizeEventName * Await queue start * Prettier README * Try ClickHouse CI skeleton * Update yarn.lock * Possibly fix CI * Revert basic tests job to old name * Cache pip install * Fix ClickHouse teardown and add Yarn installation * Await startQueue in tests * Try existing test in new workflow * Don't cache pip requirements * Add Redis to CH CI * Make some env var changes * Try test_default as Django test DB name * Try test_posthog as Django test DB name * Don't tear down ClickHouse * Make env vars common to all tests-clickhouse steps * Debug Postgres * Debug Postgres actually * Revert "Make env vars common to all tests-clickhouse steps" This reverts commit 07bfd2917fc18b6674649fdd196432ed5c8e6a37. * Try fixing DB name discrepancy * Make some env vars common * Debug Postgres better * Remove Postgres debug code * Fix inserting * Fix posthog_team.app_urls * Catch insertRow errors * Add plugin.organization_id in base tests * DELETE FROM posthog_organization * Rework CI to use Django everywhere * Reorder tests for readability * Fix missing KAFKA_HOSTS in CI * Optimize createPosthog * Start separating Postgres- and ClickHouse-based suites * Fix regexes possibly * Fix regexes actually possibly * Improve fatal error handling * Fix type * Fix import * Debug Kafka queue crash * Debug consumer * Fix organizationId * Make some more suites Postgres-specific * Check out master branch of main repo of Django server in CI * Start schedule before queue * Clean up logging * Update Dockerfile * Refactor Postgres processing to DB class * Use more of DB methods * Quit Redis * Update test database name * Debug consumer * Don't mock KafkaJS! * Add new Kafka testing system incomplete * Clean up new testing system * Make prettier compatible with ESLint * Add compatibility with Plugin.latest_tag in test * Fix minor issues * Try not terminating program on Kafka consumer crash * Change KAFKA_ADVERTISED_HOST_NAME to localhost * Address feedback * Separate Postgres and DB classes * Consume from kafka_events topic in KafkaObserver * Fix process-event test file name * Check for UUID in handed off events * Return from EventsProcessor.processEvent * use `db.postgresQuery` instead of vague `db.query` * Run the plugins suite universally * Fix closeServer * javaScriptVariableCase * Capture handleTimestamp error with Sentry * Hand off message in test * Fix plugins suite imports * Run the vm suite universally * more javaScriptVariableCase * disconnect without waiting for pending replies (.diconnect() instead of .quit()) * dot quit seems to work better? * Add sanitizeSqlIdentifier * Fix updating person properties * Fix problems * Don't use status in piscina.js * Revert "Run the vm suite universally" This reverts commit 966056fe8b1479c85cdcd0494df5ce58e50f103a. * Fix sanitizeSqlIdentifier test * Use waitForProcessedMessages * Optimize updatePersonProperties * Rework DummyPostHog to capturing internal server events efficiently * Fix nested test files not being ran * Run prettier over Markdown * Fix mock paths * Fix issues * Fix some * Don't truncate Kafka tables * Start kafkaObserver inside test * Support Team.is_demo * Increase process-event tests timeout to 60 s * Cache Python location in ClickHouse CI * Debug KafkaObserver * Cache pip in all CI jobs * Add bash script for running tests locally * Try out a different way of watching Kafka messages * Simplify DummyPostHog back again * Use posthog-js-lite in EventsProcessor * Sanitize SQL identifiers more strictly Remove all characters in identifiers that are neither letter, digit or underscore. * Don't resolve KafkaObserver start before connection * Add ee.tasks.webhooks_ee.post_event_to_webhook_ee * Expect Client to have been called twice * Adjust resetTestDatabase for Organization.personalization * Update Cliient call asserts * Postgres ingestion + process event tests (#116) * start with the postgres event ingestion process event tests * get first test to work * remove siteUrl * pass partial event to test that it still works and retain parity with the django test * another test * refactor * add more tests * opt out of posthog in test mode * add first alias test * always use UTC times when talking to postgres * prevent a crash * bit of clarity to help debug * fix bug with table name * fix bug with passing object instead of id * add some alias tests (all green now) * save merged properties * few more tests * more missing tests * fix test * team event properties test * fix bug * another test (partial) * clarify postgres magic * different timestamp format for creating event in clickhouse & postgresql * make element tests fail * capture first team event test * insert session recording events * generate element hashes * create elements and element groups * "key in object" only works with objects, not arrays * test an extra thing * add missing awaits that caused things to be done out of order * few extra tests * another test * await for things to happen * fix to work with latest master * client is called twice - it's initialized for sending celery tasks on webhooks as well * check webhook celery client queue * split into postgres & shared process event test * add query counter * clickhouse process event tests v0.1 * fix vm test * fix "Your test suite must contain at least one test." error for shared tests * also run non-ingestion tests under test:postgres * Clean up utils.ts * Clean up TimestampFormat * Put get* type Postgres functions in DB class Co-authored-by: Michael Matloka <dev@twixes.com> * postgres e2e ingestion test (#120) * Kafka e2e tests (#121) * kafka e2e test * kafka e2e test * kafka host from env * get some ch process event tests working * more working clickhouse tests * wait a bit longer * add chainToElements, fix elementsToString bug * remove quotes from inside sanitizeSqlIdentifier to also work with clickhouse * split dev setup command * fetch elements from clickhouse * more elements * bugfix * refactor and reuse wait function * ingest kafka events with the right structure * remove leftover * fix clickhouse timestamp * simplify process event creation and call the methods directly * fix uuid test * refactor delayed event fetching to support session recording events * catch bad uuids * wait for session recording events in test * use right timestamp for session recording events * use the same database as posthog (the app) * use local db * deserialize clickhouse session recording events * split dev scripts * try to make tests work by specifying db * increase kafka log level * cleanup * pass idl protos to clickhouse in github actions * start the clickhouse container in another step * let's try like this * WIP * WIP * sudo * also alias zookeeper * export zookeeper * use docker-compose.ch.yml * detached * element group test * create tests * debug test * remove some redundancy * reduce some noise * try to make topics * compatible with posthog migration 0122 * hide error * try to close e2e open handles * reuse kafkaProducer on server * Add DB.clickhouseQuery * Put isUUIDFormat on the UUID class Co-authored-by: Michael Matloka <dev@twixes.com> * Update README.md with PLUGIN_SERVER_INGESTION * rename CLICKHOUSE_USERNAME --> CLICKHOUSE_USER for consistency with the rest of the app * Delete run-tests-locally.sh * Resolve all kafka offsets (#124) * resolve all kafka offsets * remove a few lines * clean up * sort events according to offsets * add sort info * simplify offset line * add comment * Type pluginEvents as PluginEvent[] * Commit last offset of batch manually Co-authored-by: Michael Matloka <dev@twixes.com> * Add KAFKA_INCOMING_TOPIC and clean up unused code (#125) * ingest to clickhouse only if PLUGIN_SERVER_INGESTION is enabled * KAFKA_INCOMING_TOPIC * test clickhouse connection * remove some unused code * Replace _INCOMING_ with _CONSUMPTION_ * Add KAFKA_CONSUMPTION_TOPIC to README.md * add // Co-authored-by: Michael Matloka <dev@twixes.com> * Postgres parity tests (#126) * create postgres parity tests, fix some bugs * create all topics * person and distinct id parity tests * add TODOs * fetch distinct ids from clickhouse * create a specialised function for moving distinct ids and fix postgres/clickhouse person_id difference (number vs string) * test for updating distinct ids * createPerson, updatePerson and deletePerson * add a debug line to help debug flaky github action * remove falsehood * Remove "_handoff" from Kafka topic (#127) * remove _HANDOFF from topic * add plugin_ to plugin server ingestion topic * User merge test (#129) * merge test and postgres query simplification * postgres fix * remove "_HANDOFF" * small comment * Switch ClickHouse driver (#128) * Use driver @posthog/clickhouse instead of clickhouse * Update CH querying * Fix clickhouseQuery usage * Don't quote 64-bit ints in JSON from CH Co-authored-by: Marius Andra <marius.andra@gmail.com>
2021-02-04 14:59:46 +01:00
RUN yarn compile:typescript
2021-01-22 13:22:56 +01:00
FROM node:14 AS runner
WORKDIR /code/
COPY --from=builder /code/ ./
HEALTHCHECK --start-period=10s CMD [ "node", "dist/healthcheck.js" ]
2021-02-17 04:01:09 +01:00
CMD [ "node", "dist/index.js" ]