0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-12-01 12:21:02 +01:00
posthog/production.Dockerfile
Harry Waye cd82caab01
feat(ingestion): remove Graphile worker as initial ingest dependency (#12075)
* feat(ingestion): remove Graphile worker as initial ingest dependency

At the moment if the Graphile enqueing of an anonymous event fails e.g.
due to the database that it is uing to store scheduling information
fails, then we end up pushing the event to the Dead Letter Queue and to
not do anything with it further.

Here, instead of directly sending the event to the DB, we first push it
to Kafka, an `anonymous_events_buffer`, which is then committed to the
Graphile database. This means that if the Graphile DB is down, but then
comes back up, we will end up with the same results as if it was always
up*

(*) not entirely true as what is ingested also depends on the timings of
other events being ingested

* narrow typing for anonymous event consumer

* fix types import

* chore: add comment re todos for consumer

* wip

* wip

* wip

* wip

* wip

* wip

* fix typing

* Include error message in warning log

* Update plugin-server/jest.setup.fetch-mock.js

Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>

* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts

Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>

* include warning icon

* fix crash message

* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts

* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts

Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>

* setup event handlers as KafkaQueue

* chore: instrument buffer consumer

* missing import

* avoid passing hub to buffer consumer

* fix statsd reference.

* pass graphile explicitly

* explicitly cast

* add todo for buffer healthcheck

* set NODE_ENV=production

* Update comment re. failed batches

* fix: call flush on emitting to buffer.

* chore: flush to producer

* accept that we may drop some anonymous events

* Add metrics for enqueue error/enqueued

* fix comment

* chore: add CONVERSION_BUFFER_TOPIC_ENABLED_TEAMS to switch on buffer
topic

Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
2022-10-10 15:40:43 +01:00

185 lines
4.9 KiB
Docker
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#
# This Dockerfile is used for self-hosted production builds.
#
# Note: for PostHog Cloud remember to update 'Dockerfile.cloud' as appropriate.
#
#
# Build the frontend artifacts
#
FROM node:18.8-alpine3.16 AS frontend
WORKDIR /code
COPY package.json yarn.lock ./
RUN yarn config set network-timeout 300000 && \
yarn install --frozen-lockfile
COPY frontend/ frontend/
COPY ./bin/ ./bin/
COPY babel.config.js tsconfig.json webpack.config.js ./
RUN yarn build
#
# Build the plugin-server artifacts. Note that we still need to install the
# runtime deps in the main image
#
FROM node:18.8-alpine3.16 AS plugin-server
WORKDIR /code/plugin-server
# Install python, make and gcc as they are needed for the yarn install
RUN apk --update --no-cache add \
"make~=4.3" \
"g++~=11.2" \
"gcc~=11.2" \
"python3~=3.10"
# Compile and install Yarn dependencies.
#
# Notes:
#
# - we explicitly COPY the files so that we don't need to rebuild
# the container every time a dependency changes
COPY ./plugin-server/package.json ./plugin-server/yarn.lock ./plugin-server/tsconfig.json ./
RUN yarn config set network-timeout 300000 && \
yarn install
# Build the plugin server
#
# Note: we run the build as a separate actions to increase
# the cache hit ratio of the layers above.
COPY ./plugin-server/src/ ./src/
RUN yarn build
# Build the posthog image, incorporating the Django app along with the frontend,
# as well as the plugin-server
FROM python:3.8.14-alpine3.16
ENV PYTHONUNBUFFERED 1
WORKDIR /code
# Install OS dependencies needed to run PostHog
#
# Note: please add in this section runtime dependences only.
# If you temporary need a package to build a Python or npm
# dependency take a look at the sections below.
RUN apk --update --no-cache add \
"libpq~=14" \
"libxslt~=1.1" \
"nodejs-current~=18" \
"chromium~=102" \
"chromium-chromedriver~=102" \
"xmlsec~=1.2"
# Curl the GeoLite2-City database that will be used for IP geolocation within Django
#
# Notes:
#
# - We are doing this here because it makes sense to ensure the stack will work
# even if the database is not available at the time of boot.
# It's better here to fail at build then it is to fail at boot time.
RUN apk --update --no-cache --virtual .geolite-deps add \
"curl~=7" \
"brotli~=1.0.9" \
&& \
mkdir share \
&& \
( curl -L "https://mmdbcdn.posthog.net/" | brotli --decompress --output=./share/GeoLite2-City.mmdb ) \
&& \
chmod -R 755 ./share/GeoLite2-City.mmdb \
&& \
apk del .geolite-deps
# Compile and install Python dependencies.
#
# Notes:
#
# - we explicitly COPY the files so that we don't need to rebuild
# the container every time a dependency changes
#
# - we need few additional OS packages for this. Let's install
# and then uninstall them when the compilation is completed.
COPY requirements.txt ./
RUN apk --update --no-cache --virtual .build-deps add \
"bash~=5.1" \
"g++~=11.2" \
"gcc~=11.2" \
"cargo~=1.60" \
"git~=2" \
"make~=4.3" \
"libffi-dev~=3.4" \
"libxml2-dev~=2.9" \
"libxslt-dev~=1.1" \
"xmlsec-dev~=1.2" \
"postgresql13-dev~=13" \
"libmaxminddb~=1.6" \
&& \
pip install -r requirements.txt --compile --no-cache-dir \
&& \
apk del .build-deps
RUN addgroup -S posthog && \
adduser -S posthog -G posthog
RUN chown posthog.posthog /code
USER posthog
# Add in Django deps and generate Django's static files
COPY manage.py manage.py
COPY posthog posthog/
COPY ee ee/
COPY --from=frontend /code/frontend/dist /code/frontend/dist
RUN SKIP_SERVICE_VERSION_REQUIREMENTS=1 SECRET_KEY='unsafe secret key for collectstatic only' DATABASE_URL='postgres:///' REDIS_URL='redis:///' python manage.py collectstatic --noinput
# Add in the plugin-server compiled code, as well as the runtime dependencies
WORKDIR /code/plugin-server
COPY ./plugin-server/package.json ./plugin-server/yarn.lock ./
# Switch to root and install yarn so we can install runtime deps. Node that we
# still need yarn to run the plugin-server so we do not remove it.
USER root
RUN apk --update --no-cache add "yarn~=1"
# NOTE: we need make and g++ for node-gyp
# NOTE: npm is required for re2
RUN apk --update --no-cache add "make~=4.3" "g++~=11.2" "npm~=8" --virtual .build-deps \
&& yarn install --frozen-lockfile --production=true \
&& yarn cache clean \
&& apk del .build-deps
USER posthog
# Add in the compiled plugin-server
COPY --from=plugin-server /code/plugin-server/dist/ ./dist/
WORKDIR /code/
USER root
COPY ./plugin-server/package.json ./plugin-server/
# We need bash to run the bin scripts
RUN apk --update --no-cache add "bash~=5.1"
COPY ./bin ./bin/
USER posthog
ENV CHROME_BIN=/usr/bin/chromium-browser \
CHROME_PATH=/usr/lib/chromium/ \
CHROMEDRIVER_BIN=/usr/bin/chromedriver
COPY gunicorn.config.py ./
ENV NODE_ENV=production
# Expose container port and run entry point script
EXPOSE 8000
# Expose the port from which we serve OpenMetrics data
EXPOSE 8001
CMD ["./bin/docker"]