0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-21 13:39:22 +01:00

perf: Parse HogQL with C++ for a huge speedup (#17659)

* Add partial C++ parser of HogQL

* Support all the rules and add proper error handling

* Use `AlignAfterOpenBracket: BlockIndent`

* Reuse all the parser tests for the C++ backend

* Fix relationship between headers and implementations

* Add more error handling and fix minor issues

* Build both Python and C++ parsers in the package.json script

* Update ARRAY JOIN error assertion

* Improve timeit

* Move the C extension to the top level

* Refactor `vector_to_list_string`

* Build the parser on Linux

* Build wheels for the parser

* Simplify Linux build and fix macOS

* Fix Homebrew paths on x86 and don't fail fast in CI

* Set MACOSX_DEPLOYMENT_TARGET for C++20

* Set up QEMU for Linux ARM builds

* Publish the wheels on PyPI

* Avoiding Linux ARM emulation in CI for now

* Build sdist too

* Revert Dockerfile changes

* Fix PyPI publish

* Add README and optimize sdist build

* Use setup.py directly instead of build

* Use PyPI hogql-parser instead of local

* Also revert production-unit.Dockerfile

* Fix sdist upload and add Linux ARM back

* No Linux ARM build in the end

* Fix artifact uploading

* Do try building Linux ARM

We need this for prod.

* Use `npm` in `grammar:build`

`pnpm` is not available in that job.

* Fix formatting of hogql_parser

* Build everything on macOS

* Revert "Build everything on macOS"

Not so fast actually.

* Use hogql-parser=0.1.1

* Fix dylib in macOS wheel

* Bump hogql-parser version

* Fix missing module error

* Delete timeit.py

* Make error handling robust

* Format the C++

* Use `hogql-parser==0.1.1`

* Fix reserved keyword error assertions

* Use HEAD hogql_paresr in CI

* Fix `apt` usage

* Add some sudo in CI

* Ensure package will be releasable before build

* Bump version to 0.1.3

* Cover C++ `unquote_string` with tests

* Use BuildJet ARM runners for ARM builds

* Add some instructions

* Add HogQL version check to backend CI

* Update requirements.txt

* Use `setuptools` instead of the deprecated `distutils`

* Fix working dir in backend CI

* Align ANTLR versions

* Add test for "mismatched input"

This is thrown differently than other HogQLSyntaxExceptions in C++, so might help reveal what's going on with tests failing only on Linux CI and not macOS dev

* Add types and bump version

* Comment instead of failing version check

* Automate hogql-release version bump

* Fix checkout token

* Don't build hogql-parser if there were no changes

* Update query snapshots

* Update query snapshots

* Update query snapshots

* Update query snapshots

* Improve documentation

* Use new hogql-parser version

* Fix error start and end initialization

* Note `antlr4-cpp-runtime`

Co-authored-by: Marius Andra <marius.andra@gmail.com>

* Also remove NUL chars in C++

* Check ANTLR4 runtime archive checksum for security

* Note more decrefs to add

* Add vector size checks

* Use new hogql-parser version

* Don't support the `start` arg in C++ `parse_expr`

* Use new hogql-parser version

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marius Andra <marius.andra@gmail.com>
This commit is contained in:
Michael Matloka 2023-10-13 15:58:08 +02:00 committed by GitHub
parent d8e67c0dc7
commit 16a71f60c9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
50 changed files with 18577 additions and 1328 deletions

View File

@ -33,4 +33,4 @@
!plugin-server/.prettierrc
!share/GeoLite2-City.mmdb
!hogvm/python
!unit.json
!unit.json

View File

@ -49,11 +49,17 @@ runs:
python-version: ${{ inputs.python-version }}
token: ${{ inputs.token }}
- name: Determine if hogql-parser has changed compared to master
shell: bash
id: hogql-parser-diff
run: |
changed=$(git diff --quiet HEAD master -- hogql_parser/ && echo "false" || echo "true")
echo "::set-output name=changed::$changed"
- name: Install SAML (python3-saml) dependencies
shell: bash
run: |
sudo apt-get update
sudo apt-get install libxml2-dev libxmlsec1-dev libxmlsec1-openssl
sudo apt-get update && sudo apt-get install libxml2-dev libxmlsec1-dev libxmlsec1-openssl
- uses: syphar/restore-virtualenv@v1
id: cache-backend-tests
@ -63,12 +69,37 @@ runs:
- uses: syphar/restore-pip-download-cache@v1
if: steps.cache-backend-tests.outputs.cache-hit != 'true'
- name: Install python dependencies
- name: Install Python dependencies
if: steps.cache-backend-tests.outputs.cache-hit != 'true'
shell: bash
run: |
python -m pip install -r requirements-dev.txt
python -m pip install -r requirements.txt
pip install -r requirements.txt -r requirements-dev.txt
- name: Install the working version of hogql-parser
if: steps.hogql-parser-diff.outputs.changed == 'true'
shell: bash
# This is not cached currently, as it's important to build the current HEAD version of hogql-parser if it has
# changed (requirements.txt has the already-published version)
run: |
sudo apt-get install libboost-all-dev unzip cmake curl uuid pkg-config
curl https://www.antlr.org/download/antlr4-cpp-runtime-4.13.0-source.zip --output antlr4-source.zip
# Check that the downloaded archive is the expected runtime - a security measure
anltr_known_md5sum="ff214b65fb02e150b4f515d7983bca92"
antlr_found_ms5sum="$(md5sum antlr4-source.zip | cut -d' ' -f1)"
if [[ "$anltr_known_md5sum" != "$antlr_found_ms5sum" ]]; then
echo "Unexpected MD5 sum of antlr4-source.zip!"
echo "Known: $anltr_known_md5sum"
echo "Found: $antlr_found_ms5sum"
exit 64
fi
unzip antlr4-source.zip -d antlr4-source && cd antlr4-source
cmake .
DESTDIR=out make install
sudo cp -r out/usr/local/include/antlr4-runtime /usr/include/
sudo cp out/usr/local/lib/libantlr4-runtime.so* /usr/lib/
sudo ldconfig
cd ..
pip install ./hogql_parser
- name: Set up needed files
shell: bash

130
.github/workflows/build-hogql-parser.yml vendored Normal file
View File

@ -0,0 +1,130 @@
name: Release hogql-parser
on:
push:
branches:
- master
paths:
- hogql_parser/**
- .github/workflows/build-hogql-parser.yml
pull_request:
paths:
- hogql_parser/**
- .github/workflows/build-hogql-parser.yml
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
check-version:
name: Check version legitimacy
runs-on: ubuntu-22.04
outputs:
parser_any_changed: ${{ steps.changed-files-yaml.outputs.parser_any_changed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetching all for comparison since last push (not just last commit)
- name: Check if hogql_parser/ has changed
id: changed-files-yaml
uses: tj-actions/changed-files@v39
with:
since_last_remote_commit: true
files_yaml: |
parser:
- hogql_parser/**
- name: Notify about release needed
if: steps.changed-files-yaml.outputs.parser_any_changed == 'true'
shell: bash
run: |
published=$(curl -fSsl https://pypi.org/pypi/hogql-parser/json | jq -r '.info.version')
local=$(python hogql_parser/setup.py --version)
# TODO: Only comment if no comment alraedy exists for $local
if [[ "$published" == "$local" ]]; then
MESSAGE_BODY="It looks like the code of `hogql-parser` has changed since last push, but its version stayed the same at $local. 👀\nMake sure to resolve this in `hogql_parser/setup.py` before merging!"
curl -s -u posthog-bot:${{ secrets.POSTHOG_BOT_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} -X POST -d "{ \"body\": \"$MESSAGE_BODY\" }" "https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments"
fi
build-wheels:
name: Build wheels on ${{ matrix.os }}
needs: check-version
runs-on: ${{ matrix.os }}
timeout-minutes: 30
if: ${{ needs.check-version.outputs.parser_any_changed == 'true' }}
strategy:
matrix:
# As of October 2023, GitHub doesn't have ARM Actions runners… and ARM emulation is insanely slow
# (20x longer) on the Linux runners (while being reasonable on the macOS runners). Hence, we use
# BuildJet as a provider of ARM runners - this solution saves a lot of time and consequently some money.
os: [ubuntu-22.04, buildjet-2vcpu-ubuntu-2204-arm, macos-12]
steps:
- uses: actions/checkout@v4
- if: ${{ !endsWith(matrix.os, '-arm') }}
uses: actions/setup-python@v4
with:
python-version: '3.11'
- if: ${{ endsWith(matrix.os, '-arm') }}
uses: deadsnakes/action@v3.0.1 # Unfortunately actions/setup-python@v4 just doesn't work on ARM! This does
with:
python-version: '3.11'
- name: Build sdist
if: matrix.os == 'ubuntu-22.04' # Only build the sdist once
run: cd hogql_parser && python setup.py sdist
- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.16.*
- name: Build wheels
run: cd hogql_parser && python -m cibuildwheel --output-dir dist
env:
MACOSX_DEPLOYMENT_TARGET: '12' # A modern target allows us to use C++20
- uses: actions/upload-artifact@v3
with:
path: |
hogql_parser/dist/*.whl
hogql_parser/dist/*.tar.gz
if-no-files-found: error
publish:
name: Publish on PyPI
needs: build-wheels
environment: pypi-hogql-parser
permissions:
id-token: write
runs-on: ubuntu-22.04
steps:
- name: Fetch wheels
uses: actions/download-artifact@v3
with:
name: artifact
path: dist/
- name: Publish package to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
- uses: actions/checkout@v4
with:
token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }}
ref: ${{ github.event.pull_request.head.ref }}
- name: Update hogql-parser in requirements
shell: bash
run: |
local=$(python hogql_parser/setup.py --version)
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.in
sed -i "s/hogql-parser==.*/hogql-parser==${local}/g" requirements.txt
- uses: EndBug/add-and-commit@v9
with:
add: '["requirements.in", "requirements.txt"]'
message: 'Use new hogql-parser version'
default_author: github_actions
github_token: ${{ secrets.POSTHOG_BOT_GITHUB_TOKEN }}

View File

@ -98,7 +98,6 @@ jobs:
- uses: actions/checkout@v3
with:
fetch-depth: 1
path: 'current/'
- name: Set up Python
uses: actions/setup-python@v4
@ -119,38 +118,30 @@ jobs:
sudo apt-get update
sudo apt-get install libxml2-dev libxmlsec1 libxmlsec1-dev libxmlsec1-openssl
- name: Install python dependencies
- name: Install Python dependencies
if: steps.cache-backend-tests.outputs.cache-hit != 'true'
run: |
cd current
python -m pip install -r requirements.txt -r requirements-dev.txt
- name: Check for syntax errors, import sort, and code style violations
run: |
cd current
ruff .
- name: Check formatting
run: |
cd current
black --exclude posthog/hogql/grammar --check .
- name: Check static typing
run: |
cd current
mypy -p posthog --exclude bin/migrate_kafka_data.py --exclude posthog/hogql/grammar/HogQLParser.py --exclude gunicorn.config.py --enable-recursive-aliases
- name: Check if "schema.py" is up to date
run: |
cd current
npm run schema:build:python && git diff --exit-code
- name: Check if antlr definitions are up to date
- name: Check if ANTLR definitions are up to date
run: |
# Installing a version of ant compatible with what we use in development from homebrew (4.13)
# "apt-get install antlr" would install 4.7 which is incompatible with our grammar.
export ANTLR_VERSION=4.13.0
# java version doesn't matter
cd ..
sudo apt-get install default-jre
mkdir antlr
cd antlr
@ -162,9 +153,13 @@ jobs:
export CLASSPATH=".:$PWD/antlr.jar:$CLASSPATH"
export PATH="$PWD:$PATH"
cd ../current
cd ../posthog
antlr | grep "Version"
npm run grammar:build && git diff --exit-code
env:
# Installing a version of ANTLR compatible with what's in Homebrew as of October 2023 (version 4.13),
# as apt-get is quite out of date. The same version must be set in hogql_parser/pyproject.toml
ANTLR_VERSION: '4.13.0'
check-migrations:
needs: changes

25
.vscode/launch.json vendored
View File

@ -96,6 +96,31 @@
"console": "integratedTerminal",
"python": "${workspaceFolder}/env/bin/python",
"cwd": "${workspaceFolder}"
},
{
"name": "Pytest: Current File",
"type": "python",
"request": "launch",
"module": "pytest",
"args": ["${file}", "-vvv"],
"console": "integratedTerminal",
"justMyCode": true
},
{
"name": "(lldb) Attach",
"type": "cppdbg",
"request": "attach",
"program": "/Users/twixes/.pyenv/versions/3.10.10/envs/posthog-3.10/bin/python",
"MIMode": "lldb"
},
{
"name": "Python C++ Debugger: Current File",
"type": "pythoncpp",
"request": "launch",
"pythonConfig": "custom",
"pythonLaunchName": "Pytest: Current File",
"cppConfig": "custom",
"cppAttachName": "(lldb) Attach"
}
],
"compounds": [

View File

@ -0,0 +1,3 @@
BasedOnStyle: Chromium
ColumnLimit: 120
AlignAfterOpenBracket: BlockIndent

5
hogql_parser/.gitignore vendored Normal file
View File

@ -0,0 +1,5 @@
# Build
build/
*.egg-info
*.so
dist/

View File

@ -0,0 +1,50 @@
# Developing `hogql-parser`
## Mandatory reading
If you're new to Python C/C++ extensions, there are some things you must have in your mind.
### [Objects, Types and Reference Counts in CPython](https://docs.python.org/3/c-api/intro.html#objects-types-and-reference-counts)
Key takeaways:
1. `Py_INCREF()` and `Py_DECREF()` need to be used accurately, or there'll be memory leaks (or, less likely, segfaults).
1. `Py_None`, `Py_True`, and `Py_False` are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them in `Py_NewRef()`.
1. Pretty much only `PyList_SET_ITEM()` _steals_ references (i.e. assumes ownership of objects passed into it), if you pass an object into any other function and no longer need it after that - remember to `Py_DECREF` it!
### [Building Values in CPython](https://docs.python.org/3/c-api/arg.html#building-values)
Key takeaways:
1. Use `Py_BuildValue()` for building tuples, dicts, and lists of static size. Use type-specific functions (e.g. `PyUnicode_FromString()` or `PyList_New()`) otherwise.
1. `str`-building with `s` involves `strlen`, while `s#` doesn't - it's better to use the latter with C++ strings.
1. `object`-passing with `O` increments the object's refcount, while doing it with `N` doesn't - we should use `N` pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing).
## Conventions
1. Use `snake_case`. ANTLR is `camelCase`-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky.
2. Use the `auto` type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise specify the type explictly.
3. Stay out of Python land as long as possible. E.g. avoid using `PyObject*`s` for bools or strings.
Do use Python for parsing numbers though - that way we don't need to consider integer overflow.
4. If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land.
E.g. it doesn't make sense to create a `vector<PyObject*>`, that should just be a `PyObject*` of Python type `list`.
## How to develop locally on macOS
1. Install libraries:
```bash
brew install boost antlr4-cpp-runtime
```
1. Install `hogql_parser` by building from local sources:
```bash
pip install ./hogql_parser
```
1. If you now run tests, the locally-built version of `hogql_parser` will be used:
```bash
pytest posthog/hogql/
```

1048
hogql_parser/HogQLLexer.cpp Normal file

File diff suppressed because it is too large Load Diff

94
hogql_parser/HogQLLexer.h Normal file
View File

@ -0,0 +1,94 @@
// Generated from HogQLLexer.g4 by ANTLR 4.13.0
#pragma once
#include "antlr4-runtime.h"
class HogQLLexer : public antlr4::Lexer {
public:
enum {
ADD = 1, AFTER = 2, ALIAS = 3, ALL = 4, ALTER = 5, AND = 6, ANTI = 7,
ANY = 8, ARRAY = 9, AS = 10, ASCENDING = 11, ASOF = 12, AST = 13, ASYNC = 14,
ATTACH = 15, BETWEEN = 16, BOTH = 17, BY = 18, CASE = 19, CAST = 20,
CHECK = 21, CLEAR = 22, CLUSTER = 23, CODEC = 24, COHORT = 25, COLLATE = 26,
COLUMN = 27, COMMENT = 28, CONSTRAINT = 29, CREATE = 30, CROSS = 31,
CUBE = 32, CURRENT = 33, DATABASE = 34, DATABASES = 35, DATE = 36, DAY = 37,
DEDUPLICATE = 38, DEFAULT = 39, DELAY = 40, DELETE = 41, DESC = 42,
DESCENDING = 43, DESCRIBE = 44, DETACH = 45, DICTIONARIES = 46, DICTIONARY = 47,
DISK = 48, DISTINCT = 49, DISTRIBUTED = 50, DROP = 51, ELSE = 52, END = 53,
ENGINE = 54, EVENTS = 55, EXISTS = 56, EXPLAIN = 57, EXPRESSION = 58,
EXTRACT = 59, FETCHES = 60, FINAL = 61, FIRST = 62, FLUSH = 63, FOLLOWING = 64,
FOR = 65, FORMAT = 66, FREEZE = 67, FROM = 68, FULL = 69, FUNCTION = 70,
GLOBAL = 71, GRANULARITY = 72, GROUP = 73, HAVING = 74, HIERARCHICAL = 75,
HOUR = 76, ID = 77, IF = 78, ILIKE = 79, IN = 80, INDEX = 81, INF = 82,
INJECTIVE = 83, INNER = 84, INSERT = 85, INTERVAL = 86, INTO = 87, IS = 88,
IS_OBJECT_ID = 89, JOIN = 90, KEY = 91, KILL = 92, LAST = 93, LAYOUT = 94,
LEADING = 95, LEFT = 96, LIFETIME = 97, LIKE = 98, LIMIT = 99, LIVE = 100,
LOCAL = 101, LOGS = 102, MATERIALIZE = 103, MATERIALIZED = 104, MAX = 105,
MERGES = 106, MIN = 107, MINUTE = 108, MODIFY = 109, MONTH = 110, MOVE = 111,
MUTATION = 112, NAN_SQL = 113, NO = 114, NOT = 115, NULL_SQL = 116,
NULLS = 117, OFFSET = 118, ON = 119, OPTIMIZE = 120, OR = 121, ORDER = 122,
OUTER = 123, OUTFILE = 124, OVER = 125, PARTITION = 126, POPULATE = 127,
PRECEDING = 128, PREWHERE = 129, PRIMARY = 130, PROJECTION = 131, QUARTER = 132,
RANGE = 133, RELOAD = 134, REMOVE = 135, RENAME = 136, REPLACE = 137,
REPLICA = 138, REPLICATED = 139, RIGHT = 140, ROLLUP = 141, ROW = 142,
ROWS = 143, SAMPLE = 144, SECOND = 145, SELECT = 146, SEMI = 147, SENDS = 148,
SET = 149, SETTINGS = 150, SHOW = 151, SOURCE = 152, START = 153, STOP = 154,
SUBSTRING = 155, SYNC = 156, SYNTAX = 157, SYSTEM = 158, TABLE = 159,
TABLES = 160, TEMPORARY = 161, TEST = 162, THEN = 163, TIES = 164, TIMEOUT = 165,
TIMESTAMP = 166, TO = 167, TOP = 168, TOTALS = 169, TRAILING = 170,
TRIM = 171, TRUNCATE = 172, TTL = 173, TYPE = 174, UNBOUNDED = 175,
UNION = 176, UPDATE = 177, USE = 178, USING = 179, UUID = 180, VALUES = 181,
VIEW = 182, VOLUME = 183, WATCH = 184, WEEK = 185, WHEN = 186, WHERE = 187,
WINDOW = 188, WITH = 189, YEAR = 190, JSON_FALSE = 191, JSON_TRUE = 192,
ESCAPE_CHAR = 193, IDENTIFIER = 194, FLOATING_LITERAL = 195, OCTAL_LITERAL = 196,
DECIMAL_LITERAL = 197, HEXADECIMAL_LITERAL = 198, STRING_LITERAL = 199,
PLACEHOLDER = 200, ARROW = 201, ASTERISK = 202, BACKQUOTE = 203, BACKSLASH = 204,
COLON = 205, COMMA = 206, CONCAT = 207, DASH = 208, DOLLAR = 209, DOT = 210,
EQ_DOUBLE = 211, EQ_SINGLE = 212, GT_EQ = 213, GT = 214, HASH = 215,
IREGEX_SINGLE = 216, IREGEX_DOUBLE = 217, LBRACE = 218, LBRACKET = 219,
LPAREN = 220, LT_EQ = 221, LT = 222, NOT_EQ = 223, NOT_IREGEX = 224,
NOT_REGEX = 225, NULLISH = 226, PERCENT = 227, PLUS = 228, QUERY = 229,
QUOTE_DOUBLE = 230, QUOTE_SINGLE = 231, REGEX_SINGLE = 232, REGEX_DOUBLE = 233,
RBRACE = 234, RBRACKET = 235, RPAREN = 236, SEMICOLON = 237, SLASH = 238,
UNDERSCORE = 239, MULTI_LINE_COMMENT = 240, SINGLE_LINE_COMMENT = 241,
WHITESPACE = 242
};
explicit HogQLLexer(antlr4::CharStream *input);
~HogQLLexer() override;
std::string getGrammarFileName() const override;
const std::vector<std::string>& getRuleNames() const override;
const std::vector<std::string>& getChannelNames() const override;
const std::vector<std::string>& getModeNames() const override;
const antlr4::dfa::Vocabulary& getVocabulary() const override;
antlr4::atn::SerializedATNView getSerializedATN() const override;
const antlr4::atn::ATN& getATN() const override;
// By default the static state used to implement the lexer is lazily initialized during the first
// call to the constructor. You can call this function if you wish to initialize the static state
// ahead of time.
static void initialize();
private:
// Individual action functions triggered by action() above.
// Individual semantic predicate functions triggered by sempred() above.
};

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,282 @@
ADD=1
AFTER=2
ALIAS=3
ALL=4
ALTER=5
AND=6
ANTI=7
ANY=8
ARRAY=9
AS=10
ASCENDING=11
ASOF=12
AST=13
ASYNC=14
ATTACH=15
BETWEEN=16
BOTH=17
BY=18
CASE=19
CAST=20
CHECK=21
CLEAR=22
CLUSTER=23
CODEC=24
COHORT=25
COLLATE=26
COLUMN=27
COMMENT=28
CONSTRAINT=29
CREATE=30
CROSS=31
CUBE=32
CURRENT=33
DATABASE=34
DATABASES=35
DATE=36
DAY=37
DEDUPLICATE=38
DEFAULT=39
DELAY=40
DELETE=41
DESC=42
DESCENDING=43
DESCRIBE=44
DETACH=45
DICTIONARIES=46
DICTIONARY=47
DISK=48
DISTINCT=49
DISTRIBUTED=50
DROP=51
ELSE=52
END=53
ENGINE=54
EVENTS=55
EXISTS=56
EXPLAIN=57
EXPRESSION=58
EXTRACT=59
FETCHES=60
FINAL=61
FIRST=62
FLUSH=63
FOLLOWING=64
FOR=65
FORMAT=66
FREEZE=67
FROM=68
FULL=69
FUNCTION=70
GLOBAL=71
GRANULARITY=72
GROUP=73
HAVING=74
HIERARCHICAL=75
HOUR=76
ID=77
IF=78
ILIKE=79
IN=80
INDEX=81
INF=82
INJECTIVE=83
INNER=84
INSERT=85
INTERVAL=86
INTO=87
IS=88
IS_OBJECT_ID=89
JOIN=90
KEY=91
KILL=92
LAST=93
LAYOUT=94
LEADING=95
LEFT=96
LIFETIME=97
LIKE=98
LIMIT=99
LIVE=100
LOCAL=101
LOGS=102
MATERIALIZE=103
MATERIALIZED=104
MAX=105
MERGES=106
MIN=107
MINUTE=108
MODIFY=109
MONTH=110
MOVE=111
MUTATION=112
NAN_SQL=113
NO=114
NOT=115
NULL_SQL=116
NULLS=117
OFFSET=118
ON=119
OPTIMIZE=120
OR=121
ORDER=122
OUTER=123
OUTFILE=124
OVER=125
PARTITION=126
POPULATE=127
PRECEDING=128
PREWHERE=129
PRIMARY=130
PROJECTION=131
QUARTER=132
RANGE=133
RELOAD=134
REMOVE=135
RENAME=136
REPLACE=137
REPLICA=138
REPLICATED=139
RIGHT=140
ROLLUP=141
ROW=142
ROWS=143
SAMPLE=144
SECOND=145
SELECT=146
SEMI=147
SENDS=148
SET=149
SETTINGS=150
SHOW=151
SOURCE=152
START=153
STOP=154
SUBSTRING=155
SYNC=156
SYNTAX=157
SYSTEM=158
TABLE=159
TABLES=160
TEMPORARY=161
TEST=162
THEN=163
TIES=164
TIMEOUT=165
TIMESTAMP=166
TO=167
TOP=168
TOTALS=169
TRAILING=170
TRIM=171
TRUNCATE=172
TTL=173
TYPE=174
UNBOUNDED=175
UNION=176
UPDATE=177
USE=178
USING=179
UUID=180
VALUES=181
VIEW=182
VOLUME=183
WATCH=184
WEEK=185
WHEN=186
WHERE=187
WINDOW=188
WITH=189
YEAR=190
JSON_FALSE=191
JSON_TRUE=192
ESCAPE_CHAR=193
IDENTIFIER=194
FLOATING_LITERAL=195
OCTAL_LITERAL=196
DECIMAL_LITERAL=197
HEXADECIMAL_LITERAL=198
STRING_LITERAL=199
PLACEHOLDER=200
ARROW=201
ASTERISK=202
BACKQUOTE=203
BACKSLASH=204
COLON=205
COMMA=206
CONCAT=207
DASH=208
DOLLAR=209
DOT=210
EQ_DOUBLE=211
EQ_SINGLE=212
GT_EQ=213
GT=214
HASH=215
IREGEX_SINGLE=216
IREGEX_DOUBLE=217
LBRACE=218
LBRACKET=219
LPAREN=220
LT_EQ=221
LT=222
NOT_EQ=223
NOT_IREGEX=224
NOT_REGEX=225
NULLISH=226
PERCENT=227
PLUS=228
QUERY=229
QUOTE_DOUBLE=230
QUOTE_SINGLE=231
REGEX_SINGLE=232
REGEX_DOUBLE=233
RBRACE=234
RBRACKET=235
RPAREN=236
SEMICOLON=237
SLASH=238
UNDERSCORE=239
MULTI_LINE_COMMENT=240
SINGLE_LINE_COMMENT=241
WHITESPACE=242
'false'=191
'true'=192
'->'=201
'*'=202
'`'=203
'\\'=204
':'=205
','=206
'||'=207
'-'=208
'$'=209
'.'=210
'=='=211
'='=212
'>='=213
'>'=214
'#'=215
'~*'=216
'=~*'=217
'{'=218
'['=219
'('=220
'<='=221
'<'=222
'!~*'=224
'!~'=225
'??'=226
'%'=227
'+'=228
'?'=229
'"'=230
'\''=231
'~'=232
'=~'=233
'}'=234
']'=235
')'=236
';'=237
'/'=238
'_'=239

9562
hogql_parser/HogQLParser.cpp Normal file

File diff suppressed because it is too large Load Diff

1990
hogql_parser/HogQLParser.h Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,282 @@
ADD=1
AFTER=2
ALIAS=3
ALL=4
ALTER=5
AND=6
ANTI=7
ANY=8
ARRAY=9
AS=10
ASCENDING=11
ASOF=12
AST=13
ASYNC=14
ATTACH=15
BETWEEN=16
BOTH=17
BY=18
CASE=19
CAST=20
CHECK=21
CLEAR=22
CLUSTER=23
CODEC=24
COHORT=25
COLLATE=26
COLUMN=27
COMMENT=28
CONSTRAINT=29
CREATE=30
CROSS=31
CUBE=32
CURRENT=33
DATABASE=34
DATABASES=35
DATE=36
DAY=37
DEDUPLICATE=38
DEFAULT=39
DELAY=40
DELETE=41
DESC=42
DESCENDING=43
DESCRIBE=44
DETACH=45
DICTIONARIES=46
DICTIONARY=47
DISK=48
DISTINCT=49
DISTRIBUTED=50
DROP=51
ELSE=52
END=53
ENGINE=54
EVENTS=55
EXISTS=56
EXPLAIN=57
EXPRESSION=58
EXTRACT=59
FETCHES=60
FINAL=61
FIRST=62
FLUSH=63
FOLLOWING=64
FOR=65
FORMAT=66
FREEZE=67
FROM=68
FULL=69
FUNCTION=70
GLOBAL=71
GRANULARITY=72
GROUP=73
HAVING=74
HIERARCHICAL=75
HOUR=76
ID=77
IF=78
ILIKE=79
IN=80
INDEX=81
INF=82
INJECTIVE=83
INNER=84
INSERT=85
INTERVAL=86
INTO=87
IS=88
IS_OBJECT_ID=89
JOIN=90
KEY=91
KILL=92
LAST=93
LAYOUT=94
LEADING=95
LEFT=96
LIFETIME=97
LIKE=98
LIMIT=99
LIVE=100
LOCAL=101
LOGS=102
MATERIALIZE=103
MATERIALIZED=104
MAX=105
MERGES=106
MIN=107
MINUTE=108
MODIFY=109
MONTH=110
MOVE=111
MUTATION=112
NAN_SQL=113
NO=114
NOT=115
NULL_SQL=116
NULLS=117
OFFSET=118
ON=119
OPTIMIZE=120
OR=121
ORDER=122
OUTER=123
OUTFILE=124
OVER=125
PARTITION=126
POPULATE=127
PRECEDING=128
PREWHERE=129
PRIMARY=130
PROJECTION=131
QUARTER=132
RANGE=133
RELOAD=134
REMOVE=135
RENAME=136
REPLACE=137
REPLICA=138
REPLICATED=139
RIGHT=140
ROLLUP=141
ROW=142
ROWS=143
SAMPLE=144
SECOND=145
SELECT=146
SEMI=147
SENDS=148
SET=149
SETTINGS=150
SHOW=151
SOURCE=152
START=153
STOP=154
SUBSTRING=155
SYNC=156
SYNTAX=157
SYSTEM=158
TABLE=159
TABLES=160
TEMPORARY=161
TEST=162
THEN=163
TIES=164
TIMEOUT=165
TIMESTAMP=166
TO=167
TOP=168
TOTALS=169
TRAILING=170
TRIM=171
TRUNCATE=172
TTL=173
TYPE=174
UNBOUNDED=175
UNION=176
UPDATE=177
USE=178
USING=179
UUID=180
VALUES=181
VIEW=182
VOLUME=183
WATCH=184
WEEK=185
WHEN=186
WHERE=187
WINDOW=188
WITH=189
YEAR=190
JSON_FALSE=191
JSON_TRUE=192
ESCAPE_CHAR=193
IDENTIFIER=194
FLOATING_LITERAL=195
OCTAL_LITERAL=196
DECIMAL_LITERAL=197
HEXADECIMAL_LITERAL=198
STRING_LITERAL=199
PLACEHOLDER=200
ARROW=201
ASTERISK=202
BACKQUOTE=203
BACKSLASH=204
COLON=205
COMMA=206
CONCAT=207
DASH=208
DOLLAR=209
DOT=210
EQ_DOUBLE=211
EQ_SINGLE=212
GT_EQ=213
GT=214
HASH=215
IREGEX_SINGLE=216
IREGEX_DOUBLE=217
LBRACE=218
LBRACKET=219
LPAREN=220
LT_EQ=221
LT=222
NOT_EQ=223
NOT_IREGEX=224
NOT_REGEX=225
NULLISH=226
PERCENT=227
PLUS=228
QUERY=229
QUOTE_DOUBLE=230
QUOTE_SINGLE=231
REGEX_SINGLE=232
REGEX_DOUBLE=233
RBRACE=234
RBRACKET=235
RPAREN=236
SEMICOLON=237
SLASH=238
UNDERSCORE=239
MULTI_LINE_COMMENT=240
SINGLE_LINE_COMMENT=241
WHITESPACE=242
'false'=191
'true'=192
'->'=201
'*'=202
'`'=203
'\\'=204
':'=205
','=206
'||'=207
'-'=208
'$'=209
'.'=210
'=='=211
'='=212
'>='=213
'>'=214
'#'=215
'~*'=216
'=~*'=217
'{'=218
'['=219
'('=220
'<='=221
'<'=222
'!~*'=224
'!~'=225
'??'=226
'%'=227
'+'=228
'?'=229
'"'=230
'\''=231
'~'=232
'=~'=233
'}'=234
']'=235
')'=236
';'=237
'/'=238
'_'=239

View File

@ -0,0 +1,7 @@
// Generated from HogQLParser.g4 by ANTLR 4.13.0
#include "HogQLParserBaseVisitor.h"

View File

@ -0,0 +1,444 @@
// Generated from HogQLParser.g4 by ANTLR 4.13.0
#pragma once
#include "antlr4-runtime.h"
#include "HogQLParserVisitor.h"
/**
* This class provides an empty implementation of HogQLParserVisitor, which can be
* extended to create a visitor which only needs to handle a subset of the available methods.
*/
class HogQLParserBaseVisitor : public HogQLParserVisitor {
public:
virtual std::any visitSelect(HogQLParser::SelectContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSelectUnionStmt(HogQLParser::SelectUnionStmtContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSelectStmtWithParens(HogQLParser::SelectStmtWithParensContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSelectStmt(HogQLParser::SelectStmtContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWithClause(HogQLParser::WithClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTopClause(HogQLParser::TopClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitFromClause(HogQLParser::FromClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitArrayJoinClause(HogQLParser::ArrayJoinClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWindowClause(HogQLParser::WindowClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitPrewhereClause(HogQLParser::PrewhereClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWhereClause(HogQLParser::WhereClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitGroupByClause(HogQLParser::GroupByClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitHavingClause(HogQLParser::HavingClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitOrderByClause(HogQLParser::OrderByClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitProjectionOrderByClause(HogQLParser::ProjectionOrderByClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitLimitAndOffsetClause(HogQLParser::LimitAndOffsetClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitOffsetOnlyClause(HogQLParser::OffsetOnlyClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSettingsClause(HogQLParser::SettingsClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinExprOp(HogQLParser::JoinExprOpContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinExprTable(HogQLParser::JoinExprTableContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinExprParens(HogQLParser::JoinExprParensContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinExprCrossOp(HogQLParser::JoinExprCrossOpContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinOpInner(HogQLParser::JoinOpInnerContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinOpLeftRight(HogQLParser::JoinOpLeftRightContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinOpFull(HogQLParser::JoinOpFullContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinOpCross(HogQLParser::JoinOpCrossContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitJoinConstraintClause(HogQLParser::JoinConstraintClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSampleClause(HogQLParser::SampleClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitOrderExprList(HogQLParser::OrderExprListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitOrderExpr(HogQLParser::OrderExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitRatioExpr(HogQLParser::RatioExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSettingExprList(HogQLParser::SettingExprListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitSettingExpr(HogQLParser::SettingExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWindowExpr(HogQLParser::WindowExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWinPartitionByClause(HogQLParser::WinPartitionByClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWinOrderByClause(HogQLParser::WinOrderByClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWinFrameClause(HogQLParser::WinFrameClauseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitFrameStart(HogQLParser::FrameStartContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitFrameBetween(HogQLParser::FrameBetweenContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWinFrameBound(HogQLParser::WinFrameBoundContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitExpr(HogQLParser::ExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnTypeExprSimple(HogQLParser::ColumnTypeExprSimpleContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnTypeExprNested(HogQLParser::ColumnTypeExprNestedContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnTypeExprEnum(HogQLParser::ColumnTypeExprEnumContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnTypeExprComplex(HogQLParser::ColumnTypeExprComplexContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnTypeExprParam(HogQLParser::ColumnTypeExprParamContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprList(HogQLParser::ColumnExprListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprTernaryOp(HogQLParser::ColumnExprTernaryOpContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprAlias(HogQLParser::ColumnExprAliasContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprExtract(HogQLParser::ColumnExprExtractContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprNegate(HogQLParser::ColumnExprNegateContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprSubquery(HogQLParser::ColumnExprSubqueryContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprLiteral(HogQLParser::ColumnExprLiteralContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprArray(HogQLParser::ColumnExprArrayContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprSubstring(HogQLParser::ColumnExprSubstringContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprCast(HogQLParser::ColumnExprCastContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprOr(HogQLParser::ColumnExprOrContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprPrecedence1(HogQLParser::ColumnExprPrecedence1Context *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprPrecedence2(HogQLParser::ColumnExprPrecedence2Context *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprPrecedence3(HogQLParser::ColumnExprPrecedence3Context *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprInterval(HogQLParser::ColumnExprIntervalContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprIsNull(HogQLParser::ColumnExprIsNullContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprWinFunctionTarget(HogQLParser::ColumnExprWinFunctionTargetContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprTrim(HogQLParser::ColumnExprTrimContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprTuple(HogQLParser::ColumnExprTupleContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprArrayAccess(HogQLParser::ColumnExprArrayAccessContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprBetween(HogQLParser::ColumnExprBetweenContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprPropertyAccess(HogQLParser::ColumnExprPropertyAccessContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprParens(HogQLParser::ColumnExprParensContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprTimestamp(HogQLParser::ColumnExprTimestampContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprNullish(HogQLParser::ColumnExprNullishContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprAnd(HogQLParser::ColumnExprAndContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprTupleAccess(HogQLParser::ColumnExprTupleAccessContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprCase(HogQLParser::ColumnExprCaseContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprDate(HogQLParser::ColumnExprDateContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprNot(HogQLParser::ColumnExprNotContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprWinFunction(HogQLParser::ColumnExprWinFunctionContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprIdentifier(HogQLParser::ColumnExprIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprFunction(HogQLParser::ColumnExprFunctionContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnExprAsterisk(HogQLParser::ColumnExprAsteriskContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnArgList(HogQLParser::ColumnArgListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnArgExpr(HogQLParser::ColumnArgExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnLambdaExpr(HogQLParser::ColumnLambdaExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWithExprList(HogQLParser::WithExprListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWithExprSubquery(HogQLParser::WithExprSubqueryContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitWithExprColumn(HogQLParser::WithExprColumnContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitColumnIdentifier(HogQLParser::ColumnIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitNestedIdentifier(HogQLParser::NestedIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableExprIdentifier(HogQLParser::TableExprIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableExprPlaceholder(HogQLParser::TableExprPlaceholderContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableExprSubquery(HogQLParser::TableExprSubqueryContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableExprAlias(HogQLParser::TableExprAliasContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableExprFunction(HogQLParser::TableExprFunctionContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableFunctionExpr(HogQLParser::TableFunctionExprContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableIdentifier(HogQLParser::TableIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitTableArgList(HogQLParser::TableArgListContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitDatabaseIdentifier(HogQLParser::DatabaseIdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitFloatingLiteral(HogQLParser::FloatingLiteralContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitNumberLiteral(HogQLParser::NumberLiteralContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitLiteral(HogQLParser::LiteralContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitInterval(HogQLParser::IntervalContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitKeyword(HogQLParser::KeywordContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitKeywordForAlias(HogQLParser::KeywordForAliasContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitAlias(HogQLParser::AliasContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitIdentifier(HogQLParser::IdentifierContext *ctx) override {
return visitChildren(ctx);
}
virtual std::any visitEnumValue(HogQLParser::EnumValueContext *ctx) override {
return visitChildren(ctx);
}
};

View File

@ -0,0 +1,7 @@
// Generated from HogQLParser.g4 by ANTLR 4.13.0
#include "HogQLParserVisitor.h"

View File

@ -0,0 +1,236 @@
// Generated from HogQLParser.g4 by ANTLR 4.13.0
#pragma once
#include "antlr4-runtime.h"
#include "HogQLParser.h"
/**
* This class defines an abstract visitor for a parse tree
* produced by HogQLParser.
*/
class HogQLParserVisitor : public antlr4::tree::AbstractParseTreeVisitor {
public:
/**
* Visit parse trees produced by HogQLParser.
*/
virtual std::any visitSelect(HogQLParser::SelectContext *context) = 0;
virtual std::any visitSelectUnionStmt(HogQLParser::SelectUnionStmtContext *context) = 0;
virtual std::any visitSelectStmtWithParens(HogQLParser::SelectStmtWithParensContext *context) = 0;
virtual std::any visitSelectStmt(HogQLParser::SelectStmtContext *context) = 0;
virtual std::any visitWithClause(HogQLParser::WithClauseContext *context) = 0;
virtual std::any visitTopClause(HogQLParser::TopClauseContext *context) = 0;
virtual std::any visitFromClause(HogQLParser::FromClauseContext *context) = 0;
virtual std::any visitArrayJoinClause(HogQLParser::ArrayJoinClauseContext *context) = 0;
virtual std::any visitWindowClause(HogQLParser::WindowClauseContext *context) = 0;
virtual std::any visitPrewhereClause(HogQLParser::PrewhereClauseContext *context) = 0;
virtual std::any visitWhereClause(HogQLParser::WhereClauseContext *context) = 0;
virtual std::any visitGroupByClause(HogQLParser::GroupByClauseContext *context) = 0;
virtual std::any visitHavingClause(HogQLParser::HavingClauseContext *context) = 0;
virtual std::any visitOrderByClause(HogQLParser::OrderByClauseContext *context) = 0;
virtual std::any visitProjectionOrderByClause(HogQLParser::ProjectionOrderByClauseContext *context) = 0;
virtual std::any visitLimitAndOffsetClause(HogQLParser::LimitAndOffsetClauseContext *context) = 0;
virtual std::any visitOffsetOnlyClause(HogQLParser::OffsetOnlyClauseContext *context) = 0;
virtual std::any visitSettingsClause(HogQLParser::SettingsClauseContext *context) = 0;
virtual std::any visitJoinExprOp(HogQLParser::JoinExprOpContext *context) = 0;
virtual std::any visitJoinExprTable(HogQLParser::JoinExprTableContext *context) = 0;
virtual std::any visitJoinExprParens(HogQLParser::JoinExprParensContext *context) = 0;
virtual std::any visitJoinExprCrossOp(HogQLParser::JoinExprCrossOpContext *context) = 0;
virtual std::any visitJoinOpInner(HogQLParser::JoinOpInnerContext *context) = 0;
virtual std::any visitJoinOpLeftRight(HogQLParser::JoinOpLeftRightContext *context) = 0;
virtual std::any visitJoinOpFull(HogQLParser::JoinOpFullContext *context) = 0;
virtual std::any visitJoinOpCross(HogQLParser::JoinOpCrossContext *context) = 0;
virtual std::any visitJoinConstraintClause(HogQLParser::JoinConstraintClauseContext *context) = 0;
virtual std::any visitSampleClause(HogQLParser::SampleClauseContext *context) = 0;
virtual std::any visitOrderExprList(HogQLParser::OrderExprListContext *context) = 0;
virtual std::any visitOrderExpr(HogQLParser::OrderExprContext *context) = 0;
virtual std::any visitRatioExpr(HogQLParser::RatioExprContext *context) = 0;
virtual std::any visitSettingExprList(HogQLParser::SettingExprListContext *context) = 0;
virtual std::any visitSettingExpr(HogQLParser::SettingExprContext *context) = 0;
virtual std::any visitWindowExpr(HogQLParser::WindowExprContext *context) = 0;
virtual std::any visitWinPartitionByClause(HogQLParser::WinPartitionByClauseContext *context) = 0;
virtual std::any visitWinOrderByClause(HogQLParser::WinOrderByClauseContext *context) = 0;
virtual std::any visitWinFrameClause(HogQLParser::WinFrameClauseContext *context) = 0;
virtual std::any visitFrameStart(HogQLParser::FrameStartContext *context) = 0;
virtual std::any visitFrameBetween(HogQLParser::FrameBetweenContext *context) = 0;
virtual std::any visitWinFrameBound(HogQLParser::WinFrameBoundContext *context) = 0;
virtual std::any visitExpr(HogQLParser::ExprContext *context) = 0;
virtual std::any visitColumnTypeExprSimple(HogQLParser::ColumnTypeExprSimpleContext *context) = 0;
virtual std::any visitColumnTypeExprNested(HogQLParser::ColumnTypeExprNestedContext *context) = 0;
virtual std::any visitColumnTypeExprEnum(HogQLParser::ColumnTypeExprEnumContext *context) = 0;
virtual std::any visitColumnTypeExprComplex(HogQLParser::ColumnTypeExprComplexContext *context) = 0;
virtual std::any visitColumnTypeExprParam(HogQLParser::ColumnTypeExprParamContext *context) = 0;
virtual std::any visitColumnExprList(HogQLParser::ColumnExprListContext *context) = 0;
virtual std::any visitColumnExprTernaryOp(HogQLParser::ColumnExprTernaryOpContext *context) = 0;
virtual std::any visitColumnExprAlias(HogQLParser::ColumnExprAliasContext *context) = 0;
virtual std::any visitColumnExprExtract(HogQLParser::ColumnExprExtractContext *context) = 0;
virtual std::any visitColumnExprNegate(HogQLParser::ColumnExprNegateContext *context) = 0;
virtual std::any visitColumnExprSubquery(HogQLParser::ColumnExprSubqueryContext *context) = 0;
virtual std::any visitColumnExprLiteral(HogQLParser::ColumnExprLiteralContext *context) = 0;
virtual std::any visitColumnExprArray(HogQLParser::ColumnExprArrayContext *context) = 0;
virtual std::any visitColumnExprSubstring(HogQLParser::ColumnExprSubstringContext *context) = 0;
virtual std::any visitColumnExprCast(HogQLParser::ColumnExprCastContext *context) = 0;
virtual std::any visitColumnExprOr(HogQLParser::ColumnExprOrContext *context) = 0;
virtual std::any visitColumnExprPrecedence1(HogQLParser::ColumnExprPrecedence1Context *context) = 0;
virtual std::any visitColumnExprPrecedence2(HogQLParser::ColumnExprPrecedence2Context *context) = 0;
virtual std::any visitColumnExprPrecedence3(HogQLParser::ColumnExprPrecedence3Context *context) = 0;
virtual std::any visitColumnExprInterval(HogQLParser::ColumnExprIntervalContext *context) = 0;
virtual std::any visitColumnExprIsNull(HogQLParser::ColumnExprIsNullContext *context) = 0;
virtual std::any visitColumnExprWinFunctionTarget(HogQLParser::ColumnExprWinFunctionTargetContext *context) = 0;
virtual std::any visitColumnExprTrim(HogQLParser::ColumnExprTrimContext *context) = 0;
virtual std::any visitColumnExprTuple(HogQLParser::ColumnExprTupleContext *context) = 0;
virtual std::any visitColumnExprArrayAccess(HogQLParser::ColumnExprArrayAccessContext *context) = 0;
virtual std::any visitColumnExprBetween(HogQLParser::ColumnExprBetweenContext *context) = 0;
virtual std::any visitColumnExprPropertyAccess(HogQLParser::ColumnExprPropertyAccessContext *context) = 0;
virtual std::any visitColumnExprParens(HogQLParser::ColumnExprParensContext *context) = 0;
virtual std::any visitColumnExprTimestamp(HogQLParser::ColumnExprTimestampContext *context) = 0;
virtual std::any visitColumnExprNullish(HogQLParser::ColumnExprNullishContext *context) = 0;
virtual std::any visitColumnExprAnd(HogQLParser::ColumnExprAndContext *context) = 0;
virtual std::any visitColumnExprTupleAccess(HogQLParser::ColumnExprTupleAccessContext *context) = 0;
virtual std::any visitColumnExprCase(HogQLParser::ColumnExprCaseContext *context) = 0;
virtual std::any visitColumnExprDate(HogQLParser::ColumnExprDateContext *context) = 0;
virtual std::any visitColumnExprNot(HogQLParser::ColumnExprNotContext *context) = 0;
virtual std::any visitColumnExprWinFunction(HogQLParser::ColumnExprWinFunctionContext *context) = 0;
virtual std::any visitColumnExprIdentifier(HogQLParser::ColumnExprIdentifierContext *context) = 0;
virtual std::any visitColumnExprFunction(HogQLParser::ColumnExprFunctionContext *context) = 0;
virtual std::any visitColumnExprAsterisk(HogQLParser::ColumnExprAsteriskContext *context) = 0;
virtual std::any visitColumnArgList(HogQLParser::ColumnArgListContext *context) = 0;
virtual std::any visitColumnArgExpr(HogQLParser::ColumnArgExprContext *context) = 0;
virtual std::any visitColumnLambdaExpr(HogQLParser::ColumnLambdaExprContext *context) = 0;
virtual std::any visitWithExprList(HogQLParser::WithExprListContext *context) = 0;
virtual std::any visitWithExprSubquery(HogQLParser::WithExprSubqueryContext *context) = 0;
virtual std::any visitWithExprColumn(HogQLParser::WithExprColumnContext *context) = 0;
virtual std::any visitColumnIdentifier(HogQLParser::ColumnIdentifierContext *context) = 0;
virtual std::any visitNestedIdentifier(HogQLParser::NestedIdentifierContext *context) = 0;
virtual std::any visitTableExprIdentifier(HogQLParser::TableExprIdentifierContext *context) = 0;
virtual std::any visitTableExprPlaceholder(HogQLParser::TableExprPlaceholderContext *context) = 0;
virtual std::any visitTableExprSubquery(HogQLParser::TableExprSubqueryContext *context) = 0;
virtual std::any visitTableExprAlias(HogQLParser::TableExprAliasContext *context) = 0;
virtual std::any visitTableExprFunction(HogQLParser::TableExprFunctionContext *context) = 0;
virtual std::any visitTableFunctionExpr(HogQLParser::TableFunctionExprContext *context) = 0;
virtual std::any visitTableIdentifier(HogQLParser::TableIdentifierContext *context) = 0;
virtual std::any visitTableArgList(HogQLParser::TableArgListContext *context) = 0;
virtual std::any visitDatabaseIdentifier(HogQLParser::DatabaseIdentifierContext *context) = 0;
virtual std::any visitFloatingLiteral(HogQLParser::FloatingLiteralContext *context) = 0;
virtual std::any visitNumberLiteral(HogQLParser::NumberLiteralContext *context) = 0;
virtual std::any visitLiteral(HogQLParser::LiteralContext *context) = 0;
virtual std::any visitInterval(HogQLParser::IntervalContext *context) = 0;
virtual std::any visitKeyword(HogQLParser::KeywordContext *context) = 0;
virtual std::any visitKeywordForAlias(HogQLParser::KeywordForAliasContext *context) = 0;
virtual std::any visitAlias(HogQLParser::AliasContext *context) = 0;
virtual std::any visitIdentifier(HogQLParser::IdentifierContext *context) = 0;
virtual std::any visitEnumValue(HogQLParser::EnumValueContext *context) = 0;
};

3
hogql_parser/README.md Normal file
View File

@ -0,0 +1,3 @@
# HogQL Parser
Blazing fast HogQL parsing. This package can only work in the context of the PostHog Django app, as it imports from `posthog.hogql`.

18
hogql_parser/__init__.pyi Normal file
View File

@ -0,0 +1,18 @@
from posthog.hogql.ast import SelectQuery, SelectUnionQuery
from posthog.hogql.base import AST
def parse_expr(expr: str, /) -> AST:
"""Parse the HogQL expression string into an AST"""
...
def parse_order_expr(expr: str, /) -> AST:
"""Parse the ORDER BY clause string into an AST"""
...
def parse_select(expr: str, /) -> SelectQuery | SelectUnionQuery:
"""Parse the HogQL SELECT statement string into an AST"""
...
def unquote_string(value: str, /) -> str:
"""Unquote the string (an identifier or a string literal)"""
...

15
hogql_parser/error.cpp Normal file
View File

@ -0,0 +1,15 @@
#include "error.h"
using namespace std;
#define EXCEPTION_CLASS_IMPLEMENTATION(NAME, BASE) \
NAME::NAME(const string& message, size_t start, size_t end) : BASE(message), start(start), end(end) {} \
NAME::NAME(const char* message, size_t start, size_t end) : BASE(message), start(start), end(end) {} \
NAME::NAME(const string& message) : BASE(message), start(0), end(0) {} \
NAME::NAME(const char* message) : BASE(message), start(0), end(0) {}
EXCEPTION_CLASS_IMPLEMENTATION(HogQLException, runtime_error)
EXCEPTION_CLASS_IMPLEMENTATION(HogQLSyntaxException, HogQLException)
EXCEPTION_CLASS_IMPLEMENTATION(HogQLNotImplementedException, HogQLException)
EXCEPTION_CLASS_IMPLEMENTATION(HogQLParsingException, HogQLException)

26
hogql_parser/error.h Normal file
View File

@ -0,0 +1,26 @@
#pragma once
#include <stdexcept>
#include <string>
#define EXCEPTION_CLASS_DEFINITION(NAME, BASE) \
class NAME : public BASE { \
public: \
size_t start; \
size_t end; \
explicit NAME(const std::string& message, size_t start, size_t end); \
explicit NAME(const char* message, size_t start, size_t end); \
explicit NAME(const std::string& message); \
explicit NAME(const char* message); \
};
EXCEPTION_CLASS_DEFINITION(HogQLException, std::runtime_error)
// The input does not conform to HogQL syntax.
EXCEPTION_CLASS_DEFINITION(HogQLSyntaxException, HogQLException)
// This feature isn't implemented in HogQL (yet).
EXCEPTION_CLASS_DEFINITION(HogQLNotImplementedException, HogQLException)
// An internal problem in the parser layer.
EXCEPTION_CLASS_DEFINITION(HogQLParsingException, HogQLException)

1360
hogql_parser/parser.cpp Normal file

File diff suppressed because it is too large Load Diff

11
hogql_parser/parser.h Normal file
View File

@ -0,0 +1,11 @@
#define PY_SSIZE_T_CLEAN
#include <Python.h>
// MODULE STATE
// Module state, primarily for storing references to Python objects used throughout the parser (such as imports)
typedef struct {
PyObject* ast_module;
PyObject* base_module;
PyObject* errors_module;
} parser_state;

0
hogql_parser/py.typed Normal file
View File

View File

@ -0,0 +1,45 @@
[tool.black]
line-length = 120
target-version = ['py310']
[tool.cibuildwheel]
build = [ # Build CPython wheels on Linux and macOS, for x86 as well as ARM
"cp3*-macosx_x86_64",
"cp3*-macosx_arm64",
"cp3*-manylinux_x86_64",
"cp3*-manylinux_aarch64",
]
build-frontend = "build" # This is successor to building with pip
[tool.cibuildwheel.macos]
archs = [ # We could also build a universal wheel, but separate ones are lighter individually
"x86_64",
"arm64",
]
before-build = [ # We need to install the libraries for each architecture separately
"brew uninstall --force boost antlr4-cpp-runtime",
"brew fetch --force --bottle-tag=${ARCHFLAGS##'-arch '}_monterey boost antlr4-cpp-runtime",
"brew install $(brew --cache --bottle-tag=${ARCHFLAGS##'-arch '}_monterey boost antlr4-cpp-runtime)",
]
[tool.cibuildwheel.linux]
before-all = [
# manylinux_2_28 is based on AlmaLinux 8, which uses Fedora's dnf as its package manager
"dnf install -y boost-devel unzip cmake curl uuid pkg-config",
"curl https://www.antlr.org/download/antlr4-cpp-runtime-4.13.0-source.zip --output antlr4-source.zip",
# Check that the downloaded archive is the expected runtime - a security measure
"anltr_known_md5sum=\"ff214b65fb02e150b4f515d7983bca92\"",
"antlr_found_ms5sum=\"$(md5sum antlr4-source.zip | cut -d' ' -f1)\"",
'if [[ "$anltr_known_md5sum" != "$antlr_found_ms5sum" ]]; then exit 64; fi',
"unzip antlr4-source.zip -d antlr4-source && cd antlr4-source",
"cmake .",
"DESTDIR=out make install",
"cp -r out/usr/local/include/antlr4-runtime /usr/include/",
"cp out/usr/local/lib64/libantlr4-runtime.so* /usr/lib64/",
"ldconfig",
]
archs = [
"native", # We run x86_64 and aarch64 as separate CI jobs, and we want native in each case as emulation is slow
]
manylinux-x86_64-image = "manylinux_2_28"
manylinux-aarch64-image = "manylinux_2_28"

57
hogql_parser/setup.py Normal file
View File

@ -0,0 +1,57 @@
from setuptools import setup, Extension
import platform
system = platform.system()
if system not in ("Darwin", "Linux"):
raise Exception("Only Linux and macOS are supported by hogql_parser")
is_macos = system == "Darwin"
homebrew_location = "/opt/homebrew" if platform.machine() == "arm64" else "/usr/local"
module = Extension(
"hogql_parser",
sources=[
"HogQLLexer.cpp",
"HogQLParser.cpp",
"HogQLParserBaseVisitor.cpp",
"HogQLParserVisitor.cpp",
"error.cpp",
"string.cpp",
"parser.cpp",
],
include_dirs=[
f"{homebrew_location}/include/",
f"{homebrew_location}/include/antlr4-runtime/",
]
if is_macos
else ["/usr/include/", "/usr/include/antlr4-runtime/"],
library_dirs=[f"{homebrew_location}/lib/"] if is_macos else ["/usr/lib/", "/usr/lib64/"],
libraries=["antlr4-runtime"],
extra_compile_args=["-std=c++20"],
)
setup(
name="hogql_parser",
version="0.1.7",
url="https://github.com/PostHog/posthog/tree/master/hogql_parser",
author="PostHog Inc.",
author_email="hey@posthog.com",
maintainer="PostHog Inc.",
maintainer_email="hey@posthog.com",
description="HogQL parser for internal PostHog use",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
package_data={"hogql_parser": ["__init__.pyi", "py.typed"]},
ext_modules=[module],
python_requires=">=3.10",
classifiers=[
"Development Status :: 5 - Production/Stable",
"License :: OSI Approved :: MIT License",
"Operating System :: MacOS",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
],
)

56
hogql_parser/string.cpp Normal file
View File

@ -0,0 +1,56 @@
#include <boost/algorithm/string.hpp>
#include "error.h"
#include "string.h"
using namespace std;
string unquote_string(string text) {
size_t original_text_size = text.size();
if (original_text_size == 0) {
throw HogQLParsingException("Encountered an unexpected empty string input");
}
const char first_char = text.front();
const char last_char = text.back();
if (first_char == '\'' && last_char == '\'') {
text = text.substr(1, original_text_size - 2);
boost::replace_all(text, "''", "'");
boost::replace_all(text, "\\'", "'");
} else if (first_char == '"' && last_char == '"') {
text = text.substr(1, original_text_size - 2);
boost::replace_all(text, "\"\"", "\"");
boost::replace_all(text, "\\\"", "\"");
} else if (first_char == '`' && last_char == '`') {
text = text.substr(1, original_text_size - 2);
boost::replace_all(text, "``", "`");
boost::replace_all(text, "\\`", "`");
} else if (first_char == '{' && last_char == '}') {
text = text.substr(1, original_text_size - 2);
boost::replace_all(text, "{{", "{");
boost::replace_all(text, "\\{", "{");
} else {
throw HogQLSyntaxException("Invalid string literal, must start and end with the same quote type: " + text);
}
// Copied from clickhouse_driver/util/escape.py
boost::replace_all(text, "\\a", "\a");
boost::replace_all(text, "\\b", "\b");
boost::replace_all(text, "\\f", "\f");
boost::replace_all(text, "\\n", "\n");
boost::replace_all(text, "\\r", "\r");
boost::replace_all(text, "\\t", "\t");
boost::replace_all(text, "\\v", "\v");
boost::replace_all(text, "\\0", ""); // NUL characters are ignored
boost::replace_all(text, "\\\\", "\\");
return text;
}
string unquote_string_terminal(antlr4::tree::TerminalNode* node) {
string text = node->getText();
try {
return unquote_string(text);
} catch (HogQLException& e) {
throw HogQLSyntaxException(e.what(), node->getSymbol()->getStartIndex(), node->getSymbol()->getStopIndex() + 1);
}
}

9
hogql_parser/string.h Normal file
View File

@ -0,0 +1,9 @@
#pragma once
#include <string>
#include "antlr4-runtime.h"
std::string unquote_string(std::string text);
std::string unquote_string_terminal(antlr4::tree::TerminalNode* node);

View File

@ -39,7 +39,9 @@
"schema:build": "pnpm run schema:build:json && pnpm run schema:build:python",
"schema:build:json": "ts-json-schema-generator -f tsconfig.json --path 'frontend/src/*.ts' --type 'QuerySchema' --no-type-check > frontend/src/queries/schema.json && prettier --write frontend/src/queries/schema.json",
"schema:build:python": "datamodel-codegen --collapse-root-models --disable-timestamp --use-one-literal-as-default --use-default-kwarg --use-subclass-enum --input frontend/src/queries/schema.json --input-file-type jsonschema --output posthog/schema.py --output-model-type pydantic_v2.BaseModel && black posthog/schema.py",
"grammar:build": "cd posthog/hogql/grammar && antlr -Dlanguage=Python3 HogQLLexer.g4 && antlr -visitor -no-listener -Dlanguage=Python3 HogQLParser.g4",
"grammar:build": "npm run grammar:build:python && npm run grammar:build:cpp",
"grammar:build:python": "cd posthog/hogql/grammar && antlr -Dlanguage=Python3 HogQLLexer.g4 && antlr -visitor -no-listener -Dlanguage=Python3 HogQLParser.g4",
"grammar:build:cpp": "cd posthog/hogql/grammar && antlr -o ../../../hogql_parser -Dlanguage=Cpp HogQLLexer.g4 && antlr -o ../../../hogql_parser -visitor -no-listener -Dlanguage=Cpp HogQLParser.g4",
"packages:build": "pnpm packages:build:apps-common && pnpm packages:build:lemon-ui",
"packages:build:apps-common": "cd frontend/@posthog/apps-common && pnpm i && pnpm build",
"packages:build:lemon-ui": "cd frontend/@posthog/lemon-ui && pnpm i && pnpm build",

View File

@ -12,3 +12,6 @@ max_line_length = 120
[*.md]
trim_trailing_whitespace = false
[*.{yml,yaml}]
indent_size = 2

View File

@ -23,7 +23,7 @@ class HogQLException(Exception):
class SyntaxException(HogQLException):
"""Invalid HogQL syntax."""
"""The input does not conform to HogQL syntax."""
pass
@ -40,6 +40,12 @@ class NotImplementedException(HogQLException):
pass
class ParsingException(HogQLException):
"""An internal problem in the parser layer."""
pass
class ResolverException(HogQLException):
"""An internal problem in the resolver layer."""

View File

@ -1,6 +1,6 @@
from antlr4 import ParserRuleContext
from posthog.hogql.errors import HogQLException
from posthog.hogql.errors import SyntaxException
def parse_string(text: str) -> str:
@ -22,7 +22,7 @@ def parse_string(text: str) -> str:
text = text.replace("{{", "{")
text = text.replace("\\{", "{")
else:
raise HogQLException(f"Invalid string literal, must start and end with the same quote type: {text}")
raise SyntaxException(f"Invalid string literal, must start and end with the same quote type: {text}")
# copied from clickhouse_driver/util/escape.py
text = text.replace("\\b", "\b")
@ -30,7 +30,7 @@ def parse_string(text: str) -> str:
text = text.replace("\\r", "\r")
text = text.replace("\\n", "\n")
text = text.replace("\\t", "\t")
text = text.replace("\\0", "\0")
text = text.replace("\\0", "") # NUL characters are ignored
text = text.replace("\\a", "\a")
text = text.replace("\\v", "\v")
text = text.replace("\\\\", "\\")

View File

@ -12,6 +12,24 @@ from posthog.hogql.grammar.HogQLParser import HogQLParser
from posthog.hogql.parse_string import parse_string, parse_string_literal
from posthog.hogql.placeholders import replace_placeholders
from posthog.hogql.timings import HogQLTimings
from hogql_parser import (
parse_expr as _parse_expr_cpp,
parse_order_expr as _parse_order_expr_cpp,
parse_select as _parse_select_cpp,
)
RULE_TO_PARSE_FUNCTION = {
"python": {
"expr": lambda string, start: HogQLParseTreeConverter(start=start).visit(get_parser(string).expr()),
"order_expr": lambda string: HogQLParseTreeConverter().visit(get_parser(string).orderExpr()),
"select": lambda string: HogQLParseTreeConverter().visit(get_parser(string).select()),
},
"cpp": {
"expr": lambda string, _: _parse_expr_cpp(string), # The start arg is ignored in the C++ version
"order_expr": lambda string: _parse_order_expr_cpp(string),
"select": lambda string: _parse_select_cpp(string),
},
}
def parse_expr(
@ -19,12 +37,13 @@ def parse_expr(
placeholders: Optional[Dict[str, ast.Expr]] = None,
start: Optional[int] = 0,
timings: Optional[HogQLTimings] = None,
*,
backend: Literal["python", "cpp"] = "python",
) -> ast.Expr:
if timings is None:
timings = HogQLTimings()
with timings.measure("parse_expr"):
parse_tree = get_parser(expr).expr()
node = HogQLParseTreeConverter(start=start).visit(parse_tree)
with timings.measure(f"parse_expr_{backend}"):
node = RULE_TO_PARSE_FUNCTION[backend]["expr"](expr, start)
if placeholders:
with timings.measure("replace_placeholders"):
return replace_placeholders(node, placeholders)
@ -32,13 +51,16 @@ def parse_expr(
def parse_order_expr(
order_expr: str, placeholders: Optional[Dict[str, ast.Expr]] = None, timings: Optional[HogQLTimings] = None
order_expr: str,
placeholders: Optional[Dict[str, ast.Expr]] = None,
timings: Optional[HogQLTimings] = None,
*,
backend: Literal["python", "cpp"] = "python",
) -> ast.Expr:
if timings is None:
timings = HogQLTimings()
with timings.measure("parse_order_expr"):
parse_tree = get_parser(order_expr).orderExpr()
node = HogQLParseTreeConverter().visit(parse_tree)
with timings.measure(f"parse_order_expr_{backend}"):
node = RULE_TO_PARSE_FUNCTION[backend]["order_expr"](order_expr)
if placeholders:
with timings.measure("replace_placeholders"):
return replace_placeholders(node, placeholders)
@ -46,13 +68,16 @@ def parse_order_expr(
def parse_select(
statement: str, placeholders: Optional[Dict[str, ast.Expr]] = None, timings: Optional[HogQLTimings] = None
statement: str,
placeholders: Optional[Dict[str, ast.Expr]] = None,
timings: Optional[HogQLTimings] = None,
*,
backend: Literal["python", "cpp"] = "python",
) -> ast.SelectQuery | ast.SelectUnionQuery:
if timings is None:
timings = HogQLTimings()
with timings.measure("parse_select"):
parse_tree = get_parser(statement).select()
node = HogQLParseTreeConverter().visit(parse_tree)
with timings.measure(f"parse_select_{backend}"):
node = RULE_TO_PARSE_FUNCTION[backend]["select"](statement)
if placeholders:
with timings.measure("replace_placeholders"):
node = replace_placeholders(node, placeholders)
@ -166,7 +191,7 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
if ctx.arrayJoinClause():
array_join_clause = ctx.arrayJoinClause()
if select_query.select_from is None:
raise HogQLException("Using ARRAY JOIN without a FROM clause is not permitted")
raise SyntaxException("Using ARRAY JOIN without a FROM clause is not permitted")
if array_join_clause.LEFT():
select_query.array_join_op = "LEFT ARRAY JOIN"
elif array_join_clause.INNER():
@ -176,7 +201,7 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
select_query.array_join_list = self.visit(array_join_clause.columnExprList())
for expr in select_query.array_join_list:
if not isinstance(expr, ast.Alias):
raise HogQLException("ARRAY JOIN arrays must have an alias", start=expr.start, end=expr.end)
raise SyntaxException("ARRAY JOIN arrays must have an alias", start=expr.start, end=expr.end)
if ctx.topClause():
raise NotImplementedException(f"Unsupported: SelectStmt.topClause()")
@ -301,7 +326,7 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
def visitJoinOpFull(self, ctx: HogQLParser.JoinOpFullContext):
tokens = []
if ctx.LEFT():
if ctx.FULL():
tokens.append("FULL")
if ctx.OUTER():
tokens.append("OUTER")
@ -421,6 +446,7 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
)
def visitColumnExprAlias(self, ctx: HogQLParser.ColumnExprAliasContext):
alias: str
if ctx.alias():
alias = self.visit(ctx.alias())
elif ctx.identifier():
@ -431,8 +457,8 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
raise NotImplementedException(f"Must specify an alias")
expr = self.visit(ctx.columnExpr())
if alias in RESERVED_KEYWORDS:
raise HogQLException(f"Alias '{alias}' is a reserved keyword")
if alias.lower() in RESERVED_KEYWORDS:
raise SyntaxException(f'"{alias}" cannot be an alias or identifier, as it\'s a reserved keyword')
return ast.Alias(expr=expr, alias=alias)
@ -749,9 +775,9 @@ class HogQLParseTreeConverter(ParseTreeVisitor):
return ast.Placeholder(field=parse_string_literal(ctx.PLACEHOLDER()))
def visitTableExprAlias(self, ctx: HogQLParser.TableExprAliasContext):
alias = self.visit(ctx.alias() or ctx.identifier())
if alias in RESERVED_KEYWORDS:
raise HogQLException(f"Alias '{alias}' is a reserved keyword")
alias: str = self.visit(ctx.alias() or ctx.identifier())
if alias.lower() in RESERVED_KEYWORDS:
raise SyntaxException(f'"{alias}" cannot be an alias or identifier, as it\'s a reserved keyword')
table = self.visit(ctx.tableExpr())
if isinstance(table, ast.JoinExpr):
table.alias = alias

View File

View File

@ -0,0 +1,61 @@
from typing import Literal
from posthog.hogql.errors import SyntaxException
from posthog.hogql.parse_string import parse_string as parse_string_py
from hogql_parser import unquote_string as unquote_string_cpp
from posthog.test.base import BaseTest
def parse_string_test_factory(backend: Literal["python", "cpp"]):
parse_string = parse_string_py if backend == "python" else unquote_string_cpp
class TestParseString(BaseTest):
def test_quote_types(self):
self.assertEqual(parse_string("`asd`"), "asd")
self.assertEqual(parse_string("'asd'"), "asd")
self.assertEqual(parse_string('"asd"'), "asd")
self.assertEqual(parse_string("{asd}"), "asd")
def test_escaped_quotes(self):
self.assertEqual(parse_string("`a``sd`"), "a`sd")
self.assertEqual(parse_string("'a''sd'"), "a'sd")
self.assertEqual(parse_string('"a""sd"'), 'a"sd')
self.assertEqual(parse_string("{a{{sd}"), "a{sd")
self.assertEqual(parse_string("{a}sd}"), "a}sd")
def test_escaped_quotes_slash(self):
self.assertEqual(parse_string("`a\\`sd`"), "a`sd")
self.assertEqual(parse_string("'a\\'sd'"), "a'sd")
self.assertEqual(parse_string('"a\\"sd"'), 'a"sd')
self.assertEqual(parse_string("{a\\{sd}"), "a{sd")
def test_slash_escape(self):
self.assertEqual(parse_string("`a\nsd`"), "a\nsd")
self.assertEqual(parse_string("`a\\bsd`"), "a\bsd")
self.assertEqual(parse_string("`a\\fsd`"), "a\fsd")
self.assertEqual(parse_string("`a\\rsd`"), "a\rsd")
self.assertEqual(parse_string("`a\\nsd`"), "a\nsd")
self.assertEqual(parse_string("`a\\tsd`"), "a\tsd")
self.assertEqual(parse_string("`a\\asd`"), "a\asd")
self.assertEqual(parse_string("`a\\vsd`"), "a\vsd")
self.assertEqual(parse_string("`a\\\\sd`"), "a\\sd")
self.assertEqual(parse_string("`a\\0sd`"), "asd")
def test_slash_escape_not_escaped(self):
self.assertEqual(parse_string("`a\\xsd`"), "a\\xsd")
self.assertEqual(parse_string("`a\\ysd`"), "a\\ysd")
self.assertEqual(parse_string("`a\\osd`"), "a\\osd")
def test_slash_escape_slash_multiple(self):
self.assertEqual(parse_string("`a\\\\nsd`"), "a\\\nsd")
self.assertEqual(parse_string("`a\\\\n\\sd`"), "a\\\n\\sd")
self.assertEqual(parse_string("`a\\\\n\\\\tsd`"), "a\\\n\\\tsd")
def test_raises_on_mismatched_quotes(self):
self.assertRaisesMessage(
SyntaxException,
"Invalid string literal, must start and end with the same quote type: `asd'",
parse_string,
"`asd'",
)
return TestParseString

File diff suppressed because it is too large Load Diff

View File

@ -84,7 +84,7 @@ class TestMetadata(ClickhouseTestMixin, APIBaseTest):
"inputSelect": None,
"errors": [
{
"message": "Alias 'true' is a reserved keyword",
"message": '"true" cannot be an alias or identifier, as it\'s a reserved keyword',
"start": 0,
"end": 9,
"fix": None,

View File

@ -1,45 +0,0 @@
from posthog.hogql.parse_string import parse_string
from posthog.test.base import BaseTest
class TestParseString(BaseTest):
def test_quote_types(self):
self.assertEqual(parse_string("`asd`"), "asd")
self.assertEqual(parse_string("'asd'"), "asd")
self.assertEqual(parse_string('"asd"'), "asd")
self.assertEqual(parse_string("{asd}"), "asd")
def test_escaped_quotes(self):
self.assertEqual(parse_string("`a``sd`"), "a`sd")
self.assertEqual(parse_string("'a''sd'"), "a'sd")
self.assertEqual(parse_string('"a""sd"'), 'a"sd')
self.assertEqual(parse_string("{a{{sd}"), "a{sd")
self.assertEqual(parse_string("{a}sd}"), "a}sd")
def test_escaped_quotes_slash(self):
self.assertEqual(parse_string("`a\\`sd`"), "a`sd")
self.assertEqual(parse_string("'a\\'sd'"), "a'sd")
self.assertEqual(parse_string('"a\\"sd"'), 'a"sd')
self.assertEqual(parse_string("{a\\{sd}"), "a{sd")
def test_slash_escape(self):
self.assertEqual(parse_string("`a\nsd`"), "a\nsd")
self.assertEqual(parse_string("`a\\bsd`"), "a\bsd")
self.assertEqual(parse_string("`a\\fsd`"), "a\fsd")
self.assertEqual(parse_string("`a\\rsd`"), "a\rsd")
self.assertEqual(parse_string("`a\\nsd`"), "a\nsd")
self.assertEqual(parse_string("`a\\tsd`"), "a\tsd")
self.assertEqual(parse_string("`a\\0sd`"), "a\0sd")
self.assertEqual(parse_string("`a\\asd`"), "a\asd")
self.assertEqual(parse_string("`a\\vsd`"), "a\vsd")
self.assertEqual(parse_string("`a\\\\sd`"), "a\\sd")
def test_slash_escape_not_escaped(self):
self.assertEqual(parse_string("`a\\xsd`"), "a\\xsd")
self.assertEqual(parse_string("`a\\ysd`"), "a\\ysd")
self.assertEqual(parse_string("`a\\osd`"), "a\\osd")
def test_slash_escape_slash_multiple(self):
self.assertEqual(parse_string("`a\\\\nsd`"), "a\\\nsd")
self.assertEqual(parse_string("`a\\\\n\\sd`"), "a\\\n\\sd")
self.assertEqual(parse_string("`a\\\\n\\\\tsd`"), "a\\\n\\\tsd")

View File

@ -0,0 +1,5 @@
from ._test_parse_string import parse_string_test_factory
class TestParseStringPython(parse_string_test_factory("cpp")):
pass

View File

@ -0,0 +1,5 @@
from ._test_parse_string import parse_string_test_factory
class TestParseStringPython(parse_string_test_factory("python")):
pass

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,5 @@
from ._test_parser import parser_test_factory
class TestParserCpp(parser_test_factory("cpp")):
pass

View File

@ -0,0 +1,5 @@
from ._test_parser import parser_test_factory
class TestParserPython(parser_test_factory("python")):
pass

View File

@ -357,9 +357,13 @@ class TestPrinter(BaseTest):
self.assertEqual(context.values, {"hogql_val_0": "E", "hogql_val_1": "lol", "hogql_val_2": "hoo"})
def test_alias_keywords(self):
self._assert_expr_error("1 as team_id", "Alias 'team_id' is a reserved keyword")
self._assert_expr_error("1 as true", "Alias 'true' is a reserved keyword")
self._assert_select_error("select 1 as team_id from events", "Alias 'team_id' is a reserved keyword")
self._assert_expr_error(
"1 as team_id", '"team_id" cannot be an alias or identifier, as it\'s a reserved keyword'
)
self._assert_expr_error("1 as true", '"true" cannot be an alias or identifier, as it\'s a reserved keyword')
self._assert_select_error(
"select 1 as team_id from events", '"team_id" cannot be an alias or identifier, as it\'s a reserved keyword'
)
self.assertEqual(
self._select("select 1 as `-- select team_id` from events"),
f"SELECT 1 AS `-- select team_id` FROM events WHERE equals(events.team_id, {self.team.pk}) LIMIT 10000",

View File

View File

@ -86,4 +86,5 @@ more-itertools==9.0.0
django-two-factor-auth==1.14.0
phonenumberslite==8.13.6
openai==0.27.8
nh3==0.2.14
nh3==0.2.14
hogql-parser==0.1.7

View File

@ -248,6 +248,8 @@ gunicorn==20.1.0
# via -r requirements.in
h11==0.13.0
# via wsproto
hogql-parser==0.1.7
# via -r requirements.in
idna==2.8
# via
# -r requirements.in