0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-24 09:14:46 +01:00
posthog/hogql_parser/CONTRIBUTING.md
Thomas Obermüller 1812d8a523
feat(hogql): support join USING (replaces #21916) (#22378)
---------

Co-authored-by: Nikita Vorobev <nikitaevg@google.com>
2024-05-21 20:46:36 +02:00

76 lines
4.5 KiB
Markdown

# Developing `hogql-parser`
## Mandatory reading
If you're new to Python C/C++ extensions, there are some things you must have in mind. The [Python/C API Reference Manual](https://docs.python.org/3/c-api/index.html) is worth a read as a whole.
The three pages below are must-reads though. They're key to writing production-ready code:
### [Objects, Types and Reference Counts](https://docs.python.org/3/c-api/intro.html#objects-types-and-reference-counts)
Key takeaways:
1. `Py_INCREF()` and `Py_DECREF()` need to be used accurately, or there'll be memory leaks (or, less likely, segfaults). This also applies to early exits, such as these caused by an error.
1. `Py_None`, `Py_True`, and `Py_False` are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them in `Py_NewRef()`.
1. Pretty much only `PyList_SET_ITEM()` _steals_ references (i.e. assumes ownership of objects passed into it) - if you pass an object into any other function and no longer need it after that, remember to `Py_DECREF` it!
### [Exception Handling](https://docs.python.org/3/c-api/exceptions.html)
Key takeaways:
1. If a Python exception has been raised, the module method that was called from Python must stop execution and return `NULL` immediately.
> In `HogQLParseTreeConverter`, we are able to use C++ exceptions: throwing `SyntaxException`,
> `NotImplementedException`, or `ParsingException` results in the same exception being raised in Python as
> expected. Note that if a `visitAsFoo` call throws an exception and there are `PyObject*`s in scope, we have to
> remember about cleaning up their refcounts. At such call sites, a `try {} catch (...) {}` block is appropriate.
1. For all Python/C API calls returning `PyObject*`, make sure `NULL` wasn't returned - if it was, then something failed and the Python runtime has already set an exception (e.g. a `MemoryError`). The same applies to calls returning `int` - there the error value is `-1`. Exception: in `PyArg_Foo` functions failure is signaled by `0` and success by `1`.
> In `HogQLParseTreeConverter`, these internal Python failures are handled simply by throwing
> `PyInternalException`.
### [Building Values](https://docs.python.org/3/c-api/arg.html#building-values)
Key takeaways:
1. Use `Py_BuildValue()` for building tuples, dicts, and lists of static size. Use type-specific functions (e.g. `PyUnicode_FromString()` or `PyList_New()`) otherwise.
1. `str`-building with `s` involves `strlen`, while `s#` doesn't - it's better to use the latter with C++ strings.
1. `object`-passing with `O` increments the object's refcount, while doing it with `N` doesn't - we should use `N` pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing).
## Conventions
1. Use `snake_case`. ANTLR is `camelCase`-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky.
2. Use the `auto` type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise, specify the type explicitly.
3. Stay out of Python land as long as possible. E.g. avoid using `PyObject*`s` for bools or strings.
Do use Python for parsing numbers though - that way we don't need to consider integer overflow.
4. If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land.
E.g. it doesn't make sense to create a `vector<PyObject*>`, that should just be a `PyObject*` of Python type `list`.
## How to develop locally on macOS
1. Install libraries:
```bash
brew install boost antlr4-cpp-runtime
```
1. Install `hogql_parser` by building from local sources:
```bash
pip install ./hogql_parser
```
> If you're getting compilation errors like this on macOS Sonoma:
> `/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/cstring:66:5: error: <cstring> tried including <string.h> but didn't find libc++'s <string.h> header.`
> Then you may need to remove Xcode Command Line Tools:
> `sudo rm -rf /Library/Developer/CommandLineTools`
1. If you now run tests, the locally-built version of `hogql_parser` will be used:
```bash
pytest posthog/hogql/
```
## How to install dependencies on Ubuntu
Antlr runtime provided in Ubuntu packages might be of an older version, which results in compilation errors.
In that case run commands from [this step](https://github.com/PostHog/posthog/blob/4fba6a63e351131fdb27b85e7ba436446fdb3093/.github/actions/run-backend-tests/action.yml#L100).