* chore(hogql): Be defensive against NULLs in the C++ parser * Clean up on C++ exceptions * Add to CONTRIBUTING guide * Revert `AllowShortFunctionsOnASingleLine` change * Update HogQLX additions too * Bump version to 1.0.0 * Use new hogql-parser version --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
3.9 KiB
Developing hogql-parser
Mandatory reading
If you're new to Python C/C++ extensions, there are some things you must have in mind. The Python/C API Reference Manual is worth a read as a whole.
The three pages below are must-reads though. They're key to writing production-ready code:
Objects, Types and Reference Counts
Key takeaways:
Py_INCREF()
andPy_DECREF()
need to be used accurately, or there'll be memory leaks (or, less likely, segfaults). This also applies to early exits, such as these caused by an error.Py_None
,Py_True
, andPy_False
are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them inPy_NewRef()
.- Pretty much only
PyList_SET_ITEM()
steals references (i.e. assumes ownership of objects passed into it) - if you pass an object into any other function and no longer need it after that, remember toPy_DECREF
it!
Exception Handling
Key takeaways:
- If a Python exception has been raised, the module method that was called from Python must stop execution and return
NULL
immediately.In
HogQLParseTreeConverter
, we are able to use C++ exceptions: throwingSyntaxException
,NotImplementedException
, orParsingException
results in the same exception being raised in Python as expected. Note that if avisitAsFoo
call throws an exception and there arePyObject*
s in scope, we have to remember about cleaning up their refcounts. At such call sites, atry {} catch (...) {}
block is appropriate. - For all Python/C API calls returning
PyObject*
, make sureNULL
wasn't returned - if it was, then something failed and the Python runtime has already set an exception (e.g. aMemoryError
). The same applies to calls returningint
- there the error value is-1
. Exception: inPyArg_Foo
functions failure is signaled by0
and success by1
.In
HogQLParseTreeConverter
, these internal Python failures are handled simply by throwingPyInternalException
.
Building Values
Key takeaways:
- Use
Py_BuildValue()
for building tuples, dicts, and lists of static size. Use type-specific functions (e.g.PyUnicode_FromString()
orPyList_New()
) otherwise. str
-building withs
involvesstrlen
, whiles#
doesn't - it's better to use the latter with C++ strings.object
-passing withO
increments the object's refcount, while doing it withN
doesn't - we should useN
pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing).
Conventions
- Use
snake_case
. ANTLR iscamelCase
-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky. - Use the
auto
type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise specify the type explictly. - Stay out of Python land as long as possible. E.g. avoid using
PyObject*
s` for bools or strings. Do use Python for parsing numbers though - that way we don't need to consider integer overflow. - If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land.
E.g. it doesn't make sense to create a
vector<PyObject*>
, that should just be aPyObject*
of Python typelist
.
How to develop locally on macOS
-
Install libraries:
brew install boost antlr4-cpp-runtime
-
Install
hogql_parser
by building from local sources:pip install ./hogql_parser
-
If you now run tests, the locally-built version of
hogql_parser
will be used:pytest posthog/hogql/