--------- Co-authored-by: Nikita Vorobev <nikitaevg@google.com>
4.5 KiB
Developing hogql-parser
Mandatory reading
If you're new to Python C/C++ extensions, there are some things you must have in mind. The Python/C API Reference Manual is worth a read as a whole.
The three pages below are must-reads though. They're key to writing production-ready code:
Objects, Types and Reference Counts
Key takeaways:
Py_INCREF()
andPy_DECREF()
need to be used accurately, or there'll be memory leaks (or, less likely, segfaults). This also applies to early exits, such as these caused by an error.Py_None
,Py_True
, andPy_False
are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them inPy_NewRef()
.- Pretty much only
PyList_SET_ITEM()
steals references (i.e. assumes ownership of objects passed into it) - if you pass an object into any other function and no longer need it after that, remember toPy_DECREF
it!
Exception Handling
Key takeaways:
- If a Python exception has been raised, the module method that was called from Python must stop execution and return
NULL
immediately.In
HogQLParseTreeConverter
, we are able to use C++ exceptions: throwingSyntaxException
,NotImplementedException
, orParsingException
results in the same exception being raised in Python as expected. Note that if avisitAsFoo
call throws an exception and there arePyObject*
s in scope, we have to remember about cleaning up their refcounts. At such call sites, atry {} catch (...) {}
block is appropriate. - For all Python/C API calls returning
PyObject*
, make sureNULL
wasn't returned - if it was, then something failed and the Python runtime has already set an exception (e.g. aMemoryError
). The same applies to calls returningint
- there the error value is-1
. Exception: inPyArg_Foo
functions failure is signaled by0
and success by1
.In
HogQLParseTreeConverter
, these internal Python failures are handled simply by throwingPyInternalException
.
Building Values
Key takeaways:
- Use
Py_BuildValue()
for building tuples, dicts, and lists of static size. Use type-specific functions (e.g.PyUnicode_FromString()
orPyList_New()
) otherwise. str
-building withs
involvesstrlen
, whiles#
doesn't - it's better to use the latter with C++ strings.object
-passing withO
increments the object's refcount, while doing it withN
doesn't - we should useN
pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing).
Conventions
- Use
snake_case
. ANTLR iscamelCase
-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky. - Use the
auto
type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise, specify the type explicitly. - Stay out of Python land as long as possible. E.g. avoid using
PyObject*
s` for bools or strings. Do use Python for parsing numbers though - that way we don't need to consider integer overflow. - If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land.
E.g. it doesn't make sense to create a
vector<PyObject*>
, that should just be aPyObject*
of Python typelist
.
How to develop locally on macOS
-
Install libraries:
brew install boost antlr4-cpp-runtime
-
Install
hogql_parser
by building from local sources:pip install ./hogql_parser
If you're getting compilation errors like this on macOS Sonoma:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/cstring:66:5: error: <cstring> tried including <string.h> but didn't find libc++'s <string.h> header.
Then you may need to remove Xcode Command Line Tools:
sudo rm -rf /Library/Developer/CommandLineTools
-
If you now run tests, the locally-built version of
hogql_parser
will be used:pytest posthog/hogql/
How to install dependencies on Ubuntu
Antlr runtime provided in Ubuntu packages might be of an older version, which results in compilation errors.
In that case run commands from this step.