0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-25 11:17:50 +01:00
posthog/hogql_parser/CONTRIBUTING.md
Michael Matloka 8f62e85c9d
chore(hogql): Full error & NULL handling in C++ parser (1.0.0) (#18240)
* chore(hogql): Be defensive against NULLs in the C++ parser

* Clean up on C++ exceptions

* Add to CONTRIBUTING guide

* Revert `AllowShortFunctionsOnASingleLine` change

* Update HogQLX additions too

* Bump version to 1.0.0

* Use new hogql-parser version

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
2023-10-27 13:37:15 +02:00

3.9 KiB

Developing hogql-parser

Mandatory reading

If you're new to Python C/C++ extensions, there are some things you must have in mind. The Python/C API Reference Manual is worth a read as a whole.

The three pages below are must-reads though. They're key to writing production-ready code:

Objects, Types and Reference Counts

Key takeaways:

  1. Py_INCREF() and Py_DECREF() need to be used accurately, or there'll be memory leaks (or, less likely, segfaults). This also applies to early exits, such as these caused by an error.
  2. Py_None, Py_True, and Py_False are singletons, but they still need to be incref'd/decref'd - the best way to do create a new reference to them is wrapping them in Py_NewRef().
  3. Pretty much only PyList_SET_ITEM() steals references (i.e. assumes ownership of objects passed into it) - if you pass an object into any other function and no longer need it after that, remember to Py_DECREF it!

Exception Handling

Key takeaways:

  1. If a Python exception has been raised, the module method that was called from Python must stop execution and return NULL immediately.

    In HogQLParseTreeConverter, we are able to use C++ exceptions: throwing SyntaxException, NotImplementedException, or ParsingException results in the same exception being raised in Python as expected. Note that if a visitAsFoo call throws an exception and there are PyObject*s in scope, we have to remember about cleaning up their refcounts. At such call sites, a try {} catch (...) {} block is appropriate.

  2. For all Python/C API calls returning PyObject*, make sure NULL wasn't returned - if it was, then something failed and the Python runtime has already set an exception (e.g. a MemoryError). The same applies to calls returning int - there the error value is -1. Exception: in PyArg_Foo functions failure is signaled by 0 and success by 1.

    In HogQLParseTreeConverter, these internal Python failures are handled simply by throwing PyInternalException.

Building Values

Key takeaways:

  1. Use Py_BuildValue() for building tuples, dicts, and lists of static size. Use type-specific functions (e.g. PyUnicode_FromString() or PyList_New()) otherwise.
  2. str-building with s involves strlen, while s# doesn't - it's better to use the latter with C++ strings.
  3. object-passing with O increments the object's refcount, while doing it with N doesn't - we should use N pretty much exclusively, because the parse tree converter is about creating new objects (not borrowing).

Conventions

  1. Use snake_case. ANTLR is camelCase-heavy because of its Java heritage, but both the C++ stdlib and CPython are snaky.
  2. Use the auto type for ANTLR and ANTLR-derived types, since they can be pretty verbose. Otherwise specify the type explictly.
  3. Stay out of Python land as long as possible. E.g. avoid using PyObject*s` for bools or strings. Do use Python for parsing numbers though - that way we don't need to consider integer overflow.
  4. If any child rule results in an AST node, so must the parent rule - once in Python land, always in Python land. E.g. it doesn't make sense to create a vector<PyObject*>, that should just be a PyObject* of Python type list.

How to develop locally on macOS

  1. Install libraries:

    brew install boost antlr4-cpp-runtime
    
  2. Install hogql_parser by building from local sources:

    pip install ./hogql_parser
    
  3. If you now run tests, the locally-built version of hogql_parser will be used:

    pytest posthog/hogql/