After migrating a Wagtail-based site from MySQL to Postgres, we
noticed that malicious requests to the site that included percent-
encoded Unicode NULLs (`%00`) raised a `ValueError` exception that we
hadn't seen when using MySQL: `A string literal cannot contain NUL
(0x00) characters.` This appears to relate to `psycopg2`'s decision to
raise an exception in these situations, as discussed here:
https://github.com/psycopg/psycopg2/issues/420
While newer versions of Django appear to provide some field validation
that addresses these characters, it doesn't look like Wagtail's
redirect middleware is making use of those validators, and so it seemed
reasonable to clean these characters in the context of 'normalizing'
the paths before looking for corresponding redirects -- especially
since a quick investigation on the internet suggests that U+0000 in
URLs can be used as a means of attack, and also since RFC 3986 says:
Note, however, that the "%00" percent-encoding (NUL) may require
special handling and should be rejected if the application is not
expecting to receive raw data within a component.
We currently index all items in Elasticsearch using the root bulk api
(at ``/_bulk``). This API is to allow multiple indices to be inserted
into at once. However, Wagtail inserts into one index at a time so this
is not needed. If we pass the index name as a parameter in the call to
``bulk()``, the index-specific bulk API will be used instead (at
``/<index name>/_bulk``.
The advantage of this change is it makes it possible to implement access
control by checking the URL an application is using. This is required in
order for the Bulk API to work on certain hosts (such as Divio).