The FileMaker Pro File Format -- (By Evan Miller, 2025) - https://github.com/evanmiller/fmptools/blob/02eb770e59e0866dab213d80e5f7d88e17648031/HACKING FileMaker Pro is a consumer-grade database program that uses a binary, proprietary file format for storing tabular and non-tabular data. This file describes the knowledge necessary to extract tabular data from files with extension fp3, fp5, fp7, or fmp12. There are two basic kinds of FileMaker files, fp3/fp5 and fp7/fmp12. The two varieties have a similar overall structure and design philosophy but are otherwise incompatible. The rest of this document will describe their respective layouts and refer to them by their latest incarnations, fp5 and fmp12. It is based on the fp5dump project combined with my own efforts. The fp5dump project is here: https://github.com/qwesda/fp5dump The source code has more information about the fp5 type than you will find in here. I welcome any attempts to merge that information into this document. Preliminaries: Text Encoding == Text data in fp5 files use the native character encoding of the machine that created them; in most cases, this encoding is MacRoman. iconv can be used to convert this text data to a more modern encoding, e.g. UTF-8. The story with fmp12 is more complicated. FileMaker began supporting Unicode characters before UTF-8 achieved widespread popularity, and appears to use the now-deprecated Standard Compression Scheme for Unicode (SCSU), which is documented here: https://www.unicode.org/reports/tr6/tr6-4.html SCSU is Latin-1 compatible, so treating the raw bytes as ISO-8859-1 is a good start. But then it uses control codes to switch to other "windows" of Unicode characters, including full support for UTF-16BE and extended Unicode planes. Preliminaries: Integer Encoding == Most integer data (e.g. lengths) are encoded big-endian. However, certain values appear to use a quasi-variable-length encoding. The encoding was fully variable length in fp5, but seems to have been modified in fmp12. For reasons that will become clear later, these will be referred to as "path integers" that consist of one to three bytes. In all cases, the actual length of the integer can be determined from context, but they seem designed in a way that they self-report their length, similar to UTF-8 sequences. This feature is not necessary to parse them, so for simplicity the sequences will be described assuming the total length is known in advance. One byte integers have a range of 0 - 127, with the highest bit ignored. Two byte integers have a range of 128 - 65536. Ignore the highest bit of the first byte, treat the remaining 15 bits as a big-endian number, and add 128. [fp5 only] Three byte integers have a range of 49152 and up. Ignore the highest two bits of the first byte, treat the remaining 22 bits as a big-endian number, and add 0xC000. [fmp12 only] Three byte integers have a range of 128 and up. Ignore the first byte and add 128 to the second two bytes, treated as a big-endian number. File Structure == Files consist of a header sector followed by one or more body sectors. Each sector contains 1024 bytes (fp5) or 4096 bytes (fmp12). In fp5 files, the first body sector can be ignored, with the "real" processing starting at offset 2048. Header Structure == The header begins with a 15-byte magic number: 00 01 00 00 00 02 00 01 00 05 00 02 00 02 C0 In fmp12, the magic number is followed by the ASCII sequence "HBAM7". This sequence can be used to distinguish fp5 files from fmp12 files. The name of the software that created the file can be found at byte offset 541 in the header. This string is a Pascal string, consisting of a one-byte length at offset 541 followed by an ASCII, non-terminated string, usually of the form "Pro X.0", where X is the version number. Sector Structure == Sectors may be unordered; they are arranged as a doubly linked list, and contain the ID of the previous sector as well as the next sector in the list. By following the linked list from the beginning, you can traverse the data in order. fp5 sector layout: Offset Length Value 0 1 Deleted? 1=Yes 0=No 1 1 Level (Integer) 2 4 Previous Sector ID (Integer) 6 4 Next Sector ID (Integer) 12 2 Payload Length = N (Integer) 14 N Payload fmp12 sector layout: Offset Length Value 0 1 Deleted? 1=Yes 0=No 1 1 Level = Integer 4 4 Previous Sector ID = Integer 8 4 Next Sector ID = Integer 20 4076 Payload The "Payload" is a byte-code stream that can be used to construct a series of data chunks. For our purpoes, there are six kinds of chunks: * Path "push" operation (integer or byte sequence) * Path "pop" operation * Simple data (byte sequence) * Segmented data (segment index + byte sequence) * Simple key-value pair (integer => byte sequence) * Long key-value pair (byte sequence => byte sequence) The path operations define the logical position of the other kinds of data, and are central to extracting data from the file. It is a primitive sort of "file system" whose "folders" are usually (but not always) integers. For example, the file may "push" the numbers 3, 1, and 5 onto the path, in which case the next piece of data will have a path address of [3].[1].[5]. After a "pop" operation, the next piece of data will have the address [3].[1], and so on. A "simple data" chunk is just a sequence of bytes; its path will determine how to interpret its contents. Most byte sequences in fmp12 need to be "decrypted" by XOR'ing every byte with the hex value 0x5A. Segmented data refers to data that does not fit into a single chunk, or even in a single block. Typically, large strings or objects are split into 1000-byte segments that share a path. Each segmented data chunk includes a sequential index that can be used to reconstruct the large object. Key-value pairs are the most common kind of chunk; multiple key-value pairs with the same path can represent associative arrays or records. The keys may be integers or strings (but usually integers), and the values are byte sequences. The "Codes" sections will describe the byte codes that can be used to decode the six chunk types. By implementing them, any FileMaker file can be read into memory. The "Path Structure" sections will describe how to convert these raw chunks into meaningful data structures. fp5 Codes == Each chunk can usually be identified by its first byte, although in a few cases examining the second byte is necessary. The possible chunk types and structures in fp5 files are: Simple key-value ~~ Offset Length Value 0 1 0x00 1 1 N = Length (Integer) 2 N Value Key = 0x00 (Integer) Offset Length Value 0 1 0x40 <= C <= 0x7F 1 1 N = Length (Integer) 2 N Value (Bytes) Key = C - 0x40 (Integer) Offset Length Value 0 2 0xFF (0x40 <= C <= 0x80) 2 C-0x40 Key (Bytes) C-0x3E 2 N = Length (Integer) C-0x3C N Value (Bytes) Long key-value ~~ Offset Length Value 0 1 0x01 <= C <= 0x3F 1 1 K = Key Length (Integer) 2 K Key (Bytes) 2+K 1 N = Length (Integer) 2+K+1 N Value (Bytes) Offset Length Value 0 2 0xFF (0x01 <= K <= 0x04) 2 K Key (Bytes) 2+C 2 N = Length (Integer) 2+C+2 N Value (Bytes) Simple data ~~ Offset Length Value 0 1 0x80 <= C <= 0xBF 1 C-0x80 Value (Bytes) Path pop ~~ Offset Length Value 0 1 0xC0 Path push ~~ Offset Length Value 0 1 0xC1 <= C <= 0xFE 1 C-0xC0 Value (Bytes) fmp12 Codes == As with the fp5 codes, each chunk can usually be identified by its first byte, although in a few cases examining the second byte is necessary. The possible chunk types and structures are: Simple data ~~ Offset Length Value 0 1 0x00 1 1 Bytes Offset Length Value 0 1 0x08 1 2 Value (Bytes) Offset Length Value 0 2 0x0E 0xFF 2 5 Value (Bytes) Offset Length Value 0 1 0x10 <= C <= 0x11 1 3+(C-0x10) Value (Bytes) Offset Length Value 0 1 0x12 <= C <= 0x15 1 1+2*(C-0x10) Value (Bytes) Offset Length Value 0 1 (0x19 | 0x23) 1 1 Value (Bytes) Offset Length Value 0 1 0x1A <= C <= 0x1D 1 2*(C-0x19) Value (Bytes) Simple key-value ~~ Offset Length Value 0 1 0x01 1 1 Key (Integer) 2 1 Value (Bytes) Offset Length Value 0 1 0x02 <= C <= 0x05 1 1 Key (Integer) 2 2*(C-1) Value (Bytes) Offset Length Value 0 1 0x06 1 1 Key (Integer) 2 1 N = Length (Integer) 2 N Value (Bytes) Offset Length Value 0 1 0x09 1 2 Key (Path Integer) 2 1 Value (Bytes) Offset Length Value 0 1 0x0A <= C <= 0x0D 1 2 Key (Path Integer) 2 2*(C-9) Value (Bytes) Offset Length Value 0 1 0x0E 1 2 Key (Path Integer) 3 1 N = Length (Integer) 4 N Value (Bytes) Long key-value ~~ Offset Length Value 0 1 0x16 1 3 Key (Bytes) 4 1 N = Length (Integer) 5 N Value (Bytes) Offset Length Value 0 1 0x17 1 3 Key (Bytes) 4 2 N = Length (Integer) 6 N Value (Bytes) Offset Length Value 0 1 0x1E 1 1 K = Key Length (Integer) 2 K Key (Bytes) 2+K 1 N = Value Length (Integer) 2+K+1 N Value (Bytes) Offset Length Value 0 1 0x1F 1 1 K = Key Length (Integer) 2 K Key (Bytes) 2+K 2 N = Value Length (Integer) 2+K+2 N Value (Bytes) Segmented data ~~ Offset Length Value 0 1 0x07 1 1 Segment index (Integer) 2 2 N = Length (Integer) 4 N Value (Bytes) Offset Length Value 0 1 0x0F 1 2 Segment index (Path Integer) 3 2 N = Length (Integer) 5 N Value (Bytes) Path push ~~ Offset Length Value 0 1 0x20 | 0x0E 1 1 Value (Integer) Offset Length Value 0 2 (0x20 | 0x0E) 0xFE 1 8 Value (Bytes) Offset Length Value 0 1 0x28 1 2 Value (Path Integer) Offset Length Value 0 1 0x30 1 3 Value (Path Integer) Offset Length Value 0 1 0x38 1 1 N = Length (Integer) 2 N Value (Bytes) Path pop ~~ Offset Length Value 0 1 (0x3D | 0x40) No-op ~~ Offset Length Value 0 1 0x80 fp5 Path Structure == fp5 files can contain only one table, which makes things easy. The known paths are: [1]: Some kind of word index? [3].[1]: Column names => Index pairs (String key, Integer value) These column names are uppercase. [3].[5].[X]: Metadata for the Xth column (Key-value pairs) [1] => Column name [2] => Second byte indicates column type (1=String, 2=Integer) [5].[X]: Xth record in the table (Path Integer key, String or Integer value) It appears that later paths located at [32] and up are references to external FileMaker files on the same hard drive. fmp12 Path Structure == fmp12 introduced the ability to store multiple tables in one file. Individual tables have a similar layout to the fp5 files, but are stored in a root path with a value of 128 or above. For example, if the first table is stored at path [130], that table's column metadata can be found at [130].[3].[5]. The semantics are slightly changed, as documented below. fmp12 appears to eliminate the Integer column type in favor of all Strings. [4].[1].[7].[X]: Metadata about the Xth table [16] => Table name [128+X].[3].[5].[Y]: Metadata for the Yth column of the Xth table [128+X].[5].[Y]: Yth record in the Xth table (Path Integer key, String value) Note that the sequence of tables is not necessarily compact.