mirror of
https://github.com/garraflavatra/go-fmp.git
synced 2025-06-28 04:25:11 +00:00
443 lines
12 KiB
Plaintext
443 lines
12 KiB
Plaintext
|
The FileMaker Pro File Format
|
||
|
--
|
||
|
|
||
|
(By Evan Miller, 2025) - https://github.com/evanmiller/fmptools/blob/02eb770e59e0866dab213d80e5f7d88e17648031/HACKING
|
||
|
|
||
|
FileMaker Pro is a consumer-grade database program that uses a binary,
|
||
|
proprietary file format for storing tabular and non-tabular data. This
|
||
|
file describes the knowledge necessary to extract tabular data from files
|
||
|
with extension fp3, fp5, fp7, or fmp12.
|
||
|
|
||
|
There are two basic kinds of FileMaker files, fp3/fp5 and fp7/fmp12. The
|
||
|
two varieties have a similar overall structure and design philosophy but are
|
||
|
otherwise incompatible. The rest of this document will describe their
|
||
|
respective layouts and refer to them by their latest incarnations, fp5 and
|
||
|
fmp12. It is based on the fp5dump project combined with my own efforts.
|
||
|
The fp5dump project is here:
|
||
|
|
||
|
https://github.com/qwesda/fp5dump
|
||
|
|
||
|
The source code has more information about the fp5 type than you will find in
|
||
|
here. I welcome any attempts to merge that information into this document.
|
||
|
|
||
|
|
||
|
Preliminaries: Text Encoding
|
||
|
==
|
||
|
|
||
|
Text data in fp5 files use the native character encoding of the machine that
|
||
|
created them; in most cases, this encoding is MacRoman. iconv can be used to
|
||
|
convert this text data to a more modern encoding, e.g. UTF-8.
|
||
|
|
||
|
The story with fmp12 is more complicated. FileMaker began supporting Unicode
|
||
|
characters before UTF-8 achieved widespread popularity, and appears to use the
|
||
|
now-deprecated Standard Compression Scheme for Unicode (SCSU), which is
|
||
|
documented here:
|
||
|
|
||
|
https://www.unicode.org/reports/tr6/tr6-4.html
|
||
|
|
||
|
SCSU is Latin-1 compatible, so treating the raw bytes as ISO-8859-1 is a good
|
||
|
start. But then it uses control codes to switch to other "windows" of Unicode
|
||
|
characters, including full support for UTF-16BE and extended Unicode planes.
|
||
|
|
||
|
|
||
|
Preliminaries: Integer Encoding
|
||
|
==
|
||
|
|
||
|
Most integer data (e.g. lengths) are encoded big-endian. However, certain
|
||
|
values appear to use a quasi-variable-length encoding. The encoding was fully
|
||
|
variable length in fp5, but seems to have been modified in fmp12. For reasons
|
||
|
that will become clear later, these will be referred to as "path integers" that
|
||
|
consist of one to three bytes.
|
||
|
|
||
|
In all cases, the actual length of the integer can be determined from context,
|
||
|
but they seem designed in a way that they self-report their length, similar to
|
||
|
UTF-8 sequences. This feature is not necessary to parse them, so for simplicity
|
||
|
the sequences will be described assuming the total length is known in advance.
|
||
|
|
||
|
One byte integers have a range of 0 - 127, with the highest bit ignored.
|
||
|
|
||
|
Two byte integers have a range of 128 - 65536. Ignore the highest bit of the first
|
||
|
byte, treat the remaining 15 bits as a big-endian number, and add 128.
|
||
|
|
||
|
[fp5 only] Three byte integers have a range of 49152 and up. Ignore the highest two
|
||
|
bits of the first byte, treat the remaining 22 bits as a big-endian number, and
|
||
|
add 0xC000.
|
||
|
|
||
|
[fmp12 only] Three byte integers have a range of 128 and up. Ignore the first
|
||
|
byte and add 128 to the second two bytes, treated as a big-endian number.
|
||
|
|
||
|
|
||
|
File Structure
|
||
|
==
|
||
|
|
||
|
Files consist of a header sector followed by one or more body sectors. Each
|
||
|
sector contains 1024 bytes (fp5) or 4096 bytes (fmp12). In fp5 files, the first
|
||
|
body sector can be ignored, with the "real" processing starting at offset 2048.
|
||
|
|
||
|
|
||
|
Header Structure
|
||
|
==
|
||
|
|
||
|
The header begins with a 15-byte magic number:
|
||
|
|
||
|
00 01 00 00 00 02 00 01 00 05 00 02 00 02 C0
|
||
|
|
||
|
In fmp12, the magic number is followed by the ASCII sequence "HBAM7". This
|
||
|
sequence can be used to distinguish fp5 files from fmp12 files.
|
||
|
|
||
|
The name of the software that created the file can be found at byte offset
|
||
|
541 in the header. This string is a Pascal string, consisting of a one-byte
|
||
|
length at offset 541 followed by an ASCII, non-terminated string, usually
|
||
|
of the form "Pro X.0", where X is the version number.
|
||
|
|
||
|
|
||
|
Sector Structure
|
||
|
==
|
||
|
|
||
|
Sectors may be unordered; they are arranged as a doubly linked list, and
|
||
|
contain the ID of the previous sector as well as the next sector in the list.
|
||
|
By following the linked list from the beginning, you can traverse the data in
|
||
|
order.
|
||
|
|
||
|
fp5 sector layout:
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 Deleted? 1=Yes 0=No
|
||
|
1 1 Level (Integer)
|
||
|
2 4 Previous Sector ID (Integer)
|
||
|
6 4 Next Sector ID (Integer)
|
||
|
12 2 Payload Length = N (Integer)
|
||
|
14 N Payload
|
||
|
|
||
|
fmp12 sector layout:
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 Deleted? 1=Yes 0=No
|
||
|
1 1 Level = Integer
|
||
|
4 4 Previous Sector ID = Integer
|
||
|
8 4 Next Sector ID = Integer
|
||
|
20 4076 Payload
|
||
|
|
||
|
The "Payload" is a byte-code stream that can be used to construct a series
|
||
|
of data chunks. For our purpoes, there are six kinds of chunks:
|
||
|
|
||
|
* Path "push" operation (integer or byte sequence)
|
||
|
* Path "pop" operation
|
||
|
* Simple data (byte sequence)
|
||
|
* Segmented data (segment index + byte sequence)
|
||
|
* Simple key-value pair (integer => byte sequence)
|
||
|
* Long key-value pair (byte sequence => byte sequence)
|
||
|
|
||
|
The path operations define the logical position of the other kinds of data,
|
||
|
and are central to extracting data from the file. It is a primitive sort of
|
||
|
"file system" whose "folders" are usually (but not always) integers.
|
||
|
|
||
|
For example, the file may "push" the numbers 3, 1, and 5 onto the path, in
|
||
|
which case the next piece of data will have a path address of [3].[1].[5].
|
||
|
After a "pop" operation, the next piece of data will have the address [3].[1],
|
||
|
and so on.
|
||
|
|
||
|
A "simple data" chunk is just a sequence of bytes; its path will determine how
|
||
|
to interpret its contents. Most byte sequences in fmp12 need to be "decrypted"
|
||
|
by XOR'ing every byte with the hex value 0x5A.
|
||
|
|
||
|
Segmented data refers to data that does not fit into a single chunk, or even
|
||
|
in a single block. Typically, large strings or objects are split into 1000-byte
|
||
|
segments that share a path. Each segmented data chunk includes a sequential index
|
||
|
that can be used to reconstruct the large object.
|
||
|
|
||
|
Key-value pairs are the most common kind of chunk; multiple key-value pairs
|
||
|
with the same path can represent associative arrays or records. The keys may be
|
||
|
integers or strings (but usually integers), and the values are byte sequences.
|
||
|
|
||
|
The "Codes" sections will describe the byte codes that can be used to decode
|
||
|
the six chunk types. By implementing them, any FileMaker file can be read
|
||
|
into memory. The "Path Structure" sections will describe how to convert these
|
||
|
raw chunks into meaningful data structures.
|
||
|
|
||
|
|
||
|
fp5 Codes
|
||
|
==
|
||
|
|
||
|
Each chunk can usually be identified by its first byte, although in a few cases
|
||
|
examining the second byte is necessary.
|
||
|
|
||
|
The possible chunk types and structures in fp5 files are:
|
||
|
|
||
|
|
||
|
Simple key-value
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x00
|
||
|
1 1 N = Length (Integer)
|
||
|
2 N Value
|
||
|
|
||
|
Key = 0x00 (Integer)
|
||
|
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x40 <= C <= 0x7F
|
||
|
1 1 N = Length (Integer)
|
||
|
2 N Value (Bytes)
|
||
|
|
||
|
Key = C - 0x40 (Integer)
|
||
|
|
||
|
|
||
|
Offset Length Value
|
||
|
0 2 0xFF (0x40 <= C <= 0x80)
|
||
|
2 C-0x40 Key (Bytes)
|
||
|
C-0x3E 2 N = Length (Integer)
|
||
|
C-0x3C N Value (Bytes)
|
||
|
|
||
|
|
||
|
Long key-value
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x01 <= C <= 0x3F
|
||
|
1 1 K = Key Length (Integer)
|
||
|
2 K Key (Bytes)
|
||
|
2+K 1 N = Length (Integer)
|
||
|
2+K+1 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Offset Length Value
|
||
|
0 2 0xFF (0x01 <= K <= 0x04)
|
||
|
2 K Key (Bytes)
|
||
|
2+C 2 N = Length (Integer)
|
||
|
2+C+2 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Simple data
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x80 <= C <= 0xBF
|
||
|
1 C-0x80 Value (Bytes)
|
||
|
|
||
|
|
||
|
Path pop
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0xC0
|
||
|
|
||
|
|
||
|
Path push
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0xC1 <= C <= 0xFE
|
||
|
1 C-0xC0 Value (Bytes)
|
||
|
|
||
|
|
||
|
fmp12 Codes
|
||
|
==
|
||
|
|
||
|
As with the fp5 codes, each chunk can usually be identified by its first byte,
|
||
|
although in a few cases examining the second byte is necessary.
|
||
|
|
||
|
The possible chunk types and structures are:
|
||
|
|
||
|
|
||
|
Simple data
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x00
|
||
|
1 1 Bytes
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x08
|
||
|
1 2 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 2 0x0E 0xFF
|
||
|
2 5 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x10 <= C <= 0x11
|
||
|
1 3+(C-0x10) Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x12 <= C <= 0x15
|
||
|
1 1+2*(C-0x10) Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 (0x19 | 0x23)
|
||
|
1 1 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x1A <= C <= 0x1D
|
||
|
1 2*(C-0x19) Value (Bytes)
|
||
|
|
||
|
|
||
|
Simple key-value
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x01
|
||
|
1 1 Key (Integer)
|
||
|
2 1 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x02 <= C <= 0x05
|
||
|
1 1 Key (Integer)
|
||
|
2 2*(C-1) Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x06
|
||
|
1 1 Key (Integer)
|
||
|
2 1 N = Length (Integer)
|
||
|
2 N Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x09
|
||
|
1 2 Key (Path Integer)
|
||
|
2 1 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x0A <= C <= 0x0D
|
||
|
1 2 Key (Path Integer)
|
||
|
2 2*(C-9) Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x0E
|
||
|
1 2 Key (Path Integer)
|
||
|
3 1 N = Length (Integer)
|
||
|
4 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Long key-value
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x16
|
||
|
1 3 Key (Bytes)
|
||
|
4 1 N = Length (Integer)
|
||
|
5 N Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x17
|
||
|
1 3 Key (Bytes)
|
||
|
4 2 N = Length (Integer)
|
||
|
6 N Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x1E
|
||
|
1 1 K = Key Length (Integer)
|
||
|
2 K Key (Bytes)
|
||
|
2+K 1 N = Value Length (Integer)
|
||
|
2+K+1 N Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x1F
|
||
|
1 1 K = Key Length (Integer)
|
||
|
2 K Key (Bytes)
|
||
|
2+K 2 N = Value Length (Integer)
|
||
|
2+K+2 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Segmented data
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x07
|
||
|
1 1 Segment index (Integer)
|
||
|
2 2 N = Length (Integer)
|
||
|
4 N Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x0F
|
||
|
1 2 Segment index (Path Integer)
|
||
|
3 2 N = Length (Integer)
|
||
|
5 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Path push
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x20 | 0x0E
|
||
|
1 1 Value (Integer)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 2 (0x20 | 0x0E) 0xFE
|
||
|
1 8 Value (Bytes)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x28
|
||
|
1 2 Value (Path Integer)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x30
|
||
|
1 3 Value (Path Integer)
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x38
|
||
|
1 1 N = Length (Integer)
|
||
|
2 N Value (Bytes)
|
||
|
|
||
|
|
||
|
Path pop
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 (0x3D | 0x40)
|
||
|
|
||
|
|
||
|
No-op
|
||
|
~~
|
||
|
|
||
|
Offset Length Value
|
||
|
0 1 0x80
|
||
|
|
||
|
|
||
|
|
||
|
fp5 Path Structure
|
||
|
==
|
||
|
|
||
|
fp5 files can contain only one table, which makes things easy. The
|
||
|
known paths are:
|
||
|
|
||
|
[1]: Some kind of word index?
|
||
|
|
||
|
[3].[1]: Column names => Index pairs (String key, Integer value)
|
||
|
|
||
|
These column names are uppercase.
|
||
|
|
||
|
[3].[5].[X]: Metadata for the Xth column (Key-value pairs)
|
||
|
|
||
|
[1] => Column name
|
||
|
[2] => Second byte indicates column type (1=String, 2=Integer)
|
||
|
|
||
|
[5].[X]: Xth record in the table (Path Integer key, String or Integer value)
|
||
|
|
||
|
It appears that later paths located at [32] and up are references to external
|
||
|
FileMaker files on the same hard drive.
|
||
|
|
||
|
|
||
|
fmp12 Path Structure
|
||
|
==
|
||
|
|
||
|
fmp12 introduced the ability to store multiple tables in one file. Individual
|
||
|
tables have a similar layout to the fp5 files, but are stored in a root path
|
||
|
with a value of 128 or above.
|
||
|
|
||
|
For example, if the first table is stored at path [130], that table's column
|
||
|
metadata can be found at [130].[3].[5].
|
||
|
|
||
|
The semantics are slightly changed, as documented below. fmp12 appears to
|
||
|
eliminate the Integer column type in favor of all Strings.
|
||
|
|
||
|
[4].[1].[7].[X]: Metadata about the Xth table
|
||
|
|
||
|
[16] => Table name
|
||
|
|
||
|
[128+X].[3].[5].[Y]: Metadata for the Yth column of the Xth table
|
||
|
|
||
|
[128+X].[5].[Y]: Yth record in the Xth table (Path Integer key, String value)
|
||
|
|
||
|
Note that the sequence of tables is not necessarily compact.
|