Introduces a proper imperative description of how the current API documentation build system works. Refs: https://github.com/nodejs/next-10/issues/169 PR-URL: https://github.com/nodejs/node/pull/45270 Reviewed-By: Michael Dawson <midawson@redhat.com>
15 KiB
Node.js API Documentation Tooling
The Node.js API documentation is generated by an in-house tooling that resides within the tools/doc directory.
The build process (using make doc
) uses this tooling to parse the markdown
files in doc/api and
generate the following:
- Human-readable HTML in
out/doc/api/*.html
- A JSON representation in
out/doc/api/*.json
These are published to nodejs.org for multiple versions of Node.js. As an example the latest version of the human-readable HTML is published to nodejs.org/en/doc, and the latest version of the json documentation is published to nodejs.org/api/all.json
The key things to know about the tooling include:
- The entry-point is
tools/doc/generate.js
. - The tooling supports the CLI arguments listed in the table below.
- The tooling processes one file at a time.
- The tooling uses a set of dependencies as described in the dependencies section.
- The tooling parses the input files and does several transformations to the AST (Abstract Syntax Tree).
- The tooling generates a JSON output that contains the metadata and content of the Markdown file.
- The tooling generates a HTML output that contains a human-readable and ready to-view version of the file.
This documentation serves the purpose of explaining the existing tooling processes, to allow easier maintenance and evolution of the tooling. It is not meant to be a guide on how to write documentation for Node.js.
Vocabulary & Good to Know's
- AST means "Abstract Syntax Tree" and it is a data structure that represents the structure of a certain data format. In our case, the AST is a "graph" representation of the contents of the Markdown file.
- MDN means Mozilla Developer Network and it is a website that contains documentation for web technologies. We use it as a reference for the structure of the documentation.
- The
Stability Index
is used to community the Stability of a given Node.js module. The Stability
levels include:
- Stability 0: Deprecated. (This module is Deprecated)
- Stability 1: Experimental. (This module is Experimental)
- Stability 2: Stable. (This module is Stable)
- Stability 3: Legacy. (This module is Legacy)
- Within Remark YAML snippets
<!-- something -->
are considered HTML nodes, that's because YAML isn't valid Markdown content. (Doesn't abide by the Markdown spec) - "New Tooling" references to the (written from-scratch) API build tooling
introduced in
nodejs/nodejs.dev
that might replace the current one fromnodejs/node
CLI Arguments
The tooling requires a filename
argument and supports extra arguments (some
also required) as shown below:
Argument | Description | Required | Example |
---|---|---|---|
--node-version= |
The version of Node.js that is being documented. It defaults to process.version which is supplied by Node.js itself |
No | v19.0.0 |
--output-directory= |
The directory where the output files will be generated. | Yes | ./out/api/ |
--apilinks= |
This file is used as an index to specify the source file for each module | No | ./out/doc/api/apilinks.json |
--versions-file= |
This file is used to specify an index of all previous versions of Node.js. It is used for the Version Navigation on the API docs page. | No | ./out/previous-doc-versions.json |
Note: both of the apilinks
and versions-file
parameters are generated by
the Node.js build process (Makefile). And they're files containing a JSON
object.
Basic Usage
# cd tools/doc
npm run node-doc-generator ${filename}
OR
# nodejs/node root directory
make doc
Dependencies and how the Tooling works internally
The API tooling uses an-AST-alike library called unified for processing the Input file as a Graph that supports easy modification and update of its nodes.
In addition to unified
we also use
Remark for manipulating the Markdown part,
and Rehypeto help convert to and from
Markdown.
What are the steps of the internal tooling?
The tooling uses unified
pipe-alike engine to pipe each part of the process.
(The description below is a simplified version)
- Starting from reading the Frontmatter section of the Markdown file with remark-frontmatter.
- Then the tooling goes to parse the Markdown by using
remark-parse
and adds support to GitHub Flavoured Markdown. - The tooling proceeds by parsing some of the Markdown nodes and transforming them to HTML.
- The tooling proceeds to generate the JSON output of the file.
- Finally it does its final node transformations and generates a stringified HTML.
- It then stores the output to a JSON file and adds extra styling to the HTML and then stores the HTML file.
What each file is responsible for?
The files listed below are the ones referenced and actually used during the build process of the API docs as we see on https://nodejs.org/api. The remaining files from the directory might be used by other steps of the Node.js Makefile or might even be deprecated/remnant of old processes and might need to be revisited/removed.
html.mjs
: Responsible for transforming nodes by decorating them with visual artifacts for the HTML pages;- For example, transforming man or JS doc references to links correctly referring to respective External documentation.
json.mjs
: Responsible for generating the JSON output of the file;- It is mostly responsible for going through the whole Markdown file and generating a JSON object that represent the Metadata of a specific Module.
- For example, for the FS module, it will generate an object with all its methods, events, classes and use several regular expressions (ReGeX) for extracting the information needed.
generate.mjs
: Main entry-point of doc generation for a specific file. It does e2e processing of a documentation file;allhtml.mjs
: A script executed after all files are generated to create a single "all" page containing all the HTML documentation;alljson.mjs
: A script executed after all files are generated to create a single "all" page containing all the JSON entries;markdown.mjs
: Contains utility to replace Markdown links to work with the https://nodejs.org/api/ website.common.mjs
: Contains a few utility functions that are used by the other files.type-parser.mjs
: Used to replace "type references" (e.g. "String", or "Buffer") to the correct Internal/External documentation pages (i.e. MDN or other Node.js documentation pages).
Note: It is important to mention that other files not mentioned here might be used during the process but are not relevant to the generation of the API docs themselves. You will notice that a lot of the logic within the build process is specific to the current https://nodejs.org/api/ infrastructure. Just as adding some JavaScript snippets, styles, transforming certain Markdown elements into HTML, and adding certain HTML classes or such things.
Note: Regarding the previous Note it is important to mention that we're currently working on an API tooling that is generic and independent of the current Nodejs.org Infrastructure. The new tooling that is functional is available at the nodejs.dev repository and uses plain ReGeX (No AST) and MDX.
The Build Process
The build process that happens on generate.mjs
follows the steps below:
- Links within the Markdown are replaced directly within the source Markdown
(AST) (
markdown.replaceLinks
)- This happens within
markdown.mjs
and basically it adds suffixes or modifies link references within the Markdown - This is necessary for the
https://nodejs.org
infrastructure as all pages are suffixed with.html
- This happens within
- Text (and some YAML) Nodes are transformed/modified through
html.preprocessText
- JSON output is generated through
json.jsonAPI
- The title of the page is inferred through
html.firstHeader
- Nodes are transformed into HTML Elements through
html.preprocessElements
- The HTML Table of Contents (ToC) is generated through
html.buildToc
html.mjs
This file is responsible for doing node AST transformations that either update Markdown nodes to decorate them with more data or transform them into HTML Nodes that attain a certain visual responsibility; For example, to generate the "Added at" label, or the Source Links or the Stability Index, or the History table.
Note: Methods not listed below are either not relevant or utility methods for string/array/object manipulation (e.g.: are used by the other methods mentioned below).
preprocessText
New Tooling: Most of the features within this method are available within the new tooling.
This method does two things:
- Replaces the Source Link YAML entry
<-- source_link= -->
into a "Source Link" HTML anchor element. - Replaces type references within the Markdown (text) (i.e.: "String", "Buffer")
into the correct HTML anchor element that links to the correct documentation
page.
- The original node then gets mutated from text to HTML.
- It also updates references to Linux "MAN" pages to Web versions of them.
firstHeader
New Tooling: All features within this method are available within the new Tooling.
Is used to attempt to extract the first heading of the page (recursively) to define the "title" of the page.
Note: As all API Markdown files start with a Heading, this could possibly be improved to a reduced complexity.
preprocessElements
New Tooling: All features within this method are available within the new tooling.
This method is responsible for doing multiple transformations within the AST Nodes, in majority, transforming the source node in respective HTML elements with diverse responsibilities, such as:
- Updating Markdown
code
blocks by adding Language highlighting- It also adds the "CJS"/"MJS" switch to Nodes that are followed by their CJS/ESM equivalents.
- Increasing the Heading level of each Heading
- Parses YAML blocks and transforms them into HTML elements (See more at the
parseYAML
method) - Updates BlockQuotes that are prefixed by the "Stability" word into a Stability Index HTML element.
parseYAML
New Tooling: Most of the features within this method are available within the new tooling.
This method is responsible for parsing the <--YAML snippets -->
and
transforming them into HTML elements.
It follows a certain kind of "schema" that basically constitues in the following options:
YAML Key | Description | Example | Example Result | Available on new tooling |
---|---|---|---|---|
added |
It's used to reference when a certain "module", "class" or "method" was added on Node.js | added: v0.1.90 |
Added in: v0.1.90 |
Yes |
deprecated |
It's used to reference when a certain "module", "class" or "method" was deprecated on Node.js | deprecated: v0.1.90 |
Deprecated since: v0.1.90 |
Yes |
removed |
It's used to reference when a certain "module", "class" or "method" was removed on Node.js | removed: v0.1.90 |
Removed in: v0.1.90 |
No |
changes |
It's used to describe all the changes (historical ones) that happened within a certain "module", "class" or "method" in Node.js | [{ version: v0.1.90, pr-url: '', description: '' }] |
-- | Yes |
napiVersion |
It's used to describe in which version of the N-API this "module", "class" or "method" is available within Node.js | napiVersion: 1 |
N-API version: 1 |
Yes |
Note: The changes
field gets prepended with the added
, deprecated
and
removed
fields if they exist. The table only gets generated if a changes
field exists. In the new tooling only "added" is prepended for now.
buildToc
New Tooling: This feature is natively available within the new tooling through MDX.
This method generates the Table of Contents based on all the Headings of the Markdown file.
altDocs
New Tooling: All features within this method are available within the new tooling.
This method generates a version picker for the current page to be shown in older versions of the API docs.
json.mjs
This file is responsible for generating a JSON object that (supposedly) is used for IDE-Intellisense or for indexing of all the "methods", "classes", "modules", "events", "constants" and "globals" available within a certain Markdown file.
It attempts a best effort extraction of the data by using several regular expression patterns (ReGeX).
Note: JSON output generation is currently not supported by the new tooling, but it is in the pipeline for development.
jsonAPI
This method traverses all the AST Nodes by iterating through each one of them and infers the kind of information each node contains through ReGeX. Then it mutate the data and appends it to the final JSON object.
For a more in-depth information we recommend to refer to the json.mjs
file as
it contains a lot of comments.