0
0
mirror of https://github.com/nodejs/node.git synced 2024-11-25 08:19:38 +01:00
nodejs/doc/contributing/api-documentation.md
Claudio Wunder bf28da8617 tools: add documentation regarding our api tooling
Introduces a proper imperative description of how the
current API documentation build system works.

Refs: https://github.com/nodejs/next-10/issues/169

PR-URL: https://github.com/nodejs/node/pull/45270
Reviewed-By: Michael Dawson <midawson@redhat.com>
2022-11-09 13:18:46 -05:00

15 KiB

Node.js API Documentation Tooling

The Node.js API documentation is generated by an in-house tooling that resides within the tools/doc directory.

The build process (using make doc) uses this tooling to parse the markdown files in doc/api and generate the following:

  1. Human-readable HTML in out/doc/api/*.html
  2. A JSON representation in out/doc/api/*.json

These are published to nodejs.org for multiple versions of Node.js. As an example the latest version of the human-readable HTML is published to nodejs.org/en/doc, and the latest version of the json documentation is published to nodejs.org/api/all.json

The key things to know about the tooling include:

  1. The entry-point is tools/doc/generate.js.
  2. The tooling supports the CLI arguments listed in the table below.
  3. The tooling processes one file at a time.
  4. The tooling uses a set of dependencies as described in the dependencies section.
  5. The tooling parses the input files and does several transformations to the AST (Abstract Syntax Tree).
  6. The tooling generates a JSON output that contains the metadata and content of the Markdown file.
  7. The tooling generates a HTML output that contains a human-readable and ready to-view version of the file.

This documentation serves the purpose of explaining the existing tooling processes, to allow easier maintenance and evolution of the tooling. It is not meant to be a guide on how to write documentation for Node.js.

Vocabulary & Good to Know's

  • AST means "Abstract Syntax Tree" and it is a data structure that represents the structure of a certain data format. In our case, the AST is a "graph" representation of the contents of the Markdown file.
  • MDN means Mozilla Developer Network and it is a website that contains documentation for web technologies. We use it as a reference for the structure of the documentation.
  • The Stability Index is used to community the Stability of a given Node.js module. The Stability levels include:
    • Stability 0: Deprecated. (This module is Deprecated)
    • Stability 1: Experimental. (This module is Experimental)
    • Stability 2: Stable. (This module is Stable)
    • Stability 3: Legacy. (This module is Legacy)
  • Within Remark YAML snippets <!-- something --> are considered HTML nodes, that's because YAML isn't valid Markdown content. (Doesn't abide by the Markdown spec)
  • "New Tooling" references to the (written from-scratch) API build tooling introduced in nodejs/nodejs.dev that might replace the current one from nodejs/node

CLI Arguments

The tooling requires a filename argument and supports extra arguments (some also required) as shown below:

Argument Description Required Example
--node-version= The version of Node.js that is being documented. It defaults to process.version which is supplied by Node.js itself No v19.0.0
--output-directory= The directory where the output files will be generated. Yes ./out/api/
--apilinks= This file is used as an index to specify the source file for each module No ./out/doc/api/apilinks.json
--versions-file= This file is used to specify an index of all previous versions of Node.js. It is used for the Version Navigation on the API docs page. No ./out/previous-doc-versions.json

Note: both of the apilinks and versions-file parameters are generated by the Node.js build process (Makefile). And they're files containing a JSON object.

Basic Usage

# cd tools/doc
npm run node-doc-generator ${filename}

OR

# nodejs/node root directory
make doc

Dependencies and how the Tooling works internally

The API tooling uses an-AST-alike library called unified for processing the Input file as a Graph that supports easy modification and update of its nodes.

In addition to unified we also use Remark for manipulating the Markdown part, and Rehypeto help convert to and from Markdown.

What are the steps of the internal tooling?

The tooling uses unified pipe-alike engine to pipe each part of the process. (The description below is a simplified version)

  • Starting from reading the Frontmatter section of the Markdown file with remark-frontmatter.
  • Then the tooling goes to parse the Markdown by using remark-parse and adds support to GitHub Flavoured Markdown.
  • The tooling proceeds by parsing some of the Markdown nodes and transforming them to HTML.
  • The tooling proceeds to generate the JSON output of the file.
  • Finally it does its final node transformations and generates a stringified HTML.
  • It then stores the output to a JSON file and adds extra styling to the HTML and then stores the HTML file.

What each file is responsible for?

The files listed below are the ones referenced and actually used during the build process of the API docs as we see on https://nodejs.org/api. The remaining files from the directory might be used by other steps of the Node.js Makefile or might even be deprecated/remnant of old processes and might need to be revisited/removed.

  • html.mjs: Responsible for transforming nodes by decorating them with visual artifacts for the HTML pages;
    • For example, transforming man or JS doc references to links correctly referring to respective External documentation.
  • json.mjs: Responsible for generating the JSON output of the file;
    • It is mostly responsible for going through the whole Markdown file and generating a JSON object that represent the Metadata of a specific Module.
    • For example, for the FS module, it will generate an object with all its methods, events, classes and use several regular expressions (ReGeX) for extracting the information needed.
  • generate.mjs: Main entry-point of doc generation for a specific file. It does e2e processing of a documentation file;
  • allhtml.mjs: A script executed after all files are generated to create a single "all" page containing all the HTML documentation;
  • alljson.mjs: A script executed after all files are generated to create a single "all" page containing all the JSON entries;
  • markdown.mjs: Contains utility to replace Markdown links to work with the https://nodejs.org/api/ website.
  • common.mjs: Contains a few utility functions that are used by the other files.
  • type-parser.mjs: Used to replace "type references" (e.g. "String", or "Buffer") to the correct Internal/External documentation pages (i.e. MDN or other Node.js documentation pages).

Note: It is important to mention that other files not mentioned here might be used during the process but are not relevant to the generation of the API docs themselves. You will notice that a lot of the logic within the build process is specific to the current https://nodejs.org/api/ infrastructure. Just as adding some JavaScript snippets, styles, transforming certain Markdown elements into HTML, and adding certain HTML classes or such things.

Note: Regarding the previous Note it is important to mention that we're currently working on an API tooling that is generic and independent of the current Nodejs.org Infrastructure. The new tooling that is functional is available at the nodejs.dev repository and uses plain ReGeX (No AST) and MDX.

The Build Process

The build process that happens on generate.mjs follows the steps below:

  • Links within the Markdown are replaced directly within the source Markdown (AST) (markdown.replaceLinks)
    • This happens within markdown.mjs and basically it adds suffixes or modifies link references within the Markdown
    • This is necessary for the https://nodejs.org infrastructure as all pages are suffixed with .html
  • Text (and some YAML) Nodes are transformed/modified through html.preprocessText
  • JSON output is generated through json.jsonAPI
  • The title of the page is inferred through html.firstHeader
  • Nodes are transformed into HTML Elements through html.preprocessElements
  • The HTML Table of Contents (ToC) is generated through html.buildToc

html.mjs

This file is responsible for doing node AST transformations that either update Markdown nodes to decorate them with more data or transform them into HTML Nodes that attain a certain visual responsibility; For example, to generate the "Added at" label, or the Source Links or the Stability Index, or the History table.

Note: Methods not listed below are either not relevant or utility methods for string/array/object manipulation (e.g.: are used by the other methods mentioned below).

preprocessText

New Tooling: Most of the features within this method are available within the new tooling.

This method does two things:

  • Replaces the Source Link YAML entry <-- source_link= --> into a "Source Link" HTML anchor element.
  • Replaces type references within the Markdown (text) (i.e.: "String", "Buffer") into the correct HTML anchor element that links to the correct documentation page.
    • The original node then gets mutated from text to HTML.
    • It also updates references to Linux "MAN" pages to Web versions of them.

firstHeader

New Tooling: All features within this method are available within the new Tooling.

Is used to attempt to extract the first heading of the page (recursively) to define the "title" of the page.

Note: As all API Markdown files start with a Heading, this could possibly be improved to a reduced complexity.

preprocessElements

New Tooling: All features within this method are available within the new tooling.

This method is responsible for doing multiple transformations within the AST Nodes, in majority, transforming the source node in respective HTML elements with diverse responsibilities, such as:

  • Updating Markdown code blocks by adding Language highlighting
    • It also adds the "CJS"/"MJS" switch to Nodes that are followed by their CJS/ESM equivalents.
  • Increasing the Heading level of each Heading
  • Parses YAML blocks and transforms them into HTML elements (See more at the parseYAML method)
  • Updates BlockQuotes that are prefixed by the "Stability" word into a Stability Index HTML element.

parseYAML

New Tooling: Most of the features within this method are available within the new tooling.

This method is responsible for parsing the <--YAML snippets --> and transforming them into HTML elements.

It follows a certain kind of "schema" that basically constitues in the following options:

YAML Key Description Example Example Result Available on new tooling
added It's used to reference when a certain "module", "class" or "method" was added on Node.js added: v0.1.90 Added in: v0.1.90 Yes
deprecated It's used to reference when a certain "module", "class" or "method" was deprecated on Node.js deprecated: v0.1.90 Deprecated since: v0.1.90 Yes
removed It's used to reference when a certain "module", "class" or "method" was removed on Node.js removed: v0.1.90 Removed in: v0.1.90 No
changes It's used to describe all the changes (historical ones) that happened within a certain "module", "class" or "method" in Node.js [{ version: v0.1.90, pr-url: '', description: '' }] -- Yes
napiVersion It's used to describe in which version of the N-API this "module", "class" or "method" is available within Node.js napiVersion: 1 N-API version: 1 Yes

Note: The changes field gets prepended with the added, deprecated and removed fields if they exist. The table only gets generated if a changes field exists. In the new tooling only "added" is prepended for now.

buildToc

New Tooling: This feature is natively available within the new tooling through MDX.

This method generates the Table of Contents based on all the Headings of the Markdown file.

altDocs

New Tooling: All features within this method are available within the new tooling.

This method generates a version picker for the current page to be shown in older versions of the API docs.

json.mjs

This file is responsible for generating a JSON object that (supposedly) is used for IDE-Intellisense or for indexing of all the "methods", "classes", "modules", "events", "constants" and "globals" available within a certain Markdown file.

It attempts a best effort extraction of the data by using several regular expression patterns (ReGeX).

Note: JSON output generation is currently not supported by the new tooling, but it is in the pipeline for development.

jsonAPI

This method traverses all the AST Nodes by iterating through each one of them and infers the kind of information each node contains through ReGeX. Then it mutate the data and appends it to the final JSON object.

For a more in-depth information we recommend to refer to the json.mjs file as it contains a lot of comments.