0
0
mirror of https://github.com/nodejs/node.git synced 2024-12-01 16:10:02 +01:00
nodejs/doc/api/string_decoder.md

83 lines
2.6 KiB
Markdown
Raw Normal View History

# String Decoder
2012-04-20 08:32:58 +02:00
> Stability: 2 - Stable
2012-04-20 08:32:58 +02:00
The `string_decoder` module provides an API for decoding `Buffer` objects into
strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16
characters. It can be accessed using:
```js
const StringDecoder = require('string_decoder').StringDecoder;
```
The following example shows the basic use of the `StringDecoder` class.
2012-04-20 08:32:58 +02:00
```js
const StringDecoder = require('string_decoder').StringDecoder;
const decoder = new StringDecoder('utf8');
2012-04-20 08:32:58 +02:00
const cent = Buffer.from([0xC2, 0xA2]);
console.log(decoder.write(cent));
2012-04-20 08:32:58 +02:00
const euro = Buffer.from([0xE2, 0x82, 0xAC]);
console.log(decoder.write(euro));
```
2012-04-20 08:32:58 +02:00
When a `Buffer` instance is written to the `StringDecoder` instance, an
internal buffer is used to ensure that the decoded string does not contain
any incomplete multibyte characters. These are held in the buffer until the
next call to `stringDecoder.write()` or until `stringDecoder.end()` is called.
In the following example, the three UTF-8 encoded bytes of the European Euro
symbol (`€`) are written over three separate operations:
```js
const StringDecoder = require('string_decoder').StringDecoder;
const decoder = new StringDecoder('utf8');
decoder.write(Buffer.from([0xE2]));
decoder.write(Buffer.from([0x82]));
console.log(decoder.end(Buffer.from([0xAC])));
```
## Class: new StringDecoder([encoding])
<!-- YAML
added: v0.1.99
-->
2012-04-20 08:32:58 +02:00
* `encoding` {string} The character encoding the `StringDecoder` will use.
Defaults to `'utf8'`.
2012-04-20 08:32:58 +02:00
Creates a new `StringDecoder` instance.
### stringDecoder.end([buffer])
<!-- YAML
added: v0.9.3
-->
* `buffer` {Buffer} A `Buffer` containing the bytes to decode.
Returns any remaining input stored in the internal buffer as a string. Bytes
representing incomplete UTF-8 and UTF-16 characters will be replaced with
substitution characters appropriate for the character encoding.
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
is performed before returning the remaining input.
### stringDecoder.write(buffer)
<!-- YAML
added: v0.1.99
changes:
- version: REPLACEME
pr-url: https://github.com/nodejs/node/pull/9618
description: Each invalid character is now replaced by a single replacement
character instead of one for each individual byte.
-->
* `buffer` {Buffer} A `Buffer` containing the bytes to decode.
Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer` are omitted from the returned string and stored in an
internal buffer for the next call to `stringDecoder.write()` or
`stringDecoder.end()`.