npmpackage.info

Gathering detailed insights and metrics for hyparquet-writer

hyparquet-writer - 0.6.0 | npmpackage.info

hyparquet-writer

Apache Parquet file writer in JavaScript

0.6.0

MIT

JavaScript

85.25 kB

Installations

npm install hyparquet-writer

Developer Guide

BETA

Typescript

Yes

Module System

ESM

Node Version

22.16.0

NPM Version

11.3.0 Pull Requests

Open

0

Total

2

Closed

1

Merged

1

Issues

Open

4

Total

5

Closed

1

Releases

Unable to fetch releases

Languages

JavaScript

JavaScript (100%)

Developer

hyparam

Download Statistics

Total Downloads

Last Day

Last Week

Last Month

Last Year

GitHub Statistics

MIT License

26 Stars

83 Commits

3 Forks

1 Watchers

2 Branches

1 Contributors

Updated on Jul 13, 2025

Maintainers

View All 1 Contributors

Package Meta Information

Latest Version

0.6.0

Package Id

hyparquet-writer@0.6.0

Unpacked Size

85.25 kB

Size

21.86 kB

File Count

NPM Version

11.3.0

Node Version

22.16.0

Published on

Jul 04, 2025

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

NaN

Compared to previous day

Last Week

NaN

Compared to previous week

Last Month

NaN

Compared to previous month

Last Year

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

hyparquet

Dev Dependencies

@babel/eslint-parser @types/node @vitest/coverage-v8 eslint eslint-plugin-jsdoc typescript vitest

Hyparquet Writer

hyparquet writer parakeet

Hyparquet Writer is a JavaScript library for writing Apache Parquet files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the hyparquet library, which is a JavaScript library for reading parquet files.

Quick Start

To write a parquet file to an ArrayBuffer use parquetWriteBuffer with argument columnData. Each column in columnData should contain:

name: the column name
data: an array of same-type values
type: the parquet schema type (optional)

1import { parquetWriteBuffer } from 'hyparquet-writer'
2
3const arrayBuffer = parquetWriteBuffer({
4  columnData: [
5    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
6    { name: 'age', data: [25, 30, 35], type: 'INT32' },
7  ],
8})

Note: if type is not provided, the type will be guessed from the data. The supported types are a superset of the parquet types:

| BOOLEAN | { type: 'BOOLEAN' } | | INT32 | { type: 'INT32' } | | INT64 | { type: 'INT64' } | | FLOAT | { type: 'FLOAT' } | | DOUBLE | { type: 'DOUBLE' } | | BYTE_ARRAY | { type: 'BYTE_ARRAY' } | | STRING | { type: 'BYTE_ARRAY', converted_type: 'UTF8' } | | JSON | { type: 'BYTE_ARRAY', converted_type: 'JSON' } | | TIMESTAMP | { type: 'INT64', converted_type: 'TIMESTAMP_MILLIS' } | | UUID | { type: 'FIXED_LEN_BYTE_ARRAY', type_length: 16, logical_type: { type: 'UUID' } } | | FLOAT16 | { type: 'FIXED_LEN_BYTE_ARRAY', type_length: 2, logical_type: { type: 'FLOAT16' } } |

More types are supported but require defining the schema explicitly. See the advanced usage section for more details.

Node.js Write to Local Parquet File

To write a local parquet file in node.js use parquetWriteFile with arguments filename and columnData:

1const { parquetWriteFile } = await import('hyparquet-writer')
2
3parquetWriteFile({
4  filename: 'example.parquet',
5  columnData: [
6    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
7    { name: 'age', data: [25, 30, 35], type: 'INT32' },
8  ],
9})

Note: hyparquet-writer is published as an ES module, so dynamic import() may be required on the command line.

Advanced Usage

Options can be passed to parquetWrite to adjust parquet file writing behavior:

writer: a generic writer object
schema: parquet schema object (optional)
compressed: use snappy compression (default true)
statistics: write column statistics (default true)
rowGroupSize: number of rows in each row group (default 100000)
kvMetadata: extra key-value metadata to be stored in the parquet footer

1import { ByteWriter, parquetWrite } from 'hyparquet-writer'
2
3const writer = new ByteWriter()
4parquetWrite({
5  writer,
6  columnData: [
7    { name: 'name', data: ['Alice', 'Bob', 'Charlie'] },
8    { name: 'age', data: [25, 30, 35] },
9    { name: 'dob', data: [new Date(1000000), new Date(2000000), new Date(3000000)] },
10  ],
11  // explicit schema:
12  schema: [
13    { name: 'root', num_children: 3 },
14    { name: 'name', type: 'BYTE_ARRAY', converted_type: 'UTF8' },
15    { name: 'age', type: 'FIXED_LEN_BYTE_ARRAY', type_length: 4, converted_type: 'DECIMAL', scale: 2, precision: 4 },
16    { name: 'dob', type: 'INT32', converted_type: 'DATE' },
17  ],
18  compressed: false,
19  statistics: false,
20  rowGroupSize: 1000,
21  kvMetadata: [
22    { key: 'key1', value: 'value1' },
23    { key: 'key2', value: 'value2' },
24  ],
25})
26const arrayBuffer = writer.getBuffer()

Types

Parquet requires an explicit schema to be defined. You can provide schema information in three ways:

Type: You can provide a type in the columnData elements, the type will be used as the schema type.
Schema: You can provide a schema parameter that explicitly defines the parquet schema. The schema should be an array of SchemaElement objects (see parquet-format), each containing the following properties:
- name: column name
- type: parquet type
- num_children: number children in parquet nested schema (optional)
- converted_type: parquet converted type (optional)
- logical_type: parquet logical type (optional)
- repetition_type: parquet repetition type (optional)
- type_length: length for FIXED_LENGTH_BYTE_ARRAY type (optional)
- scale: the scale factor for DECIMAL converted types (optional)
- precision: the precision for DECIMAL converted types (optional)
- field_id: the field id for the column (optional)
Auto-detect: If you provide no type or schema, the type will be auto-detected from the data. However, it is recommended that you provide type information when possible. (zero rows would throw an exception, floats might be typed as int, etc)

Most converted types will be auto-detected if you just provide data with no types. However, it is still recommended that you provide type information when possible. (zero rows would throw an exception, floats might be typed as int, etc)

Schema Overrides

You can use mostly automatic schema detection, but override the schema for specific columns. This is useful if most of the column types can be automatically determined, but you want to use a specific schema element for one particular element.

1import { parquetWrite, schemaFromColumnData } from 'hyparquet-writer'
2
3const columnData = [
4  { name: 'unsigned_int', data: [1000000, 2000000] },
5  { name: 'signed_int', data: [1000000, 2000000] },
6]
7parquetWrite({
8  columnData,
9  // override schema for uint column
10  schema: schemaFromColumnData({
11    columnData,
12    schemaOverrides: {
13      unsigned_int: {
14        type: 'INT32',
15        converted_type: 'UINT_32',
16      },
17    },
18  }),
19})

References

No vulnerabilities found.

No security vulnerabilities found.