Gathering detailed insights and metrics for hyparquet-writer
Gathering detailed insights and metrics for hyparquet-writer
Gathering detailed insights and metrics for hyparquet-writer
Gathering detailed insights and metrics for hyparquet-writer
npm install hyparquet-writer
Typescript
Module System
Node Version
NPM Version
JavaScript (100%)
Total Downloads
0
Last Day
0
Last Week
0
Last Month
0
Last Year
0
MIT License
26 Stars
83 Commits
3 Forks
1 Watchers
2 Branches
1 Contributors
Updated on Jul 13, 2025
Latest Version
0.6.0
Package Id
hyparquet-writer@0.6.0
Unpacked Size
85.25 kB
Size
21.86 kB
File Count
46
NPM Version
11.3.0
Node Version
22.16.0
Published on
Jul 04, 2025
Cumulative downloads
Total Downloads
Last Day
0%
NaN
Compared to previous day
Last Week
0%
NaN
Compared to previous week
Last Month
0%
NaN
Compared to previous month
Last Year
0%
NaN
Compared to previous year
1
Hyparquet Writer is a JavaScript library for writing Apache Parquet files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the hyparquet library, which is a JavaScript library for reading parquet files.
To write a parquet file to an ArrayBuffer
use parquetWriteBuffer
with argument columnData
. Each column in columnData
should contain:
name
: the column namedata
: an array of same-type valuestype
: the parquet schema type (optional)1import { parquetWriteBuffer } from 'hyparquet-writer' 2 3const arrayBuffer = parquetWriteBuffer({ 4 columnData: [ 5 { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' }, 6 { name: 'age', data: [25, 30, 35], type: 'INT32' }, 7 ], 8})
Note: if type
is not provided, the type will be guessed from the data. The supported types are a superset of the parquet types:
| BOOLEAN
| { type: 'BOOLEAN' }
|
| INT32
| { type: 'INT32' }
|
| INT64
| { type: 'INT64' }
|
| FLOAT
| { type: 'FLOAT' }
|
| DOUBLE
| { type: 'DOUBLE' }
|
| BYTE_ARRAY
| { type: 'BYTE_ARRAY' }
|
| STRING
| { type: 'BYTE_ARRAY', converted_type: 'UTF8' }
|
| JSON
| { type: 'BYTE_ARRAY', converted_type: 'JSON' }
|
| TIMESTAMP
| { type: 'INT64', converted_type: 'TIMESTAMP_MILLIS' }
|
| UUID
| { type: 'FIXED_LEN_BYTE_ARRAY', type_length: 16, logical_type: { type: 'UUID' } }
|
| FLOAT16
| { type: 'FIXED_LEN_BYTE_ARRAY', type_length: 2, logical_type: { type: 'FLOAT16' } }
|
More types are supported but require defining the schema
explicitly. See the advanced usage section for more details.
To write a local parquet file in node.js use parquetWriteFile
with arguments filename
and columnData
:
1const { parquetWriteFile } = await import('hyparquet-writer') 2 3parquetWriteFile({ 4 filename: 'example.parquet', 5 columnData: [ 6 { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' }, 7 { name: 'age', data: [25, 30, 35], type: 'INT32' }, 8 ], 9})
Note: hyparquet-writer is published as an ES module, so dynamic import()
may be required on the command line.
Options can be passed to parquetWrite
to adjust parquet file writing behavior:
writer
: a generic writer objectschema
: parquet schema object (optional)compressed
: use snappy compression (default true)statistics
: write column statistics (default true)rowGroupSize
: number of rows in each row group (default 100000)kvMetadata
: extra key-value metadata to be stored in the parquet footer1import { ByteWriter, parquetWrite } from 'hyparquet-writer' 2 3const writer = new ByteWriter() 4parquetWrite({ 5 writer, 6 columnData: [ 7 { name: 'name', data: ['Alice', 'Bob', 'Charlie'] }, 8 { name: 'age', data: [25, 30, 35] }, 9 { name: 'dob', data: [new Date(1000000), new Date(2000000), new Date(3000000)] }, 10 ], 11 // explicit schema: 12 schema: [ 13 { name: 'root', num_children: 3 }, 14 { name: 'name', type: 'BYTE_ARRAY', converted_type: 'UTF8' }, 15 { name: 'age', type: 'FIXED_LEN_BYTE_ARRAY', type_length: 4, converted_type: 'DECIMAL', scale: 2, precision: 4 }, 16 { name: 'dob', type: 'INT32', converted_type: 'DATE' }, 17 ], 18 compressed: false, 19 statistics: false, 20 rowGroupSize: 1000, 21 kvMetadata: [ 22 { key: 'key1', value: 'value1' }, 23 { key: 'key2', value: 'value2' }, 24 ], 25}) 26const arrayBuffer = writer.getBuffer()
Parquet requires an explicit schema to be defined. You can provide schema information in three ways:
type
in the columnData
elements, the type will be used as the schema type.schema
parameter that explicitly defines the parquet schema. The schema should be an array of SchemaElement
objects (see parquet-format), each containing the following properties:
name
: column nametype
: parquet typenum_children
: number children in parquet nested schema (optional)converted_type
: parquet converted type (optional)logical_type
: parquet logical type (optional)repetition_type
: parquet repetition type (optional)type_length
: length for FIXED_LENGTH_BYTE_ARRAY
type (optional)scale
: the scale factor for DECIMAL
converted types (optional)precision
: the precision for DECIMAL
converted types (optional)field_id
: the field id for the column (optional)Most converted types will be auto-detected if you just provide data with no types. However, it is still recommended that you provide type information when possible. (zero rows would throw an exception, floats might be typed as int, etc)
You can use mostly automatic schema detection, but override the schema for specific columns. This is useful if most of the column types can be automatically determined, but you want to use a specific schema element for one particular element.
1import { parquetWrite, schemaFromColumnData } from 'hyparquet-writer' 2 3const columnData = [ 4 { name: 'unsigned_int', data: [1000000, 2000000] }, 5 { name: 'signed_int', data: [1000000, 2000000] }, 6] 7parquetWrite({ 8 columnData, 9 // override schema for uint column 10 schema: schemaFromColumnData({ 11 columnData, 12 schemaOverrides: { 13 unsigned_int: { 14 type: 'INT32', 15 converted_type: 'UINT_32', 16 }, 17 }, 18 }), 19})
No vulnerabilities found.
No security vulnerabilities found.