Installations

npm install pelias-parser

Pull Requests

Open

9

Total

124

Closed

8

Merged

107

Issues

Open

27

Total

59

Closed

32

Releases

83

v2.6.0

Published on 26 Nov 2024

v2.5.0

Published on 15 Aug 2023

v2.4.0

Published on 30 Jan 2023

v2.3.1

Published on 13 Jan 2023

v2.3.0

Published on 06 Sept 2022

v2.2.1

Published on 27 Jun 2022

View all 83 releases

Developer

pelias

Developer Guide

BETA

Module System

CommonJS

Min. Node Version

>= 10.0.0

Typescript Support

No

Node Version

16.20.2

NPM Version

6.14.18 Statistics

55 Stars

268 Commits

28 Forks

11 Watching

10 Branches

21 Contributors

Updated on 26 Nov 2024

Languages

JavaScript (97.78%)

HTML (1.91%)

Shell (0.21%)

Dockerfile (0.09%)

Total Downloads

Cumulative downloads

Total Downloads

55,172

Last day

130%

46

Compared to previous day

Last week

40.2%

129

Compared to previous week

Last month

-0.9%

426

Compared to previous month

Last year

23.5%

10,647

Compared to previous year

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

5

cluster express pluralize remove-accents stringbuffer

Dev Dependencies

9

better-sqlite3 chalk csv-parse deep-eql glob precommit-hook standard tap-spec tape

Versions

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?

Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias Parser

A natural language classification engine for geocoding.

This library contains primitive 'building blocks' which can be composed together to produce a powerful and flexible natural language parser.

The project was designed and built to work with the Pelias geocoder, so it comes bundled with a parser called AddressParser which can be included in other npm project independent of Pelias.

It is also possible to modify the configuration of AddressParser, the dictionaries or the semantics. You can also easily create a completely new parser to suit your own domain.

AddressParser Example

30 w 26 st nyc 10010

(0.95) ➜ [
  { housenumber: '30' },
  { street: 'w 26 st' },
  { locality: 'nyc' },
  { postcode: '10010' }
]

Application Interfaces

You can access the library via three different interfaces:

all parts of the codebase are available in javascript via npm
on the command line via the node bin/cli.js script
through a web service via the node server/http.js script

the web service provides an interactive demo at the URL /parser/parse

Quick Start

A quick and easy way to get started with the library is to use the command-line interface:

node bin/cli.js West 26th Street, New York, NYC, 10010

cli

Architecture Description

Please refer to the CLI screenshot above for a visual reference.

Tokenization

Tokenization is the process of splitting text into individual words.

The splitting process used by the engine maintains token positions, so it's able to 'remember' where each character was in the original input text.

Tokenization is coloured blue on the command-line.

Span

The most primitive element is called a span, this is essentially just a single string of text with some metadata attached.

The terms word, phrase and section (explained below) are all just ways of using a span.

Section Boundaries

Some parsers like libpostal ignore characters such as comma, tab, newline and quote.

While it's unrealistic to expect commas always being present, it's very useful to record their positions when they are.

These boundary positions help to avoid parsing errors for queries such as Main St, East Village being parsed as Main St East in Village.

Once sections are established there is no 'bleeding' of information between sections, avoiding the issue above.

Word Splitting

Each section is then split in to individual words, by default this simply considers whitespace as a word boundary.

As per the section, the original token positions are maintained.

Phrase Generation

May terms such as 'New York City' span multiple words, these multi-word tokens are called phrases.

In order to be able to classify phrase terms, permutations of adjacent words are generated.

Phrase generation is performed per-section, so it will not generate a phrase which contains words from more than one section.

Phrase generation is controlled by a configuration which specifies things like the minimum & maximum amount of words allowed in a phrase.

Token Graph

A graph is used to associate word, phrase and section elements to each other.

The graph is free-form, so it's easy to add a new relationship between terms in the future, as required.

Graph Example:

1// find the next word in this section
2word.findOne('next')
3
4// find all words in this phrase
5phrase.findAll('child')

Classification

Classification is the process of establishing that a word or phrase represents a 'concept' (such as a street name).

Classification can be based on:

Dictionary matching (usually with normalization applied)
Pattern matching (such as regular expressions)
Composite matching (such as relative positioning)
External API calls (such as calling other services)
Other semantic matching techniques

Classification is coloured green and red on the command-line.

Classifier Types

The library comes with three generic classifiers which can be extended in order to create a new classifier:

WordClassifier
PhraseClassifier
SectionClassifier

Classifiers

The library comes bundled with a range of classifiers out-of-the box.

You can find them in the /classifier directory, dictionary-based classifiers usually store their data in the /resources directory.

Example of some of the included classifiers:

1// word classifiers
2HouseNumberClassifier
3PostcodeClassifier
4StreetPrefixClassifier
5StreetSuffixClassifier
6CompoundStreetClassifier
7DirectionalClassifier
8OrdinalClassifier
9StopWordClassifier
10
11// phrase classifiers
12IntersectionClassifier
13PersonClassifier
14GivenNameClassifier
15SurnameClassifier
16PersonalSuffixClassifier
17PersonalTitleClassifier
18ChainClassifier
19PlaceClassifier
20WhosOnFirstClassifier

Solvers

Solving is the final process, where solutions are generated based on all the classifications that have been made.

Each parse can contain multiple solutions, each is provided with a confidence score and is displayed sorted from highest scoring solution to lowest scoring.

The core of this process is the ExclusiveCartesianSolver module.

This solver generates all the possible permutations of the different classifications while taking care to:

ensure the same span position is not used more than once
ensure that the same classification is not used more than once.

After the ExclusiveCartesianSolver has run there are additional solvers which can:

filter the solutions to remove inconsistencies
add new solutions to provide additional functionality (such as intersections)

Solution Masks

It is possible to produce a simple mask for any generated solution, this is useful for comparing the solution to the original text:

1VVV VVVV NN SSSSSSS AAAAAA PPPPP
2Foo Cafe 10 Main St London 10010 Earth

Contributing

Please fork and pull request against upstream master on a feature branch. Pretty please; provide unit tests.

Unit tests

You can run the unit test suite using the command:

1$ npm test

Continuous Integration

CI tests every release against all supported Node.js versions.

Versioning

We rely on semantic-release and Greenkeeper to maintain our module and dependency versions.

No vulnerabilities found.

10

Dangerous-Workflow

Determines if the project's GitHub Action workflows avoid dangerous patterns.

10

Binary-Artifacts

Determines if the project has generated executable (binary) artifacts in the source repository.

10

Vulnerabilities

Determines if the project has open, known unfixed vulnerabilities.

10

License

Determines if the project has defined a license.

6

Code-Review

Determines if the project requires human code review before pull requests (aka merge requests) are merged.

2

Maintained

Determines if the project is "actively maintained".

0

Token-Permissions

Determines if the project's workflows follow the principle of least privilege.

0

CII-Best-Practices

Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.

0

Fuzzing

Determines if the project uses fuzzing.

0

Security-Policy

Determines if the project has published a security policy.

0

Pinned-Dependencies

Determines if the project has declared and pinned the dependencies of its build process.

0

SAST

Determines if the project uses static code analysis.

Score

4.6

/10

Last Scanned on 2024-11-18

The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.

Learn More

Other packages similar to pelias-parser

pelias-parser

Installations

Pull Requests

9

124

8

107

Issues

27

59

32

Releases

Developer

Developer Guide

CommonJS

>= 10.0.0

No

16.20.2

6.14.18

Statistics

Languages

Total Downloads

55,172

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

Dev Dependencies

A modular, open-source search engine for our world.

Pelias Parser

AddressParser Example

Application Interfaces

Quick Start

Architecture Description

Tokenization

Span

Section Boundaries

Word Splitting

Phrase Generation

Token Graph

Classification

Classifier Types

Classifiers

Solvers

Solution Masks

Contributing

Unit tests

Continuous Integration

Versioning

10

10

10

10

6

2

0

0

0

0

0

0

4.6

/10

Other packages similar to pelias-parser