npmpackage.info

Gathering detailed insights and metrics for @us3r-network/metadata-scraper

@us3r-network/metadata-scraper - 0.3.3 | npmpackage.info

@us3r-network/metadata-scraper

🏷️ A JavaScript library for scraping/parsing metadata from a web page.

0.3.3

120

MIT

TypeScript

118.42 kB

2,039 2.7

Installations

npm install @us3r-network/metadata-scraper

Developer Guide

BETA

Typescript

Yes

Module System

CommonJS

Node Version

20.10.0

NPM Version

10.2.3 Score

70.1

Supply Chain

98.9

Quality

80.9

Maintenance

100

Vulnerability

100

License

Pull Requests

Open

2

Total

287

Closed

145

Merged

140

Issues

Open

0

Total

4

Closed

4

Releases

v0.2.61

Updated on Oct 03, 2022

v0.2.60

Updated on Sep 12, 2022

v0.2.59

Updated on Aug 29, 2022

v0.2.58

Updated on Aug 13, 2022

v0.2.57

Updated on Jun 27, 2022

v0.2.56

Updated on Jun 13, 2022

View All 64 releases

Languages

TypeScript

JavaScript

TypeScript (97.77%)

JavaScript (2.23%)

Developer

BetaHuhn

Download Statistics

Total Downloads

2,039

Last Day

Last Week

Last Month

Last Year

775

GitHub Statistics

MIT License

120 Stars

414 Commits

18 Forks

3 Watchers

5 Branches

4 Contributors

Updated on Apr 09, 2025

Bundle Size

433.37 kB

Minified

118.42 kB

Minified + Gzipped

Bundlephobia

Maintainers

View All 4 Contributors

Package Meta Information

Latest Version

0.3.3

Package Id

@us3r-network/metadata-scraper@0.3.3

Unpacked Size

46.24 kB

Size

9.11 kB

File Count

NPM Version

10.2.3

Node Version

20.10.0

Published on

Feb 03, 2024

Total Downloads

Cumulative downloads

Total Downloads

2,039

Last Day

Compared to previous day

Last Week

Compared to previous week

Last Month

-39.1%

Compared to previous month

Last Year

-38.7%

775

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

domino got

Dev Dependencies

@betahuhn/config @typescript-eslint/eslint-plugin @typescript-eslint/parser eslint ts-node tsc-watch typescript

metadata-scraper

David

A Javascript library for scraping/parsing metadata from a web page.

👋 Introduction

metadata-scraper is a Javascript library which scrapes/parses metadata from web pages. You only need to supply it with a URL or an HTML string and it will use different rules to find the most relevant metadata like:

Title
Description
Favicons/Images
Language
Keywords
Author
and more (full list below)

🚀 Get started

Install metadata-scraper via npm:

1npm install metadata-scraper

📚 Usage

Import metadata-scraper and pass it a URL or options object:

1const getMetaData = require('metadata-scraper')
2
3const url = 'https://github.com/BetaHuhn/metadata-scraper'
4
5getMetaData(url).then((data) => {
6	console.log(data)
7})

Or with async/await:

1const getMetaData = require('metadata-scraper')
2
3async function run() {
4	const url = 'https://github.com/BetaHuhn/metadata-scraper'
5	const data = await getMetaData(url)
6	console.log(data)
7}
8
9run()

This will return:

1{
2	title: 'BetaHuhn/metadata-scraper',
3	description: 'A Javascript library for scraping/parsing metadata from a web page.',
4	language: 'en',
5	url: 'https://github.com/BetaHuhn/metadata-scraper',
6	provider: 'GitHub',
7	twitter: '@github',
8	image: 'https://avatars1.githubusercontent.com/u/51766171?s=400&v=4',
9	icon: 'https://github.githubassets.com/favicons/favicon.svg'
10}

You can see a list of all metadata which metadata-scraper tries to scrape below.

⚙️ Configuration

You can change the behaviour of metadata-scraper by passing an options object:

1const getMetaData = require('metadata-scraper')
2
3const options = {
4	url: 'https://github.com/BetaHuhn/metadata-scraper', // URL of web page
5	maxRedirects: 0, // Maximum number of redirects to follow (default: 5)
6	ua: 'MyApp', // Specify User-Agent header
7	lang: 'de-CH', // Specify Accept-Language header
8	timeout: 1000, // Request timeout in milliseconds (default: 10000ms)
9	forceImageHttps: false, // Force all image URLs to use https (default: true)
10	customRules: {} // more info below
11}
12
13getMetaData(options).then((data) => {
14	console.log(data)
15})

You can specify the URL by either passing it as the first parameter, or by setting it in the options object.

📖 Examples

Here are some examples on how to use metadata-scraper:

Basic

Pass a URL as the first parameter and metadata-scraper automatically scrapes it and returns everything it finds:

1const getMetaData = require('metadata-scraper')
2const data = await getMetaData('https://github.com/BetaHuhn/metadata-scraper')

Example file located at examples/basic.js.

HTML String

If you already have an HTML string and don't want metadata-scraper to make an http request, specify it in the options object:

1const getMetaData = require('metadata-scraper')
2
3const html = `
4	<meta name="og:title" content="Example">
5	<meta name="og:description" content="This is an example.">
6`
7
8const options {
9	html: html, 
10	url: 'https://example.com' // Optional URL to make relative image paths absolute
11}
12
13const data = await getMetaData(options)

Example file located at examples/html.js.

Custom Rules

Look at the rules.ts file in the src directory to see all rules which will be used.

You can expand metadata-scraper easily by specifying custom rules:

1const getMetaData = require('metadata-scraper')
2
3const options = {
4	url: 'https://github.com/BetaHuhn/metadata-scraper',
5	customRules: {
6		name: {
7			rules: [
8				[ 'meta[name="customName"][content]', (element) => element.getAttribute('content') ]
9			],
10			processor: (text) => text.toLowerCase()
11		}
12	}
13}
14
15const data = await getMetaData(options)

customRules needs to contain one or more objects, where the key (name above) will identify the value in the returned data.

You can then specify different rules for each item in the rules array.

The first item is the query which gets inserted into the browsers querySelector function, and the second item is a function which gets passed the HTML element:

1[ 'querySelector', (element) => element.innerText ]

You can also specify a processor function which will process/transform the result of one of the matched rules:

1{
2	processor: (text) => text.toLowerCase()
3}

If you find a useful rule, let me know and I will add it (or create a PR yourself).

Example file located at examples/custom.js.

📇 All metadata

Here's what metadata-scraper currently tries to scrape:

1{
2	title: 'Title of page or article',
3	description: 'Description of page or article',
4	language: 'Language of page or article',
5	type: 'Page type',
6	url: 'URL of page',
7	provider: 'Page provider',
8	keywords: ['array', 'of', 'keywords'],
9	section: 'Section/Category of page',
10	author: 'Article author',
11	published: 1605221765, // Date the article was published
12	modified: 1605221765, // Date the article was modified
13	robots: ['array', 'for', 'robots'],
14	copyright: 'Page copyright',
15	email: 'Contact email',
16	twitter: 'Twitter handle',
17	facebook: 'Facebook account id',
18	image: 'Image URL',
19	icon: 'Favicon URL',
20	video: 'Video URL',
21	audio: 'Audio URL'
22}

If you find a useful metatag, let me know and I will add it (or create a PR yourself).

💻 Development

Issues and PRs are very welcome!

Please check out the contributing guide before you start.

This project adheres to Semantic Versioning. To see differences with previous versions refer to the CHANGELOG.

❔ About

This library was developed by me (@betahuhn) in my free time. If you want to support me:

Credits

This library is based on Mozilla's page-metadata-parser. I converted it to TypeScript, implemented a few new features, and added more rules.

License

This project is licensed under the MIT License - see the LICENSE file for details.

No vulnerabilities found.

10

Binary-Artifacts

Determines if the project has generated executable (binary) artifacts in the source repository.

10

Dangerous-Workflow

Determines if the project's GitHub Action workflows avoid dangerous patterns.

10

License

Determines if the project has defined a license.

3

Pinned-Dependencies

Determines if the project has declared and pinned the dependencies of its build process.

Reason

dependency not pinned by hash detected -- score normalized to 3

Details

Warn: third-party GitHubAction not pinned by hash: .github/workflows/dependabot.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/dependabot.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:33: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:39: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/release-scheduler.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release-scheduler.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:31: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:33: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/stale.yml:9: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/stale.yml/master?enable=pin
Info: 0 out of 9 GitHub-owned GitHubAction dependencies pinned
Info: 0 out of 6 third-party GitHubAction dependencies pinned
Info: 4 out of 4 npmCommand dependencies pinned

0

Code-Review

Determines if the project requires human code review before pull requests (aka merge requests) are merged.

0

Maintained

Determines if the project is "actively maintained".

0

Token-Permissions

Determines if the project's workflows follow the principle of least privilege.

0

CII-Best-Practices

Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.

0

Fuzzing

Determines if the project uses fuzzing.

0

Security-Policy

Determines if the project has published a security policy.

0

Branch-Protection

Determines if the default and release branches are protected with GitHub's branch protection settings.

0

SAST

Determines if the project uses static code analysis.

0

Vulnerabilities

Determines if the project has open, known unfixed vulnerabilities.

Score

2.7

/10

Last Scanned on 2025-06-30

The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.

Learn More