Installations
npm install metadata-scraper
Developer Guide
Typescript
Yes
Module System
CommonJS
Node Version
14.20.0
NPM Version
7.21.1
Score
77.5
Supply Chain
99.5
Quality
75.7
Maintenance
100
Vulnerability
100
License
Releases
Contributors
Unable to fetch Contributors
Languages
TypeScript (97.77%)
JavaScript (2.23%)
Developer
Download Statistics
Total Downloads
839,450
Last Day
1,439
Last Week
7,804
Last Month
26,834
Last Year
420,801
GitHub Statistics
113 Stars
414 Commits
18 Forks
4 Watching
6 Branches
4 Contributors
Bundle Size
430.59 kB
Minified
118.19 kB
Minified + Gzipped
Package Meta Information
Latest Version
0.2.61
Package Id
metadata-scraper@0.2.61
Unpacked Size
42.20 kB
Size
8.73 kB
File Count
11
NPM Version
7.21.1
Node Version
14.20.0
Total Downloads
Cumulative downloads
Total Downloads
839,450
Last day
12.2%
1,439
Compared to previous day
Last week
4.8%
7,804
Compared to previous week
Last month
-31.2%
26,834
Compared to previous month
Last year
35.4%
420,801
Compared to previous year
Daily Downloads
Weekly Downloads
Monthly Downloads
Yearly Downloads
👋 Introduction
metadata-scraper is a Javascript library which scrapes/parses metadata from web pages. You only need to supply it with a URL or an HTML string and it will use different rules to find the most relevant metadata like:
- Title
- Description
- Favicons/Images
- Language
- Keywords
- Author
- and more (full list below)
🚀 Get started
Install metadata-scraper via npm:
1npm install metadata-scraper
📚 Usage
Import metadata-scraper
and pass it a URL or options object:
1const getMetaData = require('metadata-scraper') 2 3const url = 'https://github.com/BetaHuhn/metadata-scraper' 4 5getMetaData(url).then((data) => { 6 console.log(data) 7})
Or with async
/await
:
1const getMetaData = require('metadata-scraper') 2 3async function run() { 4 const url = 'https://github.com/BetaHuhn/metadata-scraper' 5 const data = await getMetaData(url) 6 console.log(data) 7} 8 9run()
This will return:
1{ 2 title: 'BetaHuhn/metadata-scraper', 3 description: 'A Javascript library for scraping/parsing metadata from a web page.', 4 language: 'en', 5 url: 'https://github.com/BetaHuhn/metadata-scraper', 6 provider: 'GitHub', 7 twitter: '@github', 8 image: 'https://avatars1.githubusercontent.com/u/51766171?s=400&v=4', 9 icon: 'https://github.githubassets.com/favicons/favicon.svg' 10}
You can see a list of all metadata which metadata-scraper tries to scrape below.
⚙️ Configuration
You can change the behaviour of metadata-scraper by passing an options object:
1const getMetaData = require('metadata-scraper') 2 3const options = { 4 url: 'https://github.com/BetaHuhn/metadata-scraper', // URL of web page 5 maxRedirects: 0, // Maximum number of redirects to follow (default: 5) 6 ua: 'MyApp', // Specify User-Agent header 7 lang: 'de-CH', // Specify Accept-Language header 8 timeout: 1000, // Request timeout in milliseconds (default: 10000ms) 9 forceImageHttps: false, // Force all image URLs to use https (default: true) 10 customRules: {} // more info below 11} 12 13getMetaData(options).then((data) => { 14 console.log(data) 15})
You can specify the URL by either passing it as the first parameter, or by setting it in the options object.
📖 Examples
Here are some examples on how to use metadata-scraper:
Basic
Pass a URL as the first parameter and metadata-scraper automatically scrapes it and returns everything it finds:
1const getMetaData = require('metadata-scraper') 2const data = await getMetaData('https://github.com/BetaHuhn/metadata-scraper')
Example file located at examples/basic.js.
HTML String
If you already have an HTML string and don't want metadata-scraper to make an http request, specify it in the options object:
1const getMetaData = require('metadata-scraper') 2 3const html = ` 4 <meta name="og:title" content="Example"> 5 <meta name="og:description" content="This is an example."> 6` 7 8const options { 9 html: html, 10 url: 'https://example.com' // Optional URL to make relative image paths absolute 11} 12 13const data = await getMetaData(options)
Example file located at examples/html.js.
Custom Rules
Look at the rules.ts
file in the src
directory to see all rules which will be used.
You can expand metadata-scraper easily by specifying custom rules:
1const getMetaData = require('metadata-scraper') 2 3const options = { 4 url: 'https://github.com/BetaHuhn/metadata-scraper', 5 customRules: { 6 name: { 7 rules: [ 8 [ 'meta[name="customName"][content]', (element) => element.getAttribute('content') ] 9 ], 10 processor: (text) => text.toLowerCase() 11 } 12 } 13} 14 15const data = await getMetaData(options)
customRules
needs to contain one or more objects, where the key (name above) will identify the value in the returned data.
You can then specify different rules for each item in the rules array.
The first item is the query which gets inserted into the browsers querySelector function, and the second item is a function which gets passed the HTML element:
1[ 'querySelector', (element) => element.innerText ]
You can also specify a processor
function which will process/transform the result of one of the matched rules:
1{ 2 processor: (text) => text.toLowerCase() 3}
If you find a useful rule, let me know and I will add it (or create a PR yourself).
Example file located at examples/custom.js.
📇 All metadata
Here's what metadata-scraper currently tries to scrape:
1{ 2 title: 'Title of page or article', 3 description: 'Description of page or article', 4 language: 'Language of page or article', 5 type: 'Page type', 6 url: 'URL of page', 7 provider: 'Page provider', 8 keywords: ['array', 'of', 'keywords'], 9 section: 'Section/Category of page', 10 author: 'Article author', 11 published: 1605221765, // Date the article was published 12 modified: 1605221765, // Date the article was modified 13 robots: ['array', 'for', 'robots'], 14 copyright: 'Page copyright', 15 email: 'Contact email', 16 twitter: 'Twitter handle', 17 facebook: 'Facebook account id', 18 image: 'Image URL', 19 icon: 'Favicon URL', 20 video: 'Video URL', 21 audio: 'Audio URL' 22}
If you find a useful metatag, let me know and I will add it (or create a PR yourself).
💻 Development
Issues and PRs are very welcome!
Please check out the contributing guide before you start.
This project adheres to Semantic Versioning. To see differences with previous versions refer to the CHANGELOG.
❔ About
This library was developed by me (@betahuhn) in my free time. If you want to support me:
Credits
This library is based on Mozilla's page-metadata-parser. I converted it to TypeScript, implemented a few new features, and added more rules.
License
Copyright 2020 Maximilian Schiller
This project is licensed under the MIT License - see the LICENSE file for details.
No vulnerabilities found.
Reason
no binaries found in the repo
Reason
no dangerous workflow patterns detected
Reason
license file detected
Details
- Info: project has a license file: LICENSE:0
- Info: FSF or OSI recognized license: MIT License: LICENSE:0
Reason
dependency not pinned by hash detected -- score normalized to 3
Details
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/dependabot.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/dependabot.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:33: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:39: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/node.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/release-scheduler.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release-scheduler.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:31: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:33: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/stale.yml:9: update your workflow using https://app.stepsecurity.io/secureworkflow/BetaHuhn/metadata-scraper/stale.yml/master?enable=pin
- Info: 0 out of 9 GitHub-owned GitHubAction dependencies pinned
- Info: 0 out of 6 third-party GitHubAction dependencies pinned
- Info: 4 out of 4 npmCommand dependencies pinned
Reason
0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Reason
Found 1/16 approved changesets -- score normalized to 0
Reason
detected GitHub workflow tokens with excessive permissions
Details
- Warn: no topLevel permission defined: .github/workflows/dependabot.yml:1
- Warn: no topLevel permission defined: .github/workflows/node.yml:1
- Warn: no topLevel permission defined: .github/workflows/release-scheduler.yml:1
- Warn: no topLevel permission defined: .github/workflows/release.yml:1
- Warn: no topLevel permission defined: .github/workflows/stale.yml:1
- Info: no jobLevel write permissions found
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
project is not fuzzed
Details
- Warn: no fuzzer integrations found
Reason
branch protection not enabled on development/release branches
Details
- Warn: branch protection not enabled for branch 'master'
Reason
security policy file not detected
Details
- Warn: no security policy file detected
- Warn: no security file to analyze
- Warn: no security file to analyze
- Warn: no security file to analyze
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
- Warn: 0 commits out of 15 are checked with a SAST tool
Reason
34 existing vulnerabilities detected
Details
- Warn: Project is vulnerable to: GHSA-93q8-gq69-wqmw
- Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg
- Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275
- Warn: Project is vulnerable to: GHSA-w573-4hg7-7wgq
- Warn: Project is vulnerable to: GHSA-ww39-953v-wcq6
- Warn: Project is vulnerable to: GHSA-pfrx-2q88-qq97
- Warn: Project is vulnerable to: GHSA-rc47-6667-2j5j
- Warn: Project is vulnerable to: GHSA-78xj-cgh5-2h22
- Warn: Project is vulnerable to: GHSA-2p57-rm9w-gvfp
- Warn: Project is vulnerable to: GHSA-896r-f27r-55mw
- Warn: Project is vulnerable to: GHSA-5v2h-r2cx-5xgj
- Warn: Project is vulnerable to: GHSA-rrrm-qjm4-v8hf
- Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv
- Warn: Project is vulnerable to: GHSA-f8q6-p94x-37v3
- Warn: Project is vulnerable to: GHSA-xvch-5gv4-984h
- Warn: Project is vulnerable to: GHSA-r683-j2x4-v87g
- Warn: Project is vulnerable to: GHSA-hj9c-8jmm-8c52
- Warn: Project is vulnerable to: GHSA-3j8f-xvm3-ffx4
- Warn: Project is vulnerable to: GHSA-4p35-cfcx-8653
- Warn: Project is vulnerable to: GHSA-7f3x-x4pr-wqhj
- Warn: Project is vulnerable to: GHSA-jpp7-7chh-cf67
- Warn: Project is vulnerable to: GHSA-q6wq-5p59-983w
- Warn: Project is vulnerable to: GHSA-j9fq-vwqv-2fm2
- Warn: Project is vulnerable to: GHSA-pqw5-jmp5-px4v
- Warn: Project is vulnerable to: GHSA-hrpp-h998-j3pp
- Warn: Project is vulnerable to: GHSA-p8p7-x288-28g6
- Warn: Project is vulnerable to: GHSA-x2pg-mjhr-2m5x
- Warn: Project is vulnerable to: GHSA-c2qf-rxjj-qqgw
- Warn: Project is vulnerable to: GHSA-44c6-4v22-4mhx
- Warn: Project is vulnerable to: GHSA-4x5v-gmq8-25ch
- Warn: Project is vulnerable to: GHSA-f5x3-32g6-xq36
- Warn: Project is vulnerable to: GHSA-72xf-g2v4-qvf3
- Warn: Project is vulnerable to: GHSA-38fc-wpqx-33j7
- Warn: Project is vulnerable to: GHSA-j8xg-fqg3-53r7
Score
2.7
/10
Last Scanned on 2025-01-13
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn MoreOther packages similar to metadata-scraper
open-graph-scraper
Node.js scraper module for Open Graph and Twitter Card info
@future-scholars/paperlib-metadata-scrape-extension
This extension scrapes the metadata for a paper from the web database for Paperlib.
url-metadata
Request a url and scrape the metadata from its HTML using Node.js or the browser.
@us3r-network/metadata-scraper
A Javascript library for scraping/parsing metadata from a web page.