npmpackage.info

Gathering detailed insights and metrics for @extractus/feed-extractor

@extractus/feed-extractor

Simplest way to read & normalize RSS/ATOM/JSON feed data

7.1.3

170

MIT

JavaScript

37.65 kB

231,572

Installations

npm install @extractus/feed-extractor

Developer Guide

BETA

Typescript

Yes

Module System

ESM, UMD

Min. Node Version

>= 18

Node Version

20.12.2

NPM Version

10.7.0 Score

57.4

Supply Chain

Quality

81.8

Maintenance

100

Vulnerability

100

License

Pull Requests

Open

0

Total

88

Closed

2

Merged

86

Issues

Open

6

Total

53

Closed

47

Releases

v7.1.3

Published on 07 May 2024

v7.1.2

Published on 26 Apr 2024

v7.1.1

Published on 26 Apr 2024

v7.1.0

Published on 29 Mar 2024

v7.0.9

Published on 22 Jan 2024

v7.0.8

Published on 02 Dec 2023

View all 50 releases

Contributors

Unable to fetch Contributors

View all 11 contributors

Languages

JavaScript

JavaScript (100%)

Developer

extractus

Download Statistics

Total Downloads

231,572

Last Day

238

Last Week

2,409

Last Month

13,754

Last Year

143,182

GitHub Statistics

170 Stars

272 Commits

33 Forks

5 Watching

1 Branches

11 Contributors

Bundle Size

110.78 kB

Minified

37.65 kB

Minified + Gzipped

Bundlephobia

Maintainers

Package Meta Information

Latest Version

7.1.3

Package Id

@extractus/feed-extractor@7.1.3

Unpacked Size

118.61 kB

Size

29.96 kB

File Count

NPM Version

10.7.0

Node Version

20.12.2

Publised On

07 May 2024

Total Downloads

Cumulative downloads

Total Downloads

231,572

Last day

-66.9%

238

Compared to previous day

Last week

-36%

2,409

Compared to previous week

Last month

-23.8%

13,754

Compared to previous month

Last year

63.5%

143,182

Compared to previous year

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

bellajs cross-fetch fast-xml-parser html-entities

Dev Dependencies

esbuild eslint globals https-proxy-agent jest nock

Versions

feed-extractor

To read & normalize RSS/ATOM/JSON feed data.

(This library is derived from feed-reader renamed.)

Demo

Install & Usage

Node.js

1npm i @extractus/feed-extractor

1import { extract } from '@extractus/feed-extractor'
2
3// extract a RSS
4const result = await extract('https://news.google.com/rss')
5console.log(result)

Deno

1import { extract } from 'npm:@extractus/feed-extractor'

Browser

1import { extract } from 'https://esm.sh/@extractus/feed-extractor'

Please check the examples for reference.

Automate RSS feed extraction with GitHub Actions

RSS Feed Fetch Action is a GitHub Action designed to automate the fetching of RSS feeds. It fetches an RSS feed from a given URL and saves it to a specified file in your GitHub repository. This action is particularly useful for populating content on GitHub Pages websites or other static site generators.

CJS Deprecated

CJS is deprecated for this package. When calling require('@extractus/feed-extractor') a deprecation warning is now logged. You should update your code to use the ESM export.

You can ignore this warning via the environment variable FEED_EXTRACTOR_CJS_IGNORE_WARNING=true
To see where the warning is coming from you can set the environment variable FEED_EXTRACTOR_CJS_TRACE_WARNING=true

Note:

Old method read() has been marked as deprecated and will be removed in next major release.

`extract()`

Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.

Syntax

1extract(String url)
2extract(String url, Object parserOptions)
3extract(String url, Object parserOptions, Object fetchOptions)

Example:

1import { extract } from '@extractus/feed-extractor'
2
3const result = await extract('https://news.google.com/atom')
4console.log(result)

Without any options, the result should have the following structure:

1{
2  title: String,
3  link: String,
4  description: String,
5  generator: String,
6  language: String,
7  published: ISO Date String,
8  entries: Array[
9    {
10      id: String,
11      title: String,
12      link: String,
13      description: String,
14      published: ISO Datetime String
15    },
16    // ...
17  ]
18}

Parameters

`url` required

URL of a valid feed source

Feed content must be accessible and conform one of the following standards:

`parserOptions` optional

Object with all or several of the following properties:

normalization: Boolean, normalize feed data or keep original. Default true.
useISODateFormat: Boolean, convert datetime to ISO format. Default true.
descriptionMaxLen: Number, to truncate description. Default 250 characters. Set to 0 = no truncation.
xmlParserOptions: Object, used by xml parser, view fast-xml-parser's docs
getExtraFeedFields: Function, to get more fields from feed data
getExtraEntryFields: Function, to get more fields from feed entry data
baseUrl: URL string, to absolutify the links within feed content

For example:

1import { extract } from '@extractus/feed-extractor'
2
3await extract('https://news.google.com/atom', {
4  useISODateFormat: false
5})
6
7await extract('https://news.google.com/rss', {
8  useISODateFormat: false,
9  getExtraFeedFields: (feedData) => {
10    return {
11      subtitle: feedData.subtitle || ''
12    }
13  },
14  getExtraEntryFields: (feedEntry) => {
15    const {
16      enclosure,
17      category
18    } = feedEntry
19    return {
20      enclosure: {
21        url: enclosure['@_url'],
22        type: enclosure['@_type'],
23        length: enclosure['@_length']
24      },
25      category: isString(category) ? category : {
26        text: category['@_text'],
27        domain: category['@_domain']
28      }
29    }
30  }
31})

`fetchOptions` optional

fetchOptions is an object that can have the following properties:

headers: to set request headers
proxy: another endpoint to forward the request to
agent: a HTTP proxy agent
signal: AbortController signal or AbortSignal timeout to terminate the request

For example, you can use this param to set request headers to fetch as below:

1import { extract } from '@extractus/feed-extractor'
2
3const url = 'https://news.google.com/rss'
4await extract(url, null, {
5  headers: {
6    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
7  }
8})

You can also specify a proxy endpoint to load remote content, instead of fetching directly.

For example:

1import { extract } from '@extractus/feed-extractor'
2
3const url = 'https://news.google.com/rss'
4
5await extract(url, null, {
6  headers: {
7    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
8  },
9  proxy: {
10    target: 'https://your-secret-proxy.io/loadXml?url=',
11    headers: {
12      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
13    }
14  }
15})

Passing requests to proxy is useful while running @extractus/feed-extractor on browser. View examples/browser-feed-reader as reference example.

Another way to work with proxy is use agent option instead of proxy as below:

1import { extract } from '@extractus/feed-extractor'
2
3import { HttpsProxyAgent } from 'https-proxy-agent'
4
5const proxy = 'http://abc:RaNdoMpasswORd_country-France@proxy.packetstream.io:31113'
6
7const url = 'https://news.google.com/rss'
8
9const feed = await extract(url, null, {
10  agent: new HttpsProxyAgent(proxy),
11})
12console.log('Run feed-extractor with proxy:', proxy)
13console.log(feed)

For more info about https-proxy-agent, check its repo.

By default, there is no request timeout. You can use the option signal to cancel request at the right time.

The common way is to use AbortControler:

1const controller = new AbortController()
2
3// stop after 5 seconds
4setTimeout(() => {
5  controller.abort()
6}, 5000)
7
8const data = await extract(url, null, {
9  signal: controller.signal,
10})

A newer solution is AbortSignal's timeout() static method:

1// stop after 5 seconds
2const data = await extract(url, null, {
3  signal: AbortSignal.timeout(5000),
4})

For more info:

`extractFromJson()`

Extract feed data from JSON string. Return an object which contains feed data.

Syntax

1extractFromJson(String json)
2extractFromJson(String json, Object parserOptions)

Example:

1import { extractFromJson } from '@extractus/feed-extractor'
2
3const url = 'https://www.jsonfeed.org/feed.json'
4// this resource provides data in JSON feed format
5// so we fetch remote content as json
6// then pass to feed-extractor
7const res = await fetch(url)
8const json = await res.json()
9
10const feed = extractFromJson(json)
11console.log(feed)

Parameters

`json` required

JSON string loaded from JSON feed resource.

`parserOptions` optional

See parserOptions above.

`extractFromXml()`

Extract feed data from XML string. Return an object which contains feed data.

Syntax

1extractFromXml(String xml)
2extractFromXml(String xml, Object parserOptions)

Example:

1import { extractFromXml } from '@extractus/feed-extractor'
2
3const url = 'https://news.google.com/atom'
4// this resource provides data in ATOM feed format
5// so we fetch remote content as text
6// then pass to feed-extractor
7const res = await fetch(url)
8const xml = await res.text()
9
10const feed = extractFromXml(xml)
11console.log(feed)

Parameters

`xml` required

XML string loaded from RSS/ATOM feed resource.

`parserOptions` optional

See parserOptions above.

Test

1git clone https://github.com/extractus/feed-extractor.git
2cd feed-extractor
3pnpm i
4pnpm test

Quick evaluation

1git clone https://github.com/extractus/feed-extractor.git
2cd feed-extractor
3pnpm i
4pnpm eval https://news.google.com/rss

License

The MIT License (MIT)

Support the project

If you find value from this open source project, you can support in the following ways:

Give it a star ⭐
Buy me a coffee: https://paypal.me/ndaidong 🍵
Subscribe Feed Reader service on RapidAPI 😉

Thank you.

No vulnerabilities found.

No security vulnerabilities found.

@extractus/feed-extractor

Installations

Developer Guide

Yes

ESM, UMD

>= 18

20.12.2

10.7.0

Score

Pull Requests

0

88

2

86

Issues

6

53

47

Releases

Contributors

Unable to fetch Contributors

Languages

Developer

Download Statistics

GitHub Statistics

Bundle Size

110.78 kB

37.65 kB

Maintainers

Package Meta Information

Total Downloads

231,572

Daily Downloads

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

Dev Dependencies

feed-extractor

Demo

Install & Usage

Node.js

Deno

Browser

Automate RSS feed extraction with GitHub Actions

CJS Deprecated

APIs

Note:

extract()

Syntax

Parameters

url required

parserOptions optional

fetchOptions optional

extractFromJson()

Syntax

Parameters

json required

parserOptions optional

extractFromXml()

Syntax

Parameters

xml required

parserOptions optional

Test

Quick evaluation

License

Support the project

`extract()`

`url` required

`parserOptions` optional

`fetchOptions` optional

`extractFromJson()`

`json` required

`parserOptions` optional

`extractFromXml()`

`xml` required

`parserOptions` optional