Installations

npm install robots-txt-parser

Developer Guide

BETA

Typescript

No

Module System

CommonJS

Node Version

16.2.0

NPM Version

7.14.0 Pull Requests

Open

6

Total

8

Closed

0

Merged

2

Issues

Open

6

Total

10

Closed

4

Releases

Unable to fetch releases

Languages

1

JavaScript

JavaScript (100%)

Developer

ChrisAkroyd

Download Statistics

Total Downloads

0

Last Day

0

Last Week

0

Last Month

0

Last Year

0

GitHub Statistics

MIT License

14 Stars

80 Commits

9 Forks

1 Watchers

3 Branches

2 Contributors

Updated on Jul 09, 2025

Maintainers

1

View All 2 Contributors

Package Meta Information

Latest Version

2.0.3

Package Id

robots-txt-parser@2.0.3

Unpacked Size

58.23 kB

Size

14.52 kB

File Count

41

NPM Version

7.14.0

Node Version

16.2.0

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

0%

NaN

Compared to previous day

Last Week

0%

NaN

Compared to previous week

Last Month

0%

NaN

Compared to previous month

Last Year

0%

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

3

fast-url-parser is-absolute-url simple-get

Dev Dependencies

6

chai eslint-config-airbnb eslint-plugin-import lodash mocha nyc

robots-txt-parser

npm

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

Installing

Via NPM: npm install robots-txt-parser --save.

Getting Started

After installing robots-txt-parser it needs to be required and initialised:

1const robotsParser = require('robots-txt-parser');
2const robots = robotsParser(
3  {
4    userAgent: 'Googlebot', // The default user agent to use when looking for allow/disallow rules, if this agent isn't listed in the active robots.txt, we use *.
5    allowOnNeutral: false // The value to use when the robots.txt rule's for allow and disallow are balanced on whether a link can be crawled.
6  });

Example Usage:

1const robotsParser = require('robots-txt-parser');
2
3const robots = robotsParser(
4  {
5    userAgent: 'Googlebot', // The default user agent to use when looking for allow/disallow rules, if this agent isn't listed in the active robots.txt, we use *.
6    allowOnNeutral: false, // The value to use when the robots.txt rule's for allow and disallow are balanced on whether a link can be crawled.
7  },
8);
9
10robots.useRobotsFor('http://example.com')
11  .then(() => {
12    robots.canCrawlSync('http://example.com/news'); // Returns true if the link can be crawled, false if not.
13    robots.canCrawl('http://example.com/news', (value) => {
14      console.log('Crawlable: ', value);
15    }); // Calls the callback with true if the link is crawlable, false if not.
16    robots.canCrawl('http://example.com/news') // If no callback is provided, returns a promise which resolves with true if the link is crawlable, false if not.
17      .then((value) => {
18        console.log('Crawlable: ', value);
19      });
20  });

Condensed Documentation

Below is a condensed form of the documentation, each is a function that can be found on the robotsParser object.

Method	Parameters	Return
parseRobots(key, string)	`key:String, string:String`	None
isCached(domain)	`domain:String`	Boolean for whether robots.txt for url is cached.
fetch(url)	`url:String`	Promise, resolved when robots.txt retrieved.
useRobotsFor(url)	`url:String`	Promise, resolved when robots.txt is fetched.
canCrawl(url)	`url:String, callback:Func (Opt)`	Promise, resolves with Boolean.
getSitemaps()	`callback:Func (Opt)`	Promise if no callback provided, resolves with [String].
getCrawlDelay()	`callback:Func (Opt)`	Promise if no callback provided, resolves with Number.
getCrawlableLinks(links)	`links:[String], callback:Func (Opt)`	Promise if no callback provided, resolves with [String].
getPreferredHost()	`callback:Func (Opt)`	Promise if no callback provided, resolves with String.
setUserAgent(userAgent)	`userAgent:String`	None.
setAllowOnNeutral(allow)	`allow:Boolean`	None.
clearCache()	`None`	None.

Full Documentation

parseRobots

robots.parseRobots(key, string) Parses a string representation of a robots.txt file and cache's it with the given key.

Parameters

key -> Can be any URL.
string -> String representation of a robots.txt file.

Returns

None.

Example

1robots.parseRobots('https://example.com',
2  `
3   User-agent: *
4   Allow: /*.php$
5   Disallow: /
6  `);

isCached

robots.isCached(domain) A method used to check if a robots.txt has already been fetched and parsed.

Parameters

domain -> Can be any URL.

Returns

Returns true if a robots.txt has already been fetched and cached by the robots-txt-parser.

Example

1robots.isCached('https://example.com'); // true or false
2robots.isCached('example.com'); // Attempts to check the cache for only http:// and returns true or false.

fetch

robots.fetch(url) Attempts to fetch and parse a robots.txt file located at the url, this method avoids checking the built-in cache and will always attempt to retrieve a fresh copy of the robots.txt.

Parameters

url -> Any URL.

Returns

Returns a Promise which will resolve once the robots.txt has been fetched with the parsed robots.txt.

Example

1robots.fetch('https://example.com/robots.txt')
2    .then((tree) => {
3        console.log(Object.keys(tree)); // Will log sitemap and any user agents.
4    });

useRobotsFor

robots.useRobotsFor(url) Attempts to download and use the robots.txt at the given url, if the robots.txt has already been downloaded, reads from the cached copy instead.

Parameters

url -> Any URL.

Returns

Returns a promsise that resolves once the URL is fetched and parsed.

Example

1robots.useRobotsFor('https://example.com/news')
2    .then(() => {
3        // Logic to check if links are crawlable.
4    });

canCrawl

robots.canCrawl(url, callback) Tests whether a url can be crawled for the current active robots.txt and user agent. If a robots.txt isn't cached for the domain of the url, it is fetched and parsed before returning a boolean value.

Parameters

url -> Any URL.
callback -> An optional callback, if undefined returns a promise.

Returns

Returns a Promise which will resolve with a boolean value.

Example

1robots.canCrawl('https://example.com/news')
2    .then((crawlable) => {
3        console.log(crawlable); // Will log a boolean value.
4    });

getSitemaps

robots.getSitemaps(callback) Returns a list of sitemaps present on the active robots.txt.

Parameters

callback -> An optional callback, if undefined returns a promise.

Returns

Returns a Promise which will resolve with an array of strings.

Example

1robots.getSitemaps()
2    .then((sitemaps) => {
3        console.log(sitemaps); // Will log an list of strings.
4    });

getCrawlDelay

robots.getCrawlDelay(callback) Returns the crawl delay on requests to the current active robots.txt.

Parameters

callback -> An optional callback, if undefined returns a promise.

Returns

Returns a Promise which will resolve with an Integer.

Example

1robots.getCrawlDelay()
2    .then((crawlDelay) => {
3        console.log(crawlDelay); // Will be an Integer greater than or equal to 0.
4    });

getCrawlableLinks

robots.getCrawlableLinks(links, callback) Takes an array of links and returns an array of links which are crawlable for the current active robots.txt.

Parameters

links -> An array of links to check for crawlability.
callback -> An optional callback, if undefined returns a promise.

Returns

A Promise that will resolve to contain an Array of all the crawlable links.

Example

1robots.getCrawlableLinks([])
2    .then((links) => {
3        console.log(links);
4    });

getPreferredHost

robots.getPreferredHost(callback) Returns the preferred host name specified in the active robots.txt's host: directive or null if there isn't one.

Parameters

callback -> An optional callback, if undefined returns a promise.

Returns

An String if the host is defined, undefined otherwise.

Example

1robots.getPreferredHost()
2    .then((host) => {
3        console.log(host);
4    });

setUserAgent

robots.setUserAgent(userAgent) Sets the current user agent to use when checking if a link can be crawled.

Parameters

userAgent -> A string.

Returns

undefined

Example

1robots.setUserAgent('exampleBot'); // When interacting with the robots.txt we now look for records for 'exampleBot'.
2robots.setUserAgent('testBot'); // When interacting with the robots.txt we now look for records for 'testBot'.

setAllowOnNeutral

robots.setAllowOnNeutral(allow) Sets the canCrawl behaviour to return true or false when the robots.txt rules are balanced on whether a link should be crawled or not.

Parameters

allow -> A boolean value.

Returns

undefined

Example

1robots.setAllowOnNeutral(true); // If the allow/disallow rules are balanced, canCrawl returns true.
2robots.setAllowOnNeutral(false); // If the allow/disallow rules are balanced, canCrawl returns false.

clearCache

robots.clearCache() The cache can get extremely long over extended crawling, this simple method resets the cache.

Parameters

None

Returns

None

Example

1robots.clearCache();

Synchronous API

Synchronous variants of the API, will be deprecated in a future version.

canCrawlSync

robots.canCrawlSync(url) Tests whether a url can be crawled for the current active robots.txt and user agent. This won't attempt to fetch the robots.txt if it is not cached.

Parameters

url -> Any url.

Returns

Returns a boolean value depending on whether the url is crawlable. If there is no cached robots.txt for this url, it will always return true.

Example

1robots.canCrawlSync('https://example.com/news') // true or false.

getSitemapsSync()

robots.getSitemapsSync() Returns a list of sitemaps present on the active robots.txt.

Parameters

None

Returns

An Array of Strings.

Example

1robots.getSitemapsSync(); // Will be an array e.g. ['http://example.com/sitemap1.xml', 'http://example.com/sitemap2.xml'].

getCrawlDelaySync()

robots.getCrawlDelaySync() Returns the crawl delay on specified in the active robots.txt's for the active user agent

Parameters

None

Returns

An Integer greater than or equal to 0.

Example

1robots.getCrawlDelaySync(); // Will be an Integer.

getCrawlableLinksSync

robots.getCrawlableLinksSync(links) Takes an array of links and returns an array of links which are crawlable for the current active robots.txt.

Parameters

links -> An array of links to check for crawlability.

Returns

An Array of all the links are crawlable.

Example

1robots.getCrawlableLinks(['example.com/test/news', 'example.com/test/news/article']);  // Will return an array of the links that can be crawled.

getPreferredHostSync

robots.getPreferredHostSync() Returns the preferred host name specified in the active robots.txt's host: directive or undefined if there isn't one.

Parameters

None

Returns

An String if the host is defined, undefined otherwise.

Example

1robots.getPreferredHostSync(); // Will be a string if the host directive is defined .

License

See LICENSE file.

Resources

No vulnerabilities found.

10

Dangerous-Workflow

Determines if the project's GitHub Action workflows avoid dangerous patterns.

10

Binary-Artifacts

Determines if the project has generated executable (binary) artifacts in the source repository.

10

Vulnerabilities

Determines if the project has open, known unfixed vulnerabilities.

10

License

Determines if the project has defined a license.

0

Code-Review

Determines if the project requires human code review before pull requests (aka merge requests) are merged.

0

Maintained

Determines if the project is "actively maintained".

0

Token-Permissions

Determines if the project's workflows follow the principle of least privilege.

0

Pinned-Dependencies

Determines if the project has declared and pinned the dependencies of its build process.

0

CII-Best-Practices

Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.

0

Security-Policy

Determines if the project has published a security policy.

0

Fuzzing

Determines if the project uses fuzzing.

0

Branch-Protection

Determines if the default and release branches are protected with GitHub's branch protection settings.

0

SAST

Determines if the project uses static code analysis.

Score

3.4

/10

Last Scanned on 2025-07-14

The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.

Learn More

Other packages similar to robots-txt-parser

Other packages similar to robots-txt-parser

robots-txt-parser

Installations

Developer Guide

No

CommonJS

16.2.0

7.14.0

Pull Requests

6

8

0

2

Issues

6

10

4

Releases

Unable to fetch releases

Languages

Developer

ChrisAkroyd

Download Statistics

GitHub Statistics

Maintainers

Package Meta Information

Total Downloads

NaN

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

Dev Dependencies

robots-txt-parser

Installing

Getting Started

Condensed Documentation

Full Documentation

parseRobots

Parameters

Returns

Example

isCached

Parameters

Returns

Example

fetch

Parameters

Returns

Example

useRobotsFor

Parameters

Returns

Example

canCrawl

Parameters

Returns

Example

getSitemaps

Parameters

Returns

Example

getCrawlDelay

Parameters

Returns

Example

getCrawlableLinks

Parameters

Returns

Example

getPreferredHost

Parameters

Returns

Example

setUserAgent

Parameters

Returns

Example

setAllowOnNeutral