Installations
npm install got-scraping
Developer Guide
Typescript
Yes
Module System
ESM
Min. Node Version
>=16
Node Version
20.18.0
NPM Version
10.8.2
Score
92.6
Supply Chain
98.3
Quality
93.3
Maintenance
100
Vulnerability
99.3
License
Releases
Contributors
Languages
TypeScript (96.71%)
JavaScript (3.29%)
Developer
apify
Download Statistics
Total Downloads
4,420,075
Last Day
8,638
Last Week
37,549
Last Month
163,050
Last Year
2,504,848
GitHub Statistics
594 Stars
156 Commits
49 Forks
15 Watching
13 Branches
27 Contributors
Package Meta Information
Latest Version
4.0.8
Package Id
got-scraping@4.0.8
Unpacked Size
113.38 kB
Size
26.75 kB
File Count
5
NPM Version
10.8.2
Node Version
20.18.0
Publised On
20 Nov 2024
Total Downloads
Cumulative downloads
Total Downloads
4,420,075
Last day
8.2%
8,638
Compared to previous day
Last week
-8.6%
37,549
Compared to previous week
Last month
7.6%
163,050
Compared to previous month
Last year
73.1%
2,504,848
Compared to previous year
Daily Downloads
Weekly Downloads
Monthly Downloads
Yearly Downloads
Got Scraping
Got Scraping is a small but powerful got
extension with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.
Installation
$ npm install got-scraping
The module is now ESM only
This means you have to import it by using an import
expression, or the import()
method. You can do so by either migrating your project to ESM, or importing got-scraping
in an async context
1-const { gotScraping } = require('got-scraping'); 2+import { gotScraping } from 'got-scraping';
If you cannot migrate to ESM, here's an example of how to import it in an async context:
1let gotScraping; 2 3async function fetchWithGotScraping(url) { 4 gotScraping ??= (await import('got-scraping')).gotScraping; 5 6 return gotScraping.get(url); 7}
Note:
- Node.js >=16 is required due to instability of HTTP/2 support in lower versions.
API
Got scraping package is built using the got.extend(...)
functionality, therefore it supports all the features Got has.
Interested what's under the hood?
1import { gotScraping } from 'got-scraping'; 2 3gotScraping 4 .get('https://apify.com') 5 .then( ({ body }) =>Â console.log(body));
options
proxyUrl
Type: string
URL of the HTTP or HTTPS based proxy. HTTP/2 proxies are supported as well.
1import { gotScraping } from 'got-scraping'; 2 3gotScraping 4 .get({ 5 url: 'https://apify.com', 6 proxyUrl: 'http://usernamed:password@myproxy.com:1234', 7 }) 8 .then(({ body }) => console.log(body));
useHeaderGenerator
Type: boolean
Default: true
Whether to use the generation of the browser-like headers.
headerGeneratorOptions
See the HeaderGeneratorOptions
docs.
1const response = await gotScraping({ 2 url: 'https://api.apify.com/v2/browser-info', 3 headerGeneratorOptions:{ 4 browsers: [ 5 { 6 name: 'chrome', 7 minVersion: 87, 8 maxVersion: 89 9 } 10 ], 11 devices: ['desktop'], 12 locales: ['de-DE', 'en-US'], 13 operatingSystems: ['windows', 'linux'], 14 } 15});
sessionToken
A non-primitive unique object which describes the current session. By default, it's undefined
, so new headers will be generated every time. Headers generated with the same sessionToken
never change.
Under the hood
Thanks to the included header-generator
package, you can choose various browsers from different operating systems and devices. It generates all the headers automatically so you can focus on the important stuff instead.
Yet another goal is to simplify the usage of proxies. Just pass the proxyUrl
option and you are set. Got Scraping automatically detects the HTTP protocol that the proxy server supports. After the connection is established, it does another ALPN negotiation for the end server. Once that is complete, Got Scraping can proceed with HTTP requests.
Using the same HTTP version that browsers do is important as well. Most modern browsers use HTTP/2, so Got Scraping is making a use of it too. Fortunately, this is already supported by Got - it automatically handles ALPN protocol negotiation to select the best available protocol.
HTTP/1.1 headers are always automatically formatted in Pascal-Case
. However, there is an exception: x-
headers are not modified in any way.
By default, Got Scraping will use an insecure HTTP parser, which allows to access websites with non-spec-compliant web servers.
Last but not least, Got Scraping comes with updated TLS configuration. Some websites make a fingerprint of it and compare it with real browsers. While Node.js doesn't support OpenSSL 3 yet, the current configuration still should work flawlessly.
To get more detailed information about the implementation, please refer to the source code.
Tips
This package can only generate all the standard attributes. You might want to add the referer
header if necessary. Please bear in mind that these headers are made for GET requests for HTML documents. If you want to make POST requests or GET requests for any other content type, you should alter these headers according to your needs. You can do so by passing a headers option or writing a custom Got handler.
This package should provide a solid start for your browser request emulation process. All websites are built differently, and some of them might require some additional special care.
Overriding request headers
1const response = await gotScraping({ 2 url: 'https://apify.com/', 3 headers: { 4 'user-agent': 'test', 5 }, 6});
For more advanced usage please refer to the Got documentation.
JSON mode
You can parse JSON with this package too, but please bear in mind that the request header generation is done specifically for HTML
content type. You might want to alter the generated headers to match the browser ones.
1const response = await gotScraping({ 2 responseType: 'json', 3 url: 'https://api.apify.com/v2/browser-info', 4});
Error recovery
This section covers possible errors that might happen due to different site implementations.
RequestError: Client network socket disconnected before secure TLS connection was established
The error above can be a result of the server not supporting the provided TLS setings. Try changing the ciphers parameter to either undefined
or a custom value.
No vulnerabilities found.
Reason
no dangerous workflow patterns detected
Reason
no binaries found in the repo
Reason
Found 11/29 approved changesets -- score normalized to 3
Reason
4 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 3
Reason
detected GitHub workflow tokens with excessive permissions
Details
- Warn: no topLevel permission defined: .github/workflows/check.yml:1
- Warn: no topLevel permission defined: .github/workflows/release.yml:1
- Info: no jobLevel write permissions found
Reason
no effort to earn an OpenSSF best practices badge detected
Reason
dependency not pinned by hash detected -- score normalized to 0
Details
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/check.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/check.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/check.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/check.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/check.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/check.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/check.yml:46: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/check.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:28: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:46: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:49: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:66: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:68: update your workflow using https://app.stepsecurity.io/secureworkflow/apify/got-scraping/release.yml/master?enable=pin
- Warn: npmCommand not pinned by hash: .github/workflows/check.yml:31
- Warn: npmCommand not pinned by hash: .github/workflows/check.yml:51
- Warn: npmCommand not pinned by hash: .github/workflows/release.yml:34
- Warn: npmCommand not pinned by hash: .github/workflows/release.yml:54
- Warn: npmCommand not pinned by hash: .github/workflows/release.yml:84
- Info: 0 out of 10 GitHub-owned GitHubAction dependencies pinned
- Info: 0 out of 5 npmCommand dependencies pinned
Reason
license file not detected
Details
- Warn: project does not have a license file
Reason
project is not fuzzed
Details
- Warn: no fuzzer integrations found
Reason
security policy file not detected
Details
- Warn: no security policy file detected
- Warn: no security file to analyze
- Warn: no security file to analyze
- Warn: no security file to analyze
Reason
branch protection not enabled on development/release branches
Details
- Warn: branch protection not enabled for branch 'v2'
- Warn: 'allow deletion' enabled on branch 'master'
- Warn: 'force pushes' enabled on branch 'master'
- Warn: 'branch protection settings apply to administrators' is disabled on branch 'master'
- Warn: 'stale review dismissal' is disabled on branch 'master'
- Warn: branch 'master' does not require approvers
- Warn: codeowners review is not required on branch 'master'
- Warn: 'last push approval' is disabled on branch 'master'
- Warn: no status checks found to merge onto branch 'master'
- Info: PRs are required in order to make changes on branch 'master'
Reason
SAST tool is not run on all commits -- score normalized to 0
Details
- Warn: 0 commits out of 22 are checked with a SAST tool
Reason
15 existing vulnerabilities detected
Details
- Warn: Project is vulnerable to: GHSA-qwcr-r2fm-qrc7
- Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg
- Warn: Project is vulnerable to: GHSA-pxg6-pf52-xh8x
- Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275
- Warn: Project is vulnerable to: GHSA-rv95-896h-c2vc
- Warn: Project is vulnerable to: GHSA-qw6h-vgh9-j6wx
- Warn: Project is vulnerable to: GHSA-p6mc-m468-83gw
- Warn: Project is vulnerable to: GHSA-35jh-r3h4-6jhm
- Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv
- Warn: Project is vulnerable to: GHSA-9wv6-86v2-598j
- Warn: Project is vulnerable to: GHSA-rhx6-c78j-4q9w
- Warn: Project is vulnerable to: GHSA-gcx4-mw62-g8wm
- Warn: Project is vulnerable to: GHSA-m6fv-jmcg-4jfg
- Warn: Project is vulnerable to: GHSA-cm22-4g7w-348p
- Warn: Project is vulnerable to: GHSA-9crc-q9x8-hgqq
Score
2.8
/10
Last Scanned on 2025-02-03
The Open Source Security Foundation is a cross-industry collaboration to improve the security of open source software (OSS). The Scorecard provides security health metrics for open source projects.
Learn MoreOther packages similar to got-scraping
crawler
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
@petrpatek/got-scraping
This is a boilerplate of an Apify actor.
fdy-scraping
`fdy-scraping` is a versatile HTTP client designed for making API requests with support for proxy configuration, debugging, and detailed error handling. It utilizes the [`got-scraping`](https://github.com/apify/got-scraping) library for HTTP operations.
got-scraping-export
HTTP client made for scraping based on got.