Puppeteer Scraping Tutorial
Puppeteer is a framework that allows you to control a headless browser through scripting.
The framework allows you to control a real browser, just like a normal user would.
This means it's useful for both automated testing, as well as scraping.
Scraping is an automated way to extract data from a website.
Always make sure you have permission to scrape!
TestingBot supports a Scrape Function to scrape a page without writing any code.
Example
To get started, please see the example below where we will scrape some text from the TestingBot website.
This will start a new Chrome browser in the TestingBot browser grid
and instruct the browser to navigate to the TestingBot website and scrape the text from a specific DOM element.
const puppeteer = require('puppeteer')
const capabilities = {
'tb:options': {
key: process.env.TB_KEY,
secret: process.env.TB_SECRET
},
browserName: 'chrome',
browserVersion: 'latest'
}
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://cloud.testingbot.com/puppeteer?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`
})
const page = await browser.newPage()
await page.goto('https://testingbot.com')
title = await page.evaluate(() => {
return document.querySelector('body > div.main > div.hero.home > div > div > p').textContent.trim()
})
console.log(title);
browser.close()