Puppeteer Scraping Tutorial

Puppeteer is a framework that allows you to control a headless browser through scripting.
The framework allows you to control a real browser, just like a normal user would.

This means it's useful for both automated testing, as well as scraping.
Scraping is an automated way to extract data from a website.

Always make sure you have permission to scrape!

TestingBot supports a Scrape Function to scrape a page without writing any code.

Example

To get started, please see the example below where we will scrape some text from the TestingBot website.
This will start a new Chrome browser in the TestingBot browser grid
and instruct the browser to navigate to the TestingBot website and scrape the text from a specific DOM element.

const puppeteer = require('puppeteer')

const capabilities = {
    'tb:options': {
        key: process.env.TB_KEY,
        secret: process.env.TB_SECRET
    },
    browserName: 'chrome',
    browserVersion: 'latest'
}
const browser = await puppeteer.connect({
  browserWSEndpoint: `wss://cloud.testingbot.com/puppeteer?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`
})

const page = await browser.newPage()
await page.goto('https://testingbot.com')
title = await page.evaluate(() => {
    return document.querySelector('body > div.main > div.hero.home > div > div > p').textContent.trim()
})
console.log(title);
browser.close()