Features

Scrape Function

With the Scrape function you can visit any webpage on the internet with a remote, real browser running on TestingBot.

You can fetch specific elements from the page and TestingBot will return it as a structured JSON response.

By default, TestingBot will navigate to the URL and wait for all content to be loaded. TestingBot will then wait up to 30 seconds for an element to appear, assuming it is not yet in the DOM.

Example

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "elements":[{"selector":"body h2"}]}'

This simple example will run a Puppeteer script on our service and do the following:

  • Start a Headless Browser (latest version) on our cloud
  • Connect with Puppeteer to the Browser and navigate to the URL you specified
  • Look for any elements in the DOM that match the selector body h2
  • Return a JSON response with the results that are found, see the example response below.
Copy code
[{"selector":"body h2","results":[{"html":"Automated Testing","text":"","width":340,"height":84,"top":1852.40625,"left":780,"attributes":[]},{"html":"Live Testing","text":"","width":150,"height":84,"top":2727.890625,"left":-146.71875,"attributes":[]},{"html":"+4800 Browsers & Devices","text":"","width":184,"height":168,"top":3455.265625,"left":780,"attributes":[]},{"html":"Integrate TestingBot into your setup","text":"Integrate TestingBot into your setup","width":549,"height":35,"top":2097.9375,"left":117.03125,"attributes":[]}]}]

Specifying browser and version

See the example below to specify on which platform configuration you'd like to scrape a webpage on. You can specify a browserName, version and platform.

Select a browser & version
Copy code

Scrape Options

You can specify additional options to use while using the scrape functionality.

Authenticate options

You can specify the page.authenticate credentials.

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10" \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "authenticate":{ "username": "user", "password": "passwd" }, "elements":[{"selector":"body h2"}]}'

Goto options

You can specify the page.goto options and add timeout and waitUntil settings.

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "gotoOptions":{ "waitUntil": "networkidle2" }, "elements":[{"selector":"body h2"}]}'

Extra Headers

You can specify the page.setExtraHTTPHeaders options to add extra HTTP headers to the request that the TestingBot browser makes.

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "extraHeaders":{ 'foo': 'bar' }, "elements":[{"selector":"body h2"}]}'

Disable Javascript

You can disable Javascript with the page.setJavaScriptEnabled option.

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "javascriptEnabled":false, "elements":[{"selector":"body h2"}]}'

Emulate Media

Change the CSS media type of the page with page.emulateMediaType.

Copy code
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "emulateMedia":"print", "elements":[{"selector":"body h2"}]}'