Scrape Function
With the Scrape function you can visit any webpage on the internet with a remote, real browser running on TestingBot.
You can fetch specific elements from the page and TestingBot will return it as a structured JSON response.
By default, TestingBot will navigate to the URL and wait for all content to be loaded. TestingBot will then wait up to 30 seconds for an element to appear, assuming it is not yet in the DOM.
Example
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "elements":[{"selector":"body h2"}]}'
This simple example will run a Puppeteer script on our service and do the following:
- Start a Headless Browser (latest version) on our cloud
- Connect with Puppeteer to the Browser and navigate to the URL you specified
-
Look for any elements in the DOM that match the selector
body h2
- Return a JSON response with the results that are found, see the example response below.
[{"selector":"body h2","results":[{"html":"Automated Testing","text":"","width":340,"height":84,"top":1852.40625,"left":780,"attributes":[]},{"html":"Live Testing","text":"","width":150,"height":84,"top":2727.890625,"left":-146.71875,"attributes":[]},{"html":"+4800 Browsers & Devices","text":"","width":184,"height":168,"top":3455.265625,"left":780,"attributes":[]},{"html":"Integrate TestingBot into your setup","text":"Integrate TestingBot into your setup","width":549,"height":35,"top":2097.9375,"left":117.03125,"attributes":[]}]}]
Specifying browser and version
See the example below to specify on which platform configuration you'd like to scrape a webpage on. You can specify a browserName
, version
and platform
.
Scrape Options
You can specify additional options to use while using the scrape functionality.
Authenticate options
You can specify the page.authenticate credentials.
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10" \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "authenticate":{ "username": "user", "password": "passwd" }, "elements":[{"selector":"body h2"}]}'
Goto options
You can specify the page.goto options and add timeout
and waitUntil
settings.
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "gotoOptions":{ "waitUntil": "networkidle2" }, "elements":[{"selector":"body h2"}]}'
Extra Headers
You can specify the page.setExtraHTTPHeaders options to add extra HTTP headers to the request that the TestingBot browser makes.
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "extraHeaders":{ 'foo': 'bar' }, "elements":[{"selector":"body h2"}]}'
Disable Javascript
You can disable Javascript with the page.setJavaScriptEnabled option.
curl -X POST https://cloud.testingbot.com/scrape?key=YOUR_KEY&secret=YOUR_SECRET&browserName=chrome&version=latest&platform=WIN10 \
-H 'Content-Type: application/json' \
-d '{"url":"https://testingbot.com", "javascriptEnabled":false, "elements":[{"selector":"body h2"}]}'
Emulate Media
Change the CSS media type of the page with page.emulateMediaType.