Mock data previewTemporarily, this page contains mock data. Real benchmark results are being prepared and will be updated soon; public scores are hidden until then.

Mock previewReal data coming soon

Scraping API benchmarks with visible evidence.

The interface is live, but the benchmark values are placeholders while the first real run is prepared.

Leaderboard Methodology Source

Run controlPublic

Current exported benchmark contract.

Runmock preview

Statusscores hidden

CorpusPending

CommitPending

ScoresPending

Current leader

Pending

Real benchmark data is being prepared and will replace these placeholders soon.

Attempts

Pending

Attempt counts are hidden while the site is in mock-data preview.

Coverage

Pending

Provider and track coverage will be published with the real run.

Validator

Pending

Validator metadata will ship with the first real export.

Mock trend preview

Scores hidden · JS Render

How to read it

Success

Share of attempts that returned usable content, not just a transport-level success.

Latency

Median end-to-end response time for successful attempts in that track.

Cost

Estimated provider spend normalized to a standard successful-attempt volume.

Retries

Average attempts needed before a run is marked usable or failed.

Evidence

Per-target logs, timestamps, failure reasons, byte counts, and hashes.

Data feed

Source

Pending

Contract

Pending

Score rows

Pending

Attempts

Pending

Validators

Pending

LLM audit

Pending

Runner writesWebsite readsScores shownEvidence retained

Leaderboard

Provider preview

Scores are intentionally hidden while mock data is on the page. Use this as a preview of the comparison workflow.

Viewing overall

Status	Score	Best track
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required
Preview	TBD	PendingReal run required

Scores hiddenTrack cells are placeholders until the first real benchmark run is published.

Browse by job

Benchmark tracks

Scraping APIs are evaluated by the job they are meant to solve, so simple HTML APIs are not judged like hosted browsers or agent workflows.

Track 1

HTML scrape

URL in → HTML/text out. Static pages, no JS, simple anti-bot.

Corpus pending

Track 2

JS Render

Pages requiring a real browser to populate the DOM.

Corpus pending

Track 3

Browser Session

Multi-step CDP flows: click, fill, scroll, persist cookies.

Corpus pending

Track 4

Structured Extract

Field-level accuracy against ground-truth schemas.

Corpus pending

Track 5

Agentic

LLM-controlled browser tasks. Non-deterministic; reported separately.

Corpus pending

Methodology

Corpus and scoring

Same URLs, same parameters, same published run for every provider.

Same target set

Every provider receives the same sampled URLs and task definitions for the published run.

Track-specific adapters

Fetch APIs, render APIs, hosted browsers, extraction APIs, and agent flows are scored separately.

Usable-output scoring

A response only counts when the returned page or fields are usable for the intended scraping task.

Disclosure-first

ScrapeDrive is operated by the same team, appears in the benchmark, and is not excluded when it loses.

Anti-bot vendor	URLs
Akamai-protected	TBD
Cloudflare	TBD
PerimeterX	TBD
DataDome	TBD
Unprotected control	TBD
Total sampled	TBD

Target	Protection	Category	URLs
Amazon · product detail	Akamai	ecom	TBD
LinkedIn · company page	PerimeterX	social	TBD
Indeed · search results	Cloudflare	jobs	TBD
G2 · software listing	DataDome	review	TBD
Yelp · business page	PerimeterX	local	TBD
Zillow · listing detail	Akamai	realestate	TBD
TripAdvisor · hotel detail	Cloudflare	travel	TBD
eBay · search results	Akamai	ecom	TBD
Static news article (control)	control	control	TBD
SPA-only shop (control)	control	control	TBD

Provider index

Scraping APIs in the benchmark

A browsable directory for comparing provider fit, pricing shape, and supported modes. Scores will appear after the real benchmark run.

Bright Data

brightdata.com

Score

TBD

Enterprise proxy + scraping infrastructure. Web Unlocker, Scraping Browser, SERP API.

Best track

Pending

Pricing

Pending

fetchrendersessionextract

Zyte API

zyte.com

Score

TBD

Smart proxy with auto-rotation, ban detection, JS rendering. Scrapy creators.

Best track

Pending

Pricing

Pending

fetchrenderextract

Firecrawl

firecrawl.dev

Score

TBD

URL → clean Markdown for LLMs. Crawl, scrape, extract.

Best track

Pending

Pricing

Pending

fetchrenderextract

ScrapeDriveours

scrapedrive.com

Score

TBD

Lightweight scraping API. Built by the team that runs ScrapingEvals.

Best track

Pending

Pricing

Pending

fetchrenderextract

ScrapingBee

scrapingbee.com

Score

TBD

Headless browser API with proxy rotation. Aimed at developers.

Best track

Pending

Pricing

Pending

fetchrender

ScraperAPI

scraperapi.com

Score

TBD

Rotating proxies + headless browsers. High-volume general scraping.

Best track

Pending

Pricing

Pending

fetchrender

Browserless

browserless.io

Score

TBD

Hosted Chrome over CDP. Best for browser sessions and agentic flows.

Best track

Pending

Pricing

Pending

rendersession

Scrapfly

scrapfly.io

Score

TBD

ASP anti-scraping bypass, screenshots, sessions, extraction rules.

Best track

Pending

Pricing

Pending

fetchrenderextract

Scrape.do

scrape.do

Score

TBD

Simple URL-in / HTML-out API. Cheap, broad coverage.

Best track

Pending

Pricing

Pending

fetchrender

Apify

apify.com

Score

TBD

Actor marketplace + crawler infrastructure. Wide capability surface.

Best track

Pending

Pricing

Pending

fetchrendersessionextract

FAQ

Common questions

What is ScrapingEvals?

ScrapingEvals is a public benchmark for scraping APIs. It compares providers on real-world scraping jobs instead of only listing features or pricing pages.

Why are there separate tracks?

A raw HTML fetch API, JavaScript renderer, persistent browser session, structured extractor, and agent-controlled browser solve different jobs. One overall score would hide those differences.

How should a beginner read the leaderboard?

Start with the track that matches your job. HTML scrape is for static pages, JS Render is for browser-built pages, Browser Session is for multi-step workflows, Structured Extract is for field accuracy, and Agentic is for LLM-driven browsing.

Can the results be reproduced?

The site links run ids, commit hashes, adapter source, target categories, and evidence logs so another engineer can inspect how the numbers were produced.

Is ScrapeDrive favored?

No. ScrapeDrive is marked because the ScrapingEvals team operates it. The same tables, evidence view, tracks, and scoring language apply to every provider.

Reproducibility

Published artifacts

Run id

mock-2026-05-17

Commit

9c4f1ae

Methodology

scoring-v1

Adapters

10 providers

Tasks

Pending

Artifacts

Pending