Antidetect browsers as web scraping tools?
Anti-bot and user profiling techniques are becoming more and more invasive. So, a new niche of browsers is born – they are called antidetect. Why use them as web scraping tools though? They offer a new, unrivaled level of privacy to the user, plus a set of specific features tailored to tackle new menaces:
- Fingerprints spoofing
- Canvas and WebGL disabling
- Proxy integration
- API for Selenium/Playwright integration
Typically they let you create a set of different profiles you can use simultaneously, applying each to a new browser window. These profiles can be temporary and clean, so it’s like a different person is browsing instead of you each time, with your cookies history and your real fingerprints. Your favourite Chrome features (like extensions) can also be started automatically.
Let’s make a quick tour of the most famous ones – and see how we can use them as web scraping services.
Gologin is a relatively new player in the antidetect browsers industry, founded in 2019. GoLogin offers top-notch data protection for total user anonymity. It makes the user parameters as common as possible, not making the browser stand out, which is critical for scraping.
You can create unlimited profiles that don’t overlap with each other, and use them both locally in the browser or virtually with APIs. GoLogin integrates with Selenium (link to a tutorial on their website) or Playwright.
Inside GoLogin Github Repository you can find the code to integrate a GoLogin user profile inside a Selenium or Pyppeteer project. At the same time, I’ve written an example of integration for Playwright and you can find it in The Web Scraping Club Github Repository, together with the code of past articles.
Basically, instead of opening a standard Plawright Chromium, we connect to an already open GoLogin browser instance, that can be remote or local.
GoLogin vs Playwright Chromium
Let’s have a look at different fingerprinting when using a Playwright’s standard Chromium instance – and then a GoLogin one.
In the first case under Playwright Chromium, the test is able to detect my device and my browser.
While using GoLogin, the fingerprint uniqueness changes, probably because of the noise added, and the test fails to detect my machine type and browser.
If we make the same test using a Cloudflare-protected website ( https://www.off—white.com/ ) with a standard Playwright Chromium, we can see the home page, but not the product catalog. With GoLogin, however, we can see both.
Compared to the other web scraping tools we’ll see later, GoLogin is one of the most affordable. With its cheapest plan, we have all we need for our web scraping projects.
Incogniton is pretty much similar to GoLogin, with the same features and integrations with Selenium or Playwright.
The browser, which we need to connect to with our scrapers, unluckily is available only for Mac and Windows and has no cloud run. This can be a limit for larger web scraping projects.
There’s no free plan with web scraping API included, so I cannot make any tests. Still it is one of the most well-known solutions in the field.
Octo Browser is an alternative for GoLogin and Incogniton. It has an API for integration with Selenium or Playwright, but an interesting feature is its database of real fingerprints to use on your profile.
Also, in this case, the browser is available only for Mac and Windows and has no cloud run, limiting its scalability. There’s no free plan as well, so I could not test any solution.
VMLogin is another Chinese alternative to the previous antidetect browsers, a bit more expensive. The client is available only for Windows and the API is available from the cheapest plan.
Kameleo differs from the previous browsers on its website already. There’s an extended section about automation and integration with Selenium or Playwright. It has also, unique in this selection, a mobile app that allows using mobile fingerprints for your profiles, but unluckily the desktop client is Windows only.
From the outside, it seems a more mature product but also the cost is one of the highest. For full automation support + web scraping API it costs around 200$ a month per user.
We have seen five antidetect browser solutions that can be integrated into scraping data from website.
With the quick test we’ve made before, we have seen that using an antidetect browser could be an interesting web scraping service option for some websites. A Cloudflare-protected website like off—white.com can be read using GoLogin, so it can be a technique worth a try for the hardest cases. It can be a problem of costs, infrastructure and licenses to scale to larger projects.
Download GoLogin here and explore safe scraping with our free plan!