Table of Contents
So, is there any point in building your own mobile web scraping proxy station?
Back in the early days of my web scraper career, I met a small e-commerce website that was blocking every request coming from a data center. Being the only one in our scope that needed proxies, I wanted to solve this challenge without paying any plan to any proxy providers, since it would have been inconvenient.
We had a spare mobile SIM and I’d just bought a Raspberry PI board for my experiments and then the idea of creating a homemade mobile proxy came to my mind.
Why I Needed a Mobile Web Scraping Proxy?
I needed to bypass the block on data center IPs, and AWS, AZURE, and GCP were all not working. Since mobile IPs, for their nature, are almost unbannable for websites, I decided to give them a try.
What Is a Mobile Proxy?
A mobile proxy is a proxy server that is connected to the Internet via a mobile IP, that’s the almost tautological explanation. This can be obtained by running a proxy endpoint on a real mobile device or to a device that connects to the internet using a mobile router.
How a Mobile IP Works
I’ve mentioned before that mobile IPs are unbannable for websites and the main reason for it is due to the technology behind the IP assignment for 3/4/5G networks.
Mobile ISP, to tackle the IP shortage, use a technology called CGNAT. It’s similar to the NAT technology we have in the private network in our houses: we have only one public IP which makes us unique on the internet but all the devices connected to our router have a different private address, that can be mapped to a port of our public IP to make the device reachable from outside our network.
The same principle is valid for Carrier Grade NAT. The ISP assigns a public IP address to a series of devices belonging to different people, each mapped to a different port of the public IP.
Why Web Scraping Proxy Matters
Once understood how mobile IP work, it’s easy to understand why they are almost unbannable for a website. They cannot take the risk to ban a mobile IP, cutting out hundreds of potential visitors, because of suspicious requests. And even a high rate of requests from the same IP cannot be seen as suspicious by itself, since they can be made from different users.
Building a Mobile Proxy, The Nerdish Way
When I started this project (2019) I had the idea of using a Raspberry PI board with a mobile modem as a mobile proxy for the reasons described before. It was only one IP but would be enough for the website I needed to scrape.
First of all, I needed a Raspberry PI board. At that time I had a Model 3.
OT: Just having a look at my Amazon account I have seen I bought it for 19 EUR. Actual price: 89 EUR, vat excluded. Good job inflation.
Then I needed a GMS Hat, a 4G modem where to insert my SIM and connect the Raspberry to the Internet. I opted for this model but I’m sure there’s something more recent and better nowadays on the market.
Then bought some pins and support for creating a wonderful artifact.
And this is the result, far from being perfect aesthetically but still working after 4 years.
First of all, I needed to make the modem work on Raspberry PI. I’ve spent days trying to understand how to do it since the signal was not stable and the modem seemed to crash after some time. Finally, I wrote a script that checked if the modem was available and in case it was not so, it rebooted the modem programmatically, and that was enough to keep the proxy reliable for my usage.
The second step was to set up a proxy on Raspberry PI. Far from being an expert on this topic, my research at that time led me to choose Squid. The configuration is pretty straightforward and since I don’t want to write the 100th tutorial on how to configure a proxy with Squid, here’s a link that explains how to do it in a very clear way.
The difficulties start when I want to connect to the proxy from another machine because, as we said before, my proxy is behind a CGNAT.
After some research and study, I’ve got my enlightenment: from the Raspberry PI, I created a reverse SSH tunnel to one of my servers that’s always on. So the scrapers could use this server’s IP and port, that is known, to reach the Squid port on my Raspberry. Unluckily this great answer was not yet written, it would have saved me a lot of time, but that’s exactly the way I would do it.
Adding a little trick, we force the reboot of the machine every day to get a new IP, we can say that after more than 3 years of functioning, this little machine allowed us to save hundreds if not thousands of USD in mobile proxies.
Web Scraping Proxy: The Easy Way
With the increasing number of websites needing a proxy, in early 2020, we wanted to increase the number of IPs. I had a spare 4G little modem from my previous house moving and wanted to test a new solution. The Raspberry was doing its job greatly but with more requests incoming it started to freeze randomly, so I needed more power.
We bought one (and later added another one) Mini-PC and installed Ubuntu on it.
The only job to do here was to set up Squid and the SSH tunnel, so after connecting to the Internet with my 4G model, we had another mobile proxy ready to use.
Pros and Cons versus Commercial Alternatives
This solution has, of course, some pros and cons against standard commercial alternatives like proxy providers or in-house modems like Proxidize.
The whole setup with Raspberry cost me something like 50$ 2 years ago while including the external 4G modem, the setup with Mini-PC it’s something like 150$. In the long run, they are irrelevant costs.
Bandwidth costs instead are the most significant ones. In Italy, the monthly pricing plan we have now for “Unlimited GB”, where unlimited means 500GB a month, is 25$ (when we started it was 80$). That’s basically almost half the price of a single GB you will pay for most of any mobile proxy provider.
The backlash of this solution is that you have very few IPs to rotate between. Even if they are not blocked as often as datacenter or residential IPs, it still can happen.
In case I’ll need more GBs or IPs in the future, I should buy new hardware and SIM and set up all the infrastructure. Not a huge task more time-consuming than simply doing nothing and using as many GB and IPs as I like with a current proxy provider.
What really kept me away from enlarging our fleet of internal mobile proxies was the costs of the data plans. It’s a fixed cost per month and 80$ per month was a considerable amount of money if you want to buy 10 or 20 sims. But given the actual price of 20$, it could be that in the future I’ll begin another experiment, this time using older mobile phones to browse websites and scrape data from them.
This article was kindly provided by Pierluigi Vinciguerra, web scraping expert and founder of Web Scraping Club. Follow this link to see the original post.
Download GoLogin here and scrape even the most protected websites with our free plan!