When people hear the term data scraping, their first thought is often about how companies use this technology for competitive reasons – specifically to pull publicly-available data from millions of websites in order to remain more competitive for pricing or for market awareness. This could include monitoring a competitor’s inventory, especially in these times of supply chain malaise. But data scraping has other uses, too, and one of these is the act of partially “making people’s internet experience safer.”
There is, sadly, a great deal of offensive material online. The internet, after all, is a technology with no filters other than those put up by individual operators. Instagram, YouTube, and Twitter, for example, all have rules about content, and they routinely ban offensive material, but even they cannot keep track of all of it, and they rely on users to bring issues to their attention. Patrolling for offensive material is made even more difficult when it comes to images, since they tend to slip through text-based filters.
Illegal content detection
Public data scraping solutions provider Oxylabs has contributed to the cleanup effort. As one example, described in more detail in a blog available on their website, they worked on a pro bono partnership with the Communications Regulatory Authority of Lithuania (RRT), after winning a hackathon that focused on automating illegal content detection such as child sexual abuse or pornography, specifically in the Lithuanian IP address space.
Their blend of web scraping technology and AI-driven recognition tools made the impossible now possible: to monitor most sites on the Lithuanian web, scanning thousands of pages and searching for potentially harmful images. Once located, the technology forwards those images to specialists to review. The Oxylabs-created tool reports the detected harmful images to a hotline that had, up until then, depended solely on voluntary reports. The tool allowed more proactivity in the process by monitoring the web in the background and making the reports constant. This means it does not depend too much on the changing habits and research styles of the volunteers.
This type of technology has great potential for the global internet, too. With a little add-on of AI it can be used for retailers and manufacturers to identify and track counterfeit goods and the companies that traffic in them.
Searching for insider risk
Such developments go a long way towards turning the tables on cybercriminals of all types. A great deal of cybersecurity is based on preventing attack and searching for insider risk. But with such an elusive and shapeless criminal element trafficking in data and stolen goods, it becomes necessary to proactively go out and search for the offending material with the same amount of energy and drive that is used in reactive searches and mitigation.
Data scraping has the capacity to search publicly visible websites. It is able to use that same level of persistence that spammers and other dark-side actors have always been using, which creates a force – a collection of bots constantly on the hunt for illegal material.
Data gathering solutions
In this way, the technologies made available by Oxylabs represent a vital new approach to cybersecurity overall, a proactive style of hunting, to complement the more traditional forensic style of fixing. Established in 2015, Oxylabs describes itself as a premium proxy and public web data acquisition solution provider, which enables its clients, companies of all sizes, to fully leverage the power of big data. Its tradition of constant innovation, backed up by a large patent portfolio, and a focus on ethics have allowed Oxylabs to become a global leader in the web data acquisition industry and forge close ties with dozens of Fortune Global 500 companies. In 2022, Oxylabs was named the fastest-growing public data gathering solutions company in Europe in the Financial Times’ FT 1000 list.
For more information about Oxylabs, you can visit them here.
By Steve Prentice
Steve Prentice is a project manager, writer, speaker and expert on productivity in the workplace, specifically the juncture where people and technology intersect. He is a senior writer for CloudTweaks.