Web Scraping: know the data collection technique

In early April, a new data leak exposed 533 million Facebook users worldwide, including the founder of the social network Mark Zuckerberg and around 8 million Brazilians who have profiles on the service.

According to the platform, this data exposure is not due to any invasion of its servers. The information, which stopped at a hacking forum, was obtained through a technique known as scraping.

The method, used by marketing agencies, journalists and data scientists, has already made headlines on other occasions, such as in September 2020, when data from 235 million users on YouTube, Instagram and TikTok were leaked. But the most famous case is perhaps the Cambridge Analytica scandal, in which information from Facebook profiles was used to generate behavioral maps of voters.

Facebook is often one of the targets of scrapers.Facebook is often one of the targets of scrapers.Source: Rawpixel

What is scraping?

Also called web scraping, scraping is a technique that allows you to collect information on the internet in an automated manner, from public databases, available on websites, social networks and other online services.

Generally, the tool is used to speed up the consultation and the collection of this information, while the work done manually would take a much longer time. The agility of the process is due to specific applications, programming language or scripts to copy data on a large scale.

Specific programs are used to automatically collect public information.Specific programs are used to automatically collect public information.Source: Unsplash

Scraping is triggered when a researcher, scientist, journalist or other professional needs to collect a large amount of data to feed a study, research or report, automating the collection on a public basis from the federal government or any other source.

With data scraping, it is also possible to obtain open profile information on social networks (name, photo, address, phone, email, etc.) and through Google, for the most varied objectives, such as the segmentation of advertising campaigns and monitoring competitors.

Is data scraping legal?

Collecting data by scraping is not considered illegal, as long as scraping takes place on public bases. In other words, the information obtained is accessible to any internet user and just as visiting someone’s profile and viewing the data made available there is not a crime, using an automated tool for such work also does not violate the laws.

Public data on social media can be Public data on social media can be “scraped”.Source: Unsplash

However, it is necessary to know that Facebook, Instagram, YouTube and TikTok, among other platforms, currently consider the automated copy of data stored by them as a violation of the rules of use of their services.

Are there risks for those who have the data copied?

When using scraping, people and companies can have access to public information of any individual included in that database, such as phone number, e-mail, profile picture, age and sex, depending on the type of source accessed by the automatic tool.

In the case of a social network, scrapers also get details such as number of followers, engagement and even shared links, in addition to public posts and other content open to other users, if the platform grants such access.

Photos collected by scraping have already been used in facial recognition programs.Photos collected by scraping have already been used in facial recognition programs.Source: Pixabay

In general, the collection of this information does not cause any major damage, if done with the aim of supporting research and campaigns with user authorization.

However, the method can also be used with malicious intentions, by cybercriminals looking for data to apply scams and other types of fraud or in actions like Cambridge Analytica – it is suspected that the data served to favor Donald Trump in the elections of the United States. USA in 2016.

How to reduce the dangers?

To prevent further leaks, platforms have blocked data collection by robots and launched new privacy options. But as it is not always possible to avoid the actions of hackers and bots, the user must take the utmost care so that their information does not fall into the wrong hands.

Review your profile privacy settings.Review your profile privacy settings.Source: Facebook / Reproduction

Thus, increasing the privacy of your profiles on social networks is a good measure to reduce the chances that data will be collected by scraping. An interesting tip is to limit the viewing of information and posts to followers only, by modifying the account configuration.

This privacy review should be done on all platforms, leaving as little data as possible open.

Leave a Comment