Researchers have developed a new tool that unveils which data in a web account, such as emails, searches, or viewed products, are being used to target which outputs, such as ads, recommended products, or prices.
Roxana Geambasu and Augustin Chaintreau, both assistant professors of computer science at Columbia Engineering, along with PhD student Mathias Lecuyer created the tool called XRay to understand how personal data is being used on web services like google, amazon, Facebook and youtube.
"Today we have a problem: the web is not transparent. We see XRay as an important first step in exposing how websites are using your personal data," said Geambasu.
While harnessing big data can certainly improve our daily lives, these beneficial uses have also generated a big data frenzy, with web services aggressively pursuing new ways to acquire and commercialise the information.
Also Read
"It's critical, now more than ever, to reconcile our privacy needs with the exponential progress in leveraging this big data," said Chaintreau.
"If we leave it unchecked, big data's exciting potential could become a breeding ground for data abuses, privacy vulnerabilities, and unfair or deceptive business practices," Geambasu added.
For example, one can use the XRay prototype to study why a user might be shown a specific ad in Gmail. Geambasu and Chaintreau found, for example, that a Gmail user who sees ads about various forms of spiritualism might have received them because he or she sent an email message about depression.
"The theoretical results were encouraging, but seemed too good to be true. So we tested XRay in actual situations, learning from experiments we ran on Gmail, Amazon, and YouTube, and refining the design multiple times.
The current XRay system works with Gmail, Amazon, and YouTube. However, XRay's core functions are service-agnostic and easy to instantiate for new services, and they can track data within and across services.
The key idea in XRay is to use black-box correlation of data inputs and outputs to detect data use.
To assess XRay's practical value, the researchers created an XRay-based demo service that continuously collects and diagnoses Gmail ads related to a set of topics, including various diseases, pregnancy, race, sexual orientation, divorce, debt, etc.