Summary
This post presents research on the privacy harms and risks of Google’s recent Related Website Sets feature, to be presented at the 2024 Internet Measurement Conference. The research finds both that the Related Website Sets feature would reverse some of the privacy benefits of deprecating third-party cookies, and that Google’s justification for reintroducing this privacy harm (i.e., that Web users can tell when two different sites are run by the same organization) is untrue for many, potentially most, Web users. The study supports other browsers’ decision to reject the feature because of its privacy risks, and highlights the risk Related Website Sets poses to Chrome users.
The study was conducted by researchers at University of St Andrews, Imperial College London, Hong Kong University of Science & Technology (GZ), and Brave Software. This post was written by Principal Privacy Researcher Peter Snyder.
Related Website Sets (RWS) is a recent Chrome feature, proposed by Google in anticipation of the end of third-party cookies. The privacy and security harms caused by third-party cookies are well documented, and have led every major Web browser to either block third-party cookies, or announce plans to do so (even if Google has, again, pushed its planned deprecation date back).
According to Google, Related Website Sets reenables third-party-cookie-like-behavior where it benefits users, without reintroducing the broader privacy harms of third-party cookies. In reality, RWS aims to allow (for example) Google to link the videos you watch on YouTube to your Google profile, even when you’re not logged into YouTube, and even after third-party cookies have been deprecated in Chrome. While the research described in this post presents and evaluates Google’s stated motivations with RWS, the core truth is that RWS exists for advertiser-serving situations like the above.
The broad idea behind RWS is that if two different sites are run by the same organization (for example, instagram.com and facebook.com are both run by Meta), then there is no need for the browser to block third-party cookies between the two sites, since the user already expects that both sites will share information with each other.
More casually, the motivation behind RWS is something like this: there’s no point in telling your mom a secret, and then trying to keep that secret from your dad; you should assume your parents are going to share everything with each other.
RWS is a user-hostile weakening of the Web’s privacy model, plainly designed to benefit websites and advertisers, to the detriment of user privacy. Google argues that RWS actually benefits users, either because the privacy exceptions help fix “site compatibility issues” or to keep users “signed in” across related domains. But a quick look at the actual Related Website Sets exceptions list reveals many examples unrelated to even these (even hypothetically) user-benefiting use cases, and these sites work correctly in browsers that do not implement RWS (i.e., almost all other browsers).
In reality, the primary motivation behind Related Website Sets is as frustrating as it is unsurprising: to benefit advertisers to the detriment of users (or, as Google euphemistically says, to “show you personalized content”). As with so many other user-harmful and needlessly-complex choices in Chrome’s overarching “Privacy Sandbox” proposal, RWS exists to make sure Chrome continues to serve advertisers’ needs first, even once Google has been shamed into (finally) deprecating third-party cookies.
Study Description: Users (Understandably) Do Not Anticipate Site Relations
The study considered RWS impact on Web privacy by testing whether the underlying assumption in RWS is correct: can Web users accurately determine if two different sites are related to each other? More specifically, if a Web user is presented with two different websites, how accurately are they able to decide whether the two sites are related to each other, given the existing site-relationships defined by Chrome’s RWS list.
In general, we found that Web users cannot accurately determine if two sites are related to each other (as determined by the Related Sites Set feature). We conducted a user study with 30 Web users, recruited over social media, and presented them each with 20 pairs of websites. Website pairs were randomly selected from both the Related Website Sets list (i.e., sites Google designates as “related”, and so warranting reduced privacy protections), and the Tranco list of popular websites. Each user was presented with different pairs of websites, asked to view the sites, and then decide if they thought the two sites were operated by the same organization. This resulted in 430 determinations of whether unique pairs of websites were related (some of the 30 users did not provide an answer to all of the 20 website pairs they were presented).
We found that users’ expectations for which sites were related often didn’t match the Related Website Sets list, and as a result, the RWS feature re-enables third-party cookie-like behavior in many cases users could not anticipate. In our study, the large majority of users (~73%) made at least one incorrect determination of whether two sites were related to each other, and almost half (~42%) of the determinations made during the study (i.e., all determinations from all users) were incorrect. Most concerning, of the cases where both sites were related (according to the RWS feature), users guessed that the sites were unrelated ~37% of the time, meaning that users would have thought Chrome was protecting them when it was not.
We conclude from this that the premise underlying RWS is fundamentally incorrect; Web users are (understandably, predictably) not able to accurately determine whether two sites are owned by the same organization. And as a result, RWS is reintroducing exactly the kinds of privacy harms that third-party cookies cause.
Lest anyone judge the study participants for being uninformed, or not taking the study seriously, consider for yourself: which of the following pairs of sites are related?
-
hindustantimes.com and healthshots.com
-
vwo.com and wingify.com
-
economictimes.com and cricbuzz.com
-
indiatoday.in and timesofindia.com
Keep in mind, a user needs to determine whether two domains are related before clicking on a link; once a site has been loaded, any information sharing and tracking has already occurred.
In conclusion, we find that RWS will be harmful to user privacy, and reintroduce the kinds of privacy harms the Web has been moving away from by removing third-party cookies. The full paper will be presented at the 2024 Internet Measurement Conference.
(For the above quiz, if you chose “4”, then, unfortunately that is incorrect. That is in fact the only pair of the four that isn’t considered “related” to each other.)
However, beyond the findings from the user study, we note a more fundamental privacy harm with RWS. RWS rests on the idea that if two sites are related to each other, then it’s harmless (or, at least “acceptable”) for the browser to reduce privacy protections between those two sites. Or, to go back to the previous analogy, if mom already knows something, then there’s no harm in telling dad; dad is going to find out regardless.
This assumption is wrong; modern Web browsers are perfectly capable of preventing (say) Meta from knowing that your Facebook account and your Instagram account are owned by the same person if you register them with different email addresses and information. In fact, this is the default behavior of most Web browsers today, both browsers focused at a popular audience (e.g., Brave, Firefox, Safari) and browsers targeting specialized audiences (e.g., Tor Browsers, Icefox). Unless you use the same credentials to register an account on two different sites, modern browsers can absolutely prevent two sites operated by the same organization from linking your behaviors across those sites. Or, in other words, modern Web browsers can absolutely prevent Mom from telling Dad your secrets.
Finally, we acknowledge that some companies do try to circumvent the privacy protections in Web browsers, to try and allow two sites run by the same organization to link your accounts across sites. Some sites use techniques like link decoration or bounce tracking to try and continue tracking you. But the difference here between privacy respecting browsers (which include link decoration and bounce tracking protections) and Chrome (which is explicitly designed to allow cross-site linkage) is damning: some browsers are experimenting with techniques to prevent organizations from tracking you across sites, and some browsers are designing features with the explicit intent of allowing such tracking.
Conclusions
In conclusion, our study finds that RWS is harmful for Web privacy, and in three ways:
First, RWS assumes users can anticipate which sites are related to each other, but in practice users cannot.
Second, RWS introduces privacy harm even before users have the the opportunity to decide if two sites are operated by the same organization; by the time users can view a webpage to try and decide if two sites are related to each other, the privacy harm has already occurred, and sites have had the opportunity to track the user across site boundaries.
And third, RWS entrenches a privacy-harmful assumption in the Web platform, instead of working to excise it. RWS assumes that if two sites are owned by the same organization, then the organization should be allowed to track you across those two sites. In contrast, privacy respecting browsers have gone in the opposite direction, and tried to prevent all sites from tracking you, regardless of what organization owns them.
Although Related Website Sets is being presented as a general Web proposal, the truth is that most of the Web has already considered and rejected it. Most browsers, including Brave, Firefox, and Safari, have publicly stated that they believe Related Website Sets (previously called First-Party Sets) is bad for users, and bad for the Web. The proposal has been removed from the W3C Privacy Community Group and is no longer being considered by any privacy-focused group in the W3C.
When Websites Change of Hands
What happens if / when the domains in the list change hands? This is a common concern with all sorts of “pin trust to a domain” proposals across this history of the Web. Just because domains A, B, and C are operated by the same organization today does not (at all) guarantee that they’ll be owned by the same organization tomorrow.
Security and privacy attacks from exactly these kinds of assumptions have happened with browser extensions that have been sold from “trustworthy” parties to malicious parties, or when popular software libraries / dependencies have been taken over by a malicious actor.
The broader concern is that, even if these sites are meaningfully related at the time they’re included in the list, there is no mechanism that will remove them when they (often silently) change hands.
Language / Perception Concerns
As mentioned above, the underlying justification (as flimsy as it is) for RWS is that users can perceive that they’re operated by the same organization. Our study finds that, even for English speaking users evaluating English sites, users can’t anticipate what sites Google judges to be related. This problem will (of course) get much worse when people are visiting sites in languages they do not speak.
Timing
The intuition behind RWS is that users will be able to determine if site B is related to site A, and then only visit site B if that arrangement is acceptable. However, this is a catch 22. In order to determine if site B is related to site A, I need to visit site B and see the “shared branding or logo” (or similar), indicating the relationship between these sites. However, once I’ve loaded the site to view it, it’s already too late, and my information has been shared between the two sites.