WebBundles Harmful to Content Blocking, Security Tools, and the Open Web (Standards Updates #2)
This is second in a series of blog posts describing new and proposed web standards and how they support or threaten web privacy. This post is written by Senior Privacy Researcher Peter Snyder (@pes10k).
In a Nutshell…
Google is proposing a new standard called WebBundles. This standard allows websites to “bundle” resources together, and will make it impossible for browsers to reason about sub-resources by URL. This threatens to change the Web from a hyperlinked collection of resources (that can be audited, selectively fetched, or even replaced), to opaque all-or-nothing “blobs” (like PDFs or SWFs). Organizations, users, researchers and regulators who believe in an open, user-serving, transparent Web should oppose this standard.
While we appreciate the problems the WebBundles and related proposals aim to solve, we believe there are other, better ways of achieving the same ends without compromising the open, transparent, user-first nature of the Web. One potential alternative is to use signed commitments over independently-fetched subresources. These alternatives would fill a separate post, and some have already been shared with spec authors.
The Web Is Uniquely Open, and URLs Are Why
The Web is valuable because it’s user-centric, user-controllable, user-editable. Users, with only a small amount of expertise, can see what web-resources a page includes, and decide which, if any, their browser should load; and non-expert users can take advantage of this knowledge by installing extensions or privacy protecting tools.
The user-centric nature of the Web is very different from most application and information distribution systems. Most applications are compiled collections of code and resources which are difficult-to-impossible to distinguish and reason about. This difference is important, and is part of the reason there are many privacy-protecting tools for the Web, but very few for “binary” application systems.
At root, what makes the Web different, more open, more user-centric than other application systems, is the URL. Because URLs (generally) point to one thing, researchers and activists can measure, analyze and reason about those URLs in advance; other users can then use this information to make decisions about whether, and in what way, they’d like to load the thing the URL points to. More important, experts can load https://tracker.com/code.js, determine that it’s privacy-violating, and share that information with other users so that they know not to load that code in the future.
WebBundles Make URLs Meaningless
Google has recently proposed three related standards, WebBundles, Signed HTTP Exchanges (sometimes abbreviated to SXG), and Loading. Unless otherwise noted, this post will use the single term “WebBundles” to refer to all three specs. So far, WebBundles have been pitched for use in proposed advertising systems (i.e., TURTLEDOVE, SPARROW) and as parts of a follow-up to Google’s AMP system, although I suspect this is just the tip of the iceberg.
Put differently, WebBundles make Websites behave like PDFs (or Flash SWFs). A PDF includes all the images, videos, and scripts needed to render the PDF; you don’t download each item individually. This has some convenience benefits, but also makes it near-impossible to reason about an image in a PDF independently from the PDF itself. This is, for example, why there are no content-blocking tools for PDFs. PDFs are effectively all or nothing propositions, and WebBundles would turn Websites into the same.
By changing URLs from meaningful, global identifiers into arbitrary, package-relative indexes, WebBundles give advertisers and trackers enormously powerful new ways to evade privacy and security protecting web tools. The next section gives some examples why.
WebBundles Allow Sites to Evade Privacy and Security Tools
URLs in WebBundles are arbitrary references to resources in the bundle, and not globally shared references to resources. This will allow sites to evade privacy and security tools in several ways.
At root, the common cause of all these evasions is that WebBundles create a local namespace for resources, independent of what the rest of the world sees, and that this can cause all sorts of name confusion, undoing years of privacy-and-security-improving work by privacy activists and researchers. The sections below discuss just three ways that this confusion could be exploited by Websites with WebBundles.
Evading Privacy Tools By Randomizing URLs
Previously, if a site wanted to include (say) a fingerprinting script, it would include a <script> tag pointing to the fingerprinting script on the site. Each page on the site would refer to the same fingerprinting script by the same URL. Researchers or crowd-sourcers could then record the URL of that fingerprinting script in a list like EasyPrivacy, so that privacy-minded users could visit the site without fetching the fingerprinting script. This is how the vast majority of blocking and privacy tools work on the Web today.
WebBundles make it easy for sites to evade privacy tools by randomizing URLs for unwanted resources. What on the current Web is referred to everywhere as, say, example.org/tracker.js, could in one WebBundle be called 1.js, in the next 2.js, in the third 3.js, etc. WebBundles encourage this by removing all costs to the site; caching becomes a wash (because you’re already returning all resources to every user and caching the entire bundle), and there is no need to maintain a URL mapping (because the bundle you sent to the user already has the randomized URL).
Evading Privacy Tools By Reusing URLs
Even worse, WebBundles would allow sites to evade blocking tools by making the same URL point to different things in each bundle. On the current Web, https://example.org/ad.jpg points to the same thing for everyone. It’s difficult for a website to have the same URL return two different images from the same URL. As a result, blocking tools can block ad.jpg knowing that they’re blocking an advertisement for everyone; there is little risk that it’s an advertisement for some users, and the company logo for others.
WebBundles again change this in a dangerous way. Example.org could build a WebBundle so that https://example.org/ad.jpg in one bundle refers to an advertisement, in another bundle refers to the site’s logo, and in a third bundle refers to something else. Not only does this make building lists for researchers difficult-to-impossible, but it gives sites a powerful new capability to poison blocking lists.
Evading Privacy Tools By Hiding Dangerous URLs
Finally, WebBundles enable an even more dangerous form of evasion. Currently, groups like uBlock Origin and Google’s Safe Browsing project build lists of URLs of harmful and dangerous web resources. Projects such as these two consider the URL either the only, or a significant, input when determining whether a resource is harmful. The universal, global nature of a URL is again what makes these lists useful.
WebBundles again enable sites to evade these protections, by allowing sites to refer to known-bad-resources by known-good-urls. It would be very difficult to get sites to treat, say, https://cdn.example.org/cryptominer.js, as if it were https://cdn.example.org/jquery.js (and vice versa) on the wider Web; in a WebBundle it’d be trivial.
WebBundles Make Privacy Violations that are Currently Difficult, Easy
The designers and advocates of the WebBundle specs argue that none of this is new, that all of the above ways of circumventing privacy protections are already possible. This is technically true, but misses the big picture by ignoring economics. WebBundles make circumvention techniques that are currently expensive, fragile and difficult, instead cheap or even free.
For example, it’s true that web sites can use a large number of URLs to refer to the same file, to make things difficult for blocking tools, but in practice this is difficult for sites to do. Randomizing URLs harms caching, requires that a mapping from the random URL to the true value be stored somewhere persistently and pushed out to CDNs, and so on. It’s possible for sites to do this, but it’s difficult and costly, and so it’s uncommon.
In general, WebBundles make something undesirable, dramatically easier because it is cheaper.
This post focuses on the harm WebBundles will do to privacy and security tools. We have additional concerns with WebBundles and the related standards. We may write about them in future posts, but a partial list includes:
- SXG lacks a repudiation system: If a site accidentally includes, say, malware today, the site can solve the problem by just updating the site. If a site signs a WebBundle with SXG, there is no clear way for the signer to indicate “no longer trust this specific bundle.”
- Interactions with Manifest v3: Manifest v3 limits extensions to using URL patterns for blocking; WebBundles makes those URLs meaningless. These two features together will allow sites to completely circumvent blocking.
- Origin Confusion: Loading + SXG allow you to fetch content from one server, but execute it with the privacy and security properties of another server. The potential for user confusion is enormous, and while we are confident that Googlers are working hard to try and address the UI/UX issues here, the risk to users is enormous and not reduced to manageable forms.
Brave works to improve privacy on the Web, in the web browser we build, the tools we build and share, and the advocacy we do in standards bodies. The concerns shared in this post are just one example of the work Brave does to try and make sure Web standards stay focused on privacy, transparency and user control.
We’ve tried to work at length with the WebBundle authors to address these concerns, with no success. We strongly encourage Google and the WebBundle group to pause development on this proposal until the privacy and security issues discussed in this post have been addressed. We also encourage others in the Web privacy and security community to engage in the conversation too, and to not implement the spec until these concerns have been resolved.
One way to join the conversation would be to comment on this issue describing these concerns with WebBundles (both the issue and this blog post were written by the same person). Other options include opening new issues against the spec, or letting your web browser know how important privacy tools are to you, and that the risk this proposal poses to those tools.
 Particularly in ensuring the integrity of the initial page and its subresources.
 There are exceptions, and there is nothing in the web platform that requires this, but it’s nevertheless the case that URLs are generally expected to be semi-permanent. This semi-permanent expectation is reflected across the Web platform, including aspects of cache policy, how libraries instruct people to deploy code, etc.
 They can also be references to resources outside the bundle, but doing so defeats the purpose of the bundle in the first place, so it’s not discussed further in this post.
 As discussed later, not impossible, but difficult. The point is that WebBundles make evasion methods that are currently difficult and fragile, easy and effortless for attackers.
Continue reading for news on ad blocking, features, performance, privacy and Basic Attention Token related announcements.
Brave Ads campaigns are now supported in 191 countries with over 2.4 billion ad confirmations to date (a 140% increase from our last report in July). To date, there have been 2,039 campaigns…
Introduced in April 2019, Brave Ads provide Brave’s current 18M monthly active users the choice to opt-in to privacy-preserving advertising.
Brave Research is a highly dynamic team of researchers and developers whose goal is to push the envelope when it comes to some of the more adventurous aspects and needs of the Brave browser and the underlying ecosystem.
We are excited to announce that Gemini and Brave have partnered to make it easier for users to buy, sell, store, and earn crypto when using the Brave browser.
In the first blog post of this series, we presented a straw-man version (StrawTHEMIS) of THEMIS. THEMIS is a decentralized ad platform designed by the Brave Research team. We described (i) how a user is rewarded for the interactions with ads, and (ii) how an...
Starting today, Brave desktop users (version 1.12) and Android users (version 1.12) can use our fully redesigned sync functionality to sync data from desktop to desktop, as well as across desktop and Android devices. Support for iOS will follow shortly. Sync v2 was...