Sep 21, 2021
We at Brave Research just published a technical report called “Privacy and Security Issues in Web 3.0” on arXiv. This blog post summarizes our findings and puts them in perspective for Brave users.
Authors: Jordan Jueckstock (North Carolina State University), Peter Snyder (Brave Software), Shaown Sarker (North Carolina State University), Alexandros Kapravelos (North Carolina State University), Benjamin Livshits (Brave Software & Imperial College London)
Despite much web privacy research on sophisticated tracking techniques (e.g., fingerprinting, cache collusion, bounce tracking), most tracking on the web is still done by transmitting stored identifiers across site boundaries. “Stateful” tracking is not a bug but a misfeature of classical browser storage policies: per-site storage is shared across all visits, from both first- and third-party (i.e., embedded in other sites) context, enabling the most pervasive forms of online tracking.
In response, some browser vendors have implemented alternate, privacy-preserving storage policies, especially for third-party site context. However, such changes risk breaking websites that presume the traditional model of non-partitioned third-party storage. Such breakage can itself harm web privacy: browsers that frustrate user expectations will be abandoned for more permissive, privacy-harming browsers, cementing rather than disrupting the status quo.
Our work improves the state of web privacy by measuring the privacy vs. compatibility trade-offs of representative third-party storage policies, with the end-goal of enabling design of browsers that are both compatible and privacy respecting. Our contributions include web-scale measurements of page behaviors under multiple third-party storage policies representative of those deployed in several production browsers. We define metrics for measuring aggregate effects on web privacy and compatibility, including a novel system for programmatically estimating aggregate website breakage under different policies. We find that making third-party storage partitioned by first-party, and lifetimes by site-session achieves the best privacy and compatibility trade-off. We provide complete datasets and implementations for our measurements and tools.
Authors: Lorenzo Minto (Brave Software), Moritz Haller (Brave Software), Hamed Haddadi (Brave Software), Benjamin Livshits (Brave Software, Imperial College London)
Recommender systems are commonly trained on centrally-collected user interaction data like views or clicks. This practice however raises serious privacy concerns regarding the recommender’s col-lection and handling of potentially sensitive data. Several privacy-aware recommender systems have been proposed in recent literature, but comparatively little attention has been given to systems at the intersection of implicit feedback and privacy. To address this shortcoming, we propose a practical federated recommender system for implicit data under user-level local differential privacy(LDP). The privacy-utility trade-off is controlled by parameters ∈ and k, regulating the per-update privacy budget and the number of ∈-LDP gradient updates sent by each user, respectively. To further protect the user’s privacy, we introduce a proxy network to reduce the fingerprinting surface by anonymizing and shuffling the re-ports before forwarding them to the recommender. We empirically demonstrate the effectiveness of our framework on the MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50,000users with 5,000 items. Even on the full dataset, we show that it is possible to achieve reasonable utility with HR@10>0.5 without compromising user privacy.
Authors: Michael Smith (University of San Diego), Peter Snyder (Brave Software), Benjamin Livshits (Brave Software), Deian Stefan (University of San Diego)
We designed SugarCoat to generate resource replacements compatible with existing content blocking tools, including uBlock Origin and the Brave Browser, and evaluate our implementation by automatically replacing scripts exempted by the 6,000+ exception rules in the popular EasyList, EasyPrivacy, and uBlock Origin filter lists. Crawling a sample of pages from the Alexa 10k, we find that SugarCoat preserves the functionality of existing pages—our replacements result in Web-compatibility properties similar to exempting scripts—while providing privacy properties most similar to blocking those scripts.
SugarCoat is intended for real-world practical deployment, to protect Web users from privacy harms current tools are unable to protect against. Our design choices emphasize compatibility with existing tools, policy flexibility, and extensibility. SugarCoat is open source and is being integrated into Brave’s content blocking tools.
Authors: Panagiotis Papadopoulos, Inigo Querejeta Azurmendi, Jiexin Zhang, Matteo Varvello, Antonio Nappa, Benjamin Livshits
CAPTCHA systems have been widely deployed to identify and block fraudulent bot traffic. However, current solutions, such as Google’s reCAPTCHA, often either (i) require additional user actions (e.g., users solving mathematical or image-based puzzles), or (ii) need to send the attestation data back to the server (e.g., user behavioral data, device fingerprints, etc.), thus raising significant privacy concerns.
To address both of the above, in this paper we present ZKSENSE: the first zero knowledge proof-based bot detection system, specifically designed for mobile devices. Our approach is completely transparent to the users and does not reveal any sensitive sensor data to the service provider. To achieve this, ZKSENSE studies the mobile device’s motion sensor outputs during user actions and assess their humanness locally with the use of an ML-based classifier trained by using sensor data from public sources and data collected from a small set of volunteers.
We implement a proof of concept of our system as an Android service to demonstrate its feasibility and effectiveness. In our evaluation we show that ZKSENSE detects bots without degrading the end-user experience or jeopardizing their privacy, with 91% accuracy across a variety of bot scenarios, including: (i) when the device is resting (e.g., on a table), (ii) when there is artificial movement from the device’s vibration, and (iii) when the device is docked on a swinging cradle.
Authors: Quan Chen (North Carolina Sate University), Pete Snyder (Brave Software), Ben Livshits (Brave Software), Alexandros Kapravelos. (North Carolina State University)
We share the implementation of our signature-generation system, the dataset from applying our system to the Alexa 100K, and 586 AdBlock Plus compatible filter list rules to block instances of currently blocked code being moved to new URLs.
Authors: Jordan Jueckstock (North Carolina State University), Shaown Sarker (North Carolina State University), Peter Snyder (Brave Software), Aidan Beggs (North Carolina State University), Panagiotis Papadopoulos (Telefonica Research), Matteo Varvello (Nokia Bell Labs), Ben Livshits (Brave Software, Imperial College London), Alexandros Kapravelos (North Carolina State University)
Accurate web measurement is critical for understanding and improving security and privacy online. Implicit in these measurements is the assumption that automated crawls generalize to the experiences of typical web users, despite significant anecdotal evidence to the contrary. Anecdotal evidence suggests that the web behaves differently when approached from well-known measurement endpoints, or with well-known measurement and automation frameworks, for reasons ranging from DDOS detection, hiding malicious behavior, or bot detection.
This work improves the state of web privacy and security by investigating how, and in what ways, privacy and security measurements change when using typical web measurement tools, compared to measurement configurations intentionally designed to match “real” web users. We build a web measurement framework encompassing network endpoints and browser configurations ranging from off-the-shelf defaults commonly used in research studies to configurations more representative of typical web users, and we note the effect of realism factors on security and privacy relevant measurements when applied to the Tranco top 25k web domains.
Authors: Neil Agarwal (UCLA), Matteo Varvello (Nokia, Bell Labs), Andrius Aucinas (Brave Software), Fabián Bustamante (Northwestern University), Ravi Netravali (UCLA)
The last three decades have seen much evolution in web and network protocols: amongst them, a transition from HTTP/1.1 to HTTP/2 and a shift from loss-based to delay-based TCP congestion control algorithms. This paper argues that these two trends come at odds with one another, ultimately hurting web performance. Using a controlled synthetic study, we show how delay-based congestion control protocols(e.g., BBR and CUBIC + Hybrid Slow Start) result in the underestimation of the available congestion window in mobile networks, and how that dramatically hampers the effectiveness of HTTP/2. To quantify the impact of such finding in the current web, we evolved the web performance toolbox in two ways. First we develop Igor, a client-side TCP congestion control detection tool that can differentiate between loss-based and delay-based algorithms by focusing on their behavior during slow start. Second, we develop a Chromium patch which allows fine-grained control on the HTTP version to be used per domain.Using these new web performance tools, we analyze over 300 real websites and find that 67% of sites relying solely on delay-based congestion control algorithms have better performance with HTTP/1.1.
Authors: Jordan Jueckstock (North Carolina State University), Peter Snyder (Brave Software), Shaown Sarker (North Carolina State University), Alexandros Kapravelos (North Carolina State University), Benjamin Livshits (Brave Software)
While much current web privacy research focuses on browser fingerprinting, the boring fact is that the majority of current third-party web tracking is conducted using traditional, persistent-state identifiers. One possible explanation for the privacy community’s focus on fingerprinting is that to date browsers have faced a lose-lose dilemma when dealing with third-party stateful identifiers: block state in third-party frames and break a significant number of webpages, or allow state in third-party frames and enable pervasive tracking. The alternative, middle-ground solutions that have been deployed all trade privacy for compatibility, rely on manually curated lists, or depend on the user to manage state and state-access themselves.
This work furthers privacy on the web by presenting a novel system for managing the lifetime of third-party storage, “page-length storage”. We compare page-length storage to existing approaches for managing third-party state and find that page-length storage has the privacy protections of the most restrictive current option (i.e., blocking third-party storage) but web-compatibility properties mostly similar to the least restrictive option (i.e., allowing all third-party storage). This work further compares page-length storage to an alternative third-party storage partitioning scheme inspired by elements of Safari’s tracking protections and finds that page-length storage provides superior privacy protections with comparable web-compatibility.
We provide a dataset of the privacy and compatibility behaviors observed when applying the compared third-party storage strategies on a crawl of the Tranco 1k and the quantitative metrics used to demonstrate that page-length storage matches or surpasses existing approaches. Finally, we provide an open-source implementation of our page-length storage approach, implemented as patches against Chromium.
Authors: Zain ul Abi Din, Panagiotis Tigas, Samuel T. King, and Benjamin Livshits
Online advertising has been a long-standing concern for user privacy and overall web experience. Several techniques have been proposed to block ads, mostly based on filter-lists and manually-written rules. While a typical ad blocker relies on manually-curated block lists, these inevitably get out-of-date, thus compromising the ultimate utility of this ad blocking approach.
In this paper we present PERCIVAL, a browser-embedded, lightweight, deep learning-powered ad blocker. PERCIVAL embeds itself within the browser’s image rendering pipeline, which makes it possible to intercept every image obtained during page execution and to perform blocking based on applying machine learning for image classification to flag potential ads.
Our implementation inside both Chromium and Brave browsers shows only a minor rendering performance overhead of 4.55%, demonstrating the feasibility of deploying traditionally heavy models (i.e. deep neural networks) inside the critical path of the rendering engine of a browser. We show that our image-based ad blocker can replicate EasyList rules with an accuracy of 96.76%. To show the versatility of the PERCIVAL’s approach we present case studies that demonstrate that PERCIVAL 1) does surprisingly well on ads in languages other than English; 2) PERCIVAL also performs well on blocking first-party Facebook ads, which have presented issues for other ad blockers. PERCIVAL proves that image-based perceptual ad blocking is an attractive complement to today’s dominant approach of block lists.
Authors: Peter Snyder, Antoine Vastel, Benjamin Livshits
Ad and tracking blocking extensions are among the most popular browser extensions. These extensions typically rely on filter lists to decide whether a URL is associated with tracking or advertising. Millions of web users rely on these lists to protect their privacy and improve their browsing experience. Despite their importance, the growth and health of these filter lists is poorly understood. These lists are maintained by a small number of contributors, who use a variety of undocumented heuristics to determine what rules should be included. These lists quickly accumulate rules over time, and rules are rarely removed. As a result, users' browsing experiences are degraded as the number of stale, dead or otherwise not useful rules increasingly dwarfs the number of useful rules, with no attenuating benefit. This paper improves the understanding of crowdsourced filter lists by studying EasyList, the most popular filter list. We find that, over its 9 year history, EasyList has grown from several hundred rules, to well over 60,000. We then apply EasyList to a sample of 10,000 websites, and find that 90.16% of the resource blocking rules in EasyList provide no benefit to users, in common browsing scenarios. Based on these results, we provide a taxonomy of the ways advertisers evade EasyList rules. Finally, we propose optimizations for popular ad-blocking tools that provide over 99% of the coverage of existing tools, but 62.5% faster.
Authors: Marc Anthony Warrior (Northwestern University), Yunming Xiao (Northwestern University), Matteo Varvello (Brave Software), Aleksandar Kuzmanovic (Northwestern University)
Abstract: Free and open source media centers are currently experiencing a boom in popularity for the convenience and flexibility they offer users seeking to remotely consume digital content. This newfound fame is matched by increasing notoriety–for their potential to serve as hubs for illegal content–and a presumably ever-increasing network footprint. It is fair to say that a complex ecosystem has developed around Kodi, composed of millions of users, thousands of “add-ons”–Kodi extensions from 3rd-party developers—and content providers. Motivated by these observations, this paper conducts the first analysis of the Kodi ecosystem. Our approach is to build “crawling” software around Kodi which can automatically install an addon, explore its menu, and locate (video) content. This is challenging for many reasons. First, Kodi largely relies on visual information and user input which intrinsically complicates automation. Second, no central aggregators for Kodi addons exist. Third, the potential sheer size of this ecosystem requires a highly scalable crawling solution. We address these challenges with de-Kodi, a full fledged crawling system capable of discovering and crawling large cross-sections of Kodi’s decentralized ecosystem. With de-Kodi, we discovered and tested over 9,000 distinct Kodi addons. Our results demonstrate de-Kodi, which we make available to the general public, to be an essential asset in studying one of the largest multimedia platforms in the world. Our work further serves as the first ever transparent and repeatable analysis of the Kodi ecosystem at large.
Authors: Alexander Sjosten, Peter Snyder, Antonio Pastor, Panagiotis Papadopoulos, Benjamin Livshits
Filter lists play a large and growing role in protecting and assisting web users. The vast majority of popular filter lists are crowd-sourced, where a large number of people manually label resources related to undesirable web resources (e.g. ads, trackers, paywall libraries), so that they can be blocked by browsers and extensions.
Because only a small percentage of web users participate in the generation of filter lists, a crowd-sourcing strategy works well for blocking either uncommon resources that appear on “popular” websites, or resources that appear on a large number of “unpopular” websites. A crowd-sourcing strategy will performs poorly for parts of the web with small “crowds”, such as regions of the web serving languages with (relatively) few speakers.
This work addresses this problem through the combination of two novel techniques: (i) deep browser instrumentation that allows for the accurate generation of request chains, in a way that is robust in situations that confuse existing measurement techniques, and (ii) an ad classifier that uniquely combines perceptual and page-context features to remain accurate across multiple languages.
We apply our unique two-step filter list generation pipeline to three regions of the web that currently have poorly maintained filter lists: Sri Lanka, Hungary, and Albania. We generate new filter lists that complement existing filter lists. Our complementary lists block an additional 2,270 of ad and ad-related resources (1,901 unique) when applied to 6,475 pages targeting these three regions.
We hope that this work can be part of an increased effort at ensuring that the security, privacy, and performance benefits of web resource blocking can be shared with all users, and not only those in dominant linguistic or economic regions.
Authors: Mohammad Malekzadeh, Dimitrios Athanasakis, Hamed Haddadi, Ben Livshits
Contextual bandit algorithms (CBAs) often rely on personal data to provide recommendations. This means that potentially sensitive data from past interactions are utilized to provide personalization to end-users. Using a local agent on the user’s device protects the user’s privacy, by keeping the data locally, however, the agent requires longer to produce useful recommendations, as it does not leverage feedback from other users. This paper proposes a technique we call Privacy-Preserving Bandits (P2B), a system that updates local agents by collecting feedback from other agents in a differentially-private manner. Comparisons of our proposed approach with a non-private, as well as a fully-private (local) system, show competitive performance on both synthetic benchmarks and real-world data. Specifically, we observed a decrease of 2.6% and 3.6% in multi-label classification accuracy, and a CTR increase of 0.0025 in online advertising for a privacy budget ε≈ 0.693. These results suggest P2B is an effective approach to problems arising in on-device privacy-preserving personalization.
Authors: Panagiotis Papadopoulos, Peter Snyder, Benjamin Livshits
Funding the production and distribution of quality online content is an open problem for content producers. Selling subscriptions to content, once considered passe, has been growing in popularity recently. Decreasing revenues from digital advertising, along with increasing ad fraud, have driven publishers to “lock” their content behind paywalls, thus denying access to non-subscribed users. How much do we know about the technology that may obliterate what we know as free web? What is its prevalence? How does it work? Is it better than ads when it comes to user privacy? How well is the premium content of publishers protected? In this study, we aim to address all the above by building a paywall detection mechanism and performing the first full-scale analysis of real-world paywall systems. Our results show that the prevalence of paywalls across the top sites in Great Britain reach 4.2%, in Australia 4.1%, in France 3.6% and globally 7.6%. We find that paywall use is especially pronounced among news sites, and that 33.4% of sites in the Alexa 1k ranking for global news sites have adopted paywalls. Further, we see a remarkable 25% of paywalled sites outsourcing their paywall functionality (including user tracking and access control enforcement) to third-parties. Putting aside the significant privacy concerns, these paywall deployments can be easily circumvented, and are thus mostly unable to protect publisher content.
Authors: Ruba Abu-Salma, Benjamin Livshits
Nowadays, all major web browsers have a private browsing mode. However, the mode’s benefits and limitations are not particularly understood. Through the use of survey studies, prior work has found that most users are either unaware of private browsing or do not use it. Further, those who do use private browsing generally have misconceptions about what protection it provides.
However, prior work has not investigated why users misunderstand the benefits and limitations of private browsing. In this work, we do so by designing and conducting a two-part user study with 20 demographically-diverse participants: (1) a qualitative, interview-based study to explore users' mental models of private browsing and its security goals; (2) a participatory design study to investigate whether existing browser disclosures, the in-browser explanations of private browsing mode, communicate the security goals of private browsing to users. We asked our participants to critique the browser disclosures of three web browsers: Brave, Firefox, and Google Chrome, and then design new ones.
We find that most participants had incorrect mental models of private browsing, influencing their understanding and usage of private browsing mode. Further, we find that existing browser disclosures are not only vague, but also misleading. None of the three studied browser disclosures communicates or explains the primary security goal of private browsing. Drawing from the results of our user study, we distill a set of design recommendations that we encourage browser designers to implement and test, in order to design more effective browser disclosures.
Authors: Umar Iqbal (The University of Iowa), Peter Snyder (Brave Software), Shitong Zhu (University of California Riverside), Benjamin Livshits (Brave Software and Imperial College London), Zhiyun Qian (University of California Riverside), Zubair Shafiq (The University of Iowa)
Authors: Matteo Varvello (Brave Software), Kleomenis Katevas (Imperial College London), Mihai Plesa (Brave Software), Hamed Haddadi (Brave Software and Imperial College London), Ben Livshits (Brave Software and Imperial College London)
Recent advances in cloud computing have simplified the way that both software development and testing are performed. Unfortunately, this is not true for battery testing for which state of the art test-beds simply consist of one phone attached to a power meter. These test-beds have limited resources, access, and are overall hard to maintain; for these reasons, they often sit idle with no experiment to run. In this paper, we propose to share existing battery testing setups and build BatteryLab, a distributed platform for battery measurements. Our vision is to transform independent battery testing setups into vantage points of a planetary-scale measurement platform offering heterogeneous devices and testing conditions. In the paper, we design and deploy a combination of hardware and software solutions to enable BatteryLab’s vision. We then evaluate BatteryLab’s accuracy of battery reporting, along with some system benchmarking. We also demonstrate how BatteryLab can be used by researchers to investigate a simple research question.
Authors: Mohammad Ghasemisharif, Peter Snyder, Andrius Aucinas, Benjamin Livshits
In this work, we consider whether the “reader mode” can be widened to also provide performance and privacy improvements. Instead of its use as a post-render feature to clean up the clutter on a page we propose SpeedReader as an alternative multistep pipeline that is part of the rendering pipeline. Once the tool decides during the initial phase of a page load that a page is suitable for reader mode use, it directly applies document tree translation before the page is rendered.
Based on our measurements, we believe that SpeedReader can be continuously enabled in order to drastically improve end-user experience, especially on slower mobile connections. Combined with our approach to predicting which pages should be rendered in reader mode with 91% accuracy, it achieves drastic speedups and bandwidth reductions of up to 27x and 84x respectively on average. We further find that our novel “reader mode” approach brings with it significant privacy improvements to users. Our approach effectively removes all commonly recognized trackers, issuing 115 fewer requests to third parties, and interacts with 64 fewer trackers on average, on transformed pages.
Authors: Jordan Jueckstock, Shaown Sarker, Peter Snyder, Panagiotis Papadopoulos, Matteo Varvello, Benjamin Livshits, Alexandros Kapravelos
May 21, 2019
In this paper, we design and deploy a synchronized multi-vantage point web measurement study to explore the comparability of web measurements across vantage points (VPs). We describe in reproducible detail the system with which we performed synchronized crawls on the Alexa top 5K domains from four distinct network VPs: research university, cloud datacenter, residential network, and Tor gateway proxy. Apart from the expected poor results from Tor, we observed no shocking disparities across VPs, but we did find significant impact from the residential VP’s reliability and performance disadvantages. We also found subtle but distinct indicators that some third-party content consistently avoided crawls from our cloud VP. In summary, we infer that cloud VPs do fail to observe some content of interest to security and privacy researchers, who should consider augmenting cloud VPs with alternate VPs for cross-validation. Our results also imply that the added visibility provided by residential VPs over university VPs is marginal compared to the infrastructure complexity and network fragility they introduce.
Sep 21, 2021
We at Brave Research just published a technical report called “Privacy and Security Issues in Web 3.0” on arXiv. This blog post summarizes our findings and puts them in perspective for Brave users.
Apr 27, 2020
New data from Brave reveals that European governments have not equipped their national authorities to enforce the GDPR. Brave has called on the European Commission to launch an infringement procedure against 27 European governments.
Feb 4, 2020
Brave has uncovered widespread surveillance of UK citizens by private companies embedded on UK council websites. "Surveillance on UK council websites", a new report from Brave, reveals the extent of private companies’ surveillance of UK citizens when they seeking help for addiction, disability, and poverty from their local government authorities.
Oct 21, 2019
We have written before on Brave’s performance, energy and bandwidth benefits for the user. Brave Shields is our primary mechanism for protecting user privacy, but many users know by now that ad and tracker blocking (or just ad blocking for short) makes the web faster and generally better for them. So far Brave’s estimates of the users’ time saved have been very conservative and somewhat naive: we take the total number of ads and trackers blocked, and multiply that by 50 milliseconds.
Feb 27, 2019
Brave mobile users can expect up to two and a half extra hours of browsing per battery charge.
Feb 15, 2019
In this post we demonstrate that Brave’s privacy benefits from ad-blocking go hand-in-hand with performance improvements. Specifically, a well-implemented ad blocker can deliver 33% to 66% memory savings or 500 MB to as much as 1.9 GB across just 10 pages open in a single session.
Dec 4, 2018
Investigation needed to stop anticompetitive practices that hurt publishers, restrict innovation, and limit consumer choice.
Dec 3, 2018
Brave and a coalition of more than 30 businesses and organizations urges European Governments to break the deadlock on the ePrivacy Regulation in an open letter.
Nov 20, 2018
French regulator's decision against Vectaury confirms that IAB “Transparency & Consent Framework” does not obtain valid consent, and illustrates how even tiny adtech companies can unlawfully gather millions of people’s personal data from the online advertising “real time bidding system” (RTB).
Nov 6, 2018
Brave presents the case for a US federal privacy law that builds on the GDPR, protecting innovation, interoperability, and supporting US leadership.
Oct 10, 2018
Brave's submission to Margrethe Vestager, the EU Anti-Trust Chief.
Oct 1, 2018
Brendan Eich, the CEO of Brave, has written to the US Senate Committee on Commerce, Science, and Transportation, to present the case for GDPR-like data protection standards in the United States.