Blocked or Broken? Automatically Detecting When Privacy Interventions Break Websites
Michael Smith (University of California, San Diego), Peter Snyder (Brave Software), Moritz Haller (Brave Software), Benjamin Livshits (Brave Software, Imperial College of London), Deian Stefan (University of California, San Diego), Hamed Haddadi (Brave Software, Imperial College of London) | Privacy
A core problem in the development and maintenance of crowdsourced filter lists is that their maintainers cannot confidently predict whether (and where) a new filter list rule will break websites. The enormity of the Web prevents filter list authors from broadly understanding the compatibility impact of a new blocking rule before shipping it to millions of users. This severely limits the benefits of filter-list-based content blocking: filter lists are both overly conservative (i.e. rules are tailored narrowly to reduce the risk of breaking things) and error-prone (i.e. blocking tools still break large numbers of sites). To scale to the size and scope of the Web, filter list authors need something better than the current status quo of user reports and manual review, to stop breakage before it has a chance to make it to end users.
In this work, we design and implement the first auto- mated system for predicting when a filter list rule breaks a website. We build a classifier, trained on a dataset generated by a combination of compatibility data extracted from the EasyList filter project and novel browser instrumentation, and find that our classifier is accurate to practical levels (AUC 0.88). Our open-source system requires no human interaction when assessing the compatibility risk of a proposed privacy intervention. We also present the 40 page behaviors that most predict breakage in observed websites.