How we Choose and Rank Content in Brave Today
Today we announced a news reader integrated into the browser called Brave Today which can be found by scrolling down on new tab pages. We’re using Brave’s new private CDN to fetch RSS feeds anonymously via the browser and the browser’s personalization capabilities to rank the headlines for users with a simple algorithm that will make the experience interesting for everyone.
In a future release users will determine what ‘interesting’ means for themselves as they add any RSS feed they can find, but we wanted to start with something that everyone could use and understand. So, we found the RSS feeds from publishers leading Comscore, Alexa and Feedly rankings in a range of different content categories. We focused on sources that clearly had a number of paid contributors, supported by advertising or subscriptions or some combination. Lastly, they had to have reliable and valid RSS feeds which was not the case for several of the top publishers.
We reduced the list to about 300 sources to control the amount of data being sent to the browser over the network. And then we devised a ranking algorithm that balanced a few different attributes that would make the content in the feed feel fresh and relevant.
The key to making it all work was in using Brave’s capabilities.
The table stakes here were that Brave Today had to be private. As we detailed in the Private CDN blog post, the content in the feed must be collected directly by the user’s browser without knowing anything about that browser and leaving no data trail available for anyone to capture or follow, including us at Brave. There must be no way to know which content has been viewed or clicked or which sources a user has enabled or disabled.
Then we looked at how to apply some initial learning capabilities to personalize content in the feed. While the content feed had to work well by default with no personalization, we needed a model that could rank content without user actions, behaviors or derived data ever leaving the device. In our first iteration, the browser looks for domain matches between the list of feeds available and the recent browsing history to rank content for the user. Subsequent iterations will use similar techniques applied by Brave’s User Ads system which can learn intentions and interests which, again, are only ever accessible on the user’s local browser.
Brave users also love the browser for its speed. Of course, we would love to pack Brave Today full of content from tons of sources, but until we get feedback on performance from users and test various methods of content delivery, Brave Today must not slow things down for users. Unlimited local RSS subscriptions will come, but it will have to wait for now.
We also focused on the look and feel of the cards and the flow of content as you scroll using several different card layouts including large cards, small cards and cards with lists of items. We wanted the visual effects to reflect the same principles as the content where variety keeps it fresh and inviting as you scroll.
We then used a combination of personalization, recency, publishing frequency, and a degree of randomization to flow content into those cards to create the experience we were looking for — fresh and relevant.
We reverse-sorted content items after scoring as follows:
- Every content item starts with 0 points. A lower score is better when reverse-sorting.
- At the server, assign log(seconds_since_published) in points to each item so we can reward freshness
- Locally in the browser, assign -5 points for every item whose base domain matches a domain from recently visited sites. This helps to prioritize content from your favorite publishers over others.
So, for example, an article published 24 hours ago would initially have a score of 11 (86400 seconds in a day — log(86400) = 11.366742954792146) and then earn -5 points if the domain of that article matched a domain in the user’s recent browser history, receiving a final score of 6.
Then we include some local randomization in the feed sequence to add some variety and to ensure the feed isn’t overly news-y. We wanted to surface items you might not expect.
Of course, the user has to be in charge, and not everyone has the same tastes. So, we made the source selection configurable. We included enable/disable toggles for each source. In fact, we included a toggle for Brave Today itself so people could turn it off if they found it too distracting. From our testing, many of us unwittingly lost a good chunk of our day scrolling and reading things we discovered in our feeds which gave us some ideas for the premium version, such as scheduled on and off times.
Finally, we wanted to weave in content from partners, affiliates, and our own Brave products in a seamless way. Some of those features are still in development, but at launch we introduced content cards interspersed in the feed from sources such as our own Brave Offers store where we provide discounts and deals on products and services from providers such as Amazon, 1Password, Coinbase and LinkedIn, among others. Soon we’ll also have promoted content from more Brave partners.
As with other content in the feed, users will be able to toggle promoted sources on and off to suit their tastes, and we’re working on a premium ad-free version of Brave Today, too.
Of course, we had an important objective in all this: to support publishers and creators by encouraging users to visit their web sites directly. Rather than proxy the reading experience on yet another platform, Brave Today links to publishers’ sites directly, so users can read the articles as they were intended via the web browser. Publishers will be able to maintain their relationships with their readers as they see fit through their own UX.
If a source that you would like to see in Brave Today is missing from the initial defaults on offer then check for updates regularly, as an upcoming release will include manual RSS feed subscriptions, which will give users more control over what they see in their feeds.
Continue reading for news on ad blocking, features, performance, privacy and Basic Attention Token related announcements.
This is the eleventh post in an ongoing, regular series describing new privacy features in Brave. This post describes work done by Senior Software Engineer Mark Pilgrim and Filter List Engineer Ryan Brown, and was written by Director of Privacy Peter Snyder.
Brave, along with a team of DNS experts from the industry and open source communities, recently helped publish an IETF standard (RFC 9103) to fix a long-standing privacy and security hole in the DNS.
Today, Brave launched Brave Talk, a new privacy-focused video conferencing feature built directly into the Brave browser.