Back to episodes

Episode 47

New Use Cases For AI and the Brave Search API

Jan Piotrowski, VP, Head of Brave Search Business discusses the importance of independent search engines and the evolution of Brave Search. Jan shares insights on scaling Brave Search to over 30 million daily queries, emphasizing the need for high-quality search results and the potential for future growth in the AI space.

Transcript

[00:00:00] Luke: From privacy concerns to limitless potential, AI is rapidly impacting our evolving society. In this new season of the Brave Technologist podcast, we’re demystifying artificial intelligence, challenging the status quo, and empowering everyday people to embrace the digital revolution. I’m your host, Luke Malks, VP of Business Operations at Brave Software.

[00:00:21] Makers of the privacy respecting Brave browser and search engine, now powering AI with the Brave Search API. Here listening to a new episode of The Brave Technologist, this one features Jan Petrowski, who is a VP at Brave Software and leads business efforts for Brave Search, the largest independent search engine outside of Google and Bing.

[00:00:39] Brave Search sees over 30 million queries per day and provides powerful, real time data to various AI and LLM partners via the Brave Search API. Previously, Jan was an investor at American Express Ventures and a technology investment banker at Credit Suisse. He has a BS in industrial engineering from Stanford and an MBA from Chicago Booth.

[00:00:59] [00:01:00] In this episode, we discuss the evolution of BraveSearch, building the business of the BraveSearch API, New use cases for A. I. And for the brave search A. P. I. And the outlook for the future of how businesses and A. I. Will use things like the brave search A. P. I. And incorporate them into their business and business model.

[00:01:18] And now for this week’s episode of the brave technologist, Jan, welcome to brave technologist. How are you doing today? Good. How you doing, Luke? Thank you. Good, good, good. Been excited for this one. Can you tell us a little bit about how you got to Brave what you’re doing at Brave why it’s important?

[00:01:38] Jan: Yeah, sure. I used to work in venture capital. I met the Brave team and in the hopes of actually Investing in the company and as it turned out, this was after the ICO, I didn’t have the opportunity to invest in instead of the opportunity to join the company. So I joined the company under the business team.

[00:01:55] you all know, look, since then I’ve done a bunch of [00:02:00] roles. I remember when I first joined someone asked, do you know how to use Excel? And I was like, yeah, quite a bunch. I used to work in finance and VC. And so I spent my time helping create models for the company, working on partnerships, you know, alongside you, we did stuff in marketing, finance, all over the place.

[00:02:19] Eventually, you know, as the company grew and Brave hired people smarter than me to take on the various roles, some place where I had an interest and where the company was really focusing was the search side. A lot of people know us for our flagship product, the Brave browser. But what we found, and I always thought it was funny, is that a lot of people confuse a browser and a search engine.

[00:02:41] You know, if you ask your mom, like, how do you search the web or how do you browse the web? A lot of times. A browser and search engine are interchangeable. And so I think brave always had ambitions to build a search engine. We just didn’t have the right opportunity. And that opportunity came around a few years ago, we acquired a [00:03:00] company out of Berta in Germany, a team that had been working on private search and importantly, working on independent private search, you know, for those who know, brave.

[00:03:11] We’ve always wanted to be independent of big tech and really, you know, run our own destiny. And so it’s important that this group working at Alberta had been working and developing an independent search index to build an independent search engine. And so the mission alignment was there. Uh, we acquired the company, they joined us about three and a half years ago now and I started working on a lot of things on the search side and now I help run a lot of the business side of Brave Search.

[00:03:41] These days, that can mean a lot of different things. we want to make sure that Brave Search is one of the best consumer facing search engines in the world. And AI, there’s a really interesting and intriguing data play. Given our brave search API. [00:04:00] And so a lot of my time is spent on that side as well.

[00:04:04] Luke: Just to kind of back it up a little bit. Did you know much about search before you came to brave? Like, was this, you know, how much of this was crash course versus like stuff you were kind of familiar with beforehand?

[00:04:15] Jan: No, I didn’t know much about it at all. Like most people have a user of search engines and.

[00:04:22] I was familiar with kind of the advertising models on search. I think like a lot of people, I took search for granted, you know, it’s so easy for us now. And it’s always been to search for something and you see the results you want and voila, like it’s done. And you don’t realize the complexity behind all the processes that are happening after you type that query.

[00:04:43] And so no, I didn’t know a lot about search. And it’s been, like you said, a crash course, but a fascinating one.

[00:04:50] Luke: Yeah. Yeah. And you look at things like you mentioned independence earlier, right? Maybe we can give the audience a little more context about that. Like, what does that mean to have an independent [00:05:00] search engine?

[00:05:00] Like how many independent search engines are there in the market? why is that important?

[00:05:05] Jan: Yeah. The dirty secret in search is that. Most search engines other than Google and Bing are really just skins on those search results, meaning it’s really expensive and time consuming to build and maintain a search engine.

[00:05:22] And even when you do Google, for example, has a 20 year head start. And so the easy way, the easier way I should say for most people to build a search engine is to rely on API APIs and you know, the predominant API APIs out there, for example. The big API, so a lot of the search engines that we might be familiar with, for example, yahoo dot go, they rely.

[00:05:46] In part, or a lot on 3rd party APIs, such as the Bing search API to serve search results. That’s something I don’t think we ever wanted at brave from the beginning being user 1st. Being very focused on [00:06:00] privacy, we’ve always made a stand to say, Hey, we’re going to do things a little differently than big tech.

[00:06:05] And we wanted that same philosophy to apply to search. And so when you look at the search landscape, and you see that a lot of search engines rely on 3rd party APIs, I think that’s a route that we didn’t want to take. And so again, when we saw this company in Germany, Working or having worked on met for many years on building an independent search index that didn’t rely on third party APIs.

[00:06:26] That was like extremely. attractive for us. It’s important in many ways to be an independent search index for one from a philosophical standpoint. It’s just good to know that your destiny is in your own hands when you’re serving search results. You’re not relying on someone else, you’re not subject to censorship that may be out of your hands, you’re not subject to the whims of, of whoever that provider is, right?

[00:06:54] Simply put, technically speaking as well, there’s an advantage in that [00:07:00] you aren’t beholden to things that are affecting that provider. You know, if there’s an outage, we’ve seen this already, there was an outage on. being a few months ago, a few weeks ago, where a lot of the providers using the Bing search API also had to go down because they were relying on that on that API.

[00:07:16] And so again, that’s another reason why we relish having our own independent index, not relying on others. The last point I’d say is that that data, those search results are our own, and that’s particularly interesting and allows us to have an API business where we’re not reselling. Other people’s search results.

[00:07:37] We are selling our own data. that’s become really interesting. I think in ways that we didn’t anticipate. You were asking about search and how we started or how I started in search. And I went over a little bit of the evolution of search upgrade, but there were things that we knew would be interesting about search from a product perspective, but then there were many things that we didn’t anticipate about search and its importance.

[00:07:59] [00:08:00] And a lot of that. We’re able to take advantage of now because we have an independent index.

[00:08:04] Luke: It’s a great point, too. It’s kind of love to get into that area a little bit more as well. Like, I mean, you mentioned we have this, uh, search API business, obviously going through that whole acquisition process.

[00:08:14] It’s not a small thing to add a search engine to a browser, right? Or to brave as a company. And I’m sure that in that process, there were discussions around, okay, We’re putting this investment in, we’re already a startup. You know, this is a huge risk. Like, how do we plan to monetize this? Was having an API business line, something you guys were even thinking about, or how was, how were you guys thinking about monetizing then?

[00:08:39] And how has that kind of evolved to where we’re at now? It’s a

[00:08:43] Jan: great question. I wish I could go back or I wish I could say that I knew at the time what we’d be doing. I think. You know, when you think of a search engine, the monetization that everyone knows is ads, search ads, uh, which are proven extremely effective and a [00:09:00] great business if done well.

[00:09:01] And so when we were taking on the search engine and exactly like you’re saying, how do we monetize that, right? That has to be a question that you think about when you’re taking on something like this. We, of course, thought about search ads. I think we also also knew that we would have an API business when we launched the search engine about three years ago.

[00:09:20] We always had ambitions to launch an API. In fact, in the announcement blog, we wrote, we will eventually open up an API to power other experiences. We don’t believe in walled gardens. And so I think doing an API was always part of the plan. I think what’s been surprising to us is. The use cases that are coming out of the API, you know, traditionally, we were hoping to power other search experiences you asked earlier and I didn’t really answer your question about other search engines out there.

[00:09:51] Brave is the 3rd. Independent search engine at scale, you know, other than Google and Bing, there are certainly [00:10:00] other search engines working on search, but I’d say they’re at a smaller scale. And we thought we’d be working with a lot of those search engines via the API to power kind of traditional search.

[00:10:11] What we didn’t foresee happening is this whole boom in a I. Where a lot of new search experiences coming out would need the data relevant for search in order to power their experiences. And so the API really, even though it was something that we thought of from the start, has taken on a whole new meaning for us.

[00:10:30] Luke: Yeah, let’s dive into that a little bit more. You know, as AI boom has been huge, how do these companies like these AI companies use our search API and how is it kind of helping with their strategy?

[00:10:42] Jan: Yeah, we’re seeing so many different use cases from so many different types of customers using our API, even I guess I’ll start even outside of.

[00:10:51] Quote unquote AI, you have traditional search engines who are using our API to serve search results. But then as you start to get into other [00:11:00] use cases, a lot of them, including AI, you have companies who are training foundational models, looking at search results relevant to certain queries and training models related to that.

[00:11:10] Then you also have a lot of companies who are answering questions in real time at inference. And how do they answer questions with low latency that probably have citations so that the user is comfortable that the answer is real, right? And so there are a lot of companies using us in that sense. Things that have surprised us around like business intelligence.

[00:11:31] You have companies using us who want the latest news on a different company or an industry, right? Maybe I’m a sales team that’s reaching out to us prospect. And I want to make sure that I know about their latest acquisition. With our latest fundraise and I tailor my outreach according to that you have companies out there who are interested in different locations and places of interest on the web and what are the different attributes related to those places and [00:12:00] how do I tailor a e commerce experience related to those places or a travel experience related to those experiences?

[00:12:07] Right? And so I think a little bit of we’re learning as we go time. We have a call with. A new API customer. It’s like, Oh, that’s, that’s a new use case for us, or that’s a really interesting way in which you’re using our data, you know, the API business launched a little over a year ago. It was at the very end of May, 2023.

[00:12:26] And so we sometimes forget how early we are in this. It feels like we’ve been, you know, reading about gen AI and playing with all of the different offerings for so long now, and they’re, they’re really Many impressive options out there, but if you look in the grand scheme of things, it’s only been, you know, a year to 2 years that consumers have had the opportunity to play with it.

[00:12:51] And so we, as an API provider are also still learning as we go and learning from companies that are delving into different use cases. [00:13:00]

[00:13:00] Luke: That’s fascinating. And I think there’s a generating a revenue part of this, right? Obviously, if you’re getting all these different types of use cases, a lot of them unanticipated, right?

[00:13:09] What are some of the challenges that you face on the business side as one brave is scaling users that are using a search engine, right? While also search API scaling customers and are scaling different types of use cases. Are you guys having to balance, like, things like cost or, or infrastructure, things like that, like, into the model?

[00:13:30] Like, how challenging has that been? That’s a

[00:13:32] Jan: good question. One thing that keeps things pretty linked is BraveSearch itself is the biggest consumer of our own API. When we think about the development and the evolution of the API, it’s strictly linked to the evolution of BraveSearch. And so, the challenges that come with scaling BraveSearch, a big one being, for example, quality, right?

[00:13:58] As soon as, [00:14:00] Quality is something that is probably key to any search experiences. If you lose confidence in a search and the search results that you see, you probably aren’t that excited to go back and try that search experience again. Right? So, as we scale. Now, we’re at about 30 million queries per day on Brave Search.

[00:14:20] Wow. Obviously started around zero three years ago. You know, needing to make sure that the quality is always there. You know, that process also, I think, flows into the API. And, you know, when we talk to customers, they also have like, extreme importance to them is quality. Right? And so, we often, you know, get told, though, that despite the size of our index, we’re much smaller than, for example, at Google, right?

[00:14:49] That’s no secret out there. We have about 20 billion pages in our index. But it’s a very high quality. A lot of our customers say that. And I think that’s a direct [00:15:00] result of this idea that we’re scaling Brave Search, focusing on quality, and that quality should come over to the API side as well. But you’re right.

[00:15:09] There are very, like, different business challenges from having both sides of that. One is very consumer focused. And so the product team is always concerned with making sure that we have the best experience Search experience for consumers, but on the API side, there are different ways of approaching customers and they have different concerns.

[00:15:29] And so you kind of have to balance both.

[00:15:32] Luke: Yeah, that makes sense. I remember to, like, even even on the sides of privacy, right? Like, it kind of like a double edged sword right? Where. The API is giving you data that is from, you know, it’s privacy first or preserving. It’s not dirty data, right? But also like monetizing with privacy is also a challenge in a world where people are used to incentivizing things that are not very friendly to privacy.

[00:15:55] So it seems like a kind of a double edged thing, right? So you said [00:16:00] 30 million searches a day, right? And about three years, how does that rank in scaling? Like compared to others, I think search engines are so huge and there’s, there’s few of them in market, but there are quite a few, like how does brave scaling compared to like other search engines in the space, there’s

[00:16:17] Jan: not a ton of information out there in terms of how search engines have grown one, I guess, public point has been DuckDuckGo, which.

[00:16:26] Provided user data and you know, for us to go from zero to, you know, 20, 30 million queries per day, it was something that took duck, duck go, I think close to seven years based on the data. Now, you know, duck, duck go, of course has been in the market much longer than we have, and you know, they planted their alternative search engine flag a long time ago, so kudos to that.

[00:16:52] They scaled in a very different time than we did. But, you know, I think we have had the benefit of being linked to the [00:17:00] brave, brave browser and have a really loyal user base that quickly saw that, you know, our private mission in terms of browsing was going to carry over to private search. And so that allowed us to scale, you know, whereas it took them 7 years, it took us much less, we’re talking 1 or 2 years, right?

[00:17:20] To get to the same scale, you know, in the overall scheme of things. like I said, there are much larger indices out there. There’s no exact number for Google that I know of publicly, but you know, you have to imagine they have a couple hundred billion pages and we just have 20. But the way that we built our index deliberately optimizes for quality.

[00:17:45] Every page in our search index. Is a page that’s been visited by an actual search user, right? And so there’s this long tail of, like, garbage sites that, in my opinion, shouldn’t be part of any [00:18:00] search index. You know, you have, like, duplicate pages, spam pages. We don’t want those as part of our index. And so we like to think of our 20 billion page index as 99 percent of the web that actually matters.

[00:18:15] And so we think it’s a large enough scale that’s exciting for a lot of customers. It’s high quality, and hopefully we continue to grow it quickly like we have so far.

[00:18:25] Luke: That’s awesome. So these AI companies could be using our search API for, you know, real time data. What else are they using if they’re not using us?

[00:18:33] Are most of these companies using kind of the same data sets? much diversity is there out there among the data sources that these companies are using? Yeah, it’s a fascinating

[00:18:43] Jan: question. And I think it really, again, you’re so surprised. There are so many smart people working in AI. And yet, starting point for a lot of people is the same.

[00:18:56] And you look out there and say, what are the huge data sets? [00:19:00] That I think I could take advantage of to train a model or to scale whatever AI initiative I’m doing. And of course, a common one is called Common Crawl and Common Crawl is a snapshot of the web. If you’re not familiar with it, they provide you have free snapshots of the web every few months.

[00:19:17] And we’re talking huge, huge data sets. The funny thing is, like, researchers have pointed out that despite the breadth of the data in Common Crawl, there are just a lot of things wrong quality wise with the data underpinning it. Right? And so you look at, for example, a lot of biases in the data, the web by itself is, you know, for example, you have Wikipedia and Reddit are huge segments of the web and By extension, huge parts of common crawl as well.

[00:19:45] And you look at the demographics of those sites and, you know, it’s very young. It’s from excuse to develop countries and it’s very male. And, you know, you might guess that just by thinking about the average [00:20:00] Internet user, you know, the average user of Reddit or something like that. I think ultimately, when you want a data set that spans a whole bunch of different topics and genres, and is applicable generally, you need to be careful of like, biases like that and data, right?

[00:20:15] And then there are also, in addition to biases in the data, there’s probably also things that are lacking in the data, right? Representation from certain groups. Certain religion, certain cohorts of people, right? And so if we all agree that the starting point of any kind of machine learning process is data, then we want that data to be good, high quality.

[00:20:41] We want it to be diverse and common crawl is a great starting point. But when you quickly dig into what the data is underpinning Common Crawl, I think you’ll quickly want to look for other sources of data.

[00:20:55] Luke: Yeah, no, that’s fascinating. I mean, like, it sounds like the web looks very much [00:21:00] the same. The stuff likes in turn to look the same across everything, like, if you don’t have another set of data you’re pulling from, right?

[00:21:05] We’re using this basically from our index as is like looking into the future. Right? do you see the A. P. I. Business branching out into other areas or use cases? Or are any customers or partners introducing new use cases that you didn’t really think of that could end up being part of the A. P. I.

[00:21:23] Business over time? Whatever you could share if you can’t share anything to no sweat, but just kind of curious.

[00:21:29] Jan: Yeah, what’s been interesting to me is that a lot of people, I think the training of foundational AI model is a use case that seemed very obvious to me. And while very important, it was a starting point as you look at some of.

[00:21:48] The newer companies coming to us with more verticalized approaches to how they’re using a, I, the use cases become so much more fascinating. Right? [00:22:00] And when you think about the sales team example, I said before, right? Or the news example, where you want to make sure that you have the latest. News on a certain company, or let’s say you’re in automotive and now you’re, you’re bringing in research market data on crash tests or insurance.

[00:22:22] There is so much power from bringing in data from the web to augment whatever you’re doing internally. A lot of times BraveSearch is used as the web search component of a RAG system. And so you look at all these companies that have troves of internal data, great private data, and you have people looking at how to synthesize that data into memos, into emails, into reports, into actionable data internally.

[00:22:52] A lot of that can be augmented by what’s out there on the web. So again, the sales team example, where I’m bringing in [00:23:00] data about other companies, I’m doing research and, I need to search with a web search data. I mean, that stuff is really fascinating to me and we’re seeing it in more and more verticals, which has been very cool.

[00:23:14] Super

[00:23:14] Luke: cool. Yeah. Was there anything here that we didn’t really cover that you want to let people know about? I am

[00:23:22] Jan: extremely fascinated by where all of this is going. And obviously you are too doing this podcast. The evolution of AI is something that I think we’re all playing with as we go and learning as we go.

[00:23:37] And that’s something where like, I really, I’m enjoying what we’re doing at Brave. We’re playing with AI. We’re big fans and users of AI. And at the same time via the API, we’re supporting a lot of those AI processes. A lot of these processes and initiatives that other companies are ones that like we [00:24:00] couldn’t foresee, and really it’s been surprising to me how early we are in a lot of that innovation.

[00:24:07] And so, as I think about how Brave can support companies going through this innovation, I’m just excited to talk to anyone out there who might have data needs, who might be working on an AI project that could benefit from having web search data. And having Brave Search support that would be something I think that I’m very interested in.

[00:24:29] And I’d love to talk to you about

[00:24:31] Luke: awesome. Well, on that note, where can people find you or reach out to you if they want to connect with you or just follow more about what you’re up to? And with the search business at Brave? Sure. Yeah. I’m

[00:24:44] Jan: on. X, Yon Y. Petrowski, you can find me on LinkedIn. You can reach out to brave, biz dev at brave.

[00:24:51] com or Yon at brave to reach me directly. We’d love to talk.

[00:24:55] Luke: Awesome, man. Well, I really appreciate you joining Yon. This has been a great conversation and I’d [00:25:00] love to have you back too, as things evolve and new things come out. Thanks so much for joining us today. Awesome, Luke. Thanks for having me. Thanks for listening to the Brave Technologist podcast.

[00:25:10] To never miss an episode, make sure you hit follow in your podcast app. If you haven’t already made the switch to the Brave browser, you can download it for free today at brave. com and start using Brave Search, which enables you to search the web privately. Brave also shields you from the ads, trackers, and other creepy stuff following you across the web.

Show Notes

In this episode of The Brave Technologist Podcast, we discuss:

  • How the acquisition of a German company enabled Brave to develop its own search index
  • Ways the API business has evolved with unexpected use cases, especially in AI
  • Why diversity in data sources is essential for training AI models effectively

Guest List

The amazing cast and crew:

  • Jan Piotrowski - VP, Head of Brave Search Business

    Jan Piotrowski, is a VP at Brave Software and leads business efforts for Brave Search, the largest independent search engine outside Google and Bing. Brave Search sees more than 30 million queries per day and provides powerful real-time data to various AI and LLM partners via an API. Previously, Jan was an investor at American Express Ventures and a technology investment banker at Credit Suisse. He has a BS in Industrial Engineering from Stanford and an MBA from Chicago Booth.

About the Show

Shedding light on the opportunities and challenges of emerging tech. To make it digestible, less scary, and more approachable for all!
Join us as we embark on a mission to demystify artificial intelligence, challenge the status quo, and empower everyday people to embrace the digital revolution. Whether you’re a tech enthusiast, a curious mind, or an industry professional, this podcast invites you to join the conversation and explore the future of AI together.