Back to episodes

Episode 20

Measuring The Speed of AI Through Benchmarks

David Kanter, Executive Director at MLCommons, discusses the work they’re doing with MLPerf Benchmarks, creating the world’s first industry standard approach to measuring AI speed and safety. He also shares ways they’re testing AI and LLMs for harm, to measure—and, over time, reduce—the potential harms of AI.

Transcript

[00:00:00] Luke: From privacy concerns to limitless potential, AI is rapidly impacting our evolving society. In this new season of the Brave Technologist podcast, we’re demystifying artificial intelligence, challenging the status quo, and empowering everyday people to embrace the digital revolution. I’m your host, Luke Malks, VP of Business Operations at Brave Software, makers of the privacy respecting Brave browser and search engine, now powering AI with the Brave Search API.

[00:00:29] You’re listening to a new episode of the Brave Technologist. And this one features David Cantor, who is a founder, board member, and the executive director of ML Commons, where he helps lead MLPerf benchmarks and other initiatives. David has over 16 years of experience in semiconductors, computing, and machine learning.

[00:00:45] In this episode, we discuss how they’re helping to build public confidence in AI and emerging tech based on the research they’re working on, AI safety and ways they’re using benchmarks to offer standardized measurements about potential harm of their use, and the types of questions enterprise customers should be asking [00:01:00] their system providers.

[00:01:01] Now for this week’s episode of the Brave Technologist. David, welcome to the Brave Technologist podcast. How are you doing

[00:01:11] today?

[00:01:11] David: I’m doing great. Thanks for having me on. It’s a pleasure to join.

[00:01:15] Luke: Yeah, I’ve been looking forward to this discussion. Can you tell us a little bit about how you ended up doing what you’re doing with MLPerf?

[00:01:21] Was there anything in your background that helped drive you to where you’re at

[00:01:24] David: now, or? Yeah, so there are actually a lot of things in my background that lend themselves well to what I’m doing, but I have a pretty unconventional background in a lot of regards. So, My interest in computer architecture dates back high school in the nineties, and I ran a website on computer architecture and microprocessor design.

[00:01:43] That’s a little bit famous real world tech for a long time. I actually started a company where we were doing hardware software co design. So kind of designing a chip in a compiler together You know, around 2007 through 11, that didn’t work out after that, I [00:02:00] started doing a lot of consulting in the industry and I ended up doing some patent work for this company called Cerebra systems and they make a giant wafer scale chip for AI and machine learning and then there was sort of a call to the community to participate in AI benchmarks.

[00:02:17] What would later become ML Perf, and I showed up to one of the early meetings, I actually started out as the group secretary, taking notes, asking questions, and you know, sort of the joke is, I started out as secretary a la Mad Men, and ended up as secretary a la Ban Ki moon. When we started a foundation to house the benchmarks at that point, I’d led the ML perf inference benchmarks.

[00:02:40] I was working on a ML perf power measurement and they asked me to become the executive director of the foundation. And I’ve been running my own consulting company for a while. And so I had this like really great combination of Deep technical understanding, but also business acumen. And I was known to most of our member organizations.

[00:02:59] [00:03:00] And so, you know, in a lot of ways, I just had this very weird background of computer architecture, technical writing, and it all really together. I think in lot of ways has helped me build ML commons, the organization and grow it into a, just a fantastic team that’s supporting the ML perf benchmarks in our other initiatives.

[00:03:21] Luke: That’s awesome. Unconventional is great. Right. And I’ve done this in startups in the past where, you know, you lick your wounds a bit, like when something doesn’t quite work out, but I love that you were saying, you know, you’re kind of just starting as a secretary you were doing the new thing and then work your way.

[00:03:33] Cause I mean, realistically, you like, what better way to kind of getting in front of everything that’s going on and get a full read on stuff. How big is the team at ML Perf now that are working on this?

[00:03:43] David: The M. L. Commons itself is a pretty lean organization. I’d say about a dozen people, but one of the superpowers of M.

[00:03:50] L. Commons is that we bring together industry and academia on sort of collective engineering to make machine learning better for everyone. So some of our [00:04:00] projects Might have 20 or 30 people involved in them. And actually, if you look at the first ML perf inference paper, I think we had about 50 coauthors, right?

[00:04:09] And so if you look at the span of the organization, even though we only have call it a dozen people by calling on our member companies, you know, whether it’s Google or meta or small startups like graph core, I’d say Realistically, somewhere in the neighborhood of 200 people, you know, we’ve got about 18 projects within the organization and each one has two leads.

[00:04:33] And you know, some projects are small and might just be three or four people working on, you know, an open source library. And then some of them are really big efforts. So yeah, it’s a fantastic organization, but again, very big, lots of stuff going

[00:04:47] Luke: on. And a lot of like network too, like you’re getting a lot of people from, you know, I think it’s really, really excellent.

[00:04:53] What would you say the most important benefit is that you guys are bringing to the space right now for folks that might not be necessarily too [00:05:00] familiar with the needs around benchmarking and what you guys are specializing in?

[00:05:03] David: That’s a great question. So let me talk about MLPerf and then sort of the broader organization.

[00:05:08] When we started MLPerf, 2017, 2018, people knew AI was important, but there weren’t standard ways of measuring performance. And so, when you would go out and talk to people with different solutions, they’d be comparing on different dimensions. At the time, there were a lot of people who thought that training speed at batch one mattered.

[00:05:29] There were other people who said, ah, you know, batch size doesn’t matter. And there were a lot of different techniques out there. And it’s almost like if you’re comparing cars, like one guy comes up and says, Hey, my car goes zero to 60 in one second. The next guy says I can stop on a dime. And the third guy says I have airbags.

[00:05:44] Those are all interesting pieces of information. I live in San Francisco, so I’d go with the airbags or the quick break. It’s not apples to apples. And so what we did is. We got together a bunch of people who had done previous work in this area, and [00:06:00] the ML Perf benchmarks really were the first industry standard approach to measuring the speed of AI.

[00:06:08] The bigger picture about benchmarks is it’s not about getting vendors to beat each other up. It’s really how do we set. A common set of goals for what it means to be better and then drive the whole industry in that direction. And so, you know, the cool thing is we’ve been doing this for over five years.

[00:06:25] And if you look at the performance of our AI training systems on our benchmarks, it’s improved by close to 50 X. Wow, that’s putting tremendous capabilities in the hands of researchers. Now that’s the ML perf benchmarks and the thing I’d say stepping back is what our organization is really good at is this collaborative engineering right we’re an organization of builders focused on AI and we’re starting to turn our attention to some other things we have an AI safety project that’s kind of focused on taking that same idea of how do we get good measurement.[00:07:00]

[00:07:00] And then get the industry to start really iterating together and improve things focused on safety. You know, that’s a big topic, especially in generative AI, but, you know, I’d really say that the strength of our organization is in addition to being just fantastic community of builders is this deep understanding of AI and measurement in AI and.

[00:07:22] Luke: I’m glad you went there because I was going to ask you if there’s a role that you all have like when we see things like there’s this executive order from the White House where they start talking about they start putting parameters around computing different things that are when they’re trying to regulate or attempt to regulate just even have a conversation about this right and I would imagine that.

[00:07:40] What you all are doing, whether it’s setting benchmarks or just looking at the measurement, getting some kind of a uniform standard in place is probably pretty critical, right? Because it seemed like a lot of the feedback I saw when the White House put the paper out was a lot of people were saying, this isn’t relevant, this is relevant, but it still seemed kind of scattered.

[00:07:57] What’s your take on that?

[00:07:58] David: We actually [00:08:00] are part of the NIST AI Safety Consortium. Okay. So we’re gonna be partnering with NIST and a wide variety of other organizations. So sort of the idea behind the AI safety benchmarks is we want to build tooling and infrastructure. To test large language models and other generative AI for definable harm.

[00:08:20] So, you know, bias and other things like that. And part of it is, you know, if we can build common tooling, then we can help to build sort of a shared understanding of, Oh, is this language model. Is it going to spit out things we don’t want it saying, or is this chat bot going to start talking about things that are out of scope?

[00:08:39] Right and by creating the tools to help us measure that we can improve it over time Number one and then we can also give folks in the industry Folks in the policy space civil society and in the broader public a little bit more confidence here And you know, I think about this like crash safety ratings for cars, right?

[00:08:56] What goes into making a car safe is actually really [00:09:00] complicated You Right. There’s a lot of engineering, but at the end of the day, you or I can roll up to a website or a dealer and say, Hey, this car has a five star safety rating. Okay. I feel comfortable getting into that with my kids or my nephews. That I think gets at this issue of AI safety and really is going to help drive the whole community forward.

[00:09:20] And again, coming back to our purpose, making AI better for everyone. Yeah, that’s fantastic. Let’s

[00:09:26] Luke: get the audience a bit of a sense of how easy or hard it is to submit benchmarks. Especially for folks, I mean, Brave’s an open source company, big in the ethos. We’re dealing with AI stuff on a bunch of different areas within our products.

[00:09:39] But one thing that we’ve really noticed is just this boom and open source world around this and how it looks like it’s gonna be pretty game changing having such a big open source community. Like, how easy is it for Smaller open source teams versus like the big tech companies. Cause I know it’s also an area where people are concerned around, okay, a regulatory issues kind of favoring the big players versus some of the small [00:10:00] players.

[00:10:00] So maybe you could give us a sense of like how easy or difficult it is for different types of organizations to

[00:10:04] David: work with you guys. Yeah, one of the things that we share in common, both of our organizations are very open source centric. You know, our favorite license is Apache 2. All of our software is by default open source, right?

[00:10:17] And part of the point of these benchmarks is to provide sort of best known methods to the whole industry, right? And. And AI in and of itself is a whole industry built on open source, right? PyTorch, TensorFlow, Jax, all of these tools are open source. And, you know, we’ve also have, there’s plenty of open data sets as well.

[00:10:41] And so, you know, from the very start, so in the MLPerf benchmarks, right. Our goal is to get reproducible performance where you can see. What’s going on? So for us, it’s really important to have readily usable data sets, open models. And you know, that actually does play a [00:11:00] pretty big role in how we build the benchmark.

[00:11:02] So we just released ML Perf Inference 4. 0. We added in two new benchmarks, Stable Diffusion XL. And, Lama 70 billion. And so we have a great blog post that talks about this in a little bit more detail. But when we went to add a large language model to the inference suite, we actually evaluated four or five different models.

[00:11:23] And Lama 2 won out. So first of all, the weights are publicly available. Netto was incredibly supportive and gave us and our members the ability to use it for benchmarking and it was nicely usable and packaged up and fairly mature. And so that gave us a great starting point. And you know, it wasn’t always this easy, you know, and this is common in the open source community.

[00:11:47] When we set out, our first vision benchmark is something called ResNet 50. And when we started, we found there’s like two different versions of ResNet 50. So we actually had to go make, you know, as the XKCD comic [00:12:00] suggests, we actually had to make a third version of ResNet 50 called ResNet 50 v1. 5 that had specific attributes of each in order to make sure that it would run everywhere and be really accessible to folks.

[00:12:13] And so, you know, open source is absolutely critical. To the AI community. That’s how you get reproducibility. So that’s absolutely vital. Now, building the benchmarks, you know, a lot of times you’re going to be using, you know, potentially a proprietary tool chain, and we certainly allow that. It is actually a pretty significant accomplishment to run and submit the benchmark.

[00:12:36] AI is really a full system solution and problem, right? You know, if you’re talking about training, Yeah. You know, you’re often talking about multiple processors connected over networking. And so, you know, it’s not just, hey, how fast is this system? It’s how good are the algorithms on the system? How good is the compiler, the drivers, if you’ve got accelerators?

[00:12:58] Networking and [00:13:00] communication. And so it really is a pretty significant accomplishment. You know, I like to say it’s kind of like doing an iron man. Not everyone finishes first, but even just making it across the finish line is, pretty impressive. And is it really a testament to the maturity of your solution?

[00:13:15] AI is such a software intense. Discipline, you just really have to have a very mature software stack to submit to MLPerf. And so, you know, it really is a demonstration by the folks who have submitted to the maturity of their software stack.

[00:13:30] Luke: Just out of my own curiosity, too, and apologies if this is kind of a curveball, but it’s an area that we’ve been looking at lot and obviously have interest in.

[00:13:37] At the browser level, how much are you guys looking into like benchmarking for local models and local machine learning on the client? Like, is there any interest in that or that you’re seeing popping up?

[00:13:47] David: Yeah, absolutely. No, that’s a great question. So we actually have a pretty extensive suite of benchmarks.

[00:13:53] So one of the things that we launched earlier this year in January. Is, an effort focused on ML perf clients. [00:14:00] So this would be building benchmarks specifically at sort of desktop notebooks style systems and looking at the performance there. And you know what we’re seeing is there’s definitely a move.

[00:14:12] For, you know, a lot of AI compute is done in the cloud, but there are folks who are very interested in doing it on device, whether it’s a smartphone. We have an ML perf mobile benchmark for that IOT devices. We have an ML perf tiny for that. And then, as I said, we’re developing an ML perf client benchmark, you know, it’s going to be scenario driven.

[00:14:32] So kind of looking at real use cases. And I think the first one we’re doing is very likely to be text summarization using a 7 billion. Parameter llama language model that’s under development. We, you know, certainly going to release it this year and then we’ll probably be adding on more things to that suite overall as well.

[00:14:52] And yeah, that absolutely is a trend in the industry, right? We see people offloading, whether it’s frankly, some systems are [00:15:00] offline. Sometimes your connectivity isn’t great. Privacy is better if it’s on device and you can have potentially better responsiveness. And, you know, I think in the long run. If I put my industry prognosticator hat on, you know, we might even see sort of hybrid systems where you have some components evaluated locally, and then sort of some back in the data center where you can have, you know, massive amounts of data

[00:15:22] Luke: and compute.

[00:15:22] Yeah, that’s exactly where I was going with that too, because I think the potential here, I mean, like you have really good cloud based tech on this side, and then also like whatever’s in proximity, because there’s a lot of other signal you can capture from these devices, I know from our side, it’s like, we’re definitely interested.

[00:15:36] I mean, we’ve been doing local stuff on the browser since like 2019 or even earlier, and it’s just a fascinating area, but the cloud stuff tends to get a ton of attention first, because for obvious reasons. It’s great to hear that you all building out, you know, and you got such a robust kind of like test environment with different types of scenarios and devices and things like that.

[00:15:53] A lot of startup founders and enterprise folks are listening to the podcast. What kind of questions should enterprise customers [00:16:00] be asking or looking for when they’re talking to system providers?

[00:16:04] David: Yeah. In the case of AI and machine learning, right, this is something that kind of exploded out of the research lab and has gone mainstream.

[00:16:11] And so. You know, in a lot of ways, part of the point of a benchmark and in particular ML perf is giving people this neutral, independent measurement of performance. And part of that is getting buyers and sellers on the same page, as well as researchers and developing the next generation of systems. But, you know, if you’re an enterprise buyer, we have a benchmark suite that’s designed to measure performance in an industry standard, open, neutral way.

[00:16:36] And so if you know that you want to train a large language model. Our GPT 3 training benchmark is the thing you want to look at. But maybe you have multiple different use cases. Well, we’ve got something for computer vision. We’ve got things for speech to text. And these are meant to be tools for many, many different people, but definitely to help buyers [00:17:00] make informed decisions.

[00:17:02] That’s one of the ways where, again, getting back to a benchmark, It’s getting everyone on the same page and speaking the same language, right? And helping people understand that, you know, there are systems that are going to be great. For one thing and not for others necessarily, right? Like, especially when you start talking about inference, you might have a device that is specialized just for speech to text and isn’t really good at dealing with video.

[00:17:26] And that’s a perfectly fine solution. Now you want to run a video intense workload, then that’s not the right thing for you, but you know, maybe it’s great for other stuff. And so I think ML Perf can really be helpful to folks who are making that buying decision. About right. What is the appropriate system for them?

[00:17:45] Luke: And I can imagine to where’s your addressable customer base or audience and you know, mobile first mobile only might have a certain set of conditions or scenarios that they want to benchmark against that are very specific to like in your example, speech to text versus running high bandwidth video or things like that.[00:18:00]

[00:18:00] With the emergence of widening use of LLMs, where do you see benchmarking kind of going in that area, at least initially, to kind of help set the table a little bit more? Because I feel like a lot of the discussion around AI safety is it could be hyperbolic overgeneralized. It seems like benchmarks would help in this area quite a bit to just kind of frame up the problems, even the realistic problems, you know, that people are dealing with now are concerned about or we should be concerned about.

[00:18:23] David: You know, I think there are two things. So one, this is new technology, and so there’s a lot of. Aspects of that that may be intimidating or scary, but you know, it’s also the case that we have seen large language models spewing garbage. And sometimes it’s just things that aren’t true. Sometimes they say things that are offensive.

[00:18:39] And, you know, as with any new technology, understanding the limitations. Helps us sort of correct and protect for those. And so I think that, you know, when it comes to AI safety, getting the ability to measure these things in a standard way will both help make people more comfortable, I hope. And then also help us improve the [00:19:00] systems over time.

[00:19:01] Right and kind of get to a shared understanding about what we do and don’t want and so there’s an aspect of the folks who are developing large language models or other generative AI solutions want to be able to measure the potential harms. And i think the other thing is that getting an explicit enumeration of those and then testing for them really kind of helps focus attention.

[00:19:29] In the right way, right? And right. Ultimately, for me, it’s about how do we make sure that this technology is doing just tremendous good for everyone not hurting people because, you know, at the end of the day, I sort of look at some of the capabilities and I’m not very good at drawing. I don’t really have a good artistic inclination, but But, you know, using something like mid journey or stable diffusion or one of these image generators, I can write down what I want the scene to look like, and then [00:20:00] bam, it’s going to draw a million times better than me, which is super, super cool, right?

[00:20:04] That’s giving me the artistic abilities that it would take me years and years and years to learn. And that’s really exciting, but how can we do that in a way that’s safe? And that, you know, everyone’s comfortable with these tools. And so I think this is one of the ways that, again, we can help the technology really just improve society and, and bring the magic of AI to everyone’s hands.

[00:20:27] Luke: Given your vantage point on this, it sounds like you see a lot of everything, given what you guys are doing, where were you seeing like a lot of the demand around needs for AI

[00:20:36] David: safety? It’s pretty hard to say. I mean, I think one of the issues is I see a lot of, look, all the folks who are in our AI safety working group are absolutely concerned about this, right?

[00:20:48] And this is most of the folks that you guys have read about in the tech industry, you know, Google, Microsoft, Qualcomm, Intel. There’s plenty of folks in this group, there’s [00:21:00] absolutely concern in all sorts of different places where these are exposed, especially to consumers is certainly an issue. I think for things like generating code.

[00:21:12] Frankly, probably less of a concern than a customer chat bot that might go and talk to anyone. I mean, ultimately, I want to enable as many use cases as possible. I very much see safety as being a cross cutting concern. Like, I’m not sure if I’d say, Oh, it’s, you know, very specific to telecommunications or anything, right?

[00:21:30] I think ultimately the ways that we’re looking at using AI are really quite profound. One of the examples I like to give is, you know, to me, AI inference, is a little bit like salt in cooking, right? Which is nobody sits down and eats a pile of salt. Sprinkle it on like almost any dish gets better with a little bit of salt.

[00:21:52] Right. And so I think AI cropping up in all of these places and, you know, maybe it’s the sort of thing, you know, just [00:22:00] spitballing is a wild idea, right? I’ve got a car manual, you know, it’s a couple hundred pages. Wouldn’t it be cool if I could actually just ask my car, Hey, the tires are making a weird noise.

[00:22:12] What could this be? And it would just look it up instead of me having to like really think through it. Now, you know, I’ve been driving for a while, so I have some reasonable intuition about this, but you know, when I was 16 or 17, if my car started making a funny noise, I definitely on the phone with my dad or mom.

[00:22:29] Luke: Practically speaking too. I was even seeing like on, on Twitter, somebody was giving a story, they were in a foreign country and they accidentally got taken into custody for some reason. And they’re like, use chat GPT or something. I don’t know. There’s probably not the best legal advice, but it was just a basically like, Hey, what, what would you recommend?

[00:22:44] Basically let them down a path, right? Like people are already starting to use it. These really scrappy ways, which is super interesting. But I think the safety part is like key. And. What do you think people tend to misunderstand or underestimate the issue around AI in general, AI safety or, or any of [00:23:00] these things from

[00:23:00] David: your point of view?

[00:23:01] You know, that’s a great question and it’s actually a little bit hard to answer, but I think your example kind of pointed at that, which is, I think it’s easy for me to look at, you know, some of the use cases I’m familiar with, but because AI is such a cross cutting thing, I’m always surprised by how people are using it.

[00:23:22] And in different ways, you know, in some ways that’s the beauty is that human creativity is kind of the limit. For example, I was talking to someone the other day, they were saying that they wanted to use AI to like look at products and images and then sort of find matching ones. So you could like look at an image on social media that, you know, maybe there’s no explicit tag on.

[00:23:49] Oh, that’s an ML Perf Polar Fleece, like what I’m wearing, and then, you know, say, Hey, that looks great. Where can I get one? And then it’ll pop up and be like, okay, [00:24:00] here’s the five closest matches. Now, as it turns out, ML Perf Polar Fleece has only really come from one place, but I’m sure we can come up with some better examples, but you know, there’s just all sorts of.

[00:24:10] Wild stuff where I think there’s just tremendous opportunities. One of the things that I’ve often mused about is there’s so much data that companies get on people shopping digital, but actually the in person experience is richer, but we just don’t capture that data. I can imagine, you know, a scenario in the future where it’s like, you know, you go into a store, you try on a shirt.

[00:24:36] And it just looks at you and is like, Oh, that shirt would actually look better in orange on blue or, you know, whatever it is, that would be super cool. To me, that’s like incredible. You know, I don’t have the best sense of aesthetics, so if I can have an expert shopper with me, that’s best. But you know, maybe one day.

[00:24:54] When my girlfriend’s not there, the Macy’s bot or whatever can be like, ah, you know, [00:25:00] that looks just a little bit funny. Why don’t you try this cut instead or something like that? I think there’s just like a tremendous amount of opportunity kind of in taking a lot of these like classic analog experiences, making them and then adding more intelligence there.

[00:25:16] But I’m just always surprised. By what people are able to do. Like someone was telling me about using AI to find different kinds of glue. And I’m just like, yeah, sure. I can see how that would work, but I wouldn’t have thought about that. Cause you know, I’ll tell you, glue is not something that occupies a large share of my mind space.

[00:25:35] Luke: Right. One case I came upon in my own journey here with my kids were basically, you know, we got really tired of reading them bedtime stories, so we started to like ask Chachapiti to make them for us. Right. Like, and so, you know, you could get these totally, it’s like, almost like Mad Libs, right, where, you know, things are, you know, they’re Giving you the scenario, tell me about this.

[00:25:53] We went to this town or whatever. And then they’re just dropping in little tidbits from whatever they get from on the alphabet. What’s your take on [00:26:00] regulation kind of in general in the space? I mean, you know, the EU passed the AI act and you know, it’s getting for far long and we’re seeing the U S kind of weigh in how much does this influence your thinking with regard to benchmarks, but kind of in general too, like you think is something you’re concerned about or you don’t think there’s enough of, or you think there’s too much of.

[00:26:18] David: I don’t like to make policy prescriptions. Part of that is I run an organization that’s, that’s technical. We’re focused on building. It’s also the case that in terms of the tooling we’re building, I think it’s quite likely that it may be tools that regulators actually find useful. Ultimately, our tools are there to benefit all of society.

[00:26:39] And, you know, it’s not our job as tool makers. To necessarily dictate what society says is or is not appropriate. I think we have different cultures, different societies have different mechanisms for that, right? Certainly in the United States, right? California is a bit different than Texas and those are different [00:27:00] than other companies, different companies have different cultural take two software companies, right?

[00:27:06] Brave, you know, very strong open source attitude. You know, you’re. Internal chatbots might be trained with a different kind of attitude than, you know, maybe an organization that is like proprietary software only. And, you know, I’m not here to weigh in which one of those is correct. Realistically, I think the world’s big enough for for all of these kinds of opinions, but it’s like, okay, how do we make the tools to give everyone the level of comfort with this technology?

[00:27:34] Luke: I think that’s like really a great point too, from where people’s perspectives are with these things, a lot of the times they’re just concerned that people who are in these regulatory bodies might be making decisions that aren’t necessarily as educated on the topic as the people that are building the technology, right?

[00:27:50] There’s always this kind of riff in the public arena around this type of thing, but the fact that You all are making tools to help measure and quantify these things. Like it’s just going to help to inform the [00:28:00] discussion overall. There’s a couple of things that I was hearing here that I hope our listeners really zone in on.

[00:28:04] And one is like, you mentioned it earlier that, you know, your team’s like really concerned around the safety element of this and being mindful of that. And I think that gets lost a lot. It’s just like how much like individuals care, but also that like you guys are building tools to help inform the public.

[00:28:17] businesses and regulators to basically all get a better read on what these things actually mean and what’s actually important. It’s going to be pretty important on as far as like bringing these issues back down to earth and all of that. Looking out though, I mean, like to the future, I mean, are you optimistic about where things are going in the next five or 10 years?

[00:28:33] What are you most optimistic about? What’s the most interesting thing in the space that you’re, it doesn’t have to be necessarily mlperf related, but just from your own point of view, what are you spending time kind of looking

[00:28:43] David: into? Yeah, it’s just the capabilities we’re going to put in everyone’s hands, right?

[00:28:47] Our systems are getting faster. We’re writing better software, all sorts of clever optimizations, you know, whether it’s optimizations in the software, the hardware, the data, what are the capabilities we’re going to put in people’s hands in five years? [00:29:00] That’s one thing. And that’s it is very exciting to me.

[00:29:03] And then thinking about sort of, If you step back, what we did with MLPerf is we took sort of an area, AI performance that was really much more art than science, and we kind of turned it, you know, into a much more scientific endeavor with, with proper measurement. AI safety is absolutely a place. That merits this kind of attention.

[00:29:27] But I think there’s a lot of different spaces within the AI world where this kind of like, okay, how can we go from, we’re going on instinct, we’re really just trying a bunch of stuff out to how can we turn this into a very systematic. Set of, well reasoned out experiments and proper measurements and then drive the whole field forward.

[00:29:51] And that’s kind of what we did with, with ML Perf. And so I’m always out looking for opportunities, you know, related to AI, where I think that kind of [00:30:00] Bringing the community together, getting a sort of more scientific approach can really just help drive results for the community and, and the world at large.

[00:30:09] Luke: Oh, that’s great. You’ve been really gracious with your time and we really appreciate it. Is there anything that we didn’t really cover today that you want the public to know about? And where can people, get more info on MLPerf and, and also, you’re out on social and people are interested in kind of following what you’re up to?

[00:30:24] David: The. ML Commons website, ml commons.org. Great place to read, about us. You know, we’ve got a presence on LinkedIn, on Twitter or X, if any of the problems that I’m talking about, how do we measure performance in ai? How do we measure safety in ai? Or there’s other things that you think are interesting, please reach out.

[00:30:44] I’m like super open-minded, always looking for fantastic and exciting projects. And you know, if you want to contribute to one of ours, you know, we’re also. a very warm and welcoming community. So if any of the things I talked about, [00:31:00] whether it’s M. L. Perf on the client A. I. Safety, you know, please feel free to reach out.

[00:31:05] Luke: Excellent again. We’re really grateful to have you on today. I’d love to have you come back and kind of check back in. It says, thanks. Go forward to hear more about what you guys are doing. Thanks for for joining us today and hope you have a good one. All

[00:31:17] David: right. Thank you very much and have a good afternoon.

[00:31:21] Luke: Thanks for listening to the brave technologist podcast to never miss an episode, make sure you hit follow in your podcast app. If you haven’t already made the switch to the brave browser, you can download it for free today at brave. com and start using brave search, which enables you to search the web privately brave.

[00:31:36] Also shields you from the ads, trackers, and other creepy stuff following you across the web.

Show Notes

In this episode of The Brave Technologist Podcast, we discuss:

  • How they’re using research to help build public confidence in AI and emerging technologies
  • The types of questions enterprise customers should ask their system providers
  • AI safety, and the ways MLCommons is using benchmarks to offer standardized measurements about the potential harm of using AI
  • Unique applications of AI, human creativity as the only limit on where AI can go, and how we can make analog experiences more intelligent in the future

Guest List

The amazing cast and crew:

  • David Kanterr - Executive Director at MLCommons

    David Kanter is a founder, board member, and the Executive Director of MLCommons where he helps lead the MLPerf benchmarks and other initiatives. He has 16+ years of experience in semiconductors, computing, and machine learning. David founded a microprocessor and compiler startup, was an early employee at Aster Data Systems, and has consulted for industry leaders such as Intel, Nvidia, KLA, Applied Materials, Qualcomm, Microsoft, and many others. He holds a Bachelor of Science degree with honors in Mathematics and a specialization in Computer Science, and a Bachelor of Arts with honors in Economics, both from the University of Chicago.

About the Show

Shedding light on the opportunities and challenges of emerging tech. To make it digestible, less scary, and more approachable for all!
Join us as we embark on a mission to demystify artificial intelligence, challenge the status quo, and empower everyday people to embrace the digital revolution. Whether you’re a tech enthusiast, a curious mind, or an industry professional, this podcast invites you to join the conversation and explore the future of AI together.