AI in Legislation and How NYC Uses Data Science to Drive Policy

Episode 61

AI in Legislation and How NYC Uses Data Science to Drive Policy

Alaa Moussawi, Chief Data Scientist for the New York City Council, shares how AI is reshaping legislative processes through smarter, data-driven laws. He discusses the council’s use of Retrieval-Augmented Generation (RAG) models, and open-source AI solutions to streamline workflows and ensure legislative originality. He also shares his perspective on the future of AI in government, the importance of transparency in policy-making, and why generative AI should assist with—but never make—decisions.

Transcript

Luke: [00:00:00] You’re listening to a new episode of The Brave Technologist. This one features Alan Musawi, who is the Chief Data Scientist for the New York City Council, leading teams of data scientists and software engineers. His data team analyzes issues like pay equity, RATs, and school bus delays to support data driven legislation, emphasizing transparency and reproducibility.

The software team develops open source tools to streamline council workflows, automate tasks, and improve efficiency, such as CRMs, dashboards, and paperless hearing systems. They’ve deployed machine learning models, including a RAG model for simplifying legal research. In this episode, we discussed his council’s deployment of AI technologies, including generative AI models to streamline legislative workflows, the importance of transparent, statistically driven legislation, and And his feature for the future of AI in government.

Now for this week’s episode of the Brave Technologist.

Al, welcome to the Brave Technologist. How are you doing today?

Luke: Yeah, thanks for coming. I’ve been really looking forward to this one. So we’re here at the AI Summit in New York [00:01:00] and you were one of the speakers here, right? wonder if you could share a bit with our audience about what you were talking about.

Alaa: Yeah, so I gave a talk on what the New York City Council has done in deploying AI in the past, pre gen AI and what we’re currently doing to deploy gen AI to help streamline the workflow at the New York City Council. And I also covered some theory light theory behind vectors, vector embeddings.

And how they relate to some of the older technology that we’ve deployed and how they relate to RAG, Retrieval Augmented Generation. And I also discussed the history of AI just to give people a little, a little understanding of why AI is actually not as scary or as intelligent as they might think it is.

Luke: Yeah, no, that’s great. And how about let’s dive into a little bit of the pre generative AI machine learning. can you share any lessons learned [00:02:00] from deploying this or, or like, Maybe we start by like kind of giving the audience a little bit of context for what pre generative AI is.

Cause some folks might be less familiar with it.

Alaa: Okay, so pre generative AI is just a term that I made up because now that you have generative AI, whenever anybody says AI, everybody’s immediately thinking chat GPT. So I, I use the terms machine learning AI fairly loosely. I think more or less they can overlap. I know a lot of people would disagree with it, but I’m not here to, you know, get into the semantics of that.

Right. Pre gen AI is just maybe machine learning that was applied not with transformer technology. It doesn’t have to be transformer technology, but just not the generative AI stuff that we’re currently used to nowadays. So I’ll give you an example so we can motivate what exactly I mean. At the New York City Council, I am the Chief Data Scientist for the New York City Council.

We Pass the laws for New York City and the budget. We also write the laws [00:03:00] for New York City. That’s one of the functions of the legislative division. Now we get tens of thousands of ideas for legislation from our 51 elected officials, the council members. And if we draft a law for a council member, we need to make sure that they were the first one to have thought up of.

That idea and that no other council member had thought of that idea earlier in time. And we on the software engineering side, so I run two teams, a team of software engineers and a team of data scientists on the software engineering side, we’ve built the software that handles this whole process. It’s all built on open source technology from the ground up council members or their staff can submit ideas to our system.

It gets time stamped and that’s how we know who thought of it first. But now we have a log of tens of thousands of these ideas and we have to see if there are any duplicates if we’re working on a [00:04:00] particular idea for one council member. And so, this is where we have to perform what’s called a duplicate check.

this is what this AI tool that we’re now discussing Covers, so what it does essentially is that it looks through all of the existing ideas for legislation and it ranks them in order of likelihood of being a duplicate of one chosen piece of legislation, the one that we might be interested in drafting.

And therefore, it’ll help us identify whether there were other council members who may have thought of that idea. Earlier in time.

Luke: Is this mainly to, just for the purposes of attribution, or are you guys looking at, like, issues that were around pre proposed legislation or something like that? Like, help me understand the importance of finding the duplicates. Like, why does it matter? Just to make sure you’re not duping the work, I guess?

I don’t [00:05:00] know. But this is super interesting for me, because we really with lawmakers, with this type of thing, like, yeah, like, let’s get a little more context in this.

Alaa: When you write the legislation, you want to write it on behalf of the elected official who originally thought of the idea. So you want to make sure that you’re working with them and you want to find the one who originally thought of the

idea. And so, that’s why you want to perform the duty check

Luke: Cool. How long have you guys been doing this for? I mean, it’s like Pretty awesome like of an effort to start but like had these teams for a while?

Alaa: So I joined the New York city council about three years ago. In 2018, and so about six or seven years ago, and when I joined the New York City Council this current technology did not exist, and there were a few individuals who handled the website and were an individual who, or two who did data science but when I joined, I was able to two teams, a software engineering team that was then building software from open [00:06:00] source from the ground up bespoke, you know, customized to the exact needs of the New York City Council, and then a team of data scientists whose capabilities went much more in the statistics side so that we could drive legislation using statistics and make the, make the legislation more evidence based, fact driven.

Luke: It’s awesome, yeah, no, it’s great. What prompted the transition to RAG models and what kind of unique challenges do they address in legislative workflows?

Alaa: So, it’s not specifically for legislative applications but RAG models happen to be very good for legislative applications. I could just step back a bit and explain what RAG models

are. Yeah, yeah,

that’d

be great.

Yeah. So, a RAG model is a model that can utilize a database of information that can utilize a database of information.

And query that information so that it can provide you with answers to questions you may have. So you’ll query it with a question, it’ll find the [00:07:00] relevant information in a database and then it’ll bring that information and kind of, you can think of it like feeding that information that we found in the database to chat GPT and telling it to respond based on the content of that information. The reason why this is so good is because ~~you can constrain the what, what’s it called? You can constrain the, when somebody, the creativity.~~ You can constrain the creativity of the model. Sometimes we don’t want the model to be very creative. Sometimes the model gets very creative and we call that

a hallucination.

Exactly. So, it’s one very good way of handling that. Also. You can then have a model that responds based on the information you want it to respond with. And so, it’s, RAG models, or RAG, is very applicable in many domains where you have a good source of textual information. And in the legal domain, this happens to be the case.

Luke: and have you always had an interest in kind of [00:08:00] dealing with like, you know, city, Working with government and things like this on the data side. Or is this something you kind of found your way to or I’m just kind of curious.

Alaa: So I used to be a teacher in the New York City Department of Education. It was a very short stint so I have a minor in education. I quickly went back and decided to pursue a PhD in physics. And upon graduation I absolutely have always believed that, and when I was younger I thought that decisions on the governmental level, decisions impacting.

Large amounts of people were quantitatively based. As I got older, I found out that that wasn’t true. And I’ve always been very passionate about making these decisions quantitatively based. I think that is a way to really propel our society forward in a, in a positive trajectory.

Luke: that’s awesome. That’s great. Yeah, I think you know, have there been some like practical limitations with dealing with rag models in your work or anything [00:09:00] that you find of interest in sharing about any hurdles, any, any Anecdotes, anything interesting?

Alaa: Absolutely. So I think practical limitations, the biggest one is is that right now we’ve prototyped, I call it a prototype, but we could be deploying very easily. We’ve been using a cluster of CPUs to run these models. And so there are implementations that the community, the AI community got together and developed, open source Jerganov, I’m forgetting his, maybe Gary Jerganov.

Created something called Llama. cpp, which allows you to run these transformer models on with C And for the non technical audience, that just means that you can run it in a much more efficient way than running it in Python, which is what otherwise it would be running. And so you can run it with CPUs, which are not nearly as efficient as GPUs.

So, Right now, that is what we’re doing, and that’s practical limitation. If we really want to [00:10:00] take it off the ground, we’re currently in the process of buying GPUs and there’s a bit of complexity there because if you want to buy a rig of, let’s say, H one hundreds, a one hundreds, or AMDs, M I 300 X 350 X you really need to take into consideration the power consumption and the cooling needs.

of these systems. They draw a lot of power and hence generate a lot of heat, which you need to make sure you have the cooling capacity to handle. And so we’re in the process of not purchasing anything that high tech but maybe using some, getting a rig of graphic cards, essentially, to get the job done.

The same way. And when we deliver a few projects demonstrating the capabilities and the council realizes that this is absolutely essential and makes the job of staffers so much simpler and [00:11:00] improves productivity and efficiency. I think they will understand the need to upscale from there.

But just with GPUs, we can easily get it off the ground and get this going.

Luke: That’s awesome. how has it been, like I’m gonna go kind of a little all over the place, but like, I’m just really curious as you kind of explain this, what you’re doing here seems quite different from what Typical city governments deal with, right? Like, I mean, you’ve got a team of data scientists and software engineers, right?

how is it navigating that? have people been receptive to introducing these efforts and at the city council level has been a lot of like, kind of trying to convince people or, or like, has there been skepticism of it? Or like, I’m just kind of curious, from your experience.

Alaa: Yeah, this doesn’t happen by mistake and this wasn’t, this was my doing in terms of the application, but it wasn’t my doing in terms of the decision to make it happen. I, I was hired by a very supportive supervisor who had the [00:12:00] intention of Using technology to advance the way that the legislative division operated at the New York City Council.

I was actually, my background is in computational physics,

and so I know how to use a computer very well, but I do not have the background of a software engineer. I was asked to do a lot of software engineering, which I hesitated to do early on, and eventually we came to an agreement, and I, I went ahead and I did that, and I learned a lot of software engineering.

But I’m a scientist at heart, and so yeah, I was given the opportunity. There was a vision to make us technologically competent, I guess, at the council. And they needed somebody that could come and implement that vision. So, I didn’t have to do all the fighting.

Somebody was willing to do all that fighting for me essentially, and get us to the point where we were supported enough to [00:13:00] just have the opportunity to sit down and work with the tech and the science and, and make that happen.

Focus our energy where it’s actually needed.

Luke: That’s fantastic. Yeah, yeah, yeah, that’s great. Yeah, so, I mean, there’s a big, kind of a big push to move everything to the cloud. Are you, are you a fan of moving to the cloud or do you want to keep things on premises?

Alaa: are pros and cons to each approach. So one of the pros of moving to the cloud is stability. These cloud providers, they can provide you with backups. Their internet and electricity almost never goes down. And if it does, they probably will back you up into another region and make sure you’re staying online.

If you use third party vendors, they can easily hook in to cloud services. And they know how they’re set up already, so it’s very easy for them to, to work with that. I personally, and for the things we do at the council am more of the notion [00:14:00] that we should be moving on prem and making sure that everything we do is on prem.

And that’s for a few reasons. Number one, if you want to ensure complete security of your data and your environment, it has to be on prem. if it’s hosted by a cloud provider, the cloud provider, at minimum, is what you are entrusting with your data. And it depends on how you have it set up, if you’re using third party vendors and such. The other idea, the other reason why I like to have things on prem is because I want full control and access and I want to be able to modify things exactly as I want.

Some cloud providers make it very easy to modify things the way you need. But, I think Being on prem pairs very well with when you’re developing everything from open source yourself as well.

If you want to customize everything yourself and you want to be able to [00:15:00] change any aspect, having full control of your hardware and your software is the is the best option. Then nobody can stop you from doing anything. Your limitations are your own. Limitations. They’re not limitations that are externally enforced.

Luke: Makes sense. It was one of those things where, uh, you know, we’re, we’re Big

on that at Brave, too. Like, so it was one of those things where I was like, cool to see, you know, on, on doing things locally, like, uh, on the client, you know, for the user, right? Like, and even things like like we have an ad model where it basically can match ads locally on the browser instead of having to, you know, Leak your data to third parties in the cloud to do all this stuff that you could just do on the device, right?

Like in kind of having that user agency, right? Like, and so control, but you have this in this other context, right? Where you’re dealing with, I mean, you’re dealing at the city level, but I mean, there’s a lot of information there that you want to make sure that, you know, doesn’t have

leakage

Alaa: It depends on the level of expertise of your engineers. So, some, and the amount [00:16:00] of bandwidth they have. Some people are forced to rely on third party vendors. And that works much better in a cloud environment. I think most people are. Tending to move towards cloud environments. And I think that’s because it creates easier integrations, but I’m, I’m glad that brave is also

Luke: No, we’re big into it. Like, yeah, yeah, yeah, it’s awesome. I mean, we used You have to use some things for, for some things like that, but like very much on the on the user, user’s first side and of everything. And I think you know, you, you’re mentioning you’re doing all this stuff open source, right?

Like, and and that it sounds like you’re doing some really interesting work. Are there other areas you’re kind of Are you trying to establish a model that other cities can use? Like, are you working with you know, counterparts in other cities that are trying to do similar things? what’s your vision for this?

kind of longer term.

Alaa: so I run two teams, a team of software engineers and a team of data scientists. My biggest vision, the reason why I joined the New York city council as chief data scientist was because I wanted to see legislation be more data driven.

And evidence based [00:17:00] And this is going this is very separate from the A.

I. To be fair,

so we don’t want to use black box methods. We want to use statistical methods that are transparent, which we can justify~~ who ~~that we can explain to the public and explain our rationale for why we are performing or making the judgments or evaluations that we’re making based on the analysis and why we’re using it.

You know, making such decisions with the data. So that’s really the real reason why I joined the New York City Council. And I’d love to be able to impact and not even just on the local level. I think this is more common on the federal level because they have more funding. It’s maybe less spoken about even on the city level.

It’s just, we don’t have the resources to do it like they’re doing it. But we need to start doing it on the city level and on the state level. I don’t see it at the state level. And if I could have a big impact in any one way, it [00:18:00] would be to make this

make

data driven fact based legislation, a standard across

government on various levels, and if I can impact that in other places, I would love to do so.

On the software engineering side. Being able to drive build software that’s customized exactly to the needs of the legislative division has improved the efficiency of the legislative staff’s function tremendously. And this is something that could be applied in other municipalities as well.

But different municipalities would have their own workflows, so things would have to be customized. I would love to open source what we’ve built one day, and maybe that’s something we should consider. We just, right, there’s so many things we need to do, so much low hanging fruit that we just haven’t gotten to it yet. [00:19:00]

Luke: I think a lot of this is people have the question of how and where to get started with these things. But like doing this in New York City, it’s like such a huge data set of things and all these moving parts, right? Like, it’s really cool. Like, I’m excited to see what you guys do. I think it’s really great.

~~Yeah, I think you know, ~~let’s talk about some misunderstanding. There’s some common misunderstanding you see about AI in the field or that you’ve encountered like at the government level.

Alaa: Oh, absolutely.

there’s a general misunderstanding of AI, and it’s not specific to the government level. It’s even amongst peers in the data community.

Alaa: I think the most concise way to put it is that generative AI should not be used for decision making. And I think there’s a huge misconception that you can use generative AI to answer questions, to make decisions.

It can be an [00:20:00] assistant in the process of making a decision, but it should not be used for decision making. And this is something that I spoke about in my talk yesterday. And I can, do you have a few minutes?

Luke: Yeah, I know. That’s why you’re here, man. I don’t want to hear this is like this is really awesome point of view and want to get out there.

Alaa: right. So large language models. Languages have been trained to predict the next most likely word given a sequence of words.

So they’ve been fed terabytes and terabytes of data. Wikipedia articles, internet articles books, encyclopedias, you name it. And so they’ve learned the human speech pattern very well. And that’s all they do. So a large language model is trained to predict. to just predict the single next best token or word.

So if I ask the model, why did the chicken cross the road? I will pump [00:21:00] in, why did the chicken cross the road? And it will generate the word, the. And that’s it. The model has done its job. Why did the chicken cross the road, the? So then it takes that, ChatGPT will take that, and it will pump it back into the model, and now it will say, why did the chicken cross the road, the? And it’ll generate the word chicken. And now you take, you repeat the process. You take why did the chicken cross the road, the chicken, you pump that into the model and it’ll produce the word, the chicken crossed. Now, if you pump, why did the chicken cross the road into tragedy, it’s not going to say the chicken crossed the road because what I’m just giving an example.

So what it’s doing is that it’s predicting the next most likely word in an iterative fashion, continuously. And when it predicts one word, it does not know the next word that’s going to come up in the sequence. So, it has no idea when it says, the chicken, that it’s going to say [00:22:00] crossed

next. And when it says, the chicken crossed the road because of the water on this side, it doesn’t understand, it has no idea what it’s going to say in the next sentence.

So it’s, it’s generating this first sentence without knowing what it’s going to say in the next sentence. To be fair, it, now it’s going to use the information from the first sentence it generated to generate the next sentence. But the really, the, the big takeaway here is that right now as I’m speaking to you, I have an idea in my head.

And I’m thinking to myself, how am I going to convey this idea to you? In a way that you understand using words, and it’s an idea that exists in my mind. That thought process is not at all how AI functions. AI is simply using what it’s been trained on. It’s [00:23:00] seen patterns of words in different orders. And it’s trying to string together a very likely pattern of words given the prompt that you started it off with.

And that is all it’s doing. So, for everybody who’s scared that AI is going to take over the world, you don’t need to be scared. But also for people who think that AI could make a decision for them, you need to understand that that’s not the way that the technology was designed. That’s not what it was designed to do.

There are other machine learning models where you will train the model based on a lot of data. To predict whether this is a cat or not a cat. And so it learns to predict whether an image has a cat in it or whether it doesn’t have a cat in it, and that’s what it does, and that is a good decision making tool.

But generative [00:24:00] AI is not, that’s not the point. You shouldn’t be asking it yes or no questions, for example. You can ask it to explain things and teach you about the history of something, but don’t ask it to make determinations

Luke: Right. Right. No, that’s great. That’s great. I appreciate you taking the time to do this. I think these are important nuances that, you know, a lot of people don’t understand. ~~Are you seeing like misuse, right? Like, I mean, maybe this kind of touches on what you were just saying, but maybe there are some other cases that come to mind.~~

~~People using misusing AI or even things that you’re concerned about with how AI is being applied in the wild. Absolutely.~~

Alaa: ~~So, There are applications of AI for technological use in the military. Where they are predicting where For example, you can make a strike and that is not something that this technology, it’s probably not using gen AI. It’s probably using other types of technology, but this whole thing is stretched, ~~

Luke: ~~Okay. Okay. ~~

~~Okay. We’ll cut it. Yeah. No worries. No worries. All good. All good. All good. ~~So, as a chief data scientist at the New York City Council, how do you envision the role of AI evolving in government and policymaking over the next five years or so or two years or any amount of time that feels reasonable and not like a joke?

Alaa: I don’t think AI is going to be involved in policymaking, but it’s going to be a tool. To help policy makers more efficient in what they do. And this goes back to what we discussed earlier about decision making. It should not be used for making decisions themselves. That’s, if you want to make decisions, or you really want to motivate decisions in government and policy making, that’s where you want to use [00:25:00] statistics.

Because we can crunch the numbers, and we can understand why the decision was made. Or why the numbers are telling us to take one decision over another. but using Gen AI to do that is not the right approach.

Luke: Cool. Well, we covered a ton of stuff today. Is there anything we didn’t cover that you think our listeners might like to learn or know about what you’re doing or anything you just want to put out there?

Alaa: Not much. I guess I am currently working on developing a LinkedIn learning course

on RAG and Essentially, I’ll be walking through how to develop a local model that you can develop on your own machine using purely open source technologies with a little bit of light theory and application you know, the coding that goes with it.

That’ll probably be coming out sometime in the middle of next year.

Luke: Cool. Cool. Awesome. And where can people find you if they want to follow your work [00:26:00] or, or say hello or learn more?

Alaa: I

don’t have any good outlet, but probably LinkedIn is the best

Luke: Perfect. Okay. That’s good enough, man. Right on. Well, I really appreciate you taking the time, Alex, to come here and talk to us today. It’s super interesting work with what you’re doing, and, and looking forward to seeing where it goes.

And I’d love to check back in on it, too. See how things are going.

Absolutely.

Alaa: Thank you so much for taking your time to

Luke: Alright, man.

Alaa: today.

Luke: Thanks, man.

Luke: Thanks for listening to the Brave Technologist podcast.

To never miss an episode, make sure you hit follow in your podcast app. If you haven’t already made the switch to the Brave browser, you can download it for free today at brave. com. And start using Brave Search, which enables you to search the web privately. Brave also shields you from the ads, trackers, and other creepy stuff following you across the web.

Listen on Spotify Podcasts

Listen on Apple Podcasts

Watch on YouTube

Show Notes Guest List

Show Notes

In this episode of The Brave Technologist Podcast, we discuss:

How the New York City Council team leverages AI and data science to drive evidence-based policymakingn
The importance of transparent, statistically-driven legislation
The debate between cloud-based vs. on-premise AI solutions in government
Common misconceptions about generative AI and decision-making
The future of AI in government and its role in shaping policy

Guest List

The amazing cast and crew:

Alaa Moussawi - Chief Data Scientist for the New York City Council
Alaa Moussawi serves as Chief Data Scientist for the New York City Council, leading teams of data scientists and software engineers. His data team analyzes issues like pay equity, rats, and school bus delays to support data-driven legislation, emphasizing transparency and reproducibility. The software team works to streamline council workflows, automate tasks, and improve efficiency, developing open-source tools like CRMs, dashboards, and paperless hearing systems. They’ve deployed machine learning models, including a RAG model for simplifying legal research. Dr. Moussawi is also producing a LinkedIn Learning course on developing RAG models in secure environments.

About the Show

Shedding light on the opportunities and challenges of emerging tech. To make it digestible, less scary, and more approachable for all!
Join us as we embark on a mission to demystify artificial intelligence, challenge the status quo, and empower everyday people to embrace the digital revolution. Whether you’re a tech enthusiast, a curious mind, or an industry professional, this podcast invites you to join the conversation and explore the future of AI together.

AI in Legislation and How NYC Uses Data Science to Drive Policy

Show Notes

Guest List

Alaa Moussawi - Chief Data Scientist for the New York City Council

About the Show

Almost there…

Please continue the installation of Brave in the Google Play app.

Google Play app App Store.

You’re just 60 seconds away from the best privacy online

Download Brave

Run the installer

Import settings

Download Brave

Run the installer

Import settings