Balancing Efficiency & Security: AI’s Transformation of Legal Data Analysis

View Show Notes and Transcript

What is the role of AI in Legal Research and Data Security? We spoke to Matt McKeever, CISO and Head of Cloud Engineering at LexisNexis, a company that uses GenAI and Custom LLM models to help its customers with legal research, guidance and drafting.  Matt spoke to us about intersection of cloud engineering, cybersecurity and the revolutionary impact of Generative AI (GenAI) in the legal sector. He shared how LexisNexis leverages GenAI to enhance legal research, draft legal documents and summarize cases efficiently. We learn about the importance of data security in AI applications, especially in the legal industry and the role of custom Large Language Models (LLMs) in securing and processing legal data.

Questions asked:
00:00 Introduction
00:26 LexisNexis use case for GenAI
02:37 Amazon's Generative AI services
03:24 Cybersecurity Threats when using GenA
I05:14 Where to get started with Security in GenAI?
06:53 Balancing Security and Innovation
08:20 Business reason for GenAI
09:13 Lessons from working with GenAI
11:14 Having Custom Large Language Model
13:42 Impact of AI on Cloud Security Roles
14:50 Get Started with Custom Large Language Model
15:48 Fun Questions
17:49 Where to connect with Matt McKeever?

Ashish Rajan: [00:00:00] Welcome to Cloud Security Podcast . We have Matthew. And I will let Matthew introduce himself to the audience.

Matt McKeever: So my name is Matt McKeever. I'm the Chief Information Security Officer and Head of Cloud Engineering at LexisNexis.

Ashish Rajan: That's a very interesting role. You have security and engineering under the same wing.

Matt McKeever: Yeah, it's a great combination. A lot of work, a lot of efforts, a lot of overlapping priorities sometimes. But, in the end I have really two solid teams under me that help support the cloud engineering and the security side.

And we do a lot of collaboration across the business stakeholders as well.

Ashish Rajan: That's awesome. And I think it'll be interesting to hear your perspective on how the two merge together. Now, specifically, we're talking about Gen AI and obviously how you guys have been using it. Could you share about the use case that, obviously, as you mentioned, the AWS re:Invent Keynote. Let's just start with what's the use case.

Matt McKeever: We've been using AI for many years in our product. We have, petabytes of data. And our customers use us for search engines, basically search our products. We do other, practical guidance, help them kind of draft certain documents.

With Gen AI, now we have a much more robust search engine. We can do searching. We can do drafting of legal [00:01:00] arguments, legal briefs, legal, consent decrees. We can do summarization of our cases. We have many cases, and you wanna summarize the case, we can give you the summarization of that case, and then you can actually upload your document and we can summarize your document for you and then ask general questions on that.

And you can roll that up many times across a couple times, all with GenAI and the trigger behind all that is we have our own Anthropic, legally trained model that we trained ourselves. Oh, wow. So we use that as one of our bedrock, no pun intended, with AWS Bedrock to actually leverage that.

Ashish Rajan: Awesome. And how was this done before this? Because I imagine all the paralegals, or at least CISOs in a legal firm, they'll listen to this going, isn't that like a paralegal job? Is this more like how Gen AI can be used to do a paralegal's job? Or is that a bit more than that?

Matt McKeever: I think it's more about efficiency. It's an efficient play at their firm. So they can actually do more. Maybe that associate can now get better research that they hand off to the associate and or to the partner. But it's more an efficient play. They can look across everything.

Even if we give them the summarization of a case, we still give them all the links to that case. [00:02:00] On a search, we still give them all the links to it. They can go and verify it. But basically, it's expediting the use cases.

Ashish Rajan: Awesome. The only reason I thought that is because I was thinking, I've been watching this show called Suits on, I I'm like, and I think I remember seeing this, the entire episode is based on someone just going through documents, mountains of documents, trying to figure out which legal, I won't say loophole on this podcast, but I'll just say what legal way you can have used so that you can bring a client outside safely or win a, it's just one of those ones where instead of spending an entire episode, they could just literally have one query and that could be solved, right?

Matt McKeever: Yeah, it's been that way for a while though, but yes, this makes it even more important.

Yeah, awesome.

Ashish Rajan: And I think I want to talk about your journey because obviously the work that you guys have been doing has been mentioned in the keynote by Athropic as well. In terms of how this came into being, what are some of the services that you're using in the background from Amazon that you can talk about?

Matt McKeever: Yeah we've been on Amazon for many years. We've been using Amazon for a while. A lot of their services are ingrained in our environment. More recently, we've been using Bedrock and their Bedrock instances. Yeah. And we use a couple [00:03:00] different models. We use, the different Claude models that they have available.

And really, the key thing is to our data science team and the cloud engineering team is, which model should we use for which service? And there's a cost difference between the different models, there's a speed difference, and so we've been using Bedrock since it came out. Probably right before it came out.

And other AWS services. Ah, a plethora of them. We just, whatever is right for the job.

Ashish Rajan: Oh, awesome. And would you say, in terms of building your own model, does the threat level change because a lot of the conversation that I have with CISOs around Gen AI or, I'll start with the whole very first ChatGPT that changed a lot of people's lives last November.

A lot of conversations that I'm having with CISOs around the fact that I can't trust this, I need to have a proxy or some way of just managing what's being sent out. Has that changed the threat level that you guys had before now that you're more into Gen AI and using your own models?

Matt McKeever: Good question.

I don't think it's changed a whole bunch for the main purpose of, we were using AI before in a closed system. So when we're using Bedrock, it's still a closed system. We've [00:04:00] trained the model ourselves. our customer queries, our customer usage of it doesn't change the model. So it's still our model.

It's still our content. And then it goes against our content. So it's all a closed loop system, really tight security around that. Whatever the customer types in, there's some prompt engineering that does some filtering on it. Before it goes to large language model and then it comes back again, queries against our content.

There's a famous case in New York where a lawyer submitted a fake case. And the case actually didn't exist. It'll never happen in our system because we're going against our content that we've had for many years.

Ashish Rajan: Oh! Also, if you were to, and I think it's really interesting as well, you know how, again I'm gonna go, there's so much reference to legal things in TV shows and otherwise as well, if someone mentions a case, I don't know, some case from the past which is not an actual case, it shouldn't really matter because you're just looking at your internal data instead of looking at, hey, what's on the internet about this?

So Joe Blow decided to put a fake case on the internet, that would not be getting picked up over here.

Matt McKeever: Correct. Wont be picked up and [00:05:00] actually we go one level further, but we'll shepherdize as we call it. So even if the case was overruled, we'll bring that back to you as well. It says, Oh, that's bad case law.

That's been overruled. So again we've been doing that for many years. Gen AI just makes it a little more efficient and you can better search results.

Ashish Rajan: I think this is very interesting because your role is both security and engineering. I find that really fascinating from a people process technology perspective as well, because I believe, at least on Cloud Security Podcast, we have a lot of security audience who's probably trying to get geared up with how do I react to GenAI?

How do I start using building my own LLMs? How do I secure my LLMs? In terms of, how you guys went about thinking from a talent perspective, I don't even know where do you even start like this process of, Oh, today we're going to be a GenAI company for, I don't know, for months to come or whatever. How would someone start? What are some of the challenges or what are some of the things you would consider or you would ask them to consider as you go through this? Because you guys have been doing this for some time already.

Matt McKeever: Yes, we've been doing it for a while. From an infrastructure perspective, my role is very different.

I was the CISO for, much longer than I had been cloud engineering, but I grew up as an [00:06:00] infrastructure guy. So it's always infrastructure security. Oh, okay. And then, a year or so ago, we made some changes and I have both. With two strong teams under me, that helps me a lot.

But really make it so that they collaborate. Before there wasn't a lot, we, the collaboration could have been better. So Im not forcing that collaboration. From security perspective, you look at Gen AI, like any other new technology you use, what are the threats, what are the risks, what do you see, right?

In our case, it's following their data, following our customer's data. Our customer queries are very confidential, right? So what a lawyer is searching on, obviously is very confidential, they don't want that leaked out. So really following that through and then, our data coming back, what we give them back to them.

It's all closed in loop, it's all segmented out so we know what's talking to what. And that's key. But I think there's standard building blocks in security, right? It's authentication, it's encryption, it's the right architecture. Who's accessing what? How's that data flowing?

So really it's the same building blocks the basic security building blocks going forward.

Ashish Rajan: And in terms of building a team around this cause I don't know how many people are even experienced in [00:07:00] this thing that's been, I imagine it's a lot more learning as you go as well. What was the thing that you found that worked to balance the risk between experience versus this is like a number one challenge for a lot of companies.

I want to be creative. I want to be innovative, but at the same time, I want to be secure as well. Is there something work from a balanced perspective?

Matt McKeever: In our world, we have all these data science and data engineers who have all these great ideas. In security, we always want to say no, we're restricted.

But whenever we say no, it's like now we work with them. We develop a really strong collaboration, and we work with those teams from the beginning. So right now, right at the very beginning, we're going to do Gen AI, there's ChatGPT and we're worried about, hallucinations. Where's our data going?

What's going on outside? And, we work with architecture collaboratively, because we're all learning together, right? Make it closed end and then, we're worried about, people poisoning our prompts or, sending poison prompts and but, we have another filter in front of that.

Yeah. Whatever comes in, we're actually filtering that out. Yeah. Hey, whatever's going into our system, we know what's going into our system. Yeah. It's, again, the principle of check your inputs on programming, it's not much different, right? It's really those basic building [00:08:00] blocks.

Yeah. And we got to break it down and then understand the risk. There are a couple of unique risks in there. You have to watch through, watch your model, how's your model doing who has access to your model, Are you in a closed end system? And that's where, I think Bedrock does that for us.

Bedrock, it's closed end. It's not going outside. It's all in our endpoint. Only us, We can talk to our endpoint based on the private endpoints we have and how AWS is configured. We've configured our environment.

Ashish Rajan: I think maybe worthwhile calling out, could you share what were some of the business reasons to go into the Gen AI space?

I think I would imagine as a field to what you said before, it used to take hours and there's a lot of, is there more productivity? What led you guys going down the Gen AI path for your business?

Matt McKeever: We've been a cutting edge, not cutting edge from a technology perspective, so our data science teams were always looking at new models, new AI models.

And, back in a year ago, they when they got wind of it, they started playing around, they started thinking about it this could work. And then, January, February, March, we started getting a lot more momentum going from a business perspective. We're data company.

Yeah, it's a game changer from a data company[00:09:00] And from a business perspective, we said this will be a game changer. Yeah, so lots of resources applied to it validating, you know working through customer feedback And then you know product comes out. We still get customer feedback on that product. We're still looking to improve it

Ashish Rajan: One of the things that I wanted to cover as part of this conversation was also the changing security landscape and working with the GenAI space but also one of the things that were called out at the keynote was the shortage of talent in general in cloud.

There's also themes around the fact that people should have a data security strategy of some sort. Is there any advice for people who probably are already and I don't know if this is something that you guys face, where a lot of data is already unstructured, there's a lot of data which is unlabeled, there's data that's just basically data sprawl is a real thing, as one would imagine for people who are looking at that now, and in terms of if you put your engineering hat instead of this, I guess you can put engineering as security hat, what was something that was a challenge that you had to overcome that you would pass on as an advice to other people, hey, don't do this, because it totally just doesn't [00:10:00] work.

Matt McKeever: From a product side, we had it pretty covered. It's done the, internal use our enterprise use how our internal employees are want to use ChatGPT. All right. That's always a risk. So we did a lot of education. We didn't lock it down, but a lot of educational people, how to use it, when not to use it.

Really basically don't use it for anything with our customer data, our business data, any internal data, a lot of education because if you try to lock it down I think you just you're going to create more problems, it's going to get out and now, you can, the people is going to, I'm going to try and break out, right?

A lot of education continued education around that. And actually we've actually opened up a, we're piloting it now, but an internal chat GPT that, against a closed end model also, so you can start using, how to do a marketing campaign against something, or how do I do something?

So there's use cases where you can actually just use an internal same bedrock connections. Yeah. Internal use of it. Yeah. We do a lot of Office 365 on the enterprise side. But really giving people those closed end solutions. Yeah. That they can use our data. Whether it's customer data, whether it's your email, whether it's meeting minutes.

How do I summarize, how do I [00:11:00] change my code, whether it's Code Whisperer. You got to give them like the happy path, the right way forward, because if you don't give them a happy path, they're going to find a path. They will find their own path, yeah. So really tell them make the safe path easy for them.

And so that's what we're doing. We have, like I said, this internal proxy we're using, like ChatGPT.

Ashish Rajan: So Cloud Security Podcast has security folks in the audience as well, and there's not a lot of people talking about the use of custom LLM models that they've created themselves, and how they have been able to successfully do it in a product as well.

In terms of, skills that would be required in a team, and specifically in a cloud security team, for lack of a better word, because of the Cloud Security Podcast.. How different would it be, because you have been, to what you called out, you have been using AWS for a long time. Has the role of that team dramatically changed, apart from, hey, it's a new service that I'm trying to protect.

Has that dramatically changed from pre custom LLM to now that you guys are using your own LLM models?

Matt McKeever: Yeah. The one thing that we didn't quite understand and we learned quickly is capacity planning and cost modeling of it. Oh. Because it's easy to set up. Infrastructure wise, you point at end point, you're [00:12:00] done, use it, yeah, okay, but within Bedrock there's token based pricing, which is, you pay per token. But it may not be guaranteed. So if you may hit a spike, you may have to, peel off and retries. In an online transaction search business, between nine and five, it's peak U. S.

It's the world peak. We can't risk a customer not being down. So we have to buy extra. Yeah, so there's a provision model. So really now, managing the traffic across that. So it's more capacity planning. Very similar to how we do our cloud based, costs. Like I have a cloud business office and they look at all the financial models and how we make sure we're getting the right price out of it. Our usage and what we're paying. And now with this becomes now, what should be token based? What should not be token based? Oh, and it's really a we have our finance guys involved also, what's the financial model as we're using more capacity, as we're growing, usage is growing.

When do you increase more models? Do you increase more models? Do you use a different model and token base and really the calculus of that becomes rather interesting.

Ashish Rajan: Wow. I [00:13:00] don't even think about it. Because to what you said, now that I think about this ChatGPT and other people that talk about, Oh, we have this token limit.

That's why they talk about that. That's where there's ChatGPT talks about the fact that it's a huge cost to run every single query. It's because that token, like the amount of compute required per token, right?

Matt McKeever: So that your tokens in and tokens out. But you can go to provision model that Amazon has and Bedrock that has a certain peak tokens per minute.

Oh, okay, but that's your peak per minute. Yeah. So you have to plan for the peak.

Ashish Rajan: Oh, so it's not like you could have one of those instance for five minutes and then you're done by this morning. You'll be planning ahead for it.

Matt McKeever: Yeah, it's early days. It'd be interesting to see where Amazon comes. Yeah, for sure. We're, everyone's trying to push that. Like I, I only want eight, I want nine to five. I want a lot of them. Yeah.

Ashish Rajan: And I think it's worthwhile calling out, this is not like it's been there for years, right?

We're all like at this point, almost like a new frontier for a better word there. Now we're all trying to figure out what's the best way to approach it. As more business use cases keep coming in. I think everyone will start improving. So I think it's really good to at least see that, Oh, I think [00:14:00] the fact that from a skill set perspective is more around I guess, so would you say the threat landscape didn't really dramatically change because it was more data focused, but with the cloud security roles in terms of the monitoring and any of those, you didn't see that dramatic change. It was primarily around that.

Matt McKeever: Not a dramatic change. Like I said, it's security architecture. That hasn't changed in a whole bunch of years. Okay. Validating we using a private endpoint? We're not going out to OpenAI public, right? We don't make sure we're closed end again basic blocking and tackling from a security perspective.

The data engineering obviously completely different. And then I said the cost based aspect of things, my team has the AWS cost management. I work in a finance team. But how do we manage that? Are we using it? When should we go up, when should we go down, when should we, can we move some workloads to off hours? Yeah. Yeah. Some certain workloads, nights and weekends that we don't have to do during the day. And just use that capacity you already have to be efficient as possible.

Ashish Rajan: Awesome. I think I've got one more question and that's towards the tail end of this now. So any CISO or CTO who's listening to this conversation and probably is working on a custom LLM of [00:15:00] some sort.

What's a good starting point for them to start working on something like this? And they might be like what you said, they've been in cloud for a long time. And what would you say would be a good starting point from a maturity perspective? Hey, this is a good starting point. And this is the next milestone after that you work towards.

Matt McKeever: I think it's, define your use case. Yeah. It's a new tool in your toolbox, right? Because you have a hammer, not everything to screw . But really look at your use case and really balance out which model you need. They're all different pricing, they're all different capacity.

Which do you need? It's probably a mix of them. And then continue to grow and don't be afraid to adjust and pivot because it's ever changing. It is a new technology, new frontier. Maybe for some for us. It's just a new technology. It's really take off. Really just be agile in your approach Don't be afraid to back off and do something different

Ashish Rajan: Interesting definitely go agile.

I would definitely say I think it's one of those ones where Actually, I've got a fun question for you later on. So this is the most of the technical question I've got three fun questions for you not technical really just to get to know you a bit [00:16:00] more first one being what do you spend most time on when you're not working on technology or cloud or security or engineering?

Matt McKeever: There's not a whole bunch of that time. Really just vegging. I've taken up golf recently, so try a little bit of golf here and there. We just vegging, just going to restaurants a foodie, whether it's in the summertime, beach.

Ashish Rajan: Okay to share your handicap level. What's the handicap level there?

Matt McKeever: I don't keep a handicap. Not quite there yet.

Ashish Rajan: Yeah, fair enough. It's as long as the ball goes in, I don't really

Matt McKeever: I'm two digits I'm below a hundred, but, I don't really keep a handicap.

Ashish Rajan: Oh I'm more of a mini golf person. I'm just trying out one of those little lane ways as well. Second question, what is something that you're proud of, but that is not on your social media?

Matt McKeever: Proud of not on my social media. I think on the personal side, my, I got two kids are really successful. My daughter is down in Miami doing some, marketing stuff that she never thought she'd doing it. She's really doing good. And my son's an actuary, which I never thought he'd be an actuary because, he was one of those kids who didn't stay in high school and just really took off. So really proud from where they took off.

Ashish Rajan: I'll make sure this is shared with your kids so that you get some of your brownie points. Last one, what is your favorite cuisine or restaurant that you can share? [00:17:00] Thai food. Thai food? Number one? Oh, awesome. I'm going to add a fourth question, which wasn't there in the beginning.

Since you've been in the space for a long time, and you've come from the infrastructure space, like myself, what is something that you miss about the Waterfall days?

Matt McKeever: I really miss the pace. It was so much more laid back. It's gonna come and you have time. Now it's you gotta it's constant. It's changing, it's going.

It was just interesting. You never really bored. Yeah. But it seems I told someone like, I'm busier today than I was yesterday. It's been going on for a while. Yeah. But it's interesting. I think we're just changing so fast and be able to change. Cause back in the waterfall days, maybe it was set and forget it.

There's a path and. Oh, it didn't work out and it didn't matter.

Ashish Rajan: No, thank you for sharing that.

That was definitely something I'm going to keep remembering that one though. I appreciate that. Thank you so much for people who want to know more about, hey, I'm building an LLM model. I want to know what security engineering looks like.

Where can people reach out to you?

Matt McKeever: I'm on LinkedIn.

Ashish Rajan: Okay. I'll put that in the show notes as well. Thank you so much for coming on the show. Excellent. Thank you very much. I appreciate it. No problem. Thank you. Thanks everyone. See you next episode.