Breaking and Building Serverless Application Security

Serverless Series

Nov 21, 2021

•

Season Two

View Show Notes and Transcript

Episode Description

What We Discuss with Andrew Krug:

00:00 Intro
07:10 What is Serverless Security?
08:20 Building Blocks for Serverless Security
16:40 Foundational Security Pieces for Serverless
18:59 Adoption of Serverless
20:41 Serverless Security
24:35 Incident Reponse and Monitoring
26:06 Attack Scenarios for Serverless
29:06 WAF and Serverless
32:05 Content Security Policies
35:34 Starting point for serverless
37:38 Is Serverless Cloud Agnostic?
40:39 Benching for CSPs
45:39 Skillsets for Serverless Security
48:35 Where do companies start with Serverless?
50:22 The Fun Section and few last questions

THANKS, Andrew Krug!

If you enjoyed this session with Andrew Krug, let him know by clicking on the link below and sending him a quick shout out at Twitter:

Click here to Thank Andrew Krug on Twitter!

Click here to let Ashish know about your number one takeaway from this episode!

And if you want us to answer your questions on one of our upcoming weekly Feedback Friday episodes, drop us a line at ashish@kaizenteq.com.

Resources from This Episode:

Tools & services, discussed during the Interview
https://infosec.mozilla.org/
https://observatory.mozilla.org/analyze/
SAS Top 10 – https://github.com/puresec/sas-top-10

‍

Ashish Rajan: [00:00:00] Hey, Thanks for a coming in, Andrew. How’s it going, man? Hey, it’s

Andrew Krug: going real good. Thanks for having me on.

Ashish Rajan: Not a problem. I need to talk to people about the coffee love. So actually tell me about the coffee you’re having.

Cause I think people should know about this as well.

Andrew Krug: So today I’m having a cup of the case coffee epiphany , it’s a local coffee roaster here in the area that I’m in. I brewed it in my small Cemex here, so really serious about the coffee, but this is actually is really special roast for me because this coffee shop is where I sort of decided that I was going to really lean into AWS and do things like incident response, automation, and source.

And all that fun stuff. And it’s just a great roast. I think it’s available by subscription online case coffee, roasters.com. If you just really liked to order small batch roasts. They’re awesome.

Ashish Rajan: Also. So wait, so is this probably a good segway into the question about how’d you go into where you are?

Maybe you can start with the coffee place if you like, but I’m sure for people who don’t know who you are can you give us some short intro on how you got into sub security or [00:01:00] cloud security coffee, guardian declared security, I guess.

Andrew Krug: Yeah. So for those of you that don’t know me, my name’s Andrew Krug I’m a security evangelist at Datadog.

I came at this role kind of the long way around, I guess you would say I came up from like traditional data center work and sort of entered the field, I guess, that at the tipping point where data centers were turning into virtual. And so like believe it or not, I started my career like migrating Novell NetWare VMs into like windows.

Yeah. I was a group wise administrator for a while. Like boots on the ground, like on-prem and then like fast forward in like 2012 or so I made kind of a conscious choice to pivot into cloud security because I saw sort of explosive growth curve in AWS and, and it was still pretty early days for them.

But then I even went one more step while I was at this coffee shop with a friend of mine. Joel. He, he works at Google now. I think, and we were having a cup of this coffee and talking about incident response automation, how like all the APIs existed and, , [00:02:00] we could probably do this thing.

So that’s when we started to kind of work on a couple of open source projects render threat response to GitHub the org that we had for awhile margarita shotgun and AWS IIR which we presented a few times at, at reinventing things. But that was really kind of the path into AWS for me.

And I think to some degree for him as well. And it’s just been a wild ride since

Ashish Rajan: then. Wow. And to your point, cause I think for people who may not know this, but you’ve actually given a couple of AWS reinvent talks on the whole incident response thing as well. Dunno if you’re coming this year as well, but I’ll let you talk about if you can talk.

But that is, it’ll be, it’s always good to see your talks. Cause we had Nathan Case as well, who did a chalk with ages ago? He was here on the show as well.

Andrew Krug: Yeah. So I am speaking at reinvent again this year. Ironically, I am speaking with Nathan Case again at re-invent the combination seems to be popular.

I mean, in the past, I’ve also spoken with Henrik Johannson formerly AWS. Now I think he’s still with Salesforce, but Nathan and I are going to be talking about something we never talked about before, which is mergers and acquisitions [00:03:00] and like really how to evaluate what you need to do as part of an M and a strategy when you have to grow a business really, really fast.

Ashish Rajan: Wow. Okay. So that I would be definitely interesting at that point. Cause we. We haven’t touched that topic. So maybe I should bring you back in for that as well, but I’ll be pretty cool. Now thanks for sharing that all right. Let’s get into this man. So we spoke about your path into Cloud Security but we are talking about serverless security and trying to break it, but maybe hopefully help some people build some security around it as well.

Well, what does, serverless security mean for you? Let’s start there.

Andrew Krug: So I think we have to start really with how we define serverless. Right. And this is a topic that’s like really kind of near and dear to me, because I’ve been talking about serverless security and stuff since like 2017 or so.

Really when we say serverless, oftentimes we mean function as a service and like that’s comput e but like a whole bunch of different things into this like concept that we call serverless. Right. You could face in there anything that’s managed, but most of the time people really mean function as a service.

And so for me, that means how do we safeguard the runtime of the [00:04:00] function? So like what’s happening to the actual sandbox, that’s running the code, what’s happening with the code. And then what’s happening with all this stuff that gets the request to that code, whether it’s a web or a backend

Ashish Rajan: thing.

Okay. And to your point the separation helps you like maybe it’s a good, segway into the whole, what’s the building block for it as well. Right. Because. serverless security, it could be a bit confusing for a lot of people, even though front-end back-end separation there is that where it’s a good place to start building blocks for a serverless security?

Andrew Krug: I think a good starting point is actually just to adopt , an opinionated framework, whatever that might be. So there’s like serverless framework. For example, for awhile, there was, , a variety of other projects, but like, I think serverless framework, serverless.com has kind of emerged as the clear market leader a serverless application model, but kind of just starting with , that framework to build on top of is a good start point.

But then you also have to kind of do as much as you can to understand what it is that you’re building on top of. So ingress , runtime execution, and then how that’s accessing [00:05:00] other things inside of your cloud environment.

Ashish Rajan: Oh, so maybe if I zoom out a bit, to your point, serverless can be triggered by an event, or it could be a map request.

So that’s where the ingress kind of comes in. What was the other one? Sorry.

Andrew Krug: It was the, the actual how does it access other things inside of the cloud? So either via,IAM , it’s going to access things inside the cloud, or it’s going to access it by the network layer. So how, what’s your boundary?

Oh,

Ashish Rajan: okay. Oh, that’s you see? That’s an interesting way to put it because normally when I talk to people about, Hey, what are some of the building blocks? A lot of people go straight to application security, but I love how you put the whole word. Actually, there’s a lot more to serverless than just talking about application security, although that is a, quite an important component, but I love the fact that you kind of have kind of come from a cloud security perspective.

And so to your point, ingress what’s connectivity. So maybe if you talk to a database or any of the function and you mentioned storage, but so to your point, a good starting point then is almost to do an asset inventory. What else does Lambda function or whatever you have serverless talking to?

Andrew Krug: Yeah. I’d even simplify it a little bit more that [00:06:00] there’s architecture and then there’s AppSec. And I like, don’t get me wrong. Like AppSec, it’s weighted a whole lot more for me on like one side of the scale. But when we talk about this, it has to be a like almost an equal process for like, what’s the architecture of the thing that you’re running on top of your serverless infrastructure.

And then what is the application security strategy, which believe it or not, it’s like really, really similar to any application security strategy for anything that you’re going to run.

Ashish Rajan: Oh, it’s interesting because , the funny thing is I was going to go break first and then build, but I think we were built first.

So maybe let’s talk about , breaking this because I saw , you had a talk about hackers hacking bsides 14, I think ages ago about hacking serverless. And I don’t know if you remember that talk yet, but I think it was really interesting to kind of hear about. The perspective you shared on why people assume a lot of abstraction means for security, I guess in the broader scheme of AWS security or lambda security, , some of your thoughts on braking Lambda.

So I guess we would have to plant two sets of people. We’ve given them an [00:07:00] understanding of, oh, this is kind of what you started. You have architecture, you have AppSec, we’ve done that. And now let’s start beating some layers of how would you build a say cracks into these, I guess let’s just say that.

Andrew Krug: So , back in 2017, when we started that project, the hacking serverless runtimes thing, which we presented at BSides and black hat and even code monsters, Bulgaria, like if you’re familiar with that conference, it’s a, it’s a super fun like Java focused developer conference, but we did it.

And w it was just like super fun to hear all the people who were building on Lambda and things that hadn’t really thought about , kind of the potential attack surface, because they just sort of assumed the vendor was doing everything for them, right. To assume , kind of assume breach or, or assume that there were holes to poke in the runtime.

And I think we really demonstrated like two clear things in that talk. One of which was that you need to be really, really concerned with how frequently the vendor threw away your container that was executing the code. Right? So this idea that you sort of had persistence, but not like really persistence, like you had persistence for like 15 [00:08:00] or 20 minutes potentially inside of a function, if you just kept invoking it over and over again.

And then another thing that we demonstrated was this concept of potential vulnerabilities and shared tenancy. So when I say shared tendency, I mean, a whole bunch of functions that are part of the same. Doing the same thing. So we built this payment processor app, for example, on top of Azure functions.

And we were able to demonstrate that by tricking the runtime, we could get it to decrypt a secret in function B that was only intended to be decrypted in function. A because they hadn’t really considered that inside of the security model for getting inside of the same app.

Ashish Rajan: It’s just interesting because a lot of people don’t even know what runs behind the scene for when they trigger is some serverless function.

To your point, it’s usually a container. And is this, , if it’s still a container running from the cloud service provider.

Andrew Krug: So I don’t think that we get to know that in every case, right? So every vendor has varying degrees of transparency. And I think this is one of the things that we could improve.

And maybe we could riff on a little bit later in the talk here today, [00:09:00] but we have to assume it’s a container of some kind, a Zen VM, a Docker container, a Linux container. Firecracker. I don’t know that we get to know that it’s a container of some kind.

Ashish Rajan: Yeah. Cause I think the two to your point because the whole five minute limit that was there.

I mean, now you can increase it. That was really interesting because that’s what caught my eye about the talk that he did that five minutes is there’s more, plenty of time for five minutes. I’m like, wait, this guy just say five minutes is plenty of time to get into this container. Okay. Lambda container in the back.

Andrew Krug: Yeah, five minutes was the original and now it’s 15, 15 the execution. But it really, that wasn’t really the concern. As soon as we figured out that , there were sort of persistence behaviors, this idea of like cold starts versus hot containers. And so what we discovered over time as sort of an industry, and I really wish this was one of the places where we had had more transparency from our vendors.

Right. Is that for the sake of performance, those containers would never get thrown away. If you just kept sort of re executing in the same container, you’ve got the same warm running container with the same temp, partition and things. So you do [00:10:00] a lot beyond 15 minutes.

Ashish Rajan: I would definitely encourage people who are listening to this to probably try and experiment with this, but it’s really interesting.

So youto your point if I trigger a serverless function, and that says, create five containers to the back, I would continue to work on those same five containers potentially for the duration that I keep triggering the same function. I keep thinking,

Andrew Krug: yeah, it’s not guaranteed.

Right. But goal of the vendor is to provide you as much performance as possible. And the best performance is by not taking that hit from having to spin up a new container and then loading all those dependencies and things into memory sort of dependent on the run time. So. It’s almost a behavior that as we lean towards performance for serverless, we can’t get away from.

Ashish Rajan: Interesting. Wait, so, so we’ve already pealed so many layers onto this way. So we kind of spoke about the building pieces. Then you go into the whole container is running all potentially can contain this string in the back and how well turning from five minutes to fifteen minutes now have even , longer window of just doing what you want to do in that container.

If you actually had the malicious intent behind you, I actually, I’ve [00:11:00] got a question here from Chad and I should probably that sums up quite, quite a bit of it as well. Thanks for the discharge lease privilege, controlled access asset inventory, just trying to decipher the main building blocks myself what are the main foundational security principles apply before diving deep into trying a secure serverless?

That’s a good question. Chad Thoughts on that

Andrew Krug: yeah. I mean, I think leaning into all the principles that you normally lean into are great. And I tend to lean into the ones from infosec.mozilla.org which are sort of tried and true set of security principles, like least privilege, like understand what you’re running on et cetera, et cetera as, as sort of a starting point for evaluating anything.

But beyond that, , you really have to I think it serverless deep dive into that, understand what it is that you’re running on. And I think that’s very, very easy and serverless, but often overlooked because the platform is so simple and it’s so easy to get up and running with, but then sort of understanding the trade-offs and the drawbacks is something that not everybody.

Ashish Rajan: Yeah, I think it’s worthwhile calling out though. Right. And I’m curious to know your [00:12:00] thoughts on, because Chad’s question is quite interesting as well, because I think when people talk about serverless and although yes, serverless from a compute perspective and how it functions very different, but to what you just mentioned, what Chad mentioned as well the whole idea behind our basic principles of security, haven’t really changed.

We’re still looking at say identity, that’s the least political component, a who can do what? And because it’s serverless, the serverless function itself can have privileges. How far can that go and having an asset inventory of how many lambdas or how many serverless functions that are running in the background.

So I think that’s definitely a great starting point control, access your storage. Do if they are riffing on a lot of interesting building blocks over there, which basically also think. At different levels. Cause I spoke to a few people who have like thousands of Lambda function. Right. And we spoke about all the building blocks we spoke about, Hey it’s because it was used to be five minutes, but not 15, but there are, there could be components that are shared across.

And I don’t think there’s much public knowledge around this and not many, a lot of if I sit there and do a [00:13:00] search for serverless security kind of talks there, not that many talks out there that the Lord talks about how they have created an amazing game with it. They’ve done this. They’ve done that. And why do you suppose like, is that because there’s not much research in there and still, even though it’s been there at least for two, three years, people are still adoption is slow

Andrew Krug: I think adoption , is explosive. Actually, like I think a lot of businesses are running cloud native on top of serverless functions. I would say it’s probably like on par with the adoption curve for something like Kubernetes, but, , don’t quote me on that. If we, if we sort of look at reinvent and like the companies that they’re putting on stage, look at where the vendor ecosystem is, as far as like observability for serverless, you have companies like Twistlock of course, like Datadog, like we have amazing serverless monitoring as well.

Like definitely. There’s an investment there on the industry side, as far as like putting more tooling in the hands of customers. And that definitely wouldn’t be there if there wasn’t this massive adoption curve, because there is a huge incentive, right. Ease of development and also like no [00:14:00] offs kind of strategies for managing these things.

Once you turn them loose, they just sort of go. And I think that’s , very attractive from a business perspective.

Ashish Rajan: And to your point, I think we had someone for, Funny . You mentioned observability as well. We have, we had Ran come in from Epsag on last weekend and we were talking about this very thing.

And the fact that monitoring serverless is pretty hard as well. And some of the I’m sure the challenges like monitoring at scale is a challenge in serverless because you don’t know how far the rabbit hole goes for certain servers or functions. I imagine this the same kind of thing with security as well, we spoke about building blocks , we spoke about how to break it, but it was worthwhile touching on how do people approach security in a serverless environment thoughts.

Andrew Krug: Yeah. So I think if there’s anything that we’ve learned as a, like a culture it’s that the normal practices that are part of AppSec security is part of serverless security, right? Because you just have, you have fewer and fewer facilities to draw on. So it’s more important to kind of lean into the things that we know are really, really , just generally kind of best practice and anything that’s good for observability for [00:15:00] ops is good for security.

Right? So when I say that I’m talking about things that aren’t really obvious to security teams like function, execution time, or resource consumption, combined with some sort of anomaly detection, right? Because if somebody manages, let’s say to exploit , your web service API and they can pivot into a Lamda.

That’s going to throw some kind of execution anomaly either. It’s going to be memory consumption. It’s going to be like in a tracing graph. , there’s going to be an impact there that you can detect. And I think those things are kind of what we need to lean into going forward. The other side of that is of course, the more obvious, which is the AppSec path, which is like SAS, like software composition, analysis, dynamic composition analysis also things like application logging, which nobody still gets.

Right. As an industry, there are teams who are very, very good at it, but just like kind of leaning into good logging strategies to know what the application is doing, because once it’s done executing, it’s gone like all your forensics, like anything that you could go back and look at, it’s [00:16:00] gone outside of your logs and your performance.

Ashish Rajan: That’s funny. That’s kind of the focus last weekend as well, because to your point, people haven’t figured out application logging appropriately. And would you agree that security logs or logging security is not enough for logging the app application from a security perspective?

Andrew Krug: I mean, certainly like logging is one of the things that we can do in a serverless environment that provides a ton of value, but we need other metrics about the environment in order to determine like, , whether something , is actually breached or we’re not breached. Right. Because let’s say that you get a full RCE and the running.

You could potentially unpack or a hot patch the actual execution and you could override some logging. So we sort of need to take a belt and suspenders approach. And again, like just leaning into that assumed breach mindset, doing some tabletops that are based on how important the services that you’re running, , that that’s part doesn’t go away just because you’re not building AMS and running on or, , the like

Ashish Rajan: interesting.

So I’m glad you kind of brought that up as well, because you definitely have a lot of [00:17:00] experience in the at least in sharing your experience with the whole incident response kind of thing as well, by the way, Chad came back saying, thanks, Andrew. I understand the environment on where these functions are being executed is a great point you made and something I’ll research more for sure.

Thanks. Thanks for that Chad. And I’m glad we could answer your question as well. Before I ask any more questions you have I think it’s your point about AppSec playing an important role into this as well and applying security? So there sounds like there are multiple components and we kind of have some duck from perspective of there is a CSP side, which is your IAM network and all that.

Then there is the AppSec side and I’m going to use the word DevSecOps here as well as use some DevSecOps over there if you want. But from a monitoring perspective, just to kind of respond to your tabletop thing. Can you elaborate on this? Like how I think obviously it would depend on the application that people are running because it’s the whole in a PAAS concept, how does like a tabletop incident response thing?

What do people have to think about from that person?

Andrew Krug: I think you have to start with a risk assessment, right. And then do some [00:18:00] scenarios and, , starting with a maximum impact, maximum risk scenario, right. And then work backwards from that. And your tabletop is going to quickly call out points that you don’t have data for necessarily in whatever platform you’re logging to.

So that could be network. For example, a lot of people don’t think about network because these functions are so easy to deploy and like GCP, Azure and AWS, all the functions deploy and sort of a default hybrid VPC environment that just lets things go to the internet. But if you want to be a little bit more intentional about that, you can deploy those things into a regular VPC and you can get to things like VPC flow logs, or like the equivalent of NetFlow data or potentially network taps.

And you can have this very rich source of data to help you go back and reconstruct a story. For an incident. And it’s just sort of taken that same mindset and applying it to every choke point as you work through your application to see if you have the right instrumentation, to be able to tell that story after an incident.

Ashish Rajan: Interesting. And I’m so glad you’re talking about this as well, because a lot of people would [00:19:00] be thinking from a perspective of great I’ve got the context of myself as application, and if I’m trying to break this, what are some of the attacks scenarios, or for some of the attack vectors you can think of for serverless function?

Andrew Krug: For me, these really fall into two categories. They fall into a runtime breaches and then they fall into actually application breaches that could pivot to runtime breaches and. We talked about this concept that there’s behind the scenes serverless. So like effectively processors. So you’re using them to like process messages off the queue or something.

And then there’s like web handlers. So you’re using these to build some API, some function that’s like present on the web style internet. So the RCE is easy, right? So we assume that like something in the app actually just allows a pivot into the environment itself. Like either SSRF , a plain old, regular RCE, DC realization, vulnerability, those are all just like regular security stuff.

That we do any red team scenario. The HTTP ones are a little bit more fun and nuance type thing, because we’re still figuring this out. Like , [00:20:00] every provider is implementing their own interface or interface. To serverless functions via some kind of gateway like API gateway is a great example and then, or does not build on the learnings of the last like 10 years or so.

So like a great example is API gateways support for content security policy? So we throw around CSP as a term. Usually it means cloud security provider, but in this case, I mean content security policy. So the headers that we send back API gateway, doesn’t just have robust support for that out of the box.

And a lot of people would say, oh, well for my Jason API, I probably really don’t need that because I’m not serving JavaScript and I’m not serving this and I’m not serving. But I’ve seen cases in the past where somebody has exploited web application and they’re able to control content in an error page, for example, and the error page renders HTML.

And because it doesn’t have the same standard for content security policy, you’re able to get a pivot from an error page onto the browser or get it to something that’s unintended. So there’s just lots of surface area. I think that we have to deal [00:21:00] with both from a tooling perspective and also an observability perspective to kind of assess monitor and keep metrics on the potential attack surface area, right?

Because that’s the other side of this is that it is very easy to make microservices. So if you’re in a company that is cloud native, always building on serverless, you might have a thousand microservices and just even assessing how many of those have content security policies on their error pages, for example, becomes a daunting task.

Ashish Rajan: Interesting way. No one talks about current security policies. . I’m so glad you mentioned because wait, cause , normally when I talk to say AWS or anyone else or Google cloud Azure, a common theme that seems to come out is, Hey, you can put a WAF in front of it and you go like, oh, put a CDN put a WAF that’ll be the end of your problems.

You don’t have to worry about this. Or what are your thoughts on that?

Andrew Krug: Yeah, I mean, WAF are great. And they certainly play a role in a defense in depth strategy. Right. And I think in general, as an industry, we have not leaned into content security policies as we should. So I used to work at Mozilla, right.

I [00:22:00] worked with April king for a while. Who was the creator of the Mozilla web observatory. If you’re familiar. Have you ever read about Mozilla? Just a great tool , for scanning for CSPs, but we didn’t, , just as an org and also for the internet and April has a great blog article.

I think that she redoes every year on the Alexa top 1 million sites and like how good we’re getting as an industry at adopting just general good web security practices, sort of that basic hygiene. Like these things , they’re only I, and this is my opinion and I, I think it’s April’s opinion as well.

And I’m sure if I’m wrong, , she’ll tweet at me after this, but I think the teams only adopt these practices if we make it really easy for them to do that. Right. So let’s say it’s a tooling problem, as much as it is a security problem. So the more we put good tools in front of a dev teams to be able to do this the higher adoption curve we’re going to have, and the better web security we’re going to have in general, as an industry API gateway, cloud, front, and WAF in combination, don’t make it super easy to do this, right.

So as far as I know right now, the [00:23:00] blueprint for it, if you want CSB heterosis to do API gateway, maybe have a WAF in front of that, or maybe have a WAF in front of clouds. And then CloudFront is running something like Lambda at edge to put those CSP headers in as content is served. So that’s a lot of moving parts to just maintain some like basic security

Ashish Rajan: hygiene.

Wow. Yeah. I’ve talked about defense in that to just calm, complex and complicated at the same time, I guess. Okay. So wait, is it any different than other cloud providers? I don’t know if you’ve worked with other cloud service providers on CSP.

Andrew Krug: So I, I have , sort of basically tested out kick the tires on all the things.

But it, it varies. I mean, and there’s different places that you can sort of varies CSP, heteros, , there’s a way to bury them in like the actual like page HTML itself. Like you don’t necessarily have to put them in the request response process, but. None of it, I would call out as like, wow, this was just so easy.

Ashish Rajan: Was easy. Right. So, and considering the example that you gave, where someone is able to use or explored an error page to take an advantage, I [00:24:00] think, and con if someone had said con security policy, right. That would solve the problem is that and I don’t want the whole episode of the episode to be about CSP, but for people who don’t know what CSP can help with, or , is it hard to implement?

Is that why people are to implementing it?

Andrew Krug: So I don’t think CSP is hard to implement. I think it’s not well-known. So I think a lot of traditional, like web sec education tends to focus on the actual application stack itself and not what we’re actually doing from a browser side, because for whatever reason, people have this high level of confidence in browser trust, but CSP is still the.

Tool that we have in the toolbox to ensure that the browser is doing what it’s supposed to do. In terms of like not allowing like sea surf et cetera for embedded untrusted JavaScript, for example. So I don’t know if anybody here is old enough to remember like my space and things, but CSP is basically what shut down.

Everyone is your friend on my space due to just malicious script injection in , different posts on somebody’s wall, on my space and I’m dating myself

Ashish Rajan: well, [00:25:00] Andrew, everyone’s going to, I want to date myself. My embarrassing stories of my space as well. So you’re not alone, my friend.

But I think it’s definitely valuable to talk about CSP from their perspective, because I don’t think when a lot of people talk about serverless, they seem to focus more on AppSec pages, a few point to the downtime. Then there is a core part of it. Then there is the. I guess the component for the infrastructure, which is in the cloud service provider, which is the, I am Netflix and all that.

And then the other components you kind of had touched on the trigger point. Cause it’s all like, I guess, how are we interacting with functions? And CSP seems to be , quite a important one in there while so we build on building blocks. He spoke about some of the common attack factors as well, which includes a trigger point for how your serverless function was triggered now from a maturity scale perspective.

Right? So to your point, I think Chad did a good thing by kind of summarizing it. And you did as well spoke about the fact of having a look at the holistic. Of how almost like understand the environment of where your function is running, but we live in a world where [00:26:00] unfortunately, the lot of functions in every environment, right?

It’s not just your point. Even the background functions, it’s like 20, 30, 50 hundreds of functions running in anyone’s. So how do we get their head around something like this, maybe if you talk about from a maturity perspective, how does one start today? Like maybe it’s just one Lambda function or one Azure function today and expand, is there like a scaling that we can kind of look toward benchmarking themselves, eh or a good way you can recommend for this, for monitoring and maybe responding to this?

Andrew Krug: Of course, like bringing in some kind of observability like Datadog as a great serverless monitoring products, just to kind of wrap everything under one umbrella where you can see it all in one place. Even if it’s not that projects like open tracing and things, or Amazon’s x-ray products allow you to sort of collect telemetry, but whatever you can do to kind of aggregate all that in one place and know as much as you can about it is a big, , first step in the right direction.

So if you have functions that are in isolation, in silos that you can’t see [00:27:00] telemetry for, like that’s priority, number one, priority. Number two, I would say is a little bit harder problem because it’s cultural, right? It’s not treating serverless like something that you don’t have to follow a process for.

And that is including it and things like , your regular architecture process and. Sort of the paved road, if we want to get into dev ops or dev sec ops concepts, making it easy for teams to adopt your known good best practices. And where that comes into play for me is things like if you’re, you’re in a regulated space, like GDPR, PCI, HIPAA, et cetera, , maybe it’s a basic change, like instead of allowing Lambdas to egress to the internet, by default, you put them in a VPC that you manage monitor and audit from a network perspective instead of just letting people attach it to anything

Ashish Rajan: interesting.

Oh, okay. And so to your point again, to what you said earlier, logging seems to be the key, so you can at least see what’s going on and metrics going to be key, but is that the same across all CSP schools? I think the reason I keep going back [00:28:00] to the whole CSP sing and feel free to kind of say, Hey, I mean, you probably focus on one or the other.

I don’t know how much of it you have dabbled into . Maybe you’ve dabbled only in only a couple, but a lot of people feel serverless was the answer for being, well, I don’t know. I don’t want to say cloud free, but almost like cloud agnostic, but it’s not the case though. Right? Because each cloud provider has their own type of implantation.

Andrew Krug: Yeah. I mean, it’s certainly a challenge. Like if you’re just putting on your software engineer hat for a minute right. And saying like, oh, I want to write a, a common business logic set of functions that are going to do some sort of processing. Right. There is a way to make that cloud agnostic, but it’s not that different from making it cloud agnostic for anything.

Right. Like varying functionality in libraries and like abstracting out the various cloud providers event formats. , those are always going to be the same. And it’s going to be the same if you’re even considering from a, a business perspective going between something like ECE two or BMS and Kubernetes and serverless interchangeably, you’re going to still do.

Library in whatever language you’re building with a common business [00:29:00] object layer. And you’re gonna want to test that independently of your serverless execution. Yep. You can get it whittled down to this point where the actual event that’s coming in, you control relatively rigidly and you can have a mapping sort of for each cloud provider environments, even though their event formats are relatively different between clouds.

Ashish Rajan: The kind of triggers you can have between say an AWS Lambda versus an Azure functional Google function. Would they be similar? Like they’re all.

Andrew Krug: The event paradigm is definitely similar for each one and certainly the design patterns that your engineering teams are going to use.

They’re going to be the same as any event driven architecture, whether you’re using like a consumer for a message queue , or something like that you were circle this event pattern, like this idea that you have a single event that has a finite amount of processing that it has to do.

It has to assume no state, and then exits those are going to be the same across all the big clouds.

Ashish Rajan: All right. , then it’s, whoever is listening to this and maybe working on different cloud providers, but the same principles should still be able to apply the kind of way we [00:30:00] started this conversation with what’s on the building blocks.

Those same building blocks can be applied across any cloud service provider provided a serverless function.

Andrew Krug: To some degree. Certainly like from a, just like a general like classical computer science perspective, also from an architecture perspective, that that event pattern is going to be the same, the observability metrics that you’re want to get going to want to gather from a security perspective are also going to be the same.

So even time memory consumption, what logs they’re outputting, all that it’s going to be really, really similar. Just the means that you’re going to use to exchange that telemetry is going to be different.

Ashish Rajan: Right. Right. And you almost so a question coming from our KB as well. Actually we, we just asked the question about implementation in a typical scenario.

So you have the question around, do we have any benchmarks you find for CSPs to start with for Lambda or for serverless functions?

Andrew Krug: So it gets a little bit complicated to when we start to talk about benchmarks, because this is like evolving, right. So I think that there’s compliance packs and AWS, for example, That if you apply them mill address, like specific sets of compliance [00:31:00] regs for a HIPAA or PCI, but I don’t think there’s anything universal.

Ashish Rajan: No, I think Jen, maybe what’s I calling out cloud security Alliance came over the white paper recently. You and I are part of a CNCF white paper as well, which you can feel free to leave soon for the world for serverless security, white paper. But I think once I, until that comes across, I also don’t know if any of the benchmarking cause I don’t know this, the CIS benchmark, but I think what are your thoughts on the whole attack matrix thing for, , how for serverless as well.

Cause a lot of people kind of normally lean there for building a threat model exercise for a serverless function to what may be kind of was going with this. We did the whole, yeah, let’s let’s do an asset inventory. We identify all the functions that we have now. That’s probably doing a threat model analysis.

And usually at that point, people lean over for something like an attack meter. And I don’t know if that’s a good one for serverless, unless, ,

Andrew Krug: so I like to start with the SAAS top 10. When I talk about serverless attacks, so along peer sec and Twistlock, I believe teamed up with OWASP and they [00:32:00] created a SAAS top 10, similar to the OWASP top 10 for serverless stuff.

And it just sort of a numerator, a bunch of popular. Potential attacks that apply to all the different CSPs. That’s a great place to start. It’s a great place to start if you’re doing threat modeling. I have a cat.

Ashish Rajan: This is Donna. I was going to just show Donna, did you like coffee as well?

Andrew Krug: Does but yeah, so this SAS list is, is a great place to start that’s provider independence. So it starts with its function data in injection. So like mangling the event itself to get something unintended broken on that nation.

So I’m overly permissive IAM or in a case of IP API gateway, for example, you might have a function that authorizes another function that has an authentication bypass that’s not unique to serverless, right? Like we see that.

This is the classic example. I like to give where you send the JWT with the algorithm is none, but that doesn’t get verified by the underlying library. Insecure, serverless deployment configuration, so potential pivots over overprivileged permissions inadequate monitoring and logging. We talked to [00:33:00] depth about that just a minute ago.

Secure dependencies, like a supply chain problem. This is not a problem that’s unique to serverless, but it still doesn’t apply. Insecure, secret storage definitely applies people constantly with Lambda functions and Azure functions and Google cloud functions. They put sensitive things in environment variables and , it’s not any different than Kubernetes, right?

Or easy to it just still applies here. Denial of service and financial resource exhaustion, which is that’s part of a threat model. I think a lot of people don’t think of very often.

They don’t use the flip side of that to justify that their security strategy either because serverless, believe it or not, it’s often sold as like, oh my gosh, you’re going to save so much money, but it can also be really expensive if you apply the technology in correctly.

Ashish Rajan: Yeah. Yeah. For people who have paid thousands of dollars and experienced a denial of whatever, definitely watch for that as well.

So, okay. So I’ll probably put a link for that in the show notes as well for people to kind of go back to as well then. So, okay. We have a model to go back to for a SAAS , top 10 for people to kind of go and find out [00:34:00] what they can do from an attack threat. So the threat modeling perspective, I’m curious about the skill set as well and the team cause do we need AppSec people in the team to be able to do security for solace and bearing in mind that all of us were by listening to this as well are doing, to, are trying to do some form of cloud security nor all of us are AppSec.

People are a lot of network security, big cloud security. And they’re like, oh, I can do like software composition analysis. That’s not bad. That’s kind of like patching so I can kind of grab on to that, but static code analysis and the field like DAAS SAAS and stuff, that’s like very AppSec.

So feels like does the skill set of the team for people who may have like a Cloud security team does doesn’t need to expand into AppSec as well? Or I’m curious to, when he talks about what kind of skill set should people look at in their team for doing serverless security .

Andrew Krug: Yeah. I think that there’s a lot to chew on there even until very recently.

I think if you had said like SAAS and DAAS are part of AppSec, I would have said like, oh, it’s probably not. Because those were like very computer. Science-y like very academic [00:35:00] concepts that we’re only seeing now, like sort of trickle into our industry. So like first and foremost, I would say to people, , if you’re uncomfortable with those things, don’t be afraid of them.

And just go and try and use the tools that are out there. Because the difference between you and somebody who says I’m an expert at is probably like this big, right? Because the technology has existed for so long or yet it’s such a short period of time. That if somebody does say, oh, I’m an expert at X, Y, Z, unless they publish a thesis on it, they’re probably not any more of an expert than you.

Really?

Ashish Rajan: No. Yeah. That’s yeah, that’s true. That’s a good, good point by the way. Cause you’re pointing the, the SAAS test and by the way, I’m curious to know your thought on this beat, but I definitely don’t recommend people start with SAAS first because it just, none of the number of false positives you get.

And then we tell people to go for SCA, that software composition analysis find the libraries and stuff first before they’re going to SAS. But I don’t know. Do you want me to commend people go SCA or do you recommend people go SAAS first years?

Andrew Krug: So there’s a whole bunch of factors there. I think that dictate the practicality of your strategy.

Like one again is the initial threat model or risk [00:36:00] assessment. The other is the language and the tooling support for that. So like for example, I’m a big fan of RTC some breath, for example, if

Ashish Rajan: you’ve ever yes.

Andrew Krug: Huge. It’s a, it provides so much value. As far as like analysis, , static analysis, pre production at the same time, I don’t think it’s a question of any more like whether we need SCA or task, like dynamic analysis, like in the runtime itself.

I think we really need both to be able to detect all the different sorts of attack factors that we’re going to see. And for me acceptance. So if you’re a fan of Kelly Shortridge and I referenced this a lot, Kelly Shortridge has this talk on the stages of InfoSec grief, like where you get to the acceptance stage, which really means that , you need both of those kind of detection, mechanisms and controls.

So you need detection in the runtime itself and you need static analysis.

Ashish Rajan: Interesting. And I’m glad you mentioned it. So from a skillset perspective, then it sounds like we definitely need a few apps that folks in the, in the team. To be able to kind of do a, at least a full-scale. serverless security and depending on how many Lambda functions you have [00:37:00] actually maybe do you feel like there’s a maturity scale where the initial step, if you don’t have an appsec person, they probably don’t have a budget for an appsec person.

At least the initial thing could be to get an asset inventory or get a viewpoint of all the cloud security component. Where do people start that?

Andrew Krug: So if my answer depends on the size of your company and the size and complexity of your organization, right? I think that , there’s still huge dividends paid in not just hiring that one app sec person, that’s going to come in and do your inventory, but hiring somebody, who’s going to do that.

And then kickstart a embedding or a security champions program to try and create an education campaign for your existing teams and build that into sort of a team set of OKR or goals for a quarter or whatever. So you’re not just creating , another silo. You’re sort of like farming.

Little bits of security as you have two lanes and observability to do that. And I think that’s where the, the big force multiplier is.

Ashish Rajan: And so that’s how you can scale as well at that point, do we need to build a security champions program for this as well? For some of the security?

Andrew Krug: So serverless security I still say , is just security.

In fact, , it’s [00:38:00] actually easier in a lot of ways, I think because , there’s a finite set of possibilities based on platform, right? So we can take the set of potential outcomes and we can really, really limit the threat model. And I think that’s what makes serverless compelling and interesting because you’re just.

Ashish Rajan: I think that was pretty white valuable, man, but we do have towards the end we do a fun section. So people get to know the other side of you as well, especially now, since they’re going to like next weekend being Thanksgiving weekend, I definitely wanted to people to know a bit more about the personal side of the coffee policy or that Andrew is as well, I guess.

So I’ve only got two questions, not too many. And I’ll probably start with the first one. Where do you spend most time on when you’re not working on cloud security or technology?

Andrew Krug: So I, I like to get outside. I have a very small ish alfalfa farm in this part of the state. I have horses and so I spend most of my time dragging sprinklers around and driving tractor.

Ashish Rajan: Tractors horses and in a farm that’s that we’re definitely keeping busy, man. I think I don’t know if I can do that with, I do ride horses. I’m assuming then if you have horses.

Andrew Krug: Yeah. Yeah. I do ride them. They were very expensive to have as a [00:39:00] pet if you’re not using them for some sort of activity.

Ashish Rajan: Oh, okay. I would probably keep that for parents who are listening to this going if the kids want a pony in the house. So it’s the expensive one to how I guess. Yeah.

Andrew Krug: I mean, my horses, when I was competing, they had massage therapists, acupuncture, dentists, a horse shoer, it’s like, oh yeah, the horses get better care than I do.

Oh yeah, it’s super important. I mean, if you think about the size of the relative muscle groups and things, , if you’re really doing a lot of , vigorous, physical activity, , , they get muscle knots, , just like everybody else.

Ashish Rajan: Wow. Okay. I speak we might take the, no, that’s kinda come to an interesting point, but we’ve got a couple of questions that came in.

So my, what I quickly answered those as we think back to the final, final two questions of the fun section. I’ve got a first switching with Chad when enforcing various security controls and features. Are there any common mistakes we watch out for that could drastically affect end user experience or performance?

Andrew Krug: So the answer would be everything depending on where you’re putting your instrumentation. Right? So like, x-rays are really great example where they [00:40:00] just have a library that you can pull in where you can actually say, just patch all the functions and instrument tracing on all of them. If you do that, you’re going to take a performance set.

Right? Same thing. If you’re building an aggressive logging or things that are locking, like if it, if it’s a like Python, for example, it’s a great example because it doesn’t have awesome parallelism. The more that you instrument, the, the more you’re going to have to think about concurrency and kind of fan out.

So again, just having that observability and tracing so that you can know when you put things into your stack, how much you have to scale really, really important.

Ashish Rajan: Interesting. So to your point then for having an effective serverless security controls it probably goes back to the, to your barn, but the basics with having a balance between, Hey, what control is really important versus without compromising and performance, and maybe working with the developers or whoever’s producing that for that,

Andrew Krug: I always tell people, start with a list of five things that you would never want to have happen.

Find a way to enforce that first and then start to do continuous improvement on that until you [00:41:00] reach a point of diminishing returns, then start to look at other things. Cause , there’s definitely, always some low hanging fruit, like the network and the, I am.

Ashish Rajan: Yep. She has a good point. Cause I don’t know how many people have called five that they consider cause, and depending on how many applications you’re looking at in a company, they could talk five could vary.

So there’s already a lot of BI-LO work right there. So hopefully that answered your question, chat, but feel free to answer. I mean, I guess ask a up question. Another one from RKV. Do you have any examples of serverless breaches or can we deep dive a little into.

Andrew Krug: So, I don’t know. And people always ask me this when I talk about serverless as if there’s a public serverless breach that I could deep dive on.

I don’t know any that are, have actually made news headlines. And that’s also kind of an interesting artifact here that as we talk about this, and a lot of people have sort of redone the same research that I did in 2017. As , security conference talks I haven’t seen a presentation yet that cited a famous breach or had a story of their own breach.

Ashish Rajan: Oh, maybe worthwhile calling out. Cause a lot of this security training that takes place that revolves around the [00:42:00] area of the, kind of, exactly what you mentioned with containers. Potentially a container running in the background, having something persistent our access to something persistent and be able to explore that, I think, but beyond that, there is no public reference of what else people have done.

I mean, I guess you can go into aspects connected to your serverless function and maybe they are breached or they are wonderful, but it’s sort of a gray area, but I also have not heard of a public one yet. But not in saying that it doesn’t mean that people may have just not have shared it.

Andrew Krug: I’m sure there’s some ancillary breaches that had serverless as a component, like a secret leak or, , an SRF that resulted in a credential leak or something.

We just didn’t see serverless front and center as kind of the, the star of the show there. Yeah. Like oftentimes like Kubernetes gets called out because of the admin interface and there’s like hoop secrets or something. We just haven’t really seen that on hacker news or Reddit.

Ashish Rajan: No, no, definitely already yet, but I think it be although do you feel like that’s far away or maybe to your point, we’ve already seen it, but because it’s north, given this, the star light on it, we just, it wasn’t even [00:43:00] given better paid attention, but some of those questions cause the trainings that I have done in the past for my team, where we went for at blackhat, they do serverless security training as well.

I think under, I did not know if they did last year, but a couple of years ago, a few years ago they did a serverless security training. And the exercises from what I understand from my team was they have a Lambda function, which is surrendering a webpage. And then you use that to go to say, one of the metadata, you, what else for AWS or Azure or whatever.

And you’re able to kind of like a, redo a recon and get more information about what the environment is and use that to go into other resources that you have access to. Like that’s so common one. I don’t know if you’ve seen any other.

Andrew Krug: Yeah. The, the other one that I think I would call out is the authentication bypass, because oftentimes people will use this authentication proxy pattern.

So they use a function in front of the function. So , we see that commonly in API gateway where there is like a potential authentication bypass just by doing header fussing or something along those lines. And then you can get to the subsequent protected function. Those tend to be [00:44:00] fun ones.

Ashish Rajan: Oh, I see.

That’s a good one as well. I’m going to make a note of that Nora for that one. Cool. All right. Hopefully we answered your question on KB. Thanks for that. Thanks for much for that and feel free to drop another one if you like, but going back to the fun section again. So we spoke about horses and massage therapist for horses.

It’s a really extreme tangent to go from serverless security to a massage therapist for a horse and daughters get back again. So what’s the so the second question that I have for you is what is something that you’re proud of, but it is not on your social media.

Andrew Krug: Oh, I mean, there’s, there’s a bunch of things.

I guess I could, I could cite, like, my farm is definitely like one of the things I don’t social media about. We already talked about that, but I also do saxophone performance,

Kind of in secret.

Ashish Rajan: Oh, wait, are you, is there even a thing? Like the bathroom saxophone?

Andrew Krug: Yeah. I mean , there’s lots of great spaces that are great to play saxophone, but

Ashish Rajan: so, okay. So you’ve been, you’ve been played well, I’m assuming that’s why your, it makes you happy.

Andrew Krug: Yeah. I need saxophone in college. So like my first big kid job ever when I graduated with my computer science degree was a CIS admin [00:45:00] for the college that I went to and they had a program where you could go basically, if you were an employee take classes for free.

So it was like, I want to get like saxophone lesson.

Ashish Rajan: Cool. All right, well, I’ll look forward to Dave and I can hear the facts of one person as well as the men. Cool. Last question. Where does your favorite cuisine or restaurant.

Andrew Krug: All my favorites, a cuisine, a restaurant. It’s definitely my house. I love to cook at home.

I’ve done every, like cooking masterclass on masterclass.com. I think like the Wolfgang puck ones, the Gordon Ramsey ones, , aside from that, there say there’s a Mexican restaurant down the road that I love. It’s called Komal bar and grill. And they just make the best like Guatemalan, Mexican fusion food that you’ve ever had.

It’s so good.

Ashish Rajan: Interesting. And so to your point, you had the whole farm to table experience because you have fun. I’m assuming you’re growing vegetables and stuff on your farm as well. You

Andrew Krug: haven’t had a one acre garden.

Ashish Rajan: Oh, wow. So, okay. Yeah, totally a farm. Well, no one, you don’t have to go to a restaurant, man.

So when you have a farm retail experience at your house or at your farm in this case, you definitely don’t need to do any of that. That’s, that’s definitely not required. I think for [00:46:00] people like us who are not in the farm, we have to go for pay , high dollar value for a farm today.

Andrew Krug: Yeah, it’s just for me, I do a lot of pressure canning.

Like I, I do the whole thing, like put up tomatoes, like vacuum pack, things like freeze, the 30 tomato plants this year generating like 200 pounds of tomatoes per week or something crazy like that. So

Ashish Rajan: what’d you do with all that tomato

Andrew Krug: freeze them or give them away, like make a boxes, give them to the neighborhood.

Ashish Rajan: I’m like, wow, that’s the Lord tomatoes for for you? Well, I don’t think you don’t, I can have it, but I think, but it’s so I, I definitely need to start to take some tips on farming from you then in that case, definitely need to get a massage therapist for myself, where it should have been a born as a dog, as a horse instead of a man, I feel like living out so many.

What was it again? Massage therapist, a horse shoe person, acupuncture, acupuncture, and a dentist. Oh, my God, like, so, so jealous of a horse right now, like I would definitely be a horse any day. Probably a hot space, I guess, but that, that was, that was the conclusion I can go into a different advertiser or someone else came in as well.

And then I think I definitely feel I can bring you back [00:47:00] again after your reinvent cause I definitely don’t talk about much in acquisition. And how do you do security at scale for after merger acquisition? So maybe it can bring both you and Nathan for that after you guys are done with the talk.

So to people who want to reach out to you for maybe have some follow-up questions, so security, where can they find you? Where do you hang around on social media?

Andrew Krug: Yeah. So I’m just Andrew crew K R U G it’s. It’s the last name, if you don’t know how to spell crew I’m on Twitter. And also you can find my email and phone number on my personal website, Andrew crew.com.

I’m also hiring at Datadog, like crazy for security advocate and security evangelists. If you’re interested in doing the kind of work that I do, presentations, blogs, prototypes security research, definitely reach out to me. I’m always happy to have a conversation, even if you don’t think your background is the right fit.

The requirements are suggestions, not our rules. So

Ashish Rajan: that to me, cool. Maybe once I’m talking, what are the security advantage vantage cloud security vendors to can be a good question. Would we be able to ask you as well? If they’re looking for a icebreaker to talk to you that could, that could be a good [00:48:00] one to start off as well.

But thanks so much for coming in, man. I really appreciate the time and I am looking forward to bringing me, hopefully bringing you and Nathan together for one of the couple of conversations, but for everyone else, thank you for joining us. Hopefully everyone enjoys the next thanks giving

Thanks everyone for coming in and we will see you soon. Thank you. Peace.

‍

Serverless Series

Episode Description

What We Discuss with Andrew Krug:

THANKS, Andrew Krug!

Resources from This Episode:

Claim your free spot in our upcoming Cloud & Kubernetes Security Training!

AI is already breaking the Silos Between AppSec & CloudSec

Ransomware, AI & "Minutes to Meltdown": A New Strategy for Resiliency

AI Agents for SOC: Hype Curve vs. Measurable ROI

CloudFormation vs. Terraform: An Engineer's Experience Migrating AWS IaC

Can You Build an AI SOC with Claude Code? The Reality vs. Hype

AI is already breaking the Silos Between AppSec & CloudSec

Ransomware, AI & "Minutes to Meltdown": A New Strategy for Resiliency

AI Agents for SOC: Hype Curve vs. Measurable ROI

CloudFormation vs. Terraform: An Engineer's Experience Migrating AWS IaC

Can You Build an AI SOC with Claude Code? The Reality vs. Hype

Incident Response of Kubernetes and how to Automate Containment

The Truth About AI in the SOC: From Alert Fatigue to Detection Engineering

Why Siloed Security Fails in the Cloud: A New Horizontal Approach

The Security Gaps in AWS Bedrock & Azure AI You Need to Know

The Evolution of Email Security: From Pre-Breach to Post-Breach Protection

How Agents Exfiltrate Data & How to Defend Them

Using AI to Fix Your Cloud Security Backlog beyond Visibility

Phishing Scores Will Be Gone in 18 Months: The Future of Social Engineering

Building a Resilient Security Architecture for the AI Era

Your SecOps Team Can't Save Your Cloud: A New Blueprint for Security

New Identity Blueprint for a Future with Cloud & AI

Why Engineers Ignore Security: Building Processes That Actually Work

AI for SOC Automation: A Blueprint for the New world of Incident Response

The Truth About Agentic AI in the SOC: Reality vs. Hype

Understanding a $10B Fraud Vector in Cloud-Native Workflows