Risk Analysis of Kubernetes Security

View Show Notes and Transcript

Episode Description

What We Discuss with Mark Manning:

  • What is Kubernetes & Kubernetes Security for you?
  • What are the common components of Kubernetes for Risk Analysis?
  • Where does one start as a newbee with Kubernetes Risk Anaysis?
  • What is an example of a good Kuberentes Architecture?
  • What’s an anti-pattern of Kubernetes deployment?
  • What are the low hanging fruits outside of Auth/AuthZ for Kubernetes Risk Analysis?
  • How do you get inventory of all the elements in a cluster?
  • How do you analyze a cluster RBAC and how can we ensure the cluster admins implementing it properly and securely?
  • Any tool other than kubectl, which can detect these security risks configurations?
  • Thoughts on Kubernetes CIS Benchmark?
  • Security for container in runtime?
  • Is sidecar monitoring the only way to monitor run time of docker containers?
  • Thoughts on Pod Security Policy?
  • How do you do Risk Analysis for Kubernetes with 1 Cluster per Business Unit?
  • Given a choice would you go CSP Managed or bare metal deployment of Kubernetes?
  • Is GKE more forward thinking for security than AWS EKS?
  • Isn’t rancher used to manage the Kubernetes cluster
  • What Risk Analysis method do you use and how to influence the culture in developers?
  • Drift Detection in Kubernetes Cluster?
  • Why would one pick Kubernetes if there is drift in cluster post deployments?
  • Kubernetes being used for infra orchestration in the cloud, like what they’re trying to do with cross plane. What are you? Security concerns that that happens?
  • And much more…

THANKS, Mark Manning!

If you enjoyed this session with Mark Manning, let him know by clicking on the link below and sending her a quick shout out at Twitter:

Click here to thank Mark Manning at Twitter!

Click here to let Ashish know about your number one takeaway from this episode!

And if you want us to answer your questions on one of our upcoming weekly Feedback Friday episodes, drop us a line at ashish@kaizenteq.com.

Resources from This Episode:

Ashish Rajan: Welcome Mark! How are you man?

Mark Manning: I’m good. How are you?

Ashish Rajan: Good. Thanks for coming on, man. Always good. To have more people who are in the Kubernetes space sharing and caring. So I’m so glad I have you here now. And hopefully you can help me not sound like an idiot, when I’m doing a Risk Analysis of a Kubernetes Cluster questions.

For people who don’t know Mark Manning can you tell us a bit about yourself

Mark Manning: yeah. So I’ve , self-identified as like a hacker, growing up. And I wanted to get involved in like the Def con scene in the U S and, and hanging out with a whole bunch of, you know, kids in hoodies and things. And that, and that was the thing that taught me that I wanted to do this in my career.

Once I’ve realized that there was a career. So I worked with. Companies like Intrepid as group and NCC group. And they taught me a lot about doing network security, penetration tests red team assessments, application security assessment source code reviews. So I’ve had, you know, 10 or 15 years doing that type of consulting work.

And then recently I moved over to being a security architect at snowflake where I’m trying to actually see if I can do [00:01:00] some of the stuff that I’ve been complaining about all this time, you know, like, Oh, it’s broken and you know, you need to go fix it now. It’s like, I’m the person that needs to go fix it.

Let me see if I can, I can sort that out,

Ashish Rajan: Nothing wrong with sorting our stuff for other people as, and especially when things you’ve learned, I was going to ask in terms of let’s just say Kubernetes security. What is Kuberentes , according to you and where does Kubernetes security?

Mark Manning: So like overall, like it’s really hard to , define the whole Kubernetes ecosystem. Like when I was working at NCC group, we had this group of people that was working on Kubernetes. What that meant was we worked on a little bit of Kubernetes stuff, which is the Kubernetes API. It’s orchestrates a bunch of containers.

It runs a bunch of services on this platform. Things we could do do forever, but really Kubernetes to me represents like this entire ecosystem of being able to set up load balancers, being able to set up a North South. Network controls, being able to do service mesh and all this kind of stuff just happens to be this, this platform is orchestration platform , is standard API that that you can create

yeah. To secure this [00:02:00] whole thing. Really the takeaway was like, you’ve got to secure all the different components and you got to understand where the components are. This is why like you can’t just secure Kubernetes. It’s Oh, you also have Istio, you also have this proxy. You also have this. You know, other things, that’s doing something else.

You’ve got to get a hold of all that stuff to really truly understand what it needs to do

Ashish Rajan: for, if anyone wants to listen to the depth of the platform. I think the CNCF has that, that image, which is like hundreds of applications. If you want to get, if you want to have a nightmare of what this, how big the space has become is like that, that screenshot from you, like.

I don’t know, hundreds of application and Kubernetes is one of them.

Mark Manning: They just added 19 more or something like that. All the cloud native applications, they just go through the CNCF and start saying like, Hey, help support me, help get some, some people behind me and like, wow, it’s huge.

Ashish Rajan: It is massive. And, and maybe one way to kind of talk about risk analysis and harder kind of take this would be What are some of the common components of capabilities that people should be aware of?

Mark Manning: Yeah. So it’s like at the base level, we talk about [00:03:00] communities as clusters, there’s a Kubernetes cluster. Usually you’ve got a bunch of things running in it. So there’s the cluster. There’s the idea of a node, which is just basically, you can think of it as a server or like one of the, one of the components of the cluster.

And then you start getting into what they call pods, which are just. A group of containers that are running in those things. And then on top of that, there’s all these different controls, but those are, those are the perimeters, you know, cluster, node, and pod and everything else is kind of Yammel abstractions from there.

Ashish Rajan: All right. And people talk about control, plane, API. What are those?

Mark Manning: Yeah, when you say that Kubernetes, it’s like a whole bunch of processes, right there. There’s a whole bunch of programs that are running in the background and they think of them as a control plane, the Kubernetes API, the networking components that need to go in from.

How to interface with the API. How do you port forward through different components on a node? For example, you know, you’ve got this main control plane, but you’ve got a bunch of nodes, maybe on EC2 instance to somewhere, somewhere out there and you want to be able to. Forward a port to one of those random nodes on someplace.

You don’t need to know where [00:04:00] it is, and that happens through the control plane. So there has to be all these proxy systems that are communicating from the control plane, down to the nodes and back and forth, and a bunch of observability stuff is happening in between there. So there’s this control plane that’s designed to be like, locked down.

Like you’re not supposed to do anything in here. There’s a bunch of privileged stuff like this is the main kind of kernel. If you will, above the operating system like this is, this is the guts of the Kubernetes platform.

Ashish Rajan: I think to your point, there’s so many moving parts to it as well because of that whole distributed system background as well.

As a newbee in Kubernetes, and he’s where do I, as you suggest for Kubernetes risk analysis. So keeping that question in mind, where does one start doing risk analysis? Like, what’s the kind of questions that I’m asking.

I’m trying to risk analysis.

Mark Manning: It’s like. The, worst case scenario that I’ve seen is when I was consulting for some of this stuff, it would be organizations that just had Kubernetes pop-up for themselves. Right. And they’re just like, I need to understand the risk of this thing. I need to it’s my job to secure it.

But I don’t know what Kubernetes is. I don’t know where it is. I don’t know who’s running this thing. So there’s some like basic primitives of like, where [00:05:00] are your clusters? It’s like, It sounds like a simple question, but is it hosted in the cloud? Is it hosted, you know, locally, all those things at different threat models.

And I think that that contributes to it. How big the clusters are. Like when you go in, like what just, what’s the attack vectors of this stuff? Like how many people can actually get into your Kubernetes cluster? You know, how many people can access the thing? Who’s running it Who who has access to control anything who has access to deploy different stuff into it.

So it’s like the, the who, how, when, and, or, you know, all the kind of questions, you know, how do you get a handle on, on, on the entire, like Kubernetes platform before you could start doing risk analysis on it?

Ashish Rajan: So it sounds like, the cluster is probably the core of the system as well.

When people are looking at the risk analysis of Kubernetes environment, they’re obviously finding out how many clusters would you have and is there a way you kind of start by assessing a risk of a cluster per say? Or is it more ? That’s kind of where you go, Hey, who can deploy it?

The authentication authorization and the kind of like, what

Mark Manning: oh yeah. I think it would start at the cluster level. Like most organizations have like a single cluster and I’ll say, I’ll say like, most in [00:06:00] like. Like there’s a primary production cluster, right? Like that’s, that’s usually like a model that’s pretty common and you go, okay, I’m going to focus all my attention on that.

So you think of a, like, okay, what access controls do I have? And then who has access to prod? Right. How do they access it? Because the question is, do I give my developers access to the console, like with kubectl so they can directly push things into the cluster directly from the laptops. That seems kind of scary.

Or, you know, some organizations they’ll do things like I’m going to grant SSH access to each of the pods that you deploy into these, you realize that that’s scary in itself. Like how do you manage any of that stuff? So the access control thing it gets heavy when you have like the scale of stuff, right?

Like when you go, like I’ve got 3000 developers and they’re all in this thing, you know, how do I, how do I tame them? How do I ensure that they’re only accessing the stuff they need to. So I think that that’s step one for me. Wow.

Ashish Rajan: So wait production with kubectl access to pods in a cluster, but the number could be like 3000 plus people accessing that.

Mark Manning: I’ve seen it all, man. Like I’ve seen like all kinds of, like, you would never really want [00:07:00] that realistically. Right. But, well, I gotta go debug this thing in, in, in production. So I need to have SSH access or, you know, the operators will be like, you know, if this thing crashes, I need to go. Figure out what’s going on.

So I need kubectl. So in my ideal world, like you never allowed direct access. There’s always like the break glass scenario or you go something terrible is happening. We got, we got to get in there and we got to do some stuff, but you know, Most of the time developers shouldn’t have access to your production environment.

Like that’s just a universal InfoSec truth, right?

Ashish Rajan: Yeah. Maybe let us peel it off in a different way. So I guess maybe we can start off with what say good deployment structure according to you. And then we can go into, what do you think is a good one to start off as at least? Yeah.

Anyone who’s just listening to this one. I want you to do a risk analysis that needs to have a picture in their mind for, or this is what a good one looks like. What a bad one looks like.

Mark Manning: Okay. The best one that I’ve ever seen they abstracted away access completely. So a developer needs to access needs to deploy, you know, like an internet service or something into Kubernetes.

They’ll push [00:08:00] something to like Github. And they’ll have some kind of layer on there that says, here’s how I want it to look inside of Kubernetes. They can use some kind of CI /CD system, right. But there’s, there’s, the developers never have direct access to push into Kubernetes. It’s just using CD using CI/CD and just deploying the application into, into the place that needs to go.

And then you’re relying on like support teams to help make sure that it’s actually running there, but the, those are the ideals.

Ashish Rajan: I were to unpack that example a bit more, if I have an Nginx application, it’s a CI/CD pipeline, that’s making the package. And when the package ready towards the end, it’ll make an API calls, the , Kubernetes, API, and or kubectl. If you want to use that, it’ll use that to deploy into the cluster. Instead of logging in. I want to do, it’s going to deploy the Nginx box. Yeah,

Mark Manning: exactly. I mean, yeah, we laugh, but like that happens all the time because it’s just so much easier. Ooh. Support for this. Right. You just go like configure it and say, what’s my, what’s my cluster.

And then when you need to grant access to stuff, you’re not doing it too. All your developers. You’re saying like, here’s my get [00:09:00] labs service. Here’s my repo. I’m going to grant that thing, access to my cluster. Instead of everybody in the whole organization

Ashish Rajan: mean we’ve already spoken about the bad scenario about direct, direct SSH access.

Is there other, other anti-patterns for like that risk analysis people can look for?

Mark Manning: From, from the authentication standpoint, like access controls, there’s, there’s a lot of like, There’s a lot of like extra ways you can get back into, like with Q cuddle, you can also do like port forwarding. So what I’ve seen is like they’ll have in like a little sidecar, which means like, you’ve got your Nginx instance, but then you’ll have like a debug pod, like sitting next to it at all times.

So sometimes you’ll see things like. It’ll run like an admin console that you can kind of like log into as a backdoor to the system, or maybe it’s running SSH and stuff. So there’s a lot of like sketchy ways that you can, you can kind of backdoor your system to, to, to grant access. So,

Ashish Rajan: so I don’t access it directly, but I access to the back door that I just left over.

Like, Hey, I’m not, I, I did what Mark said. Don’t log in directly. I became a back door. I think this kind of goes into the next step as well. So what are the similar security [00:10:00] defaults that people should be looking for? So clearly we know authentication authorization already. Are there other components as well that are like low hanging fruits?

Mark Manning: The, the biggest thing is like the runtime security as a whole, because the problem is like, You know, I, I do these talks and I beat up on Kubernetes and it’s really not Kubernetes. That’s that’s you know at fault for most of the security issues is just that we’ve chosen these insecure, you know, designs, or we’ve done something that we didn’t realize how security impacts.

So I think a lot of organizations will go into and just try to secure the runtime. And they do that with things like OPA gatekeepers, the thing that does like policy controls over things like, okay, I’m not gonna run privileged pods, which is if you run a privilege file, that basically means like I’ve disabled.

All of the security. For the container or I’m not going to run as the root user, meaning like you’re not going to use any permissions or, you know, all that kind of stuff. There’s, there’s also concentric constraints you can do like I don’t want to ever allow host mounts. Because if you can have a pod that mounts like the host operating system inside of it, you just [00:11:00] can take over the host operating system. I did this all the time.

Ashish Rajan: Like they break out of the container.

Mark Manning: Yeah. You don’t even need a break out of the container. , that’s the best part of it is like, if you’ve mounted the root. Of the host operating system. But if it’s Linux right from inside the container, you can just like chart that environment. And like, you now are the host, like, but you still, you’re not breaking out of the container.

Ashish Rajan: Yeah, that’s right. Kind of like, I guess we’re old school people, it’s a virtual machine, but you give access to a drive on the host. Yeah, well, you’re not on the hose, but you actually have access to the host.

Mark Manning: Exactly.

Ashish Rajan: Right. I see that kind of takes me to this question that Vineet has asked over here. How do you list, inventory of a cluster? Is there like an API command on Kubernetes?

Mark Manning: Yeah. I mean, one of the first things you ran is like Kubectl dump cluster. Or I think cluster info. Cluster info dump.

We’ll just extract everything. Like here’s all the Hamleys, all the things that it’s a process. And you’ll, you’ll see everything there, whether or not you can really understand what’s going on is the second thing, but great, decent API. Just say like ” kubectl get deployments”,” kubectl get pods”, get namespaces and all that stuff [00:12:00] will give you a pretty decent inventory , of what’s running in your, in your cluster.

Ashish Rajan: Right. So hopefully that answers the question Vineet, but kubernetes API is mature enough that it understands there will be requirements for these. But I think to your point, the challenge would be understanding the information that’s been , given in front

Mark Manning: of you to say that, like that’s a true inventory like that that’s collecting the information, like having an inventory, which would be like actually itemizing, you know what the intent of each one is like, that’s a harder one.

And there’s a lot of like third party tools, I think, it comes to. To do that. A lot of like cloud providers are attempting to make that as a service. Right. You go in like, here’s your inventory, everything that’s running.

Ashish Rajan: All right. Yeah. I think we answered this question. So it was pretty awesome. I think I was going to ask you then, so we kind of spoke about what an ideal deployment flow for Kubernetes could be, but when I think about scaling security, a lot of people have been taught by cloud to automate everything and have some kind of sensible defaults that they can go for, because then you can scale easily as security.

Are there things like that that we find out?

Mark Manning: Well, I think. More importantly, you need to have like a [00:13:00] runtime policy. And that’s, that’s kinda what I was trying to explain before, as like as an organization, you need to define exactly what you allow your developers to do, because Kubernetes allows you to do whatever you want, or it allows you to restrict which he wants to look at.

Developer’s going to say. I need to run this pod as root. I need to run as privileged and you need to have a system that can tell them, like, no, you shouldn’t do this because it’s actually going to hurt you. It’s not, it’s not about the system. It’s like, you shouldn’t do this because you’re gonna, you’re gonna kill yourself.

And also here’s the reason why, or here’s some feedback. And that’s why I really liked the, like some of the OPA stuff. Which is the open policy agent, which is just trying to, to, to provide like, not just like it’s, it’s kind of like a platform of you, you should shouldn’t do this. Don’t run as root donors, run as privilege.

Don’t, you know, whatever the organization wants put in a policy and then it kind of gives a developers guidelines that says, okay, you can’t do this. But then also like the feedback on why it didn’t get deployed was. Customizable. So you can say like, you know, here’s per this policy or whatever reason, like you can kind of give them something actionable about like what [00:14:00] they need to fix.

Ashish Rajan: So the more we’re going deeper into this, I’m really enjoying this conversation as well, because we kind of started off with the cluster. Like the most important part is to figure out what cluster you have and how many clusters you have. Then you kind of go into what’s a good cluster should look like, and what’s a bad one.

Looks like that was good as well. And you kind of spoke about some of the defaults, but. I think to your point, I don’t feel like a lot. And I think you started off this conversation with, by saying this as well. Sounds like a. Or a person who is trying to do some risk analysis on the Kubernetes cluster need to have at least some basic understanding of Kubernetes.

Not, it’s not just as simple as I’m going to go in and ask a bunch of questions because I’ve had 20 plus years of doing security risk analysis. I can use that information. I feel as well.

Mark Manning: It’s even more than that. Like I was that person that came in with, you know, that background in network security and thinking like, Oh yeah, I know everything about Kubernetes.

And then I went, Whoa, this is a different beast. So, yeah, I, I, a hundred percent agree that like you have to treat it with something else. You have to appreciate what it’s trying to do. You have to appreciate, like number one, there’s no like [00:15:00] IP addresses. Like if you throw that right out the window, all the net sec folks that I used to work with would always be like trying to end map my, my Kubernetes cluster.

And I’m like, this, this doesn’t make any sense. Like, everything is femoral. It lasts for 10 minutes to like an hour. I don’t care about their IP addresses. When you give me a pen test report, I need to understand like what the service is doing.

Ashish Rajan: Okay. Wait, tell me more. So there is no IP address.

Mark Manning: There are there’s totally IP address, but like, if you imagine, like when I was working as a security consultant, I would give like a pen test report, a classic like nmap, you know, might be like, yeah.

Okay. I found this host here’s this IP address at this host? They’re running an outdated Nginx. And then you turn to the customer and they go, Oh, here’s this IP address. And then you go look it up in your system. And we’re like, I that that’s an IP address that doesn’t exist anymore, or it’s not even relevant.

Like we burned down that cluster because Kubernetes really operates. So ephemerally that you can’t really assume that like the same IP address is going to be the same thing. You can’t even assume like the same host name a lot of the time. So you have to really understand like the Kubernetes identifiers and get deep into the service to provide any like actual.

[00:16:00] Perspective on that stuff.

Ashish Rajan: So, okay. So at least we clarified that part that you definitely need to have some respect of the platform and understand it a bit more. And it’s the ephermal nature of the Kubernetes platform that attract people because it’s structured so much of it that I feel.

Yeah, I think that that would definitely make a difference at that point. Is there a supply chain element in there? I think you touched on CI/CD earlier, is that supply chain element on how can that be exploited?

Mark Manning: Yeah, a hundred percent. Like I think I think all the stuff that normally goes into a Kubernetes cluster comes from, , your artifact store, or it comes from an image registry, right.

And these are all things that are hopefully built within your organization. And things you get lock down. So like, this is thinking about supply chain. You’ve got to think about the images that are getting deployed. Like where did they come from? Did they come from Docker hub? Because that’s the wrong answer.

Come from some external GCI repo, like, you know, it’s, it’s tough. And even if you’ve chosen like I’m going to have a private imagery repo for my org. Right. How do you actually verify. That the stuff that you’ve built is [00:17:00] actually trusted, you know, within your organization. How do you know that, like the, your build environment, didn’t just go out to Docker hub, pull down new Ubuntu and bake it in there.

Like, you know, we’re just kind of shifting it in a direction. So they’re, they’re doing a lot of work. I’m like with Harbor is a, is a container image registry. No, no Docker notary too, is coming out with like different ways of signing images. So you can kind of verify the integrity throughout the process.

But yeah, the, the whole supply chain going into Kubernetes is. It’s tough, but,

Ashish Rajan: I think that reminded me of I think, I can’t remember which a couple of years ago, exact same scenario. I think when we were working on ECS on AWS container services and someone is talking about containers and like, Oh, Oh yeah, we use our own private containers, but the source.

Was in Docker hub, download the Docker hub container out of budget stuff in it and move on to software. And then, Hey, it’s our container, but it happens every time. Oh my God.

Mark Manning: And there’s still so many organizations that are doing that. And it’s still like a step up from just straight from Docker hub so that you don’t have like the denial of service.

Issues now since, since Docker hub is like [00:18:00] rate limited everything now. So there’s, there’s a scenario like, oops, I’m going to get blocked by Docker hub from like pulling down the images. Now I can’t even deploy into my own production cluster like that.

Yeah. So now you have to look a little pleasant. I that’s.

Ashish Rajan: Magno has asked how do you analyze a cluster RBAC and how can we ensure the cluster admins implementing it properly and securely?

Mark Manning: I know Magno knows some pointy in questions here. Yeah. So like the RBAC stuff , is really interesting. One thing that we do like in the current company is we’ll do reviews , of a new application that goes into a Kubernetes cluster. Right. But we’ll do an assessment too, on, on any our RBAC controls, meaning like, what does the application need to have access to ?

Like, we want to be able to keep track of logs, but like, why does this Service account need to get in our RBAC profile to like connect to the Kubernetes API? Like, it doesn’t need to do that. Why does it need to access secrets? It doesn’t need to do that. I’ve used a Kubectl Krew . It’s is a plug-in system.

It’s like the PIP for, kubectl. If you’ve ever seen this stuff, packages, [00:19:00] you need, you go in like, well, some of the stuff, so there’s there’s RBAC view. There’s a, there’s a bunch of things just called Krew access controls that it dumps exactly who can access what. .

It’ll say like, who can access secrets? Who can in this namespace who can look at all this kind of stuff in this namespace. But I wrote a blog post on if you search for NCC group RBAC there’s a, there’s a blog post that I wrote a couple of years ago about the kind of the state of all the different tools that are out there.

And like for me, it was about visualizing it because you go, I’ve got 3000 devs or whatever, how many applications, and I really need to understand how each of them are able to access each other. So I think the kubectl crew is the way to go

Ashish Rajan: and I’ll leave the blog posts in the show notes as well. Hopefully that answers the question.

Magno, of course, any questions from Viplove. Do you suggest any tool other than kubectl, which can detect these security risks configurations?

Mark Manning: So there’s a lot of new tools that are kind of coming out to like auto scan. Like there’s a bunch of, there’s what I really like is there’s a bunch of tools that help you like exploit Kubernetes clusters now.

So they’re just almost like the [00:20:00] Metasploit . So like the kubesploit stuff and like there’s Peirates is made by a Jay Beale and InGuardians, like it’s designed to do like attacking and stuff like that. But to the question of like scanning and finding things, there’s tools out there, like kube-scan and, there’s another one that Rory McCune wrote.

That to compare it against CIS benchmarks and say like here’s some vulnerabilities, here’s some feedback. And the only thing I’ll warn you on that as somebody that’s just been doing this forever and find, trying to find those vulnerabilities is like, they can give you like perspective for sure. Like you should run them all, but , one, they might not understand the context of your cluster and, and two, they might give you like a false sense of like belief because like, Oh, I ran through the scan and like it says, it says everything’s good and it doesn’t have anything for Istio or Envoy or whatever your CNI is for, for network controls.

Like, you know, there’s just so many pieces. That , I don’t really want to be like, yeah, I can just run this tool and you’re all good. But the CIS tools are what everybody uses

Ashish Rajan: CIS benchmark out of curiosity. Cause I think in the, I come from a cloud world, I kind of [00:21:00] have a mixed opinion of CIS, which never updated after the first version.

We’ve never matured to the scale. The cloud kind of grew up to is that this kind of state that Kubernetes is in or CIS in Kubernetes, is pretty awesome.

Mark Manning: CIS and Kubernetes is probably the most awesome CIS. There is. Because he’s the only one. Yeah. So shout out to like Rory who manages CIS benchmarks that I know.

And like he’s done a lot of work to, to keep them up to date and keep them relevant and like, and they’re definitely on point. And like, they’re expanding on this kind of stuff and they even have flavors of CIS benchmarks now. So there’s a GKE. Benchmark aside from the Kubernetes benchmarks.

So you now can specifically not just be like, well, I’m running GKE. They have different expectations than, than my standard Kubernetes and things like that. So they’re good. Okay. So that’s the most charitable it can be, but we all know that like they have inherent weaknesses. Like if you rely on them alone to like dictate how you do things in your cluster that’s not what they’re designed for.

You know, they should be kind of feedback. Give you some [00:22:00] insight and then you got to go figure out the rest, right? You got to go figure out like, like it’s not going to tell you about Istio. It’s not going to tell you about Calico. It’s not gonna tell you about your eBPF after you’ve installed or, your sysdig Falco rules or it doesn’t understand any of that stuff.

And that’s the point it’s like, Kubernetes is this ecosystem of a whole bunch of other stuff. You’ve got to get a whole arm on everything best you can.

Ashish Rajan: So, so learn about Kubernetes first and then look at CIS benchmark and okay, great news to start. Great answers. Magno has another one. The problem with admission control is that.

It only prevents from the containers being deployed, but what happens after the containers running is runtime production. The only way

Mark Manning: again, Magno knows what’s up. Yeah. The there’s this thing like, so at control is just understand the question, like the mission controller model right now.

And this is kind of why pod security policies were given up on, or one of the reasons. Is the way that it works is you, you give it to me. Analyst says, I want an nginx service. It says, cool. Here’s here’s my structure, my deployment, all this kind of stuff. And then the admission controller, usually like you can have an admission controller of like OPA or a pod security policy, [00:23:00] mission control.

And they go in and they review your YAML and they say, this YAML looks legit. Like everything’s cool. Like, and it’s literally, the API is like, yes or no, it’s almost like a true or false has go like that. That’s the only feedback you get. And then it just released into, into the world. And there’s some subtle things where you go you know, that one of the features is allow privilege escalation in a Kubernetes pod

Ashish Rajan: I thought you just made it up,

Mark Manning: know it’s called allow privilege escalation and like any, any scary person go like, no, no. Don’t, don’t allow privilege escalation in my system, but it’s designed really means like sudo, right? Like it’s the saying? Like, do you want a normal user to become a different user or assume a different UID?

So there’s, you know, so it’s saying like, do you want to allow a sudo? And in that case, like if there’s a scenario where, okay, I’m not running as root. Right. Maybe I’m running as a different UID 1000 and that passes the admission controller and says, yeah, the mission controller says you’re running is non route.

Everything looks legit. I’m going to pass it on. And then once it gets deployed out, there you go. Okay. I’m going to, I’m going to shell in. And then I’m going to run sudo. It turned [00:24:00] into UID zero turn to route. And then like, and , I’ve now bypassed like the, the original intent of the policy right before that.

So all those controllers, they stop right at the beginning. They go, okay, check this out. Everything’s good. You know, wash my hands of it. And there’s nothing to go back and monitor, like, did anything change? Did anything get updated or modified? Is there anything that get got like manipulated with a patch?

Like have we seen any of that stuff? So that’s why like sysdig and a bunch of other organizations are trying to do like constant runtime security checks on top of Admission controllers.

Ashish Rajan: Oh, cause , in terms of a lot of people kind of go down with the sidecar model of monitoring boards and containers and stuff as well.

And I guess, is that the only way that or have you seen like a better way of doing , what’s a better way, if there is one,

Mark Manning: there’s a bunch of cool ways. Like the from just like an auditing perspective, like the eBPFs stuff, I really I’m really into like the. The sysdig Falco has kind of like two modes.

For example, you can do UPPF mode or you can do Linux kernel module mode, which are just been like, they’re both just at each node, they’re [00:25:00] monitoring stuff on your behalf. And they’re keeping track of all your containers as opposed to like needing to attach a sidecar to every single process and like, see what’s going on with all that kind of stuff.

I think that’s the more, more modern approach. .

Ashish Rajan: Interesting. And by the way, just to your comment earlier about pod security policy, I think we love had another comment. What are your thoughts on Pod security policy? I guess they work in some context or another, I guess it’s a short answer, but you can, you can expand again.

If you want to add anything bits in there,

Mark Manning: Pod security policies are dead. And they’ve been replaced with, I think like pod security guidelines. I’m sorry. I’m not up to date. I’m not on the latest. They just changed this recently. So the, the issue is there’s this new version where we’re going to go out and we’re going to say, okay, pod screen policies are gone.

And then there’s going to be a version where we don’t have any pots carry policies. And we don’t really have any replacements because there’ll be this alpha testing API. Right in about two versions after that we’ll have an kind of replacement to a pod security policy , that addresses a lot of the stuff that’s there.

My perspective is what I’ve been already telling customers for years. And when I was giving [00:26:00] consultations, you just like go straight to OPA because if you look at like AKS, Like Azure’s Kubernetes, they’ve given up on pod security policies and then they say you can’t even run any of them.

And they’ve done everything with OPA like built into it. So OPA seems to be the way forward. OPA it’s usually the combination is OPA plus OPA gateway or OPA gatekeeper plus this OPA constraints framework, which basically is a drop-in replacement for pod security policies. Does the exact same functionality.

So I’ve all been given up on them and, you know, I don’t want to give any recommendations one way or another.

Ashish Rajan: Yeah, I think I was going to say Magno agrees with you as well. PS being deprecated check our OPA, or I don’t even know how to say this.

Mark Manning: Kyverno. There’s also a K rail. There’s a couple of them that are just like replacements to to pod security policies.

Just, just give us like those guide rails of here’s exactly how you’re supposed to run stuff.

Ashish Rajan: There you go. So hopefully , that means risk analysis job is going to be easy as well as just go do one of these make sure the OPA policies are checked in, . So we kind of spoke about a few elements of doing a risk analysis of the [00:27:00] Kubernetes clusters, but as you’ve already pointed out 3000 devs, one cluster, but there could be scenarios where.

That’s one scenario, but there could be scenario where you can go down the whole cloud model and go one cluster per business unit, and however many teams that came in there. How does security kind of work in that scale? How would you do a risk analysis of that? And is there a way to secure that?

Like what’s the, what’s the world?.

Mark Manning: Yeah, no, the coolest organization I’ve seen do that was like. They have like a control plan for the Kubernetes clusters. they’re running a Kubernetes cluster that would build Kubernetes clusters for them. So it would be like this declarative thing because they believed that.

Every single business unit, like you’re saying should have their own kubernetes cluster, but even better than that, they were like, they should be able to tear it up and, and rebuild it. Like whenever they want it. So they made this like abstraction from the whole thing and then, and then said, like, go to it.

So, yeah. So getting a handle on, like, how do you audit that? How do you, what’s the risk of that thing? Like what are the guidelines that you provide in the end? It gives you so much more control if you do [00:28:00] that stuff, right? Cause you, you’re not saying. Here. Good luck. Go build your own random Kubernetes cluster.

You’re saying like, I’m going to curate the perfect Kubernetes cluster on your behalf, and I’m going to just hand it off to you, you know? So you get to choose all the runtime stuff that you want to do. Like at the organization level, you can just be like, Well, we, we don’t want anyone to run the privilege pods like the end, you know, and, and try to work it out that way.

So I think, I think having more control is really what everyone’s trying to get to like abstracting things away from users control is the end goal.

Ashish Rajan: Wow. So wait, so, cause I always think multiple Kube clusters means multiple Kubernetes, APIs. Means like the problem is much more scale at that point.

And at that point he kind of ventured into the territory of, would you rather go for a managed Kubernetes in one of those cloud service providers? Or would you go bare metal or I don’t know, cloud native or whatever, but what would you say your approach would be for if we were to go down that path?

Mark Manning: Well Bare metal is I haven’t seen an organization do bare [00:29:00] metal in a really long time. Because actually I take that back. There were, there were some big name companies that were doing bare metal because they had the resources, they had giant teams to support it. They had all, you know, all this stuff around it to build this system, but bare metal is all but gone because of that reason.

Like you need Kubernetes experts in geniuses. To be able to make the system correctly because there’s so many knobs that you can mess up that, you know it’s really easy to shoot yourself in the foot. So everything has been GKE. EKS AKS has been like, if you look at some of like the OpenShift stuff, some of like the things that like red hat is doing is,.

The only way in my mind to, to like sanely deploy Kubernetes today, like doing it from scratch is, is fun and dangerous and super hard but should never like end up in production.

Ashish Rajan: And th at the risk of the person who made it leaving the organization and basically leaving you hanging dry, I guess a hundred percent.

Oh, my God. I think I definitely find that to be a piece where it’s great if you [00:30:00] have resources, but then again, those resource may choose to move out of the company, then what do you do? Yeah.

Mark Manning: But, and also like just like the stuff that GKE is doing in particular, like. They’re really making like one of the most secure Kubernetes flavors out there by like adding in like binary authorization control, which like, lets you really like control the stuff that’s that’s that’s in, in allowed to get executed inside of your cluster.

Like all this stuff that you just do not have the capability to run. You’d have to write it from scratch or try to rely on some open source stuff. It’s just like, bam it’s there. Yes, it cost money, but bam

Ashish Rajan: it’s there. Oh, right. Oh, so you reckon GKE, he has a much more mature for compared to say AWS offerings or Azure offerings.

Do you feel that GKE is more security forward thinking?

Mark Manning: I think it is now like, like I think you know, Borg came out and it turned and, you know, it turned kind of into Kubernetes at Google, originally released Kubernetes and then. AWS kind of came in and like dunked on them and they, like, they came in and where we’re like, I’m going to take Kubernetes and I’m going to make EKS.

And at that time that was, [00:31:00] that was an amazing service. And like GKE just didn’t just didn’t match up. But now I think like the current state of it is. Like there’s so much more extra features that Google has bolted on to GKE, like as a support system. Like if you just look at the Kubernetes portion, like EKS, GKE, I brought the same, but some of the capabilities in my opinion, that that Google provides over over EKS is, is.

By far more secure, but I think, I think I’m kind of biased.

Ashish Rajan: No, no, nothing wrong with that, but at least it’s good to know the overall landscape as well. Like if it started weld with EKS and now GKE is taking on the race, you’re going to hope so considering they start the thing almost 15 years ago. So I could that they’re catching up.

I’ve got a couple of more comments than Viplovee is asking. Isn’t rancher used to manage the Kubernetes cluster? Rancher is another platform. But what are your thoughts?

Mark Manning: I thought Rancher they kind of like pivoted. I think they got, they got bought out by SUSE Linux or something like that.

But what I, what I’ve seen rancher for is like the K3s stuff. I don’t know if you’ve seen any of that. It’s like a [00:32:00] minimized version of Kubernetes. I’ve

Ashish Rajan: got mini Kube, Kube mini, or whatever it’s called.

Mark Manning: Yeah. Yeah. Yeah. Can only get is in the same way of mini Kube, but it also designed to do like, okay, I want to do maybe a small Kubernetes cluster on a bunch of raspberry pie.

You can use K3s because it’s got a lower resource footprint. Oh, maybe I want to run Kubernetes on like, Some like edge system or some IoT, like whatever it is, like, it’s just so much more low profile. So you can do the same level of orchestration. And that’s what I saw Ranchor was doing to, kinda to go in that direction.

The really cool pivot.

Ashish Rajan: Smart. Play from them, I got a question to be a good one from Nneka as well. So many good questions today. Apologies. If you have already covered this, what’s your opinion on threat modeling versus performing a risk analysis, especially if this analysis has been performed by the deputy, we haven’t covered this question, then it goes a great question.

Mark Manning: Yeah, that’s an awesome one. I mean, so like my take is, you know, at snowflake I do a lot of risk analysis as a security architect where it’s designed to be, I need to determine the risk of whatever we’re trying to do. And [00:33:00] then also like mitigation packages for it. Right. Like I go, okay, I want to do XYZ.

In Kubernetes, I need to understand the threats. I need to understand the attack vectors. I need to understand like the overall like impact and that kind of stuff. And then once we get into like the development stages and the engineering stages, that’s actually when we do more threat modeling and it’s like, it’s a subtle, it’s a solid difference between threat modeling and risk analysis kind of stuff.

But like risk analysis is usually like, Business impact factoring in , pretty heavy looking at like assets or trying to protect versus like threat modeling is super useful when you’ve already got the system there and you go, let’s just see like how successful we are at defending from these threats.

So, you know, I’m biased. Like I work in security, do both. Right? Like, but it depends on like what phase of maturity you’re in .

Ashish Rajan: Have you seen one done with devs on Kubernetes like Threat Model being run on Kubernetes by that? Have you seen that?

Mark Manning: Yeah. I mean, you have to factor it in like, and I think this is like maybe to the point of, it’s important to understand that the Kubernetes world and the dev ops world really [00:34:00] focus on a lot more control for developers and that’s like sympathize with that, but also.

We understand them as a risk, right? Like we just need to factor them in that doesn’t mean that they’re nefarious. But when we’re doing a risk analysis, we have to say like, you know, a developer going in SSH into, you know, your Kubernetes cluster is a risk, you know, doing with doing stuff from a laptop is a risk.

So I think you have to factor them. Yeah,

Ashish Rajan: appreciate that. And it’s a great, and I think she has got a follow on saying introducing the culture of our challenges. What methodology would you recommend for performing risk analysis for. Kubernetes, I think we’ve covered the initial part, but culture of running analysis in the SDLC is

Mark Manning: The thing that I’m into lately for risk analysis has been usually like a mixture of, of the Magoo system.

If you’ve, if you’ve seen this stuff, if you search for like Magoo risk analysis

Ashish Rajan: all right. Okay. I’ll put them in show notes as well.

Mark Manning: Yep. , , and he makes kind of like a, a short kind of concise version of how to do risk analysis stuff. And then the other one that that really is like more formalized is called engineering trustworthy systems is like the [00:35:00] Bible of doing risk analysis that, that I kind of have been following and believe in. So like, those are, those are how to do it. But culture part, like, man, that’s really tough, right? Like how to get people to, to accept this stuff. But like snowflake has been. Really good at this actually in reality.

Cause we, we do a, the product security team before I came there and started like a developer driven security program. That was designed to be like, look, we’ve got a security team, but they’re not doing any of the security. It’s all you guys. You’re the developers. And we helped to make sure that all of them understand how to threat model, how to define the threats and risks for each of their environments.

And they do that for each of the applications that go into Kubernetes clusters, right? Like. Analyze the risks of the stuff. And then we’ll do security reviews on them and threat model them. And like, it’s all, it’s all part of a program, but I don’t know how they got that culture, but they’ve had it there for awhile, but they’ve successfully convinced them all that, that culture is

important.

Ashish Rajan: Awesome. That’s a great answer as well. Hopefully that answered your question, Nneka, but feel free to answer if you have more question if you have. , one from [00:36:00] Magno is developed through me. What do you think about drift detection?

Do you think it’s a good approach to identify any suspicious cluster changes, which tools you recommend? ,

Mark Manning: I don’t do any drift detection stuff right now. I have not been successful at doing any of the drift detection stuff is what I’ll say. I’m, I’m totally supportive of it. And I haven’t done any assessments of any organizations that, that are using it right now.

So I’ll just say that that’s, that’s out of my pay grade.

Ashish Rajan: Would I be right at least to I guess peel that one more layer for people in a different direction. Would that be the change in the deployment config versus what’s in the cluster or would that be more, someone has click-ops it.

Mark Manning: Similar to the previous question, we’re just like understanding the current state versus the proposed state, right?

Like understanding, like you have the same problem with like, Terraform, right? Like here’s, here’s this deployment I go out and I’ve deployed it. Like, how do I actually know that it’s in the same state that I was, I was actually expecting it to be like, going back later and saying like, has anything been modified as, has it been, has it been updated and doing kind of like diffs on stuff?

Like you can analyze a Def between your you know, your code that you [00:37:00] deployed , here’s what I put into the system. And then what actually comes out is actually usually completely different because there’s all these things that get injected. Like maybe you’ve got an Istio sidecar that gets injected.

Maybe you’ve got things that get extra, like bolted on there. So it’s really hard to determine whether or not like it’s still the same that you expect.

Ashish Rajan: The reason I ask that is because the assumption people, when they walk into a Kubernetes world is that, Hey, you don’t have everything is obstructed. You just define your deployment package and. Kubernetes API will take care of , deploying the whole thing, and it would remain reliability.

That was number one, right? Like you almost go up all these other tools that if you’re kind of like shoved into it because of whatever it may be lacking in the initially and people found an easier way. I feel like we’ve kind of pulled Kubernetes at least to some extent into the other things that we used to do with cloud.

And on-premise where it’s like, there’s a need for different direction now. It feels like,

Mark Manning: yeah. I think Kubernetes provides an API, right? Like it’s, it’s all much better than the cloud. It’s so hard to, to, to really take an inventory of the cloud. And [00:38:00] when you’ve got a giant, giant environment, like, you know, everything and saying, it’s not drifting, Kubernetes has the opportunity of, of doing that.

, When I asked, I used to do this on every single, like, engagement that I did for Kubernetes, I would say like, why are you deploying Kubernetes? And some people would say things like, well, for performance reasons, or like scalability, like generally, like I haven’t seen anybody like substantiate any of those claims, but I have seen the best answer.

And this, this seems to be true is when you want to do multi-cloud stuff. And you want to, you don’t care. What cloud you want to deploy to Kubernetes provides a way of doing most of the stuff you’re doing in the cloud already, but like across any cloud is absolutely agnostic. It’s a single API. You do it in the same way, GKE versus EKS versus AKS, subtle differences, of course.

But like, I think that’s like the main selling point for Kubernetes and like, it’s an opportunity for us, but why it’s like. Much harder in the cloud. Like if you happen to port it over to communities, it’s much easier to do like drift detection in Kubernetes and it would be across your entire cloud or your [00:39:00] fleet of systems.

Ashish Rajan: Oh that’s a great point. So between GKE AKS and all those I guess the three major cloud providers, the Kubernetes built if I make it in say EKS. Moving across through GKE or AKS is not a lot of hard work

Mark Manning: it’s designed to be trivial. Yeah. And then it’s on the operators.

I manage like GKE EKS for the organization to make sure that like, the security kind of is in parity with each other. But, you know, in terms of an, of an API, it’s, it’s almost a, one-to-one, there’s some differences between identity across the different ones and all this kind of stuff. But when you look at like, Does this YAML work in this system versus this system?

Like yeah. You know, nine times out of 10, it will without much manipulation at all.

Ashish Rajan: Oh, right. I’ve got another question from Magno do you see Kubernetes being used for infra orchestration in the cloud, like what they’re trying to do with cross plane. What are you? Security concerns that that happens?

Mark Manning: Yeah. When they just recently were relatively recent last couple of years, they made like the cloud API for Kubernetes, which was like, I was so excited that we finally said, [00:40:00] okay you know, you look at where is the security boundary that you can really rely on in Kubernetes, it’s not at the pod level, that’s kinda mushy, there’s someone at the node or the node pool level.

Like they’re trying to do some stuff, but at the end of the day, like if I want it to be absolutely sure that there’s not going to be a hack. I would say it’s at the cluster level. And then they started doing things where like, well, now we’re going to allow inter clusters, cluster communication. And like, we’re going to have multi cluster Istio deployments and all this kind of stuff.

So I think there is, there’s just a tool that I just got released about trying to pull out all the container stuff. For Kubernetes and just make it like an orchestration platform as a whole, like, it could be for cloud components, it could be for running IoT boxes. It could be for whatever. It’s just like, it’s just the orchestration part of Kubernetes and then like throwing out the containers.

And I think that that might be where like the, the future of a lot of stuff goes is like just taking out the scalability of Kubernetes and that control API that they’ve done really well and seeing where they can take

Ashish Rajan: We’ve kind of spoken about a few things for risk analysis and quite a few [00:41:00] changes that have happened. But one thing that are taken away from all of this conversation is that if you’re thinking you’re doing a risk analysis of the Kubernetes cluster or deployment, you’re going to need to get your hands dirty and at least understand the terminology and what, what we get involved in it.

So keeping that in mind for people who may have listened in and gone. Man. There’s a lot of stuff to learn. Where do you start, I guess, with learning and what’s your way that worked for you. And maybe some people can take inspiration from that as well to learn Kubernetes.

Mark Manning: Yeah. I I’ve been going into like the hands-on stuff from the beginning of still like mini kube.

If you can set up like a VM that has Kubernetes, a lot of people are doing K3s I mentioned for. If you’ve got an old raspberry PI’s hanging around and you can make a cluster for Kubernetes pretty, pretty easily. It’s easy to set up. They even have like the K3 sub can, like, it’s a catch-up, it’ll actually deploy automatically the raspberry PI pies, all in K3s and like all this kind of stuff.

So it’s actually, it’s really, really easy , to, to play around with this stuff. I would recommend that. There’s the the kind tool, which is just like mini Kube, but it [00:42:00] does like, it’s like Kubernetes in Docker. So it’s actually like inside of a Docker container and does 90% of the stuff without having a virtual machine.

So super nice for, for that kind of thing. And then, so like, For me, it would be, I want to make my own unnecessary cluster. And then that’s what I, what I kept doing. Like, I don’t need to run he’s at my house, but I do it, you know, just, just for, for reasons to, to mess around with it and yeah, really appreciate like trying to do things in it.

So I would go, I want to implement OPA with the constraints framework. I want to implement Istio I want to implant some CNI and you can do mostly. So this stuff inside mini Kubes, and they’ll give you a feel for. Like the primitives of how everything was working together, in my opinion.

Ashish Rajan: Wow. So all the 50 people who are watching across the stream right now that that’s the answer for learning to start building it, start building it.

That’s pretty much it. All right. This is the last segment of the podcast, and this is 3 FUN questions. None, none of them technical, but hopefully not. And I just joked just to get to know you a bit more, so just want to quickly go through them as well. [00:43:00] What do you spend most time on when you’re not working on Kubernetes or technology?

Mark Manning: Man it’s for me it’s been a bunch of zero trust stuff. Like it’s, it’s still, it’s still technology, like, okay. There’s like the family things that I do around the house and that kind of thing. But I have these two topics that I’m really focused on, which is Kubernetes. And the other thing is like understanding and appreciating.

Zero trust and zero trust, authentication stuff. So that’s my, that’s my other topic that I’m working hard on. Oh,

Ashish Rajan: it sounds like I might have to bring you on for zero trust as well, but I’m going to leave that for future as next question. What is something that you’re proud? Oprah is not on your social media?

Mark Manning: Yeah, you know, I don’t talk too much about like some of this stuff for like, besides Rochester is a local hacker conference has been going on for more than a decade. Stuff that like we built a hackerspace like 10 years ago, like I make references to it, but those are the things that I’m, I’m kind of proud of is that they’ve they’ve, there’s been this like community in my, in my, my little city.

Of hackers. And I like to see these generation of hackers coming out of the city, going on to do really awesome stuff. So I’m really proud of that.

Wow. [00:44:00] Well, is that cause that’s the I guess for reference, that’s a Talk that’s on YouTube from you as well, right? The Rochester. Yeah.

BSlides Rocheter. Yeah.

So yeah. It’s been going on for awhile. Yeah, for sure.

Ashish Rajan: Wow. There you go. I’m glad you, yeah, they are creating community for hacking for next generation as well. Last question? What’s your favorite cuisine or restaurant that you can share?

Mark Manning: Oh, man, what’s a restaurant. And since COVID

Ashish Rajan: Oh yeah. Like w what are those things that I never knew that they existed

Mark Manning: now? Some of my favorite that I always asked my mother to make me, like on my birthday is is like her hand made lasagna.

So that’s, that’s my go-to. I can’t go run like 10 miles, like work it off, but like, that’s, that’s still worth it.

Ashish Rajan: Great timing considering it’s mother’s day mother’s day as well. So everyone said, Oh, I’d like, Did he just slip it in. So if you’re listening to this, make the

Mark Manning: lasagna, I’m going to ask you guilty to like what a podcast is first and then she’ll yeah.

Ashish Rajan: Oh yeah. Fair enough. Like, why do you spend so much time on audio? Like radio slide? What about television is a video podcast. Optional. Yeah. Would you please? Oh, so that’s pretty, this is pretty good. Awesome, man. I think I’m going to say thank [00:45:00] you, but I just wanted to thank you. Thank you to you as well.

I just want to say. Thank you, Mark. I had a great time. I learned so much and clearly everyone over here learned so much as well. Where can people find you to basically, how do you for questions afterwards?

Mark Manning: Find me on me on Twitter as a anti tree. I’m happy to take DMS about all kinds of crazy stuff you got going on. So feel

Ashish Rajan: free to be raspberry PI. Cuban and Eve everything. Zero trust. Let’s do it. If you guys were looking for social engineering, that’s the way to go. This is zero trust. Actually the social engineering would not have happened if there was a zero trust model, just food for thought, right?

That’s all right, but thanks so much. And thank you everyone else who joined us today and I will see you all next week, but if you haven’t followed already.

Please follow Cloud Security Podcast, any podcast on it’s social media. That’s where we go live every weekend and yeah, otherwise we’ll see you next week and I’ll see you soon, Mark. Thanks. Cool. Thank you, man. Appreciate it.