What is a Security Data Lake?

View Show Notes and Transcript

Episode Description

What We Discuss with Omer Singer:

  • 00:00 Intro
  • 04:59 What is Cloud Native?
  • 06:43 What is Security Data Lake?
  • 16:18 Do we need Security Data Lake?
  • 19:35 Where do SOAR + Playbook fit in?
  • 22:30 Leveraging existing data analytics
  • 25:21 Whats driving adoption of data analytics in Security?
  • 28:11 How to get immediate value?
  • 33:19 Trading Usability and Effectiveness in terms of reduction over attack surface?
  • 38:11 Security Data Warehouse?
  • 40:18 SIEM vs Security Data Lake
  • 43:02 How do you measure Value of Security Data Lake?
  • 45:27 How do you govern Security Data Lake?
  • 46:37 AWS, Azure or GCP to create Security Data Lake?
  • 47:47 Where do you start to automate your workload?
  • 49:36 Future roles and where to start learning about this?
  • 53:40 The Fun Section
  • And much more…

THANKS, Omer Singer!

If you enjoyed this session with Omer Singer, let him know by clicking on the link below and sending him a quick shout out at Twitter:

Click here to thank Omer Singer at Twitter!

Click here to let Ashish know about your number one takeaway from this episode!

And if you want us to answer your questions on one of our upcoming weekly Feedback Friday episodes, drop us a line at ashish@kaizenteq.com.

Resources from This Episode:

  • Tools & services, discussed during the Interview

Ashish Rajan: welcome to another episode of Virtual Coffee with Ashish and I am your host Ashish. We’re talking about data and we’re talking about security data lake. So today’s going to be an interesting topic because I think I’ve had so many questions about. I can’t wait to share some more information, but before I do to bring my guest in and I will say he has picked a really banger of a song.

That’s the hint for the song. Hey, Welcome Omer .

Omer Singer: I’m great. I’m great. Thanks for that.

Ashish Rajan: No, thanks for coming in, man. I think it’s always fun to have other security professionals coming in and sharing some of the knowledge, especially in the newer areas of cyber security as well, so for people who may not know who Omer is can you tell us a bit about yourself and how you got to where you are today?

Omer Singer: Yeah, sure. So Omer singer computer engineer by trade. I live in Santa Monica, California, and I’ve been in cybersecurity for over 15 years now. So I got my start when I was at USC in LA back in 2005, they were kind of early to the cyber world. And as part of the computer engineering program, they let us specialize and there was a specialty in cybersecurity, [00:01:00] which I thought was cool.

Cause I figured I could be building things, but it’s can be a lot more fun to be breaking things. So that’s why I gave it a shot and got into cybersecurity then and never left. And I’ve been on both sides. I’ve been on the defensive side and security operations center roles then on the offensive side and cyber intelligence gathering.

Roles, which was interesting as well, by the way, fun fact you had that ad for earlier. And I actually served with the co-founder and CEO of Axonius . So he and I did some interesting things together back in the day. So we have kind of a small, it’s a pretty small community here in cybersecurity and we get to know each other over the years.

And, and yeah, for the last four years I’ve been with with snowflake initially leading security and. Building up that team, building the security program on top of snowflake as a security data lake. And that’s the topic that we can chat about today for the last year or so. I’ve had this this title, a head of cybersecurity strategy that you and I were just talking before we started here about how that’s kind of a title that I invented, but I’m having a lot of fun doing it.

Ashish Rajan: Yeah. That’s pretty awesome, man. I think we definitely need to peel off the layers of the [00:02:00] strategy title but toyour point about the secure data lake. And this is the Cloud Native Month in Cloud Security podcast this month. So what does cloud native mean for you?

Omer Singer: So cloud native is the idea that. With more and more of our security happening in the cloud.

We have an opportunity to kind of rethink our approach. Cloud native is taking a fresh approach with things that are unique to cloud in order to secure cloud infrastructure. And, and I’ll give you an example. So I mentioned that I started back in 2005. I was doing vulnerability assessments is kind of my first gig in cybersecurity and I’d come with the laptop I had.

Nessus installed on there and we kind of get the plug into the network and we start scanning, right? And then we come out with a list of vulnerabilities and we’d help the customer to kind of patch those issues. Now you fast forward today, and there’s still a lot of, kind of scanning vulnerability scanning going on, even in cloud networks.

And if you’re install tenable or Nessus on an EC2 server and scan the VPC, that’s not clear. Right. And cloud native is taking an approach like side scanning, where you have some exciting startups in [00:03:00] cybersecurity doing side scanning techniques. They’re looking at the snapshots that the IaaS provider takes and using those to understand what the vulnerabilities are.

Now, that’s something you could only do in the cloud. You cannot do that outside of the cloud, but it has all sorts of advantages. That’s cloud native.

Ashish Rajan: That’s awesome answer by the way, the company that you’re hinting towards there. I mean I think last month to talk about how besides scanning everything.

Yeah. So those guys came in as well. Awesome. And I think this is probably a good segway into our topic as well then. So chip likes to do security in cloud. And so I’m going to start with the very obvious thing then. What is a secure data lake? Let’s start with that.

Omer Singer: Sure. So security data lake is the idea where you’re going to use a general purpose data plot.

To do security use cases like threat detection, incident response, evaluating security posture. You might call that security analytics and it’s as opposed to doing, do using a dedicated, separate stack for security and security data lakes have been around for a while. You had security data lakes Back, I would say in like 20 [00:04:00] 13, 20 14 on Hadoop, which was exciting, but never lived up to its promise because of some of the short comings of of Hadoop at the time.

But like the idea is that you’re taking general purpose, big data technology, and you’re applying it to do whatever you need to do. And.

Ashish Rajan: And I think you said something really interesting. When you say using a generic data lake platform? Cause I think when I talk about data lake to a lot of people, they always think S3 bucket or something else. So we’re not talking about S3 bucket over here and well, we might

Omer Singer: be, we might be, I mean, it, this is where I think we have both a big problem and a big opportunity.

As, as an industry cybersecurity, we like to think that we’re hot shit. That we’re really technical. We’re very technically savvy actually, as an industry, we’ve fallen way behind a lot of other industries when it comes to data analytics, making use of data. If you look at marketing and finance, they’re actually doing a lot more interesting things with BI ML, predictive analytics, all sorts of really exciting things.

With data analytics and data science and, and in [00:05:00] cybersecurity, , we really haven’t and it’s been holding us back. I think the opportunity is aligning with what the rest of the enterprise is doing in order to get value from data. And so look, if your company where you’re at is using an S3 based data. In order to understand, for example, what ads to serve up or kind of what the financial picture is going to look like for next year and the cybersecurity you, as in cybersecurity, want to align to that, then do it like basically joined the company where it’s doing its data analytics.

Yeah. For a snowflake customer or for a security team that works at a company that’s a snowflake customer that might be getting on snowflake. If you work at a company where big query is the central data platform and you get on big query, it’s really, the important thing is cybersecurity aligning to the rest of the company where they’re doing interesting things with, with data.

And, and just last thing on S3. Like this is not about. Archiving, right. That’s, that’s a big misconception where people were thinking about security data, like as a place to just archive a bunch of data, dump it. And Hey, if I have an incident and I’ll figure out a way to [00:06:00] restore it, like that’s really leaving a lot of value on the table and probably introducing a lot of risks.

There’s really a lot of very interesting analytics that you can be doing as a security team.

Ashish Rajan: I think that is true because a lot of the SPP has a store, an archive, so you can save costs 200 as well.

Omer Singer: Good point. Yeah. So the data, right? Not just the storage problem, we need to have an analytics.

Ashish Rajan: Yeah. And I think to your point not since you mentioned it like that, cause the S3 bucket is just this literally like a storage it’s like, yeah.

I mean, there’s not nothing running on top of it. It just, this is your data. Just do whatever you want with it, but you have to kind of add that layer on top. I think it’s not just a matter of, Hey, I’ve stored everything and me helped me get all the information that I need to. Cause I think you need to done some kind of.

I think Athena or something on top of it anyway, I’m wearing it wrong, but I should, but I think where I’m going have to go with this was it’s really interesting. You mentioned that apart, a S3 bucket then. So are we using the data to perform analytics? Are we behind in the times, but I thought it seems of the world are maybe if you talk about what theme is first and third themes, one more time.

Omer Singer: Let’s define, let’s define that. I [00:07:00] know, we kind of even frame this topic as like, what is the future of SIM given the security data lake? Right. And I think SIM as a category, it’s really going to be challenged because if you think about a secure SIM it’s security information and event management, the idea that I have a lot of alerts coming from all sorts of different sensors, I also have a lot of raw data forensic data that’s coming off of my end points that like EDR is generating.

My I, as environment is also recording so many facts about things that are changing, things that are happening, connections being established. And as a security team, I need to kind of have a place where I can go and either use that to have accurate threat detections, or be in a position to investigate something, respond to an incident and mitigate the threat before some serious damage is done.

Right. And, and generally. For most companies, that’s the sin, right? That’s kind of like the nerve center of the SOC. That’s where all the data is going. That’s where the alerts are ultimately coming from. The problem is that the Sims of today are built on technology. That’s just not scaling [00:08:00] and there’s so much security data out there.

Now that it’s a full-time job. Just being able to have all this data in one place where you can ask questions of that data. That’s, it’s just become really hard. And so when you talk to people who actually use a SIM, you hear, well, first of all, it might be the biggest line item in the security budget. So a ton of spend, and then in terms of the visibility, we’re saying, Hey, it’s the place where all our data goes.

Then in actuality, it’s getting maybe third of the data, some cases, a quarter of the data and much of the data can’t be pulled into this. Because these solutions are just so expensive on what they can collect and how long they can keep it. And that’s putting us in a really bad place from an analytics perspective because table stakes for good analytics is having all your data in one place where you can actually use it.

And as long as the SIM is not able to support that it’s not going to be successful. And that’s, that’s why I think this is a really important topic to be, to be talking about and to kind of get informed them on some of the exciting changes that are happening.

Ashish Rajan: So you’re saying that I kind of like what you mentioned that because , the term that I think I’ve read that in one of your blogs, the whole vertically [00:09:00] integrated seam.

which is not going to work. So I think when you, so when you say vertical integration, because everyone’s trying to come into that one point, would that be different in a security data lake? Would that not be working. It

Omer Singer: would like if you look at the SIM solutions and , we can call out who some of the big players are.

Splunk’s the biggest SIM player by far in terms of market share you have elastic having their own SIM and Sumo logic, having their own SIM, what all those have in common. All those technologies have in common is that they are log management solutions. They have their search engines, and they’re really excited.

And what they’re meant to solve is mainly kind of these things of have ops use cases where you need to very quickly be able to ask questions of the data, observability log management type of use cases. And they do a good job of that. But if we need to get full visibility on all the data and really be able to ask questions across petabytes of data, going back a year or more, that’s a big data challenge.

And big data is a hard problem. And a company that tries to solve. The big data analytics problem and the security specific challenges around [00:10:00] all the integrations around automatically identifying TTPs across all the different parts of the infrastructure CHRO and having an easy interface for analysts to use.

Like it’s just too much. And when you try to do too much, you end up not doing anything particularly well. And so the model now we see emerging is the data problem you put on the data. And, and that gets solved there. And, and you have this new category of tools coming come out called XDR for extended detection and response.

And what the open X, the, our players are saying is, look, you put the data wherever you want to put the data, have the data in a place where you can cost-effectively centralize it, aggregate it. And we’re going to be laser focused on pulling the data in. Normalizing it enriching it, using it to do accurate threat detection and supporting efficient incident response.

And that kind of separation of duties that everybody that kind of best of breed, everybody focused on what they do best that’s. The model that is not vertically integrated ends up being much more effective. And, and that we see our customers moving.

Ashish Rajan: That’s awesome. And I think definitely interesting Doby because a few people that have kind [00:11:00] of, I’ve got a question I want to switch over to that for a second.

So what a question from our M.S. Over here, maybe it’s just me, but I’m curious to know, why do we need another data lake for security? Why not security folks build their own AI ML based solutions related to security scans?

Omer Singer: It’s a great question. I see what you’re saying. You hit on it. Exactly. Thank you for calling this out. I’m not proposing that security has a separate data lake, a separate data stack. In fact, what I’m saying is security joined the rest of the country. In the data stack that is serving all the other departments.

We have this really kind of silly situation where the entire enterprise is running on a, some data platform and security has its separate data stack. And then it’s really struggling to collect the data and to have interesting insights and to be data-driven. Security is really struggling with that. And so we should not try to build a stack.

I mean, that’s exactly what we’re trying to say here. And the security data lake kind of nomenclature is not ideal. I kind of struggle with that as well. And if anybody has a better suggestion for what we should call this architecture, I’m listening and it’s actually, we are trying to kind of at snowflake have, have more content on this and we kind of just aligned to the [00:12:00] terminology that was out there.

Kind of know what the hell we’re talking about when we say user your standard general purpose data platform for security use cases, as well as everything else. But but yeah, it’s definitely not separate. And, and in fact, if, if the cybersecurity teams were more kind of with it, when it comes to big data, I don’t think we would need one.

Right? You don’t have like finance data lake marketing data. Like, no, it’s just the data platform. Whether it’s a data lake data warehouse, doesn’t matter. Just call it data platform. Like they know they just use the central one. And and I think over time, yeah, maybe we’ll just call it security analytics and everybody will know what we’re talking.

Ashish Rajan: Cause I I was thinking as well, because to your point, it does make sense that you don’t want to add on to the existing thing.

It’s just more reuse what your company is already doing. with data analytics just tap into that because they’re already everywhere

Omer Singer: , and it’s hard to get to a single source of truth. Like let’s not underestimate the challenge there. That’s a really important thing to do.

It’s what enables then everything else. And if you think about your security program, you probably today [00:13:00] you have some forensic data out with the EDR vendor. You have. SAAS logs out with a SAAS vendor. You have maybe a multi-cloud environment and each cloud has its own logs. And then you have maybe some corporate data going into the SIM, look at all those silos.

How are you supposed to automate detection across that? Like our analyst supposed to be piecing that together in their heads, like a force that’s not gonna work. And then each one has its own retention period. And now you have some of the data for some of the time. Anyways. I think this is really an important job, especially for anybody tasked with securing a cloud centric infrastructure, where there’s just so much data to really treat this as a big data problem, aligned to the big data solution that the company is selected and then focus on how do you use all that data?

To do a good job everywhere else.

Ashish Rajan: Awesome answer, man. I want a question from Wasim as well. Omer, do you see strategies such as soar and playbook naturally bolting onto security data lake easily is really where everyone is spending their time to build automation.

Omer Singer: Yeah. Yeah, I do. I think soar is another one of these categories where everybody was super excited about SOAR , the huge promise.

And I think the results have been a little bit underwhelming. [00:14:00] I don’t know, like maybe Wasim you guys have done an exceptional job with your SOAR , but many security teams I talked to. And they bought a store solution. They were really getting into it. You ask them what have you automated with store?

And they’ll tell you are fishing investigations fully automated with soar and it’s like, okay, well then there’s probably more that we could be doing. And I think the reason why we haven’t seen soar really deliver is because there’s so much noise in the detection. Upstream , from the SOAR , which automates kind of, what do you do afterwards?

Right? There’s so much noise in the detection. So many false positives that if you try to hook up your detections to your automation, It just these like robots will be going crazy. Right? Cause cause they’re being sent everywhere and nothing would, would be kind of done as, as it was supposed to be done.

What you need is you need high fidelity detections so that you can actually have automation. You could take the human out of the loop today. Most security teams, they have to have people reviewing the detections before anything can happen downstream. And once you’ve got the person in there, that’s it you’re done with automation.

It’s just not going to happen. So I think once we have single source of [00:15:00] truth, In the security data, like you have all the data, you have an analytics engine where you can use context. You can use that complete picture to finally have accurate detections without all the false positives. Then you can have soar actually start automating a lot more and get a lot more value from that.

It’s just, there are these things that need to be done for us, but absolutely all is kind of going in the same direction about it.

Ashish Rajan: Oh, that’s a great answer because I think Wasim has another one, which is MS for what its worth I’ve been working in a large bank who has been building the enterprise data lake.

They hired a bunch of PhD data scientists for years on the field building and the biggest challenge. It’s still hard to aggregate the data at scale and correlated sensibly. So they, oh, that’s too common to what MS was saying. And Wasim also agrees that soar was snake oil. I think all of, all of us still waiting for the promised land to be seen for soar.

this is really awesome content as well, man. , I’m glad we clarified the fact that, Hey, we’re not trying to build our own security data. But just trying to bolt onto the existing one sounded like it needs to be an organization that are already doing some kind of data [00:16:00] analytics projects.

So anyone who’s listening to this probably can go, Hey, my company’s already doing data analytics. How do I tap into that to get some security information? Would that be right way to kind of explain what security a data lake is as well? Well, I

Omer Singer: think one of the nice things is that this movement, and I really do think that this is a movement of cybersecurity.

Joining the rest of the company in doing analytics. But this movement has been gaining steam and vendors have been recognizing it. And we’re seeing more and more vendors in the detection and response space, including traditional SIM vendors saying we’re going to embrace this architecture and what, we’re not going to require you as the customer to send us a separate kind of slice of the data.

I wrote that to hold it. And if you want this data, you’ve got to come to us. You’re seeing more and more vendors saying we’re going to come to you to where the data is. And it’s coming from lots of different directions. , we had early partnerships with XDR players, like Hunter’s EEI Panther lab.

Has this approach Securonix the highest scoring SIM vendor and the Gartner MQ has [00:17:00] no an open bring your own data lake architecture, extra beam announced that they’re doing it. So I think you don’t need to start from scratch. You don’t need to depend on data scientists just to get started. I think over time.

You can make use of the fact that you have all this data in a standard general purpose data platform that supports data science to then hire data scientists or partner with existing data scientists in your company, and do more with that. But there’s a lot of capabilities backed by vendors that support this architecture and and you can see some, some great progress, which is working with an open vendor that supports this architecture.

The vendors that remain committed to that vertical integration. Those are the ones I think that are going to continue to hold security teams back. And I think their on a evolutionary dead end, because if you’re trying to do both. And security analytics as well as, as companies that focus on one or the other, you’re just not gonna be able to keep up.

Ashish Rajan: Yeah. And I think it’s really interesting what you mentioned about data analytics and data scientists as well, because I feel like I don’t know that many [00:18:00] companies that I see, like, , it seems like there’s a movement happening as you to what you just said. Are there a lot of companies that have already adopted this?

I think the only company that I know of has Netflix because they, like, I think when I was talking to them a couple, two, three years ago, they were then they were hiding her data analytics. And I was like, why are you hiding a data analytics person for security at that point? That was me being surprised at three years ago.

But now it seems like it’s the norm to have, if you have a lot of data to work with. You definitely need some data analytics people in your team. So what’s driving the adoption. You reckon it’s more the fact that that people are realising SIEM sucks wait did I say that out loud

Omer Singer: Yeah, well, definitely companies are doing it.

It’s not just Netflix anymore. Last year, And snowflakes award ceremony with like the Oscars of snowflake once a year. And for the data science category, the Comcast security analytics team took home the award and, and they’re doing a great job over there and it doesn’t all have to be AI really kind of scary, fancy stuff.

Some of this data analytics is just BI just having self-service reports where people outside of the [00:19:00] security org can for the first time have direct access to some of the the data, the insights, the metrics where for example, HR might have a role to play in your security program because they need to make sure that terminated employees get decommissioned quickly.

Right? They don’t do the commissioning themselves, but they need to start the process. In many companies, that process doesn’t happen fall asleep. Sometimes you have employees and they still have access to their different systems for days or weeks after their termination. That’s scary. Right? So if you can reflect to the HR vice versa.

How their organization is doing in terms of opening tickets on a timely basis. They’re going to probably do a better job because they have that visibility and they want, others have that visibility. Maybe the CFO, maybe the CIO or the CEO has visibility and their Tableau or Looker dashboards.

And so they’re going to do what it takes to do better. There that’s analytics. Like if you just have that, that works and , people joining security sometimes ask me like, well, what is the best skill I shouldn’t, I should learn to get into cybersecurity. And we tell them. Learn SQL because it’s in such short supply [00:20:00] today within cybersecurity.

And it’s such an enabler that I think it’s a great skill to have. So I think that’s accelerating and, and again, just companies like Panther labs that are saying like, you don’t need to build this yourself. We have the integrations that are going to turn your snowflake into us. The fact that they have that off the shelf available, ready to go, makes us accessible to to many more security teams, including the teams that are, they see themselves as engineers, right.

They know piped on the wall on a build. And so having this kind of very scalable kind of limitless platform that has all the data and you can kind of crunch it as much as you want. They have a lot of fun with it. And so they’re embracing.

Ashish Rajan: Oh, that’s good to hear. I want to ask another question from Wasim here as well.

Omer , do you find a good place to start is consuming cloud native telemetry first such cloud sources, a cloud trail. then tackling on-prem next (Syslog, SHA, passive DNS) or do you pull as much as you can at once? What are your thoughts here to show immediate value?

Omer Singer: Now that’s such an important question because a lot of times people are approaching a security data lake project. And this is kind of what I do. I work with teams that are kind of starting [00:21:00] out these projects. So they’ll start from the sources. What do I bring in. And I think this is maybe a Relic of the SIEM kind of poisoning our minds a little bit and making this all about log management.

Like guys, we’re not here to manage logs. We’re not here to check a compliance box. We’re kind of do better detection and response right at the end of the day. That’s what we’re trying to do. And Hey, if we could be proactive and lock down these environments, that’s great too. And so start with that start with what are the business outcomes that you’re looking to achieve?

Rather than the data sources. So lots of them. I think the question is what are your crown jewels, who are the threat actors that are going to be going after your crown jewels and then work backwards from there and what you may found. And this, this goes back to kind of cloud native. What you may find is a focus on proactive.

Hardening might be more rewarding than better threat. Maybe you start off by not having any logs ingested, but you start snapshotting your configuration records in AWS and you start checking to see, Hey, are my security groups locked down the way they’re supposed to be or not. Right. That’s maybe that’s the most valuable thing.

It’s, it’s going to be about the [00:22:00] business outcomes and being able to to measure them, define them, right. Treat this as an engineering project that you design and afterwards you’ll get to. Okay, well, if this is what I want to do, What are the data sets that I need? Where are they find found today? How do I go and collect them and then get to collecting them?

And maybe there’ll be in the cloud. Maybe there’ll be on prem. Maybe there’ll be out in SAAS . But work backwards from the business outcomes that you’re trying to achieve.

Ashish Rajan: Interesting. So Crown Jewels in areas where their data landed. That data analytics is already taking place. And then look backwards,

Omer Singer: crown jewels, wherever it is, , maybe what you’re trying to secure as a physical warehouse, where I work with a customer called Prologis, they’re the largest operator of physical warehouses in the country.

And they collect all sorts of, of data too. Snowflake, including badge data, like who’s badging into the physical warehouses, right. Because maybe that’s what they want to make sure that if somebody is badging in, they’re really kind of around, or if they’re on vacation in Hawaii, like why did it just badge in right?

Maybe somebody stole their identity or stole their badge or whatever. So it’s not even necessarily about where your data is. It’s using the data to protect your crown jewel. [00:23:00] Whatever they may

be

Ashish Rajan: also, and to take another leap from what Wasim was asking. So the immediate value question where, so our value proposition hasn’t really changed.

It’s just how or not I’ve only used the word data sources again. So, what are we using to display? Hey, this is how we’re giving value to the organization. As a security that hasn’t changed. That’s still remains the same. It’s just that we’re trying to better ourselves by not using SIEM as a way to do detection in this response.

We already have these data lakes around in the company. How do we can adapt in some of those to kind of find out like what could be a source of truth? Would that be right?

Omer Singer: Yeah. And, and I think the reason maybe why we’re as an industry, we’re so focused on what data sets do we bring in is because with SIEM that was so.

It was so hard to bring in the data. And it was like, oh, this is too much data. I’m going to spend a bunch of time, shaving it down and, , bring my windows events. But if it’s event, , four to four and then all four to zero, I don’t want to bring that up too noisy. And then we just, , on these thoughts sources, it’s like, let’s assume that you have all the data.

Let’s just start with that assumption. Let’s say you have all the data for as long as you. And it’s in a [00:24:00] place where you can join it with the business data. You could join it with threat intelligence. You can, you really have all, you have the full picture, cybersecurity 360. Now let’s start asking the hard questions of what do we care about as a security team?

Like what do we care about? What do we. , and use that to drive, drive your strategy.

Ashish Rajan: That’s awesome. . I think, cause I think the elementary and the whole standard was something that he does from last week, as well as where. Application provider has their own standard writing a log as well. And you’re trying to figure it out. Hey, what does this really mean? And then in your SIEM , there is that whole, another language at that point, where do I use over here to try and make some sense of all these forces coming in?

So I appreciate that. So for the questions that came in from the community, for the folks who could not make it to the live we have the part of cloud security forum. And first question they asked was the layer of knobs to turn ownership or database objects, secure view XL, encryption, even a bring your own key BYOK .

How would you prioritize them based on trading off usability and effectiveness in terms of reduction over attack surface? Like long-term maintenance costs obviously.

Omer Singer: Yeah, that sounds like a question that’s coming , from somebody that’s responsible [00:25:00] for securing a snowflake environment. And and I’m glad that he’s thinking about that because it is something that you don’t want to just on a trust that everything’s working as it’s supposed to be working as a security team.

Yes. Think about cloud infrastructure security, but also think about data security. I mean, definitely that’s important. This is where I would say as a security team, that’s aligning. The data platform, the rest of the company is using like work with the people that are managing that data platform and work with.

All the different advisors and, you mentioned bring your own key. That’s a specific snowflake feature. So I’m assuming there a snowflake customer. We have an entire team of security field CTOs. Their expertise is data security. How do you lock down this environment? I think it’s important, right?

Do you have business data? Maybe you have security data going in there. You want to make sure it’s locked down. There’s a lot of configurations that you can put. To, to harden it like restricting which IP addresses can connect, for example. And then there are the additional solutions that are focused on data security that, that check to see.

For example, you have a user that’s all of a sudden pulling down a thousand records with social security numbers in them. Well, , maybe that’s an insider threat, right? So [00:26:00] you get into classification, you get into monitoring that’s an entire area, but yeah, I think we should focus on how do we, as security teams use that data platform to have a more effective security.

Ashish Rajan: All right. Okay. And, to your point the, because there are two parts to this as well. One is the security, like it said, we’re trying to grab information from the data lake how do you get my pulse of security from that particular data source, but then there’s the other side of how do I secure this thing as well?

Because I wanted the right people to have access to this data because potentially some of that could be sensitive as well. Like there could be PII in there and a lot of other things in there, there’s two sides to this as well. Yeah. And we see

Omer Singer: attackers they’re running a business, they’re always following kind of where the money is.

And the more that, , data is the new oil, the more that companies have their most sensitive IP, the more, a lot of value in these data platforms, the more attackers are going to go after it. So yeah. Beat them to it. That’d be an added side benefit of having the security program run on the central data platform is the security becomes more familiar with that data platform, right?

Because they’re using it. It’s one of their tools now. So now it’s not some strange thing off to the [00:27:00] side. They have a better sense of it. They start feeling like, okay, I understand what our back means in a data platform. Well, let me ask some questions about what is our, our back policy, right? For this.

Ashish Rajan: Right. And that’s what clarified that as well. Cause I think it’s definitely what’s while separating those two things out. Well, one is just, how do you do it internally? But then the other part is also how do you mitigate the threat to the platform itself? The other question that came from that was as S3 has a lot of different tiering options, would snowflake consider going into this direction, a data lake maybe flat, but they consume a lot of high priority information, like a lot of sensitive information. So they obviously want it. ,

Omer Singer: stay tuned. We’re always looking for ways to make our storage even more. Cost-effective last quarter we released some compression improvements behind the scenes that improved by 30% compression.

So all of a sudden storage. Drop up to 30% very significant for some customers. And if there’s ways to use kind of S3 a behind the scenes, kind of different options to achieve further savings. I think it’s something that we’ll be looking at. But the first thing is to tap into the cost effectiveness of cloud native storage, right?

The reality is for most SIEM out there today, you [00:28:00] send data there. They’re actually loading it into some server, which is going to be very expensive storage and not using it. The cloud native storage in snowflake. If you’re running snowflake on AWS, it is S3 under the hood and that’s how we can charge $23 a terabyte a month for storage and we charge for compressed data.

So it’s actually going to be up to 10 X compression on machine data, and you might be looking at 10 terabytes for $23 a month. That’s why I can constantly say the limitations are removed. You’re like the technology is out there now to have all your data in one place for as long as you need. Now, let’s talk about what you can do.

Ashish Rajan: Awesome. And the next questionfrom the forum was . Is there a security data warehouse too? Or is it just security data lake or what is the difference between the data lake and data warehouse?

Omer Singer: We can get into that. I honestly, I had no idea what a data warehouse was before I joined snowflake, which just goes to show how much in cybersecurity.

We don’t know enough about this area. They come from different places. The data warehouse was very structured. It was coalminer and you needed to have all the data very well structured. Just by the way, why security. Because we have all these different data sources, they have all these different shapes [00:29:00] and sizes and formats are always changing.

Some sources we’ll call it source IP, some sort of so-called SRC underscore IP. It’s a mess. Right? A lot of it is Jason and data warehouses. Couldn’t couldn’t hold that semi-structured data. Then you had data lakes, which were all about the semi-structured data. But they had their own problems in terms of actually kind of working with the data, having governance on the data.

What we’re seeing now though, is that all the data platforms are really kind of meeting in the middle and they’re taking the best of all the different cloud warehouse stuff, the best of the data lake stuff, when they’re having it in this case. Data platform. And so snowflake . Doesn’t call itself a data warehouse anymore.

You have other places that players in the space coming from the data lake side, and now they’re adding data warehouse capabilities. I wouldn’t get into those semantics. I think the point is you want to be using cloud data platform. The important thing is that the storage is very cheap and limitless and that when you need to ask tough questions, you have a lot of compute to throw at the problem.

Those are the things that a modern cloud data platform is gonna.

Ashish Rajan: Awesome. Thanks for that. And of course, a few questions that came from David Matosich that was only left on LinkedIn [00:30:00] yesterday. So how is the security data lake different from a SIEM ? I think we have answered that earlier, but I think it’s worthwhile calling out if for someone who’s listening to this.

And going how SIEM gives me the same information, security data lake would give me a same information as well, but seems like I would need like a data lake platform, I guess. Am I creating a data lake? From scratch or do I have to buy a vendor to do this, but as long as like, I already have a, of a theme

Omer Singer: too, hung up on these categorizations because like a lot of this is just analysts and vendors and making stuff up to, to have some sell you.

I think the important thing is you need to have a place where you could put all your security data and, and be able to analyze it. And a SIEM as a certain architecture, it’s an architecture that is traditionally vertically integrated. Came from the log management kind of requirements. So you’re seeing, we’ll typically have collectors storage analytics and some investigation interface.

Those are usually kind of that’s what a sin is, what the security data lake is saying, that storage and slash compute engine let’s rip that out. Let’s [00:31:00] take everything that’s left and bolt that on to the data platform, but the company. That’s where that’s the security data link approach. It’s a different architecture.

These aren’t solutions as much as they are architectures. And then you go and look, you have a SIM like secretary. Saying that they support bring your own snowflake while there’s still a SIM, but they’re not as similar. The traditional sense, because what they’re saying is they’re going to come to wherever you’re data is stored.

They’re going to accommodate that. They’re going to support that kind of a direction where you’re going of joining the rest of the company on a central data platform. And so I’m considering that a security data lake architecture that’s, that’s the important thing

Ashish Rajan: I’m smiling because I remember a lot of companies that are consulted for there always was a regular theme where no matter what application you architect, all logs, are going into SIEM in respect.

That’s like a song like a Hitler rule of if it’s like all logs should go into SIEM . What happens after that? Do not care, but push them in the SIEM , but I I’m starting to see even more. The value that a security data lake would bring in. In an organization. You’re [00:32:00] not asking people to, Hey, I want all your logs, send it over here.

You already have the logs. You keep it very, keep it. But it’s in your data lake. I’m just tapping into that existing platform to understand how do I make some sense of this so I can detect threats and respond to it in a timely basis. So I love that approach because they almost be removing that friction of, Hey, I want to have all the logs in for this amazing platform.

But no, no, no. You already had the platform. So I love it.

The other one was, how do you measure the value? A security data lake provides to a customer versus a business executive.

Omer Singer: Yeah, I think measuring value is important. , how do you go about doing it? It goes back to what is your security program all about? Well, first thing is, can you collect all the data?

I think if you can show that before you were trying to juggle multiple data silos, you had some of the data going to the SIEM , some of the data go into some S3 bucket, some of the data staying out with the EDR vendor and you say, Look now I know as an incident responder, for example, I know that if I go to this data platform, I’m going to find my, my logs there.

Even if it happened nine months ago on some [00:33:00] server somewhere and something, some cloud, I go to the data platform and that single source of truth, it’s going to be there. I think that’s valuable, right? I think if you can also start measuring key risk indicators and show that they’re trending in a good direction, , that’s one.

I’ll give you an example. We consider a visit when I was in the security engineering team at snowflake. We considered visibility to be really important for us because we don’t have visibility, everything else that we’re doing downstream for that it’s going to be questionable. Right? And so we started tracking that we took our inventory here, all the servers that we have, we had that in one table and the database, and then we had another table with all of the CloudTrail logs or whatever agent logs, whatever it is.

And we started comparing inventory to logs. And if we see a system and inventory that we don’t see in the logs, that’s a visibility gap. And if we boil it down to one number, just a percentage, that’s what we can then report and show that to all the different stakeholders and say, by the end of the quarter, can we take this from a 70% and 80%?

And if we do then great that’s value. And if you don’t, that’s still value because then at least, , that for whatever reason, our dev ops team does. The [00:34:00] tools or the manpower or whatever they need to successfully achieve the visibility targets that we’re going after. So that kind of quantified approach, that measured approach is something that you’ll see a data platform really supporting because that’s that kind of data warehouse, BI background really shining through.

It’s all about measuring things and reporting it. But now that extends to logs and other data sources as well.

Ashish Rajan: Oh, and what about governance? And so finally question was how do you govern a data lake with different classification of data in one place? Cause I get imagined everyone has PII, non PII, artists governance work in that space then.

Yeah.

Omer Singer: I think this is one of the challenges with an S3 data lake is that governance becomes really, really hard. And it’s something that is snowflake. We spend a lot of time on governance and and there, are. Policies that you can apply. Obviously we’re not going to go into them here, but yeah, there are techniques that are out there and there are tools to that, that go in and they do classification and then they track to see what are kind of analysts queering for that queening for a bunch of records that we understand have social security numbers, PII, or financial records, et cetera.

And then [00:35:00] have that as an input to the security team. Hey investigate that. I think increasingly we will see sources like that driving what the security operation is doing, rather than just like malware on a laptop. Like actually some analysts that seems to be really keen on some, private data. Like that’s also something that security should investigate.

Ashish Rajan: Awesome. . Thanks for answering those questions about, and thanks for sending them through David Matousek as well as the Cloud security forum. And thank you for your patience as well, Andres. So Andres’s question is AWS GCP or Azure to create a secure data lake?.

Omer Singer: Well I gotta say that I’m biased as a kind of volume it snowflake and snowflake runs in all three.

So I would say , if, if you’re using snowflake, it really doesn’t matter which three I, and again, aligned to what the rest of the company is doing. I think security needs a partnership with the data organization. So Andre , if the rest of them have kind of the data strategy where you work is to be on that.

Build your security data lake on Azure. But if the company is going all in on AWS as the place where the rest of the data lives, well, join them. They’re like, [00:36:00] this is really a partnership now, and you’re going to really love, not having to worry so much about kind of all the underlying stuff. Just knowing that you have the data there now.

Doing interesting.

Ashish Rajan: Awesome. If you have a follow up question, feel free to ask a follow-up question as well. Andre I’ve done asked the question as well, almost thinking I was reading one of your blogs about panther and snowflake integration. What was your advice immediately for a company who wants to automate their workload SAP with all major features of a traditional SIEM to start with?

I think the advice is on what’s required first, I guess. How do you start.

Omer Singer: I mean, reach out to these vendors. They’re experts at this. This is where, when I started this journey in snowflake back in 2018, we didn’t have vendors supporting this. So it was a bunch of like open source. I hired a bunch of Python developers.

Building tooling ourselves, which is hard. And you’re building tooling rather than actually doing the detection and response work. Right. So I would say now take advantage of the fact that you have experts at Panther labs. You also have experts at the additional solutions. If you require all the features of a traditional SIEM .

We’re starting to see traditional SIEM vendors embracing this [00:37:00] architecture there as well. Right? And this is where you have, for example, Securonix now supporting this architecture as well, that are really coming at it with a different approach. If you value detection as code, if you want tests for your detections, CIC D integration, Panther has built a phenomenal solution.

That I I wish we had when we got started and we know we didn’t have to build all that custom tooling ourselves, but I’ll share that. We’re actually in the process of fixing that and and actually bringing Panther into the security engineering team at snowflake. So we’re excited to use their tooling , and they have some, some great customers out there.

If you check out their website, Figma for example is a team , that’s using it and, and recorded a webinar talking about their experience. So I would check that stuff out and, and reach out to them.

Ashish Rajan: Awesome. All right. Thank you. And toward the, I think I missed the question before. I think this is a good one.

I mean, looking into new roles in the future as data security, engineers or analysts who understand how data lakes work and managing security of data lakes and security layer on top of a data lake architecture, monitoring, scanning. So are we opening up? I think I would say yes, but I’m, I’m curious to hear your opinion.

Omer Singer: I think so. I look [00:38:00] at data security as an ecosystem, so many new solutions out there doing data security. Just like you have application security. Yeah. I think data security is going to be a thing. Again, I think a learning CQL is probably the best investment that anybody in cybersecurity can, can do today.

And Yeah. I mean, I did, I never thought about it as kind of its own rule. I think as more and more , companies are, are, are being data-driven and going all in , on a data first strategy. Yeah. Yeah. It’s going to be a great role. And then also in the service of the security program, right.

Because if you understand cybersecurity, you understand CIA, right. Confidentiality, integrity, and assurance availability. If you understand a tacker TTP, And then you also understand data engineering, data science BI now you combine those two. I mean, wow. That’s a, you’re going to be a superstar.

Ashish Rajan: Like a unicorn, for sure.

That point

Omer Singer: you could hire somebody like that today with as much budget as you want it. I don’t think you can hire somebody like that.

Ashish Rajan: Some really, yeah. It would be a really unique skill set as well. But I guess this is probably the final question from my side then. So if [00:39:00] someone wants to start learning about this, we should one start like.

If I’m a security person that’s going to this, I think I’ve heard of data analytics. I think I know what I want. So how am I, how am I upskilling myself to be able to do a secure data lake? What’s that two oh. So,

Omer Singer: I mean, , if you want to join now, the nice thing is you’re joining. You’re getting in at the ground floor, you’re joining it.

The really kind of the early days of a movement that I think is going to transform cybersecurity. So definitely do it. , you can add me on LinkedIn. I try to kind of post things about this, that I’m seeing successful at different customers that are doing this. Using the hashtag security data lake.

So hashtag security data lake on LinkedIn, what you’re going to find is if you, if you search for that on LinkedIn, you’re going to find testimonials. You’re going to find customers now starting to talk about this and just that black hat, for example this Jen, just, just this week, right? We had we had blackout and we had the head of security Pallavi at Netgear and she talks about how she moved.

I mean, her team moved from a traditional cemetery. To a security data lake. And why and how they did it. [00:40:00] And she has guidance and advice for anybody doing it as well. And it’s so awesome that now we’re, we’re kind of, we’re still in very early days in this movement, but we have now enough security team is doing it.

You’re going to see leaders talking about it. So watch pull lobbies video, watch. You have Julie from Guild education. She talks about her team’s journey, moving from a traditional SIEM to running a Panther on on snowflake. So palabra used hunters on snowflake and and, and Julie from Guild education, she used Panther on snowflake, but they talk about it and you’re going to learn a lot, hearing what they went through and then talk to your data engineers or the whoever’s running kind of the data center of.

In your company, probably there’s somebody right, doing that, talk to them and come to them with some of the challenges like, Hey, I was really hoping to do better insider threat detection next year. What do you think about like approaching that together and work with them and your partnership and bring your expertise and they’ll bring their expertise.

And I think then you’ll find that you’re able to create some early successes and build.

Ashish Rajan: Awesome. All right. So that’s a sort of technical questions that I had. I’ve got some fun [00:41:00] questions as well. Just talk to me, there’s three of them. But I think this is really interesting and I feel like I can keep going on the data lake pieces, but for the moment, let’s stick with the three questions fun questions.

First one, what do you spend most time on when you’re not working on security data lake?

Omer Singer: Not working on security, like, well, I have a three-year-old and she takes up a lot of my time. So playgrounds pushing her on the swing, that kind of thing. And then I have a nice little setup in kind of my, it’s not really a backyard, more like a porch here where I have some, some workout equipment where it’s, , being stuck at home work from home pandemics closures, who knows what we’re going into.

So try to. Also a exercise a little bit outside, but also good. Yeah.

Ashish Rajan: Yeah. That’s a great, that’s great. So that’s definitely a positive user time as well. A second question. What is something that you’re proud of, but is not on your social media?

Omer Singer: Proud of, but not on social media. I think it’s the security operation center that I ran at my previous role, I was at an MSSP and we need to do 24 by seven security.

And it’s hard. You do the graveyard shift on that. It’s hard to stay focused and do this [00:42:00] job 24 by seven. And what we did is we took a chance on some people that really were interested in cybersecurity, but had no background, nobody was taking their resumes and we took a chance on them and we built the training program internally.

And I see now, How they’re advancing and like the, by now they’re not doing the graveyard shift anymore because everybody wants to get out of that, but they’re doing these great roles and a great security teams. And and I’m really proud of giving, given those individuals, their, their start. And it’s, it’s just, it’s so much fun to see where, where they’re advancing their security career.

Wow.

Ashish Rajan: Yeah, I think you’ve, you’ve, you’ve, you’ve laid the seed for a, for a, for a great future for a lot of people. That’s pretty awesome then what, what the last question, what’s your favorite cuisine or restaurant that you can share

Omer Singer: cuisine or a restaurant? I think Italian, man, I think I can’t go wrong with Italian.

Right? You got beats up or, , during the week you want to go to a fancy dinner, you need to take the misses out to some fancy Italian restaurant and she’s happy. So yeah, I got to give credit to the Italian cuisine.

Ashish Rajan: That’s awesome, man. So, and I, yeah, so that’s just the questions that I had, but where can people find you too?

I [00:43:00] think you mentioned LinkedIn earlier. Is LinkedIn the best place to hang out and connect with you for people who have more questions about

Omer Singer: is LinkedIn Omer singer Omer with an E singer add me on LinkedIn and reach out. And I really feel like this is a movement. So if you, if you’re, if you’re a part of this movement, if you’re getting started at your company to, , taking a more data-driven approach to your security program, reach out, let’s set a zoom let’s let’s catch up and help each other.

As we try to do better security with better data.

Ashish Rajan: Yeah. Awesome. And I want to quickly give a shout out the interesting question that came in as well so thank you. Everyone who kind of participated as well for everyone else. Next weekend. we’ve got , someone from Atlassian coming in, talking about how they’re using observability platform. So it’s going to be a really interesting conversation over there, as well as the other part of using cloud native.

But thank you Omer for coming in. I really appreciate your time, man. I’ll probably see everyone next time, but thanks so much, man. I think we need to bring you again. So because she’s facing all the other questions that I had. Awesome, man. Thanks everyone. And I’ll see you all together next week. Bye.

No items found.