- Thank you, isn't that great?
Was anybody at Fogo De Chao last night?
Right, everybody was at NetWars,
but after NetWars, people were, okay, a couple meat sweats.
I've got the meat sweats right now.
We're going to sweat through this here.
Okay, cool.
So thank you for that wonderful introduction.
Thank you for having me.
It's actually great being back here in the Phoenix area.
It's been several years
and I've never experienced Phoenix this cold.
It's almost like I want to move here.
So, thank you Ismael for covering most of the material here.
You know, I want to share with you today some of the
thoughts I have on my travels and the journey that I'm on
in developing what I characterize as a next generation
analytics platform, next generation SIEM,
for SOC that I work for at Microsoft,
and nothing here is going to be Microsoft specific.
I'm not going to show you, unfortunately,
how we do it specifically at Microsoft.
What I am going to do however is I'm going to share with you
things I've learned across organizations on how to do this.
I don't claim to have all of the answers
but I have a lot of the questions
and actually that's the inspiration
for what I'm going to bring you today
is bringing to you some of those design considerations
that we have to think about and take into consideration
when we're building that next-gen platform.
And I promise you that there's no BS marketing
in any of these slides.
This is an actual sticker on the side of a coffee machine
that I saw in Hyderabad.
The phone numbers have been erased to protect the innocent.
Years ago,
I put together my first production SIEM instance,
15 years ago,
and back then we had many assumptions.
Relational databases were the way you stored lots of data.
15k rpm spinning discs
were fast.
Yup, chuckles.
More than eight cores in one server, man, you were hot shit.
And by the way,
that's 7820X,
there is actually more powerful
than my first production install
for an enterprise of 100,000 users.
But did it have-- (audience laughing)
No it didn't, not to my knowledge, no.
Spark processors are pretty darn slow.
Sun V880 was my first install.
- Woo! - Yup, woo!
Solaris 8, I think it was, in Oracle, oh my god.
Moving on.
Tens of thousands of, excuse me,
tens of millions of events a day was like a big deal! Yeah!
Look at all those firewall logs!
One monolithic product was sufficient
for your analyst needs.
And the only organization, generally,
that consumed the fruits of all of this stuff was the SOC.
Are any of these assumptions still true?
No!
Yet the design of some
SIEM installs are still based on these assumptions.
There are better ways.
And if any of you have had any experience with big data,
you know what I'm talking about.
But there's more to it than that.
Who are we engineering this capability for?
Is it Tier One, is it Tier Two, is it the hunters?
Is it the responders?
The data scientists, the engineers who were developing it,
the SOC leads or upper management.
Right, yes, Wesley is Tier One.
(audience laughing)
Like not just Wesley like kid Wesley,
right, does it bother anybody that it says "Engineer"
in front of Geordi?
Is he the chief engineer at this point?
No!
Yeah, president of the Star Trek club right here.
I'm using Star Trek to talk about computers,
this doesn't get any geekier than this.
So each of these people have different questions
that they need to ask almost every day.
Where do I start?
What really happened?
Where is the adversary?
What do we do next?
What questions are better finding the adversary?
What new detections can I create?
How do I keep the system running, well,
how is the SOC doing generally,
and how is my security program doing overall.
Engage.
So when we put together this capability,
let alone, by the way, who's missing here?
The biggest user group of them all that needs this data.
Any guesses?
The constituents. Our customers.
Each of these groups have different questions that they ask
and different needs, so when we think about
how do we put together this world-class capability,
we need to think about how are we going to cater
to each of the folks on the bridge, here.
Moreover, it's not just about one data store.
Chances are,
data stores, things, data lakes, databases, use a buzzword.
That have security-relevant data in them
are likely proliferating in your enterprise.
You're actually probably no longer the only organization
with a big pile of security-relevant data.
Telemetry from IoT devices, NetFlow.
There's a very long list.
Application logs.
You can't put an agent
on everything you need to defend, necessarily.
Again, we talked about mobile workforce yesterday,
that's one potential example.
And you'll never reach 100%, what is 100% anyways?
It's like me saying,
I'm going to monitor every person in this room.
Somebody just went out, somebody just came back in.
It's a moving target.
The result is that we have to make
careful choices about our resource investments,
as if that wasn't already obvious,
we have to pay just as much attention
to integration as we do aggregation.
In other words, I need to be able to make connections
between these different things,
I'm going to talk about that in a second.
We can't just chase our monitoring percentages
to 100%, right?
You have to make a decision, what percentage of coverage
at any given point in time, is good enough.
One of my very last slides I'm going to talk about
some of the different ways to measure that.
And then finally we have to consider
those different dimensions on how we measure
our monitoring coverage.
It's also not about just one product.
It's potentially about several.
We have our neural time alert triage view,
we have ticketing and workflow,
potentially some orchestration if you're real fancy.
Business context, drill down to pcap, file detonation.
The analytic engine that actually drives all of this,
be it atomic detections or otherwise.
The platform you're using to do your data science,
how are you querying and visualizing that data,
in an analog manner or otherwise.
EDR, of all sorts,
particularly super high-fidelity endpoint telemetry,
and threat intelligence.
Every single one of those arrows
is a potential pivot for the analyst,
depending on what your architecture looks like.
We could be talking about three different products here,
in your architecture, we could be talking about 10 or more.
It's an integration job for every engineer
that supports this kind of capability.
There are some products I'm familiar with,
I'm not going to name names here, no BS marketing.
That would cover
four or five of these blocks.
There are other installations, other architectures I've seen
where there's a monolithic capability for each of these.
When we think about drilling down to Tier One for a second,
I had this moment of clarity,
several months ago I was
I was having a conversation with some Tier One folks.
We were talking to them about, what are your requirements
for our next-gen SIEM platform.
And I had these crazy ideas about like,
message buses, which I'll talk about in a second,
and data enrichment,
and we're going to do this for data science,
and blah blah blah.
75% plus of the conversation
evolved around the following points.
They need to be able to triage that data, obviously.
You need to be able to prioritize the alerts,
be able to annotate them, escalate them,
and potentially suppress alerts for a given
username, a given hostname, a given alert or analytic, etc.
They need a clicker that works,
they need to be able to view the details of those analytics,
potentially the syntax,
just like in the olden days, right?
Why do we like a tool like Snort so much?
You can see the actual signatures. Same deal.
Being able to drill down to the actual events
that triggered the outcome of a given analytic.
They need to be able to communicate with other analysts
and their constituents.
And they need to be able to document what they're doing.
And my moment of clarity was,
I can build the most amazing architecture
with all the fancy cool stuff
that would make all the big data people be like,
"wow, that's really cool."
And it would make our hunters like,
"wow, that's really cool, I love this."
But if I couldn't nail these pieces right here,
it doesn't matter.
I expect a similar level of fidelity in the requirements
and how we engineer and cater to operator's needs,
for every single one of those folks
on the deck of the bridge.
And when I think about,
how am I bringing all of this data to them,
and all these integrations,
it's actually kind of a spectrum,
and yesterday I heard
most of the items on the spectrum.
The first, the very, the saddest side of that spectrum is
we don't know data exists that we think we would want
or we don't know where it is.
We have to ask someone for the data, that's slightly better.
We have to manually swivel-chair to it.
We can get to it automatically
from our main triage console through what one vendor calls,
"right-click integrations," there are other names for it,
but the idea is that there's no manual entry necessary.
Oh hey, there's that IP and that alert,
I'm going to move over here to this other console
and go look for, I dunno, pick something, NetFlow data.
We can enrich alerts as they enter our architecture,
as they enter our SIEM.
That, the SIEM, perhaps more ideally,
can actually evaluate those incoming alerts
as enriched against that contextual data,
and potentially raise or lower the priority.
Believe it or not, this is actually happening,
starting to happen today in some products.
And then finally, we can take orchestrative actions.
Now, I know that's a lot of buzzwords there,
and in the olden days, right,
I remember this was like 2004 and we were like,
"yeah man, we're going to get a Snort alert,
"and then we're going to fire a Nessa scan,
"it's going to be awesome!"
What is old is new again,
we're still talking about, to a certain extent,
the same ideas, right?
We want to be like, "oh, we got an alert over here,
"we're going to go bring down a memory image,
"or we're going to go pull a reg key
"or something along those lines."
All worthwhile things to do.
We live all day long in items three through seven.
And as the engineers building this platform,
it's our job to bring the capabilities from two in three
down into the happier,
perhaps nirvana part of that spectrum.
And this takes a lot of work.
Admittedly.
When we think about bringing that data into this system,
remember earlier I said one of
the assumptions we used to make
is that relational databases were
the way we stored all this data.
That's no longer the case.
We have many more options
on how we ingest, persist, and query that data.
So, on the left-hand side here
we have the approach that is a very rigid schema,
kind of old school,
every field that we parse must have a home,
if we have a failure on ingest there's a very good chance
that we'll drop the data, depending on the manner in which
the particular capability was written.
So ingest is generally comparatively expensive,
on the other hand our analyst can expect that
all of the data finds a home in the right field and that,
comparatively, their analytics,
be it near-real-time or otherwise,
are comparatively less expensive.
This is the older way of doing things.
On the opposite side of things,
you have data, excuse me,
an approach that is more consistent
with modern big data techniques,
where rather than being schema onto ingest,
it's a schema onto query.
Very common technique.
And the implication of this,
is that each data source may end up being
a separate table in your data store.
Or maybe, if it isn't,
a large segment of the data that you're ingesting
is dumped into one field.
Right, this is very permissive way of collecting data.
So we're doing comparatively less ingested,
excuse me, less parsing, and ingest time
and consequently our analytics
are comparatively more expensive, right?
Does this scale for a large group of analysts?
No!
There's tremendous inconsistency in what they're looking at,
and particularly the Tier One folks are going to be like,
"what the heck does schema on query mean?"
You can't expect them to have to write
a different query every time they want to query
one data type versus another,
and I'll show you what I mean in just a second.
The approach that
I advocate
is kind of a hybrid approach.
We have options now in these architectures
where maybe we were somewhat permissive an ingest time,
but somewhere in our analytic pipeline
we remap fields, we do semantic field extraction,
normalization, et cetera.
This allows us to be more flexible
on the data we ingest and persist,
but this still gives us a degree of consistency
that our Tier One folks and above have come to expect.
And the thing here is is,
you don't need to extract all three zillion fields.
Extract 30, the 30 that matter the most to you.
The ones that matter the most to the neural time analytics.
So that when you're writing your queries
and your WHERE clauses specifically,
you're not doing super expensive
field extractions at query time,
rather that has already been done.
And then you can bring your results set
to a much smaller volume,
and then do that more-involved feature extraction.
Let me show you one example
of how this will easily manifest itself here.
In one slide, 'cause I've got things reversed.
So we want to collect all of this data,
we want to collect all this data
from all these different places.
And we want to send it to many places.
What's the solution?
Any big data people here?
What's the solution, if I want to collect data from many,
dozens of different things
and actually output that data to many other things,
persistence, stores, neural time analytics, et cetera.
Any guesses?
Big data, specifically.
It's on the slide, message buses!
I can produce into a message bus
and I can consume from that message bus.
I've got all my sources of data on the left,
I've got the folks that consume that data on the right.
And this gives me a lot of flexibility amongst them
I can produce data into different topics.
Those topics have arbitrary names.
I could call them Pomeranian, I could call it Bloodhound,
I could call it Shih Tzu, it doesn't matter.
The point is that I can arrange my data and how I ingest it
in a manner that makes sense to me.
And that message bus is resilient.
It can store my data for a fairly temporary period of time,
24, 48, 72 hours.
And I can consume,
I can have these different recipients of data
consume from those different topics.
Pretty elegant, pretty straight-forward.
On the front-end of that capability,
I have my ETL for the kit, for the data sources that need it
and in cases that don't,
for instance if I've got a data source
that can natively produce into that message bus,
so much the better.
'Cause when you've got
all these different stuff lying around,
you've got to have a way of plumbing your data
'cause at the end of the day,
me and a lot of the folks I work with,
were digital plumbers.
No hairball too small.
I digress.
Well here's the example I was referring to.
Let's consider Bro for a second.
In Bro, we get this fabulous, amazing
layer, three through seven layer, five through seven,
depending on which model you're choosing
and the time of day, day of the week.
Metadata.
DNS, HTTP, FTP, et cetera.
And the
features of each different kind of metadata
that I'm extracting from packets off the wire,
vary substantially and this is an eye chart here,
you can see all these different fields
for each of the different kinds of data.
Now, the default behavior that I'm familiar with
is that when you produce this data into Kafka,
which is one very popular message bus technology,
open source, by the way,
each of these different log types
are going to be their own topic.
And then I can subscribe to that.
If I'm an analyst
and I want to query across all these different log types,
that might be very difficult.
On the one hand, you want to be very permissive
about bringing this different data in,
on the other hand, do you want to jam that in to one schema?
To rule them all?
Probably not, 'cause you're losing the value of the data
for the analyst, and, as we talked about yesterday,
there's this long history
of different data standards, different schemas, et cetera,
and I certainly don't want to be yet another person
who says, "I'm going to stand up,
"I'm going to write my own SIEM schema."
Right, not the first person to do that.
The question is, is, and you have to ask yourselves,
is what's the strategy you're going to take.
For example, imagine I have a big data store that allows me
to write queries across multiple tables simultaneously
and I work with such a technology every day.
Imagine you have a big data technology
that allows you to do something called "federated query,"
where you can write one query
against disparate instances of that technology
and gather the results siemlessly.
These are different strategies you can take,
so there's no right or wrong answer here, necessarily.
When we think about collecting all this data into this SOC,
we ask ourselves, does the SOC have an event horizon
like a black hole does?
In the olden days, the answer might've been yes.
Can we afford to do that anymore?
No.
Our constituents, and our upper management
are demanding the data that we collect.
Yet another reason why I need to think about a topology,
message bus or otherwise,
that allows me to gather data from a lot of places
and send it to many places
in a performing and resilient manner.
But let's be careful what we wish for,
or what serving the needs of our constituents
without due care.
Let me show you what I mean.
Imagine
that I have stood up as the SOC and I said,
"I am going to host the enterprise log analytics platform."
On a good day, that means I've saved a whole bunch
of folks a whole lot of money.
Great, right?
On a bad day, there's this giant political battle
over who gets to own that data, and who's going to pay for
it. Ask me how I know.
(audience laughing)
No one's ever experienced this before, right?
This giant struggle we get in about
who gets to own the data.
What if we say, we're going to open our platform
with all this wonderful data that's relevant
and not just from a security perspective, to others.
On a good day, this means that I have empowered
my constituents, those who I have given access to
the store to find badness.
Sounds cool, right?
I've democratized security operations!
Boy, that was a lot of buzzwords, I'm sorry.
On a bad day, this means I've got random folks
who aren't necessarily thinking about security,
haven't necessarily been trained on all of these approaches
dealing with incident handling,
and they're going off, they're playing analyst.
And they're like, "oh man, it's sexy today,
"I get to play analyst, I get to find cyber bad guys
"and shoot cyber bullets at them, pew pew."
(audience laughing)
They're going to compromise your investigations,
I've seen this happen, folks.
One sysadmin does something bad, the other sysadmin's like,
"hey man, I saw you do a bad thing.
"You should stop that."
And the SOC never finds out. Not good.
Let's say not only have I given folks access,
but I've empowered them to create their own analytics.
You too can find bad guys.
You too can write your own standing queries,
you too can write your own neural time analytics.
Boy, what a great force multiplier this would be.
I could get my service teams, my application owners,
not to have to tell me about
their application necessarily in as much detail,
or explain the whole thing to us
so that we write the analytics, they do it for us, whoa!
This would be so cool, right?
Guess what happens?
I could make the system melt
or I could flood it with false alarms. Not good.
And then finally,
let's say I want to open that access back up to others,
am I going to give all of my constituents
access to all of the data I have collected?
No!
That's suicide.
So I have to find a good platform that gives me some options
in terms of role-based access control.
Where I can say, "hey, this group of folks over here,
"they have access to just the data
"that was collected from their systems."
And perhaps some higher-level overall situational awareness
something that looks like Threatbutt.com
Anybody here, Threatbutt.com? Yes!
Anybody put that up in their SOC before?
Yes! I love it.
We have all of the cybers, right.
We're shootin' cyber bullets today, pew pew, it's great.
If you have no idea what I'm talking about,
trust me, go to Threatbutt.com,
it doesn't have malware as far as I'm aware.
(audience laughing)
Yesterday.
What's that? - Turn your speakers on.
- Yeah, right. Yeah, definitely turn your speakers on, sir.
- This kind of stuff also holds your team into more
non-security events. - Yes!
The comment was,
"this pulls your team into other non-security events,"
Bingo, right? This is a giant rabbit-hole.
To finish off this last point,
I'm going to talk about some mitigations in the second,
to finish this point off,
we've got different users flyin' around,
they're steppin' on each others' toes,
we're askin' the same question
of the same data four different times,
certain queries are super-inefficient, right?
The system is melting, very bad day.
So we don't have the option anymore to say, "go away."
We can't do that anymore politically,
we can't do it financially,
we can't do it in terms of operations,
so we've got to think about
what are the mitigations we're going to put in place
to cope with the fact
that we're opening up access to others.
On the first point,
we can think about shared capital expenses,
we can think about enterprise licensing,
specifically we can think about platforms
that, again, do that thing called federated query,
where it's not just me hoarding all of the data, okay?
What if I allow my analyst
to write federated queries across these disparate stores?
Very cool, very shiny, kind of complicated.
And we establish some kind of clear governance over
who is the custodian of all of this data.
Number two, if we think about opening it to others,
we need to make sure that the SOC owns
the identity and authorization plane for that.
We can think about certain means to lock down access,
I would strongly encourage
that you use something resembling MFA,
Multifactor Authentication.
Maybe you're putting that access
not inside your identity boundary for the SOC
but another identity boundary that's more common,
more familiar to the folks you're serving.
Oh, and (laughs) NDA's.
If you're going to allow people to write a bunch of queries,
or create analytics you need to train them,
just a little bit.
And then finally,
you need to make very intelligent choices up front about
finding a platform that has something resembling
our back in it.
Now, with all this said
there's one other piece of advice I have
that I thought of last night,
when you set this up, make sure you audit the access
from the people who ask for it.
A lot of times people are going to stomp their feet
and clamor and be like, "gimme access, gimme access,"
and then you finally granted them
after a lot of arm-twisting and teeth-pulling,
and there's going to be a substantial portion of the people
who demanded access who never log in.
Trust me, you want to go and have those logs
for a long period of time,
so when people like, "you didn't give us access,
"you didn't play well in the sandbox with us,"
that you can say, you know, "sir,
"I'm sorry but we gave your folks access
"and they never bothered to even log in,
"let alone do what they said they demanded to do."
Moving on, so every time I talk about this stuff
someone usually asks at the end,
what about doing this in the cloud.
Cool, let's talk about it.
So I have a few points, the first is is when we consider
a cloud-based logging platform, analytic platform, etc,
the first question I'm going to ask you is,
is that offering actually platform or software as a service,
or is it dressed up IAS.
Are they deploying the capability for you
through some very good automation and optimizations,
but ultimately you're responsible
for maintaining that platform after first deployment.
And for that matter, if it's paz or saz is it elastic?
And do you want it to be elastic?
If you start scaling up, your queries start scaling up,
your ingest, do you want that capability
to actually scale out
and then automatically charge you more.
That's a good question.
Are they selling you on a log analytic platform,
are they selling you a full-blown SIEM?
Number two, where are your assets?
There's a lot of companies now, a lot of start-ups,
that are totally cloud-based.
Does it make any sense for an organization
that has gone to the cloud to put just their SOC on prem?
Maybe not, in fact probably not.
Number three, what are the security measures that
that platform offers you,
can they federate with your identity and access,
can they federate with your active director
or whatever else you're using.
What is the forensic quality
of the data as they've collected it.
What safeguards are they putting in place,
yes, of course they say it's encrypted,
encryption at rest, there's more to it than that, of course.
And then finally, what regulatory compliance
and protections are they offering you.
Let's say you have
an enterprise somewhere in it that has PCI in-scope assets.
Or you have a data that's relevant
from a SOCs or HIPAA perspective, right?
You definitely need to ask those questions
of this provider before you start sending your data,
'cause invariably, what's going to happen?
You're going to get something related to that in there.
You're going to do everything you can to
make sure no customer
credit card numbers show up, but at some point or another,
authentication credentials
are going to fly by on the wire, right?
See it all the time.
Number four, what's your integration and pivoting story?
Let's say you've got your great platform in the sky,
what integrations does they provide you,
just as I said in an earlier slide.
And then finally, what can you do to get the data out?
What options do they give you?
Eventually, you'll probably want to get your data.
Maybe for an administrative
or legal proceedings or otherwise.
Can they send you a hard drive with that,
are they going to charge you to download your data?
Boy, that's frustrating.
Can you run a query over several months' worth of data
in an efficient manner?
Et cetera.
No matter where you are, in terms of your age of your SOC,
the maturity of your SOC, et cetera,
I want to offer a couple thoughts
on how to measure the overall health
of this platform you've put together.
The first one is obvious,
we want to instrument that platform to ensure that it's up
and that its utilization is within
the parameters that we expect.
The second is we want to measure our data quality.
Is the data parsed?
Are all of my feeds alive?
Has the volume of each of those feeds changed over time?
Let me give you one really good example.
There's two approaches you could take to part of this,
one of them is you could have a manual definition
of all of the data collectors, connectors, agents,
forwarders, pick your buzzword that you want to use
to describe the thing that sucks in your data.
And you could compare that list
of the collectors that you think should be functioning
to the ones that you actually saw data from
in say the last hour, the last 24 hours, the last week.
Hey, is anybody missing, did anyone go dark?
Right, very straight-forward.
Another approach is to compile that list dynamically
say over the last 30 days of data,
what are all of the collectors that I got data from
and have I gotten data from in the last hour.
Alert if the answer is no.
This approach I have been using for 15 years,
and in every SOC I go to,
it never fails in finding stuff that broke,
every single time.
When we think of it in another way,
think about operating a large SAN.
In that SAN, each individual hard drive
is relatively reliable, right?
But you take 300, 500, 1,000 hard drives together,
invariably one is breaking at any given time.
You're getting smart errors on any given hard drive
on a given day.
The same is the case of maintaining a large collector fleet.
Hundreds of collectors in some cases.
I digress.
Number three, we want to think about measuring our coverage,
remember earlier I talked about
absolute number versus percentage.
I would suggest, I would actually more than suggest,
I would argue that you need to measure both percentage
and absolute number of the number of assets
and percentage of assets that you're collecting.
Why? You don't know what you don't know.
And there's always going to be assets in your enterprise
that you didn't know exist.
So if you're measuring only your percentage,
percentage of what?
The number of desktops that were turned on
today versus yesterday?
Think about dynamic, elastic, virtualized environments
where I've got, VM is spinning up
and spinning down all the time.
Driving our monitoring percentages to 100%,
you can quote me on this, is madness.
You'll never get there and if you do,
you're going to chew up all of your resources
and pull them away from other pursuits.
Consider other ways to measure your coverage,
it's not just about the number of assets
that you're collecting, it's the asset types,
it's the different parts of your enterprise,
and it's the different parts of that computing stack.
So think about
measuring your coverage along multiple different dimensions,
and being kind of judicious
in how you focus your effort there.
Moreover, think about your analytic coverage.
One of the talks later today
is going to talk about the attack framework,
I won't mention it in detail here,
I will say that I find a lot of value,
and I know a lot folks who find a lot of value
in taking my analytics,
mapping them into a consistent framework
against my adversary TTPs,
and then measuring that framework.
What you may find, if you've never done this before
and you take your analytic coverage
and map it into the adversary TTPs,
you may find that you have really, really,
really good coverage on three tiles out of perhaps 50.
That's very troubling,
because after you've done all these other very good things,
if your analytic coverage isn't liking all those TTPs,
you're not looking at everything you need to be.
What are my false positive ratios?
Obviously you want to measure that.
One of the best SOCs I've ever worked with,
or worked for, or worked at,
every day in their stand-up,
for five to 10 minutes they would review the event volume
from the previous 24 hours,
talk about what's their number one analytic firing,
and what are they doing to tune that down.
And then on the opposite side of that spectrum,
what are the analytics that are driving
most of their investigations.
And then finally, we want to measure this.
We want a red team against the things
that we think we have instrumented.
There's a notion of purple teaming,
where you're doing this as an open-box activity.
We know the red team is coming,
we're going to watch to see what activity should be firing
and see if there's any gaps, and trust me there always are.
Perhaps even more interesting,
one of the most effective techniques I've seen
is, imagine instead of capabilities,
perhaps I'm using some kind of instrumentation
I have across my enterprise
to trigger alarms everywhere across all of my house.
Or perhaps across all the places I have network sensors.
And then I bring that telemetry back that says
this is where I should have trigger detections.
And then on the other hand,
I actually look at what detections were fired and join them.
That is a absolutely fabulous way
of insuring that my entire analytic architecture
is working end-to-end.
And I guarantee you if you do this,
you will find gaps in places
where you thought your coverage was absolutely fabulous.
And then you'll start doing some digging and be like,
"wow, oh, we missed that.
"That's interesting.
"Oh man, okay, cool."
Right?
The other thing to think about here is is,
when you start measuring all of this stuff,
get ready to spend a lot of time
on the things you just started measuring,
I've experienced this firsthand, right?
As soon as you start measuring something, it may get better,
but your operational priorities may shift.
So, to wrap up here.
Number one, when you think about your requirements
and think about who you're engineering to,
I urge you to think about every role
and every set of folks in your SOC,
whether you're a shop of one or a shop of 500.
Believe it or not, there are SOCs that will go to that big.
Including your constituents, your customers.
Number two, let's be resourceful.
One of the best analysts I ever met was amazing,
not just because he wrote the most amazing queries
and had the most amazing analytic tradecraft,
but he was super awesome at going out
and talking to random folks
and finding all these random data lakes
that seemed kind of pedestrian or nothing to do with
security, and then do the most amazing things
with the data that came out of 'em that revealed insights,
both from a detective and forensic standpoint.
Number three, we want to integrate strongly
with all these other data lakes, data stores, et cetera.
We want to tolerate the richness of those data sources
in our architecture.
We want to share out but we want to do so
in a measured and careful way.
Number six, we want to consider, looking forward,
what's our story, is it on prem, is it in-cloud,
is it a hybrid situation,
I suspect in many cases
it's going to be a hybrid story for quite a long time.
And number seven, as I just mentioned,
we will always want to measure the health of our platform
in its totality, end-to-end.
With that, questions.
Không có nhận xét nào:
Đăng nhận xét