(humming)
(audience applauding)
- So, my talk is on information anarchy,
and since basically nothing Scott said about me was true,
I'll go ahead and tell you a little bit more about myself.
The most important thing, I am totally not an anarchist.
Which you might not believe if you had seen
my browser history lately and all of the anarchy clip art
I had to look up trying to make this presentation.
It's actually, anarchy clip art is a real thing
and I couldn't use any of it
because it was on the other side of the line
of what I think I could have gotten approved.
I am a SANS instructor.
I wrote a book with Scott called
Intelligence-Driven Incident Response,
and I am a player of random musical instruments.
You all say my ukulele skills.
I also play the steel drums,
and there was a brief stint in high school
where I was part of a kazoo band.
It ended poorly.
Creative differences, and it turns out kazoos
are terrible instruments.
But the most important thing about my biography
for this talk is that I run threat intelligence for Rapid7.
And if anyone's familiar with Rapid7,
we have a lot of different tools we use.
We have a vulnerability scanner,
incident detection and response,
services and consulting.
We have a medical framework that we use
for penetration testing.
And so, I kind of coordinate and liaise
with all of these different organizations.
The goal of the threat intelligence I provide to them
is to get the right information to the right people on time,
and that's really, really important to me.
It got really, really difficult in 2017
to achieve that goal, and that's because
people would come to me usually maybe once a month
and they would say hey, I heard this thing
and it sounds really scary and what do I need to do?
And so I would take the time to figure out
what is this thing, where did they hear about it,
and what can they do?
And a lot of times, it turned out that that
wasn't actually based on accurate information.
And once a month turned into once a week,
turned into once a day, and my entire job
was now being spent tracking down
misinformation and bad information
and things that people didn't know
where they heard it from,
but they're really, really scared,
they need to know what to do.
And I actually, Chris's presentation earlier
gave me the words to describe what I was going through.
I was using all of my cognitive processing
on tracking down bad information,
and I was mentally exhausted.
It got to the point where somebody would send me a link,
any link, in Slack, and I would be like, ugh, this again,
and I would be angry.
I decided I had to do something about that.
Plan A was I was going to just stop
and I was going to go raise chickens somewhere,
but my HOA doesn't allow us to have poultry,
so I had to come up with a better plan.
All this pain led to the research we started doing
on misinformation, how do I identify it,
and how do I deal with the current state that we are in,
which I deemed information anarchy?
What is information anarchy?
It's a term that I first heard
as part of JDP 2-OO, which is the British military
intelligence doctrine, and they said that
information anarchy is a state
where you have increasing amounts of information coming in,
and an increasing lack of control
or information about where it's coming from
or whether it's valid or not,
and that leads to a state where
it actually makes more difficult for people
to make decisions based on this information,
because you don't know what's important,
you don't know what's relevant,
you don't know what you need to act on.
How did we get here?
It's not a new term and it's not a new concept
even in information security.
There was a paper in 1995 written by a gentleman
named Don Parker called A New Framework
for Information Security to Combat Information Anarchy.
1995.
I wish I had been paying a little bit better attention
but I was really caught up with my kazoo band at that time,
but 2018, we still have this problem,
we still have this issue.
We came upon it pretty gradually.
For a brief history, my brief history
of the state of information in our world,
we started out with information monarchy.
One person, one organization, often one government
controlled all of the information.
They decided what people heard,
they decided when it went out,
and they weren't really interested
in any sort of feedback.
Unfortunately, there are a lot of countries
and a lot of regimes that still exist
in this type of information state.
After that, we started to see information revolutions.
People started getting more educated.
They realized, wait a minute, this stuff you're telling me
is not necessarily the truth.
It is not the end-all, be-all.
I need more, I want more.
Luckily, in a lot of cases, they were successful,
which brought us to information democracy.
We now have multiple places we can get our information from.
We can seek it out ourselves,
we can watch the news and have people tell it to us,
we can read peer review journals,
and while I flashed I'm not an anarchist,
I should be saying, yay, information democracy,
that's the best.
But it had its problems as well,
and one of the problems is that people started,
the news people who were creating this information,
started competing for readers.
You want ratings, where people spend their time
and their money, that is where they wanted
the readers to be, so we started seeing
this fight for ratings, and because of that
we started seeing people doing things like
posting sensationalized stories,
and the fear, uncertainty, and doubt, and the FUD
in order to get those ratings
and become the best news source.
So that was bad enough.
But then, we had social media.
And social media basically allowed anybody
to be their own news source.
You can post anything you want.
It doesn't have to be real, it doesn't have to be right.
You can literally post anything.
Like I said, try it.
Please, it hurts my heart, but after that,
it becomes so hard to know what is a real news source?
What is a new news source?
What is the blog somebody is running
but it sounds legitimate?
And that brought us to our state of information anarchy
where we have so much information.
Some of it's good, some of it's bad,
and we just really don't often know
a good way to find out, and that's where I was spending
all of that cognitive processing time,
was trying to sort out what's good from what's bad.
Throughout this research, we realized it isn't just
good information and bad information.
There's lots of different types of misinformation out there.
These are some broad categories
that I identified through our research.
There's lots of subcategories throughout it.
But the first one is innocent mistakes.
Sometimes, some people just get it wrong,
and it's not intentional.
They're following their processes.
The same happens to us in intelligence analysis.
We can do our best job, we can make our best guesses,
we can use all the tools and resources we have,
and sometimes it's still just not right.
When this happens, though,
and this used to happen a lot more often,
it needs to be corrected when you identify that.
If new information comes in or you realize
you made a mistake, that it needs to be corrected,
and unfortunately what we've seen
is that even when it is corrected,
it's really hard to know that something you read yesterday
is no longer accurate.
There is a really good resource I found.
It's a website called newsdiffs.org,
N-E-W-S-D-I-F-F-S dot org,
and they, it's a project where they catalog
about a handful of the most common media outlets,
like the New York Times and the Wall Street Journal,
and they will provide updates
on when articles have been changed,
so you can see the changes side by side
so you know when new information has been introduced.
But it is actually really, really hard
even to go in and identify when somebody
did make this kind of innocent mistake in their reporting.
The next one, I see this a lot,
and I've determined that a lot of my heartache in 2017
was because of this type of misinformation.
Hypothesis as a fact.
A lot of these attacks we saw,
WannaCry2 and NotPetya and things like that,
the commentary and the information
that was out there about it was people trying
to do their analysis in real time,
often over social media or in the actual press.
They might have been good ideas.
They were definitely something that somebody
should continue to research and find out
what the outcome was, but when they make it to the news
and they make it to something people are reading,
it is really hard to know whether or not
you should act on it, because they don't actually tell you
that this isn't final.
This is a problem because people rush.
When these new breaking attacks come out
or these new vulnerabilities are announced,
we want to get information out quickly,
but that quick is often compromising our ability
to provide accurate information.
And it's not always intentional.
I know I've have times when something happens
and someone calls me and says hey, what's going on?
And I say, all right, well, we're still investigating.
I don't have all the details,
but it looks like it was X, Y, and Z.
I'm pretty sure, then I read it,
and somebody's published that
and they've cut out all of my qualifiers,
and it just says Rebekah Brown says X, Y, and Z,
and that's really unfortunate because
now I'm contributing to the confusion.
The third category is something I think we see a lot,
and this is the only time in my talk I'm gonna say this,
when people talk about fake news.
It's someone pushing an agenda.
This is essentially everything
that crazy aunt you have posts to Facebook,
where you're like really?
You know that's not true.
But it's something that confirms somebody's existing biases.
A lot of the cognitive biases we talk about
in intelligence analysis where we will find reports
full of things like that.
And it comes in multiple forms.
It can be either somebody trying to push an agenda
of I want people to be scared about their security,
I want them to think they're vulnerable
because then they'll call us
and they'll need more security services,
or it can be somebody saying
this is the political agenda I want to push,
and so I'm going to pull everything out
and formulate this news in a way
that it is going to encourage people
to believe my way of thinking.
And the fourth category is intentional disinformation.
We actually see this pretty rarely
in information security news.
A lot of times what we see are the second
and third level effects of a disinformation campaign,
but it's pretty rare that there is formal active measures
behind the things we respond to,
and that's because these are,
Thomas talked about some active measures this morning.
There's a really good quote I like from a colonel
who's in the East German foreign intelligence service,
he's in charge of their disinformation,
and he says, our friends in Russia call it
(speaks in foreign language),
our enemies in America call it active measures,
and as for me, I call it my favorite pastime.
Disinformation is an operation.
This image comes from Operation Infection,
which was a Russian disinformation campaign
that they ran in the 80s, and they convinced,
they started inserting information
and started conjecture and getting people to think
that the AIDS epidemic that was being experienced
in the U.S. was actually U.S. biological weapons.
We say that now and it doesn't make a lot of sense,
but if you look at what was going on at the time,
we had just come out of Vietnam
where biological weapons were used.
There were prisoners of war who had been captured in Vietnam
who gave coerced confessions saying that yes,
the U.S. is employing nuclear weapons,
and during the Cold War, when we're trying
to increase military spending,
it was probably in Russia's best interest
for the U.S. public to not trust their government
and not trust their military.
This campaign was very involved.
It involved thinking research and scientific journals
and all these sorts of activities,
so misinformation is not trivial,
or disinformation is not trivial, and we do see it.
Like I said, we'll see second and third level effects,
but in most cases, if somebody's just publishing
a dumb article that I have to respond to,
it's not disinformation, it's misinformation.
All right, so how can we identify it?
It's good to have a better idea of what misinformation is
and what kind of categories things go into,
but I still need to free up my time.
I still need to be able to have shortcuts,
they're gonna help me get through
all of this bad information so I can focus
on what's really important to my job.
There's a couple different techniques
that I've come up with.
Are there any Zombieland fans in here?
What's rule number one?
Cardio.
Well, since Alex Pinto has already claimed
cardio is rule number one from machine learning,
I'm going to go ahead and say rule number one
for misinformation is sourcing.
Where does the information come from?
Identifying the source and being able to know
whether or not it's a valid source
is going to cut all that information
you have to respond to by about half.
When we were responding to WannaCry,
and trying to identify what was going on,
as we kind of worked through this little war room
across all of Rapid7, I had one person
who kept saying, I have IOCs, I have IOCs,
and they would send me this list in Excel
of some IPs and domains, and he's like,
we need to push those in for detection,
and I'm like where, what's going on?
One of the guys I work with jokes that
I am the great intel firewall,
'cause nothing goes into our detection systems
unless I approve it, and that often involves
knowing the source and me being comfortable
with the sourcing.
When I finally got this person to explain to me
where he got those IPs from,
his response was, literally, Twitter, word of mouth,
and probably some other places.
(audience laughing)
I don't have, that made it pretty easy
for me to be like, no.
And turns out, it was a separate campaign,
it was the (mumbles) campaign
that had been running at the same time,
but the unfortunate thing is,
if you look through a lot of threat intelligence tools,
especially aggregators, you will see those IPs
and those domains listed as tied to WannaCry,
because once something's on the internet,
once something's on Twitter, if you're mining Twitter,
you're gonna get the bad information with the good.
Even though we are able to identify
that that was not linked, if you go look it up today,
you will have a really hard time figuring that out.
You would have to go back to the actual source of the source
who later corrected it, and said just kidding,
totally different thing.
It's not that easy to do.
When we're looking at sourcing,
the things you want to know are
where did the information come from?
A lot of times, like I said that itself
will be like nope, not acting on this.
The next thing is can you access the source material?
If they say, oh it came from a report from CrowdStrike.
Okay, cool, can I actually see that report?
Can I get that report to validate, to verify,
to answer any follow up questions that I have?
A lot of times, the answer is no,
and that makes it really, really difficult to do your job,
but if you are able to see the original source,
who is being cited, what report, what analysis,
what Twitter post in some cases,
that will give you a lot better idea
of whether or not it's something that you can follow up on.
And then finally, was a structured analytic method used
when you read through this, whoever they're talking about,
where this information came from?
Did they go through some sort of process
to get from point A to point B?
Carmen spoke about how structured analytic techniques
are not always necessary and that's absolutely true,
but I want to know that there was some sort
of analytic process that happened
to get to the information I'm being asked to respond to.
The next thing we looked at after sourcing
was linguistic methods.
One of the things I wrote up in my bio
is that I'm writing my Master's thesis,
and my degree is in homeland security
with a cyber security focus,
and a graduate certificate in intelligence analysis,
so I take a lot of weird classes,
and one of my favorites was actually called
intelligence profiling of leaders,
which, I read intelligence profiling, and I was like whoa,
pretty sure we're not supposed to do that,
but when you're talking about leaders
it was actually really helpful to understand
how these people who are making decisions operate
and how they think and what motivates them.
And they used things like leadership trade analysis
and sentiment analysis and motive imagery analysis
to look over texts that people speak about
to understand more about what they're saying.
I decided to try and apply some of those methods
and we had to tweak obviously some of the word lists
and things to make it more applicable
to information security research.
But I ended up after looking through about 100 articles,
some good, some really, really bad, like I feel,
I don't know, man, I wish I hadn't read
some of those blog posts,
but I did it for science, and I came up with four lists
of things to look at, look for in an article.
The first is words of sourcing.
Things like according to, as per,
they're going to tell me that this text that I'm reading
was based on something.
The next thing I looked for was words of uncertainty.
If you've done a lot of intelligence work
and you've written intelligence reporting
you know those words of uncertainty are really important.
Those are things like possibly, and could be, and might,
that show kind of how confident we are
and when something's a fact and when it's an assessment.
The next thing I looked for were explanatory phrases.
Things like because and therefore
that show that they're doing some of that
analytic explaining and talking through their process
and not just stating facts with nothing to back it up.
And then the fourth thing was retractors.
Those are words like but, however, although.
And retractors serve two purposes.
Sometimes, retractors actually show complexity,
which we're going to talk about next,
meaning that somebody can identify
that there's more than one side to a story,
or there might be concerns, and they can address them
as part of their own analysis, and those are good,
and then there's retractors where people
kind of want to give themselves a way out.
They could be like, that was totally China who hacked those
but who do I know, it could be anybody.
You've made this very blatant statement
and then you've kind of tried to walk yourself back.
People are gonna remember that blatant statement
and then it turns out you're wrong,
you can go back and be like,
hey, I said, what do I know?
Retractors are often used as a way
of letting yourself state something sensational
without having to be accountable for it later.
So like I said, we did this across 100 different documents,
refined our word list, had some issues.
May was one of our words of uncertainty.
We found huge spikes in reports that came out
in the month of May, so we had to do some tweaking.
I have a great data science team,
so I'm very, very thankful they helped me with those,
but for our case study, we took seven different articles
about the DNC hack.
Five had some pretty blatant misinformation,
and bad information.
Two were pretty, we view them as these are objective,
they're conveying the facts, and we kind of ran them through
just to see what it looked like.
What we found was a couple different interesting things.
Our explanatory and our retractors
were kind of basically consistent.
And this was, for anybody who does data science,
this was normalized by words in the articles
so we did a percentage of words in the article
rather than just a raw count,
but what was interesting was that the sourcing
and the uncertainty was all over the place.
Some articles use a lot, some articles use none.
The first thing that jumped out
when we used this first analysis right here
was our words of uncertainty in this article, zero.
And it was not a short article.
You're telling me there's not a single possibly,
or might, or any room for chance in that,
so that jumped out to me right away
as something to look into.
In addition to just kind of looking
at what they look like in general,
we took a look at each article's profile,
so each of those categories of their words
across the whole article, and again,
a couple of things jumped out.
The ones here that are kind of more balanced,
and they definitely have some but no huge spikes,
these were our good reports.
Right off the bat, that told me that okay, you know what,
that balance and that middle line,
there's something to that.
Again, we have our one to zero words of uncertainty,
tons of sourcing, our IVN, which stands for
Independer Voting Network and that hey,
they're saying they're independ,
and so I'm trusting that they're objective,
but they had basically none.
They had very, very small counts of any of those words.
We generated a heat map because we love heat maps,
and what we saw were right around here, this is the midline.
Kind of that orange to pinkish purple color,
and so our two control documents
were kind of more consistent and more in the midline.
Here's our IVN report there with basically nothing,
and then we had a couple of different reports
with things like high sourcing but not much else,
and we could kind of start to see the fingerprint.
What does a good piece of information look like
versus something I know is bad?
Again, I'm not looking for perfection,
I'm looking for shortcuts.
I'm looking for something that's going to help me
quickly identify whether or not this is worth my time.
We found a couple of trends by then
going back in and reading more about the document
or more about the articles and looking for where it fit
in those different categories,
so we found a couple of trends.
The first one, in documents where they had high sourcing
and high words of uncertainty,
those tended to be more of those hypothesis as facts.
There was a source, somebody was willing
to talk to somebody, they could say who was being cited,
but there were lots of high words of uncertainty.
It could be, might be, we'll see, we'll find out.
Looking for that particular pattern
is a good way to identify that type of misinformation.
We also saw high words of sourcing
and then low uncertainty and low explanatory
fit the pattern of pushing an agenda,
and what we found when we went back to look in more detail
was that they had high sourcing
but the sources were things like
according to an anonymous source,
or according to somebody close to the information
and I'm like, wait a minute,
you're saying the right words to make me think
that you've talked to somebody about this,
but you're not giving me any information.
The next stage of our research,
we're going to start applying tags,
so when we look for words of sourcing
we're going to try and identify
whether it's a person's name versus
somebody close to the President,
'cause let's face it, we're all here in Bethesda.
Compared to most of the world,
we are relatively close to the President.
I feel like we can go ahead and start citing things.
The next one we found is low everything,
and this was our IVN report
and a couple of other opinion pieces.
This is the profile of an opinion piece pushing an agenda.
There's not a lot of sourcing
'cause it's somebody's own opinion.
There's not a lot of words of uncertainty
because they are certain of their position.
Not a lot of explanations, not a lot of retractors.
When you see that low pattern,
I would not even really waste my time on that
because that's likely to be opinion, not fact.
And then none of the information in our sample
fit this profile, but we started looking out for
okay, what does disinformation actually look like?
We had to go back and pull different documents
from the CIA archives that had been published
and we found in a lot of cases
that information that was being sent out
as kind of propaganda and disinformation
had high words of uncertainty and high retractors
and again, they want to be able to back themselves out.
All they want to do is sew those seeds of doubt in your mind
and there was low sourcing and low explantation.
Again, that's just something to look for.
Like I said, we're still kind of still
tweaking our algorithms.
We've already got more things we want to look for,
but we want to make this mechanism that we use
to do this quick fingerprint available
just in case it does help other people
kind of understand how to handle
a lot of the information that's out there.
The problem we found with information security, though,
is that that model really only works with text,
like with news articles and media reporting
and even long form intelligence reports,
we saw the same profiles.
But it doesn't work really well
for things like malware analysis
or technical blogs or vulnerability disclosure,
so what we started looking for there
is something called integrative complexity,
which is a score from one to seven
that looks at how complex, not the content,
but the way that words are put together
and the way that they are structured.
It looks for two things.
Differentiation, how well can I identify
that there's more than one possible answer
or outcome or consideration, and then integration,
which is how well can I piece together
different information and then draw those connections
between different things.
It is really hard, really, really hard,
to automate integrative complexity.
A lot of people have tried.
There's some documentation out there about how to do it,
but the recommended way to handle it
is still like hand coding,
which is not going to save me any processing power there.
What I started doing with this
is coming up with some rules of thumb
for the very simple versus the very complex documents
with the idea that the more complex it is,
the more likely it has been well researched and thorough.
Some tips for that are when you start to see
words like just, or always, or never,
that's the indication of a more simplistic viewpoint.
Look for synthesis and multiple data sources.
That's when we talk about that integration.
If somebody's taken more things into account,
they've probably done some more thorough research.
Look for counterpoints or arguments
to be preemptively addressed,
especially with a lot of things that we do.
There are going to be counterpoints,
and if there's an analyst who can identify those
and say well yes, we know that sometimes
people run this against this type of system
and so we would expect to see this sort of thing
but in this case, we saw,
when they're doing that type of analysis
I have a little more faith that it's worth my time.
And then look for complexity across the entire text,
not just their area of expertise.
This is a big problem I found
as kind of a result of doing this analysis,
which is that any time we talk about vulnerabilities,
we find big problems or attacks or people aren't patching,
our analysis of information security,
our domain where we're experts, super complex.
We can see all the different components,
we can articulate all the components,
we know how they're all related,
and then when we say, what needs to be done?
Well, just patch.
Jeez, you're dumb, why didn't you already fix that?
We cannot reach that same level of complexity
outside of information security
to tell people how to address the problems.
That's a whole another line of research,
but it's just something to be aware of,
that you want that complexity across the whole spectrum,
not just one particular area.
All right, so how do we survive,
how did I find out how to survive in the misinformation age?
Sourcing, look for where the information came from.
That's going to be your number one way
of weeding out bad information right there.
Number two, content, what are they saying?
Look for some of those tips like
according to an anonymous source close to the President.
Look for high words of uncertainty.
Like I said, you want some, but you don't want
the whole thing to be a high word of uncertainty
or that could give you an indication
that it's not actually final research or final information.
And then look at the structure.
Like I said, any time somebody says
always or never or just, I'm immediately suspicious.
Sometimes, they are good cases for that,
but it is kind of a good flag to look for.
And then finally, let's not just survive.
Let's change things.
Let's not make it more and more piled on
of bad information.
A lot of us in this room, we have the ability
to create content.
We provide input to media, to information report,
to infosec reporting, to blogs.
Make sure that you are taking the time
to do good analysis, to identify your sources
and validate them before you start putting information out.
It's great if you go back and fix it after you need it,
but the best way to make sure
people are getting the right information
is to put it out there and take the extra time
to be thorough.
Let's raise the bar.
And if that requires raising a little hell
and turning the current way that we're doing things
on its head, I'm okay with that, too.
Like I said, totally not an anarchist.
If you want to learn more, we started putting
our basic research that we've been doing
on the thematic content analysis into my github repo there.
These are some additional documentations,
they're not necessarily CTI documentations
except for the very bottom one.
That's actually a really good current report
on countering Russian disinformation.
But it's some different good things you can look at
to better understand how we can solve problems
by looking outside of our own domain of expertise.
All right, I am out of time,
but thank you so much for being here.
Thank you for staying, I really, really appreciate it,
and I hope to see you all back next year.
(audience applauding)
(tense music)
Không có nhận xét nào:
Đăng nhận xét