>> So, it's my pleasure to welcome Mark Briers to
MSR to give a talk today entitled Turing,
Bayes and Cyber Security.
Mark is the Program Director of
Security the Alan Turing Institute.
Prior to that he worked for 16 years in
the Defense Industry primarily in
the areas of statistical data analysis.
His research interests include
scalable bayesian inference,
sequential inference, and anomaly detection,
particularly areas of cyber security.
So thank you for doing this talk today, Mark.
>> No, thank you. Thank you for inviting me and thank you
for coming here on early- is it Wednesday?
I've kind of- Wednesday or Thursday.
I've lost count what day it is.
Jet lag and what gave you .
So thank you for hosting me today.
So, what I want to do is
to split this talk essentially into
three parts I think I changed talk title as
well just to keep us all on our toes.
So, what I want to do is to discuss
the Alan Turing Institute and introduce that to you.
Your organization and hopefully motivate you,
inspire you and convince you that
it's a worthwhile opportunity to collaborate with us,
if that's of interest to you guys.
Then kind of in some sense the segue from
Turing and the Turing Institutes and Alan Turing the man
into the use of Bayesian statistics in the context of
security related applications and
Turing's work at Bletchley Park.
So there's a couple of slides on and
some relatively recent publications of
Alan Turing's that've been declassified
and pushed out that not many people are aware of,
which demonstrate Turing's use
of Bayes which is quite cool in my opinion.
Then finally, try to kind of- there's
so many tenuous links in this talk
and finally link the kind
of Bayesian story to my work in
cybersecurity and give you an overview of what I'm
trying to achieve with the work at
the institutes in the context of cybersecurity.
My background is I'm a statistician by kind of PhD.
Specifically sequential Monte-Carlo that was
the subject of my PhD thesis.
So, you'll see lots of use of those kinds of algorithms
and I've been taught
Bayesian statistics from an undergraduate level.
So, all I know is base so if there's
a question my solution is a Bayesian solution,
so I will make a slight apology for being
completely kind of shortsighted in my approach to things.
So, the Alan Turing Institute,
I've got a few slides on the Alan Turing institute.
So, we are a charity in the UK,
and we were set up by the UK government,
well, formally announced in March 2014.
And what was happening in the UK and perhaps around
the world but specifically in
the context of the UK landscape,
there's lots of work being done in,
well, I guess we know call it data science,
perhaps we'd even call it artificial intelligence given
the evolution of the type that gets
such these data related activities.
But back in the day it was referred to as big data.
So, there are lots of difference initiatives,
research initiatives or application relation initiatives
happening in academia and in industry.
There's lots of fragmentation even within the UK.
So, the UK government's was particularly keen to ensure
that UK society and
UK PLC can benefit from these activities,
and so they set up a National Institutes for data
science and artificial intelligence related research
to provide national level leadership and ensure
that UK society and in collaboration with
our international partners internationally we can
benefit from research activities
taking place in these areas.
So, the institutes- and you can think
of as sitting at the center of two networks.
On one hand we've got a network of
academic related institutions and
I'll talk about that in more detail shortly,
on the other side we have a number
of industry or government related partners through
which we try to kind of get real-world problems and
ensure that our research has real-world applicability.
Our job is essentially to try to
make some sense out of
all the stuff that's going on in academia and industry,
connect things up and
provide strategic leadership at the national level.
Alongside that were also expected to
train and educate everybody
from some of my relatives who know nothing
about data science all the way through to
professors who want to know more
about a specific specialized area.
So we're expected to kind of
provide training and education as well.
So we've got quite agreement
and we get a reasonable amounts of
funding from central government
which is actually funneled through
one of our research council's,
similar to the NSF I guess.
So we're considered to be
a strategic government investment.
That means that we're
not considered to be
predatory in any sense because we are a charity,
we've got some charitable goals around
social good and training and education,
people don't see us as a threat,
a commercial threat or otherwise.
They see us as a way in which they
can utilize our expertise and the expertise
that we bring in to benefit their organizations.
And in doing so we can benefit
or we can align ourselves to our charitable goals.
So, we've got a network of
university partners and I'll
describe how that works in a moment,
but these are the university partners that we
have at the moment so we started off with
five universities when we were
initially created so we were announced by
George Osborne who was our and it was our Chancellor of
the Exchequer back in 2014.
We kind of started in 2016 really doing any activity.
So October 2016, so about two years old now.
We started with five universities.
So, we had a competition in the UK to decide
the five best universities in data science and AI and
the top five one and there'll be
no surprises there in terms of the names.
These are household names in some respects.
Then in the past six months we've increased
our network with eight more universities.
We're now up to 13 universities, if I can count,
and what we do is we
essentially second academics from universities
to work with us and
provide us with the intellectual capital that we
need to be able to
undertake the research activities
that we wish to undertake.
I'll touch on that like say in a moment.
Just to show you that we're not London-centric.
So, the cool thing is that, well, I think it's cool.
The cool thing is that
the Alan Turing Institute's headquarters is
based in the British Library in London.
So, if you ever get across the London them do
please drop me an email and my email
address is on the final slide.
Come visit us, because the British library has
some cool artifacts and
some Mark McCarthy's hosted there.
We actually have an iPad coffee machine
which is another cool thing.
Maybe is not so cool.
In Microsoft you may have those things.
But in the UK that's kind of unique
so it's one of the attracting things.
So Magna Carta and iPad coffee machine,
if they can't attract you over
to the UK then I don't know what will.
So, we're based in London and we have
two university partners also based in
London UCL and Queen Mary.
Then we have them universities based around the country.
So, we have offices in each one of
these locations and I suspect we'll
have representation in Northern Ireland and in Wales in
relatively short timescales as well or I hope that
from a personal perspective so that we have
full geographic coverage across the UK.
So, that's the academic side in terms
of the partners that we have.
So, when we first started, again,
we had four partners initially,
so these were launchde these with
the types of organizations that we're working with.
So, Lloyd's Register Foundation.
They are a charity themselves and they
own Lloyd's Register which is an insurance company.
That's a commercial entity
but the profits from the commercial entity,
as I understand, they feedback into the charity.
The charity's mission is to essentially make
the world a safer place in which to live.
I guess that benefits the
insurance arm of the organization.
So, Lloyd's Register particularly interested in,
they call it data-centric engineering.
So, engineering of
critical national infrastructure to ensure
the safety and integrity of
those infrastructure, maximize that.
Intel, marine tested and
co-designing chipsets and new algorithms.
I guess the generating optimized the sales.
I lead the interaction
primarily with these organizations,
so GCHQ which is the UK equivalent of NSA,
our Ministry of Defense and
the Ministry Defense research arm which
is Defense Science Technology Lab, DSTL.
Then finally, one of our partners was HSBC.
I've been told many stories about HSBC, but allegedly,
HSBC has 20% of
the world's financial transaction
trade flow data going through the books,
so that's quite a lot of data and they want
to do lots of things with that data.
So, there's a co-partnership happening with HSBC.
I will go through all of these
because I realize it's a little boring.
The partnership with Microsoft is one way of presence,
so we're very grateful to Microsoft for providing us with
a million pounds worth visual credits
here to do cool stuff,
but that's where the relationship
gives a million pounds worth of
credits and lots more.
So, part of my motivation for
being here is to try to reach out beyond
those usual credits and
make some meaningful attempt at collaboration.
Some of you may know Andrew Blake who was
the director of Microsoft
Research Cambridge as I understand it.
A few years ago he was
our initial director at the Institute.
Andrew is now moved on to
bigger and better things I'm told.
So, there is a link between MSR and MSL
Cambridge at least on the Institutes in London,
but like I say, I'm really keen
to develop those relationships.
So, as I go through this presentation,
if anything takes your fancy then please do contact me.
I can arrange to link you up with
the specific academics or a more general level.
So, in terms of how we're structured,
how many people we hav,
we're not at the MSL size sadly,
but we do have a reasonable amounts
of people at least by UK standards.
So, we have 250 or over 250 Turing Fellows.
So, we have to use
the word Turing in front of everything.
That's the brand that we attach ourselves to.
So, what that mean,
that means that we succumbed in academics
from our 13 partner universities.
So this would be professors all the way through to
junior academics working with us for one, two,
three days a week on specific research projects
or doing simple teaching with us
or appearing on House of
Lords committees or trying to
shape public policy and do lots of different tactics.
We actually get them to do quite a lot
of work for us in lots of different ways,
and it's amazing academic community.
So, from the 13 best UK universities
we then selected 250,
so best academics in this area to work with us.
So, we have a really great people of
academics working with us and some of the names,
I won't list them because
I don't want to show you demonstrate
my favorite academics but some of
the names I hope you will come across.
We have 19 of
our own research fellows which
really translates into postdocs.
We have about 50 PhD students.
We have an internship program
which is running at the moments which I'll come back
to at the end of the summer period.
We have quite a lot of visiting researchers
from academia, industry, and government.
The UK government in response to Brexit has
introduced a scheme called the Rutherford Fellows.
What that is trying to do
is to demonstrate that the UK is
an open country which I hope we always will be.
They fund senior, great academics,
or great researchers not academics,
great researchers from around the world to
come and spend up to
a year in the UK working at
different research organizations or in industry.
So the Turing Institutes has been given quite a bit of
money to attract lots of people in the UK.
So, we've got six or seven people
from the US actually come across,
so people from statisticians,
for instance, from CMU
are working with us and
spending six months or so with us.
So, if there's any interest in that,
I'm not quite sure what
Microsoft position on that would
be but if that's of interest
then we have the ability to find people to come
across and spend a
reasonable amount of time working with us.
We have our own, we call them research engineers,
software engineers that are of more
academically minded than your average software engineer.
We have a reasonable size admin team
because to manage 250 academics,
you almost need 500 size admin team
but we didn't go to that order of magnitude,
we settled at 50.
We have seven program directors.
So, I'm one of the program directors at the institute.
We will have eight eventually.
That aligns to our aid programs and the challenges.
So, the program directors essentially
control or lead the research of
the institutes aligned to
one of these eight thematic areas.
This is a slide that's been
created by the marketing department,
so you can suddenly
get less and less technical as
the marketing department get more and more involved.
I should remember that this has
been publicly released as well.
Just in case
the marketing department watch this presentation,
it's a great fantastic slide
but it's not the most technical slide I've ever given.
So, we're doing work in health care.
As I mentioned, we're doing work in engineering.
I lead the work that we do in security.
We do work on economics using HSBC data but again,
focusing on our charts
full of [inaudible] around social good and education.
We have quite a big interest
in ethics and making machines,
seeking decisions fair, transparent and ethical.
I touched upon this in terms of
our interactions with Intel,
and designing computers and chipsets and algorithms,
kind of co-designed in those too.
We are doing work in science and humanities,
and we are trying to foster government's innovation.
I've still not figured out what that actually means.
But we're trying to make government more efficient
in every sense of the word.
So, as you guys, I'm sure,
know, statistics, data science,
artificial intelligence, call it what you like,
computer science I suppose, has general applicability.
And we will set up as these national institutes to
derive benefit from all the work that we undertake.
And so what we decided to do was focus on
these eight application areas essentially.
We believe, with the partners that we have,
if we do make positive impacts in
one or more of these eight areas then we'll
be able to meet our charitable objectives.
So we focused on eight areas,
which doesn't feel like that much
focus because actually each one
of these areas is huge in itself.
But it's a level of focus.
Is as much focus as we can give ourselves at present.
We have a little bit of technical focus.
Although the scientific strategy is still emerging,
and I will kind of propose a scientific strategy
or rather a focus where I
think the institute is
best placed to contribute scientifically.
And that builds on Alan Turing's legacy, in my opinion,
or one of his many contributions
to the scientific literature.
But I'll come on the slide in a moment.
In terms of some of the applied research projects, so,
again I have just given you a very high level overview of
the types of things we're doing and if
any of these are of interest,
the publications on our websites,
and we open-source all of
the software that would generate,
so essentially everything.
We believe in reproducibility,
we believe in openness, etc.
So we're trying to push
everything out there so that people
can benefit from the work that we do.
So, just to give you
a little bit of insight into the work that we do.
Some of them are self explanatory.
Digital twin this is related
to the work that
we're doing with large register foundation.
So, apparently there's a 3D printed bridge
being placed in Amsterdam.
The printed sensors are all over this bridge and is
quite cool video that I don't have on my laptop,
this bridge actually being printed.
But they don't really understand, as I understand it,
they don't really understand
the long-term structural integrity
or structural properties of this bridge.
And how it is going to degrade as
a function of time and so on.
But there are instruments in it with lots of sensors and,
so this project's all about trying to make
sense of that data and making sure that
the bridge can hold
people or vehicles or wherever it is trying to hold.
So, it's quite an important project from that perspective
but as we start to 3D print more bridges,
which is expected to happen in time,
we need to be able to understand the types of
data that come in from this type of system.
The national economy dashboard.
So, this is where we're utilizing the HSBC data,
so this is, the UK government,
I am told, has a good understanding of
how much trade flows say between the UK and the US,
because we know how much stuff goes over our borders.
But actually they don't know
how much trade flows between London and
Manchester because this is
not an easy way of measuring that.
So, the HSBC data it gives us a proxy
to measuring and trade flows between
different geographical areas in the UK,
and that allows us to essentially to
produce local GDP figures almost,
and optimized national level economic strategies
associated with,
and the different flows of trade for instance,
between different major cities in the UK.
That's given the finance parts of
the governments an ability
to optimize their interventions,
and I'm sure it was a political dimension to that,
but I would hope that they make
all the decisions based on the
data that's in front of them,
versus, but that's not always the case.
I guess I will go through
all of these because I realize
it's kind of a bit dull and
I've sat in the audience listening to
these kinds of talks myself,
so I won't bore you too much.
But like say if there's anything on that slide
that is of interest then please do contact me,
or visit our website
turing.ac.uk and you can find out more details.
In terms of the work that I lead,
and I ask the academics
or work with the academics to undertake this,
I've got three multi-year research
projects happening at present.
And so GUARD, I think what we did here was actually,
get the acronym and then figure out what the name of
the project was retrospectively.
Anyway, essentially GUARD is about predicting conflicts.
So, it's a combination of
graph theory and incorporating into
those graphical based representations
some stochastic differential equations,
and model things as a function of time.
And integrating lots of different datasets,
to be able to predict
areas that are susceptible to conflict.
And that could be, so we've been
working with the Colombian governments for instance,
on looking at different drugs cartels
and how they interact,
or how they shoot each other,
and then suggest intervention strategies based
on the geographical topology,
and some of the work that we've been
doing, and saying right okay,
if you insert a roadblock in this location and
that's likely to reduce the amount
of shootings or interactions
between these different groups,
and all the way up to the international levels in
these types of areas are likely to
be susceptible to conflicts in the future.
Critically, because I'm a Bayesian
and obviously I applied
prior distributions over everything and we
quantify the uncertainty with everything that we do.
Which is kind of cool.
So, a project starts and interaction between Warwick,
so Turing Institute Warwick University
and University College London.
AIDA is a bit like DARPA's D3M project,
but is on a completely different financial scale,
as you'd expect because when DARPA
invest they invest big whereas when we
invest we invest proportionately
to how much money we have got in the bank.
So, in this project,
there's an interaction between Cambridge,
Edinburgh and Oxford Universities
and the Turing Institutes.
And basically what we're trying to do is to
semi-automate parts of the data wrangling process.
So, we're looking at, is a lot of Gaussian process stuff.
If you've seen Zoubin Ghahramani talk about
the automated statistician and
model selection in Gaussian processes.
Essentially that's kind of
what we're doing in this space.
And then finally, in terms of, on this slide,
computing in untrusted environments,
so what we did in there is utilized in
Intel's SGX, and technology.
So we've we've just released
a SGX compliant Linux kind of library.
So, SGX-LKL,
or LKL-SGX, can't remember which way around it is.
Sitting on top of that SGX
compliant Linux kind of library
we've actually placed Spark,
and you may think that's a trivial operation but
the memory footprint in SGX is quite small,
and so the interactions that one has to
do to get Spark sit in,
and even just the JVM sitting inside of SGX,
sitting in essentially an encrypted memory,
is quite significant and
so what that has now given us an ability to do,
is to run secure containers essentially,
in cloud-based infrastructure, and
scale out using Spark's scalability properties.
Essentially we've got full end-to-end encryption now,
we've got encryption at rest,
and we have encryption using SGX when
data and processing capability is in memory.
So, that's quite cool.
So, we're partnered with them,
that's Cambridge, Imperial College,
Turing Institutes and we have
interactions with Docker, and in fact,
we have interactions with
MSL Cambridge on that particular project too.
So, that's what in there,
and I promise I'll get to
some kind of technical contents at some point soon.
Then, I've kicked off,
I watch a lot of these projects now ended,
but I kicked off a bunch of projects and short projects.
Those previous projects are all multi-year projects.
These were all six-month projects.
So, Adversarial Machine Learning is
of great interest to many people.
What we do in there is based in deep learning,
and not just because I'm a Bayesian.
Actually, the guy that run
these projects is also Bayesian,
so that's why I thought that was a great idea.
So, there's a natural combination of things,
but what we're doing there is
essentially spotting hours of distribution,
data points, and quantifying the probability
that those kinds of data points
are adversarial-related or adversarily-generated.
We've been doing some work
with the National Cybersecurity Center.
So, GCHQ has an arm of GHCQ,
which is called the National Cybersecurity Center,
and they're tasked with ensuring
the UK is safe from a cybersecurity perspective.
So, we've been doing some core work and run demark,
analyzing demark data and analyzing
different interesting data sets
to characterize the UK governments web footprint.
You may think that's easy, but actually when
a fire station in the middle of the countryside,
in the middle of nowhere, in the UK,
sets or pay websites on GoDaddy,
or wherever, and pays them £20,
you don't necessarily always know
that that website's been created,
and from a central government perspective,
yet you're responsible for ensuring
that the government is responsible for
ensuring that it's protected to some extent.
So, we've been trying to help them to understand network.
The final piece I'll touch upon
is evaluating homomorphic encryption.
We've just open-sourced a software platform,
which we call SHEEP.
I can't remember what the acronym stands for,
but it's one of these most recursive acronyms.
What that is, it's
taken homomorphic encryption primitives.
So, the addition, multiplication operations,
and the different implementations of those,
and actually providing a benchmarking platform,
through which people when
the clever mathematicians generates
a new FAG-related algorithm,
they can push it on this platform and they compare and
contrast their runtime performance,
as well as some of the mathematical assumptions that are
baked into these different algorithms.
So, we noticed that in the multiparty computation worlds,
there's a benchmarking platform,
but there wasn't one in the FAG world,
so we've now generated this platform.
There's a paper just ICML on this platform.
Hopefully, we'll see some adoption of that.
So, the reason I mentioned that is, obviously,
it's a bit of advertising with thoughts of interest.
If you're an FAG person,
then please do have a look at SHEEP
and see whether it's of interest to you.
So, what I tried to do is
cover not so quickly, so I apologize for that.
Quite a lot of the work that we did at the Institute.
I didn't really touch upon some of
the ethics side of things
and some of the most social sciences side of things,
partly because that's not my technical background.
I would do them
a disservice if I tried to explain
what we're doing in the area of data ethics,
but within the law on that side of the spectrum.
Working with lawyers, working
with more of the kind of philosophers,
as it were trying to
help the government and organizations of different kinds,
and show that they use data in
an ethical and legally compliant manner,
and changing the UK law such
that we are ethically responsible in the use of data.
That's quite an interesting area
from a cybersecurity perspective.
There is something that I've
recently been thinking about.
So, I wouldn't say anything profound at the moment,
but I think it's an area that we should,
for those of you that are cybersecurity related research,
you should start to think a
little bit more than we currently perhaps do.
So, I'll start the segue now into a bit more
of a technical content.
So, I was interested,
so I joined the Alan Turing Institute for many reasons,
partly because it's a national institute in the UK,
and partly because of the brand,
the brand of the man himself,
and the brand of the university partners
and the institute essentially
trades off all of those brands.
That's what gives us
great convening power and
also allows us to create national level impacts.
I was interested in Alan Turing and what he did,
so I started to read some papers of
his and papers that were written about him.
It turns out, you may suspect I would say this,
that Alan Turing himself and I'll cast him,
perhaps maybe not one of the first British,
certainly one of the first data scientists
that I've come across.
That's through my limited reading in history.
He was a statistical data scientist,
specifically at Bletchley Park. Why do I say that?
Well, I'll give you some citations
and some quotes from him
to demonstrate that that's partly true.
But if data science is the combination
of different scientific disciplines
to derive value from data,
then Turing and his work
at Bletchley Park, essentially that's what he did,
so in Hut eight at Bletchley Park, he had linguists.
He had, I guess the first computer scientists.
He had hardware engineers.
Quite crucially, he had Jack Good as a statistician,
which was helping to guide him towards the boundaries,
MAS algorithm, all the work they
did in deciphering Enigma.
So, I propose that, or rather,
the literature and I proposed
that Turing was a statistical data scientist,
so the institute has
quite a natural link back to Turing and his work.
Then, if we focus in
on Alan Turing's view of probability.
So, there is a really cool paper.
Back in 2012, Alan Turing appeared on archive,
which is not something that happens every day.
When you get notifications on you that Turing's just
appeared on archive, but GCHQ kindly,
openly published, declassified, and openly published
a paper of Turing's in 2012.
You can go on the Internet and download
this either from archive because
somebody's typed it up where you can see
the original manuscripts, which was handwritten.
The paper by Turing,
which was written in 1941,
but suddenly published in 2012,
it's titled
The Application of Probability to Cryptography.
My mathematical knowledge, being a statistician,
isn't as great as I'd like it to be.
Well, I actually do
understand the cryptography that goes on in this paper,
but I don't fully understand cryptography
in any meaningful sense,
but I do understand the probability side
of things that he presented, thankfully.
Some of the key quotes that come from this paper.
Actually, there's a great guy,
a great US-based academic head called Sandy Zabell,
who has done quite a lot of work on the history of
Turing and provided commentary on this paper,
which so if you're interested in this paper,
I suggest you also read this accompanying paper,
which provides a commentary.
It's actually Sandy Zabell
that pulled out some of these quotes.
So, this, to me,
is evidence that Turing was a Bayesian,
is a Bayesian, was a Bayesian, I guess.
The use of Bayesian statistics
through World War II was actually
the key thing that helps us to decipher
Nick Moon allegedly win the war.
So, on Bayesian statistics,
if I extrapolate Bayesian statistics
and Bayesian methodology helped to win the war.
That's quite extraordinary, given that
Bayesian statistics in the '40s
was seen as something that one should never do.
So from my heart,
it's further off to
Alan Turing for actually persevering with
the Bayesian methodology in utilizing this
in the way that I'll
describe in a moment released at high level.
Turing said in his manuscript,
the probability of an event
on certain evidence is the proportion of
cases in that which an event
may be expected to happen given that evidence.
So, that loosely suggests that he's a Bayesian.
He's thinking about conditional probabilities.
That's my interpretation of that statement.
But then more specifically,
he talks about the evidence
concerning the possibility of an event
occurring usually divides into a part
about which statistics are available,
i.e., a likelihood function,
under less definite parts about which one
can only use one's judgments, i.e prior knowledge.
We combine those in a mathematically rigorous way,
as you all know, because I'm sure
some of you are Bayesians in this audience.
Well, I hope you are. We get posterior distribution.
So, that suggests Turing was Bayesian.
Actually, what really suggest Turing was
a Bayesian was the quotes which
directly states
that nearly all applications of probability to
cryptography depend
on the factor principle or Bayes theorem.
So, back in 1940's,
Bayesian statistics have been used.
Essentially, what we're computing
was Bayes factors and
we introduce some cool computational tools, the deciban.
Deciban was introduced basically in the same way that,
the computer scientists in the audience
was used computational tricks
to improve computational tractability.
The deciban was introduced
back in the 40's to do exactly that.
So, some cool things that we do now almost naturally as
a data science community that they would do in
Hut eight back in the
1940's which I find quite inspirational.
Again, there's another great paper by Jack Good.
This stuff, the work that they were
doing in Hut eight really started to not
leak but started to appear in
the open literature around 1980's.
Again, Jack Good, I have
infinite respect for these kinds of academics who
were utilizing big proponents of
these kinds of methodologies that
were completely out of favor.
During the period,
and during the 40s,
and yet they persevered.
They knew that this changed the course of
the war in specific ways
in which they utilized relative simple ways,
in which they utilized these methodologies,
and they were constantly told by
the rest of the academic community that
Bayes in statistics was just not useful in any way,
shape or form, and yet they had to keep,
that's bite the lip and not
actually release any of this stuff.
So, it's remarkable.
I'm not sure I would have the self-control to be able
to not leak all
secret. Maybe, I shouldn't say that actually.
I have the self-control, just in case anybody's watching.
So, that leads me to take on my interest which is,
I want to be able to build
a Bayesian model for cyber-security or,
specifically in
this context network-based cyber-security.
I think there are four key things
that we need to learn about,
and I have not done any of these yet.
I should say that in writing, beginning this journey.
I might be interested in anybody's
thoughts in any of these things,
and I'll allude to
the problems that sits within these things.
So, I think the key ingredients are as follows;
so Chain Event Graph essentially,
a way in which one can undertake causal,
specifically Bayesian analysis but
causal statistical analysis,
and represents the causality between events using game.
Event trees, and this is a general Chain Event Graphs.
So, a generalization of event trees,
for those of you that don't know them,
and eliciting great expertise from people like
Johnson and others about
cyber-security systems
by adversaries about how they work,
about systems of systems,
about supply chains et cetera,
building all of that into
quite a complicated statistical
causal Bayesian model using
this Chain Event Graph methodological system
to represent what's going on in the real world.
So, that's the incorporation
of prior knowledge as it were.
So, I'm particularly interested in Chain Event Graphs,
so I just started a project which is actually
looking not cyber-security at all,
but I'm hoping it will translate into that.
So, I've started a project using Chain Event Graphs,
which is looking at how an individual will go
from minor radicalization to full radicalization,
to taking part in some terrorist related attack in
Western country and looking at
the sociological evolution as it were,
an individual's psychological and the
sociological surroundings evolution through
that path from being somebody like myself all the way
through to somebody being very
radicalized and now I'm performing in stack.
That has analogies in
some respects to the cyber-security kill chain.
So, I'm hoping that
the learnings from that work will translate
across into the cyber world.
So, we've got the ability
to do causal inference assuming that this stuff works.
The next thing I'm interested in
is horizontally scalable inference.
So, I am particularly
interested in scalable statistical methods.
So, things that sit on top of platforms such as Spark.
What I see, a lot of the time in
Spark-related applications is that people
like to count, and that's great.
We do a lot of summarization
through counting a lot of things.
But that's not the most sophisticated one
can do with respect to statistics.
So, what I want to do is to get
most statistical sophistication
to Spark-like environments.
So, we're currently starting a project,
an open-source projects that complements Spark ML,
MLlib, sorry, which is looking at place in MCMC,
Markov Chain Monte Carlo related
algorithms on top of Spark,
and so there's some interesting challenges that
exist when you're trying to perform MCMC.
You've got distributed datasets across multiple machines,
you're running local MCMC algorithms
computing local posterior distributions,
or sample-based representations
of those posterior distributions.
How do you combine those samples to produce
a globally consistent and
statistically correct MCMC algorithm
or rather representation up
asymptotically correct representation
of the posterior distribution,
and that's what this project's all about.
So, we just started quite a big project as
a collaboration between Cambridge,
Warwick, Oxford and Bristol,
and the Turing Institute on just that topic.
The best thing we've come up with so far is quite a kill,
rejection sampling algorithm,
which is this perfect sampling all kinds of things.
We'll be publishing that very shortly.
Computationally, it's horrendous. It's just some Spark.
So, we have the horizontal scalability.
It takes quite a while just to do one MCMC iterations,
so there's still quite a lot of
research to do in this space
but we do have ideas
around how we can speed these things up.
Handling time-varying phenomena.
I mentioned earlier on that I'm interested
in things as function times
sequential Monte-Carlo et cetera,
and that's going to be the focus
of the last few slides that I've got.
In the moment, there's some point process
work that I've been doing.
I'm using Markov much and Poisson processes,
which Josh has seen now about three
times in this presentation,
so I apologize Josh.
Then, finally, I think the final key ingredients of
a good cyber-security model or a
good cyber data science related system
should be that the techniques are privacy preserving.
So, I mentioned homomorphic encryption.
We just kicked off another project which is
looking at FHE-based algorithms,
and essentially I'm trying to produce homomorphicly,
that's not the correct phrase, HE algorithms,
that's a classification algorithms.
So, for instance, there's been a paper that demonstrates
a logistic regression algorithm that is
fully homomorphic encryption compliance,
if that's the correct phrase.
We want to take that a bit further, so,
logistic regressions is useful but
it's not most sophisticated thing we can do.
So, how do we take that project and go to
the next class of statistical classification algorithm
using homomorphic encryption,
so this project's known as Crypto-ML.
Again, we're looking for collaboration
opportunities on that project too.
So, such collaboration actually it's just a collaboration
between
the Heilbronn Institute for Mathematical Research,
which is based in Bristol,
and Warwick University assistance,
and the Turing Institute.
So, for me, if we can combine all of these four things,
we have ability to respect people's privacy and
utilize as much of the data on
the different end-user devices as possible,
but respect the privacy.
We can handle time-varying model,
we can operate at scale and looking cross computers,
and we can incorporate prior knowledge.
If I can combine these desperate things,
and they're still desperate as it stands today,
I think we've got quite ecosystem.
I don't know how far away
in terms of time we are from this,
but this is my vision and this is
the work that I've kicked off and
the work that I'm doing personally.
I hope to integrate all of these and stand here in,
say five years time and give
you a much better presentation.
I remember when I stood here five years ago.
Now it integrates all these systems
have come up with it's all
open-source please rip it
apart and tell me where I can improve in.
If you think any of those ideas are wrong,
or you've got any complimentary ideas,
or you just want to chat to me,
then again my e-mail address is at the end and I welcome
the opportunity to chat to you
via Skype or Zoom or wherever.
I guess Skype will be the favorite to.
>> Teams.
>> Teams, okay. So, I
believe some of my colleagues from
Imperial have spoken here before.
So, I won't repeat that presentation.
So, now I'll focus in on a little project that I did with
one of my students at Imperial College.
Around the use of
NetFlow data, analyzing NetFlow data.
I'm trying to understand
user behavior on a particular device
using a particular class of model point,
process model known as
a Markov-modulated Poisson process,
and show you some results,
some kind of the presentation
really started off quite broad in terms of
the [inaudible] institute and I'm just narrow it down to
this particular NetFlow-based analysis.
So, we are at Imperial College.
I'm collecting NetFlow data.
We have about 40,000 computers.
We generate about 12 terabytes of
Flow data per month or around 15 gigabytes per hour.
The kinds of things we're interested in,
so we're interested in people trying
to steal our intellectual property.
Because for Imperial, that's one of the ways in
which we can generate revenue and obviously,
then, fund the academic research
that we also undertaken, the training that we do.
We don't want to impose
lots of constraints on the network.
Academics get grumpy when you tell them they can't go
and visit a particular website and [inaudible] website is.
Students in college halls really don't like being told
that they can't use the latest illegal file sharing too.
Not that anybody does that at Imperial College, I'm sure,
but that's a consideration too.
Spearfishing turns out as often the case,
is a major compromise roots for the network.
I guess the four things which is
always the case with cyber security.
The brand damage that
one could incur if there were attack,
successful attack, or if such an attack was reported,
I'm sure there has been several,
several successful sites.
You just don't know about them. So, I
guess some of you will know this stuff,
but NetFlow record is
somewhere between two network devices.
It's collected at
the router level as quite a quite a lot of
interest in statistical issues around missing data,
around duplication, around direction,
around time, around synchronization.
There's lots of change going on,
and there's lots of ways
in which you can analyze the data.
So, NetFlow data is
really cool dataset from a statistical perspective.
If you're interested in
developing statistical methodology,
and that's kind of my interest really,
then this offers quite a lot of
different statistical problems to
motivate the methodological developments
that one can undertake.
The great dataset, the Los Alamos National Labs
released is a great dataset
for everybody to kind of access,
but I'm sure, and I saw you
have access to some great data too.
So, maybe, you don't need that open-source data,
but I will advertise the Los Alamos data,
not least because I'm visiting Los Alamos next week.
If anybody's watching this,
they will be pleased I've advertised the datasets.
So, given NetFlow data,
I have given, metadatas are
about the packets flying across the network.
What I want to do is just analyze an individual device,
and I want to [inaudible] use of behavior.
So, I want to know, just from
the timing information of the events on the computer,
can I figure out what the user was doing,
were they streaming, were they are on,
not necessarily were they on YouTube,
but were they streaming video,
were they writing emails,
were they doing nothing at all.
What kinds of activity were they doing?
So, I want to come in further using a Bayesian process.
The Bayesian process I've chose to use
is a Markov Modulated Poisson Process.
What is an MMPP?
Well, really simply, what we've got is a,
this is some other kind of more formal detail,
but I won't go through this in the interest of time.
What we have is a hidden,
a latent continuous time Markov chain,
which is denoting or
rather representing the user behavior.
So, we've got a finite state,
continuous-time Markov chain.
So, if the chain is in state zero,
the user's inactive and if the state is in chain one.
If the chain is in state one,
then perhaps the wrong video streaming websites,
and so on and so forth.
So, we have some interesting challenges
that are both in terms of
specifying the number of states.
We've got a model selection problem,
linking it back to Alan Turing's work in
model selection and base factors and all that good stuff.
So once we solve that problem,
then we have an inference problem.
We want to infer the state,
what are they doing on the computer.
I know all our likelihood function
is essentially a point process model,
a plus one point process model,
where we get the event timing data.
From event timing data, we want to infer
this state of this continuous-time Markov chain,
and some stuff on continuous time Markov chains,
which I will skip over
because you're either interested in it, and you know it.
Or, you're not interested in it,
and it's doormats that nobody really cares
about, interesting math, nevertheless.
So, the kind of key contribution that we made.
The first one was from an application perspective.
We were able to kind of,
as you'll see on the next slide,
we were able to demonstrate that one can
infer some stuff from using these types of methodologies.
The second contribution that we
made was in terms of parameter estimation,
and so from a methodological perspective,
and not continuous-time Markov chain.
What you want to do is to
estimate a bunch of parameters or two parameters,
which are reasonable dimension depending on
the number of hidden states
that you have in this continuous-time Markov chain.
You can do that using EM or Gibbs sampling,
so it's all relatively straightforward.
But, from an estimation perspective,
what you need to be able to do,
is to construct the smooth distribution.
So, if your state space modelling,
if you are trying to compute
the posterior distribution as
a function of time as you get more and more data.
What you need to be able to do is to compute
this distribution conditional on all the data.
I've omitted lots and lots of details,
puts in the accompanying paper.
What we realized is that,
when you're computing the smoothing distribution,
you run a filter forward across the data in time.
You want to filter backwards in time,
and you combine the outputs
of the different points in time.
This backwards filter is actually
not a probability measure.
It needn't be necessarily a finite measure.
So, when you use
sequential Monte-Carlo related techniques,
which you have to use in this particular case
and to estimate these distributions,
because the solution [inaudible]
admit none of the analytically tractable solution.
This estimation of this measure,
which is not necessarily a finite measure,
means that these techniques,
all the convergence theory goes out to the window.
So the methodological contribution that we
made in this particular paper was
to guarantee that the backward filter
is a finite measure.
The very least by introducing an artificial distribution,
we did some probabilistic manipulations
to kind of remove
the effects of some artificial distribution
that we introduced and make
this whole system computable
using the types of methodologies,
sequential Monte-Carlo we have to use,
which is then embedded in a parameter
estimation algorithm to estimate the parameter.
Parameter estimation is very
computationally expensive
once you estimate the parameters,
the actual algorithm to infer this stuff,
to infer the states of the user,
is actually relatively straightforward.
So, just to kind of reiterate,
we got point process data.
We want to estimate the states of this hidden state,
how you actually use the [inaudible].
We use a relatively simple
algorithm to be able to do that.
That's Markov modulated Poisson process algorithm to
estimate the probability of the states
in any one particular time.
Prior to that, we have to estimate
the parameters of the system we use,
in our case, we used Gibbs sampling.
But within that Gibbs sampling procedure,
we needed a sequential Monte-Carlo algorithm,
and we realized that there was
this technical problem about this backwards filter,
not necessarily approximating a finite measure,
and so we solve that particular problem as well.
So, that's the methodological contribution,
then gets applied in cyber security context.
In this particular case,
I think we went for
a four state, continuous-time Markov chain.
But I would say,
methodologically, it's generalizes practically.
If you go beyond 10 states,
then you probably going to be waiting
a few weeks for your results
to actually be usable.
So, very good question.
So, one was emailing,
one was doing nothing,
one was streaming,
and the fourth one was on Microsoft Word.
I'm sorry, little advertisement for your organization
there. Just to see.
So, I can't remember what,
I think these are the different,
so the different colors represents
the different types of activity beyond the taken.
So, what we did was
to get the student to do different activities.
We kind of recorded the ground truth,
and then we accessed his NetFlow data from
his device from the college,
and so, this is the counts of NetFlow data.
I think been over a five minute time period,
or it might be a minute, I can't remember
now the exact details.
So, that's the NetFlow data,
and these are the results.
What we do here,
is just focus in on one part of that data
just to make that, make it visible.
So, these are the different states.
This blue line here is the map estimates of
the state that we believe the device to be in,
if you believe me.
Then, the estimate estate is often consistent,
not always, but often consistent with
the actual true states.
Which, I didn't believe to be true.
So, I didn't think this was actually going to work.
I just thought it was a methodological problem
and it was a homodyne,
I had in my back pocket at the time,
and I thought, let's try it out.
But actually, we were able to infer
the true states of the user on this particular device.
We didn't have any of the ground tree,
so I don't know how well it generalizes.
But this for me is a particularly kind of
important piece of the jigsaw in
this larger Bayesian model I was talking about,
being able to infer what the user's doing from just from
NetFlow data with an associated kind of representation.
The uncertainty is the key ingredients
in building out this system of
systems based representation and understanding
what's going on in the system.
>> Right, do you remember
which processing corresponded to which activity in this?
>> No, off the top of my head. I'd have
to go back to the paper,
I don't remember top of my head.
>> I was sort of wondering why Microsoft Word
would generate number data.
Yeah, I was curious about that too.
So, you could discriminate
Microsoft Word from no activity in the data.
>> There is an online version of Word.
>> That must be where it is.
>> There's several online versions
of Word in fact I think.
>> Yeah, okay.
>> It could well be on the OneDrive,
relates the kind of syncing it.
I don't think we're using the kind of you
know the you go to Chrome or
Explorer over to use and use the online Word.
But I think you can call off sync from going on
the background, these days.
>> You use bad Network, horrible things
happen If your stuff is in OneDrive.
Therefore, it must be a good network.
>> So, that's all I wanted
to talk about with respect to that.
I'm conscious that I'm running out of time,
but just to reiterate.
So, the Turing Institute is
the UK's national level Institutes for data science.
Thankfully, Turing's appreciation of
Bayes helped to change the world for the better.
I said Turing but actually,
Jack Good's contribution is the
substantial and the team is shouldn't be underestimated.
As I'm sure, many of you in the audience know,
there are many interesting Bayesian
challenges still to be solved.
I've kind of alluded to some of them in my presentation,
and those are kinds of things the
activities that were undertaken.
Specifically, under my direction.
And as you know big data or all deep learning,
deep reinforcement, learning deep something.
Does not necessarily mean
that statistical or expert knowledge is not needed,
and so I'm still I'm still gonna fly
the Bayesian flag even in the world of deep learning.
I think the deep learning community
and people I know in the deep learning community,
are kind of caught on to that and trying to
integrate expert knowledge into
the systems and are developing.
In this presentation, I introduced
a very simple Bayesian model.
It's not quite what I wanted to present,
it's not quite what I wanted to do.
In terms of, it's not as far as I'd like it to be.
But I still think it's quite cool,
and will solve some interesting problems.
I hope this is
demonstrated to an extent
the kinds of work that we're doing,
puts Turing Institute at
the forefront of research in modeling and inference.
Specifically, Bayesian modeling inference,
we have quite a strong community in Bayesian statistics.
For me personally, I think, you know,
I always think it's good that an institute should be
recognized for doing something quite well.
Let the vector institutes in
my mind is great at deep learning.
Primarily, because the people they've
got associated with that Institute.
For me, we've got some really great Bayesian people and
I'd like the institute to become
famous for Bayesian-related statistics.
But again, that's partly because
I'm biased in terms of my background,
but I think it has quite a nice segue way
to many different attributes while ensuring books.
Specifically, it's working probability theories.
Then finally, I've mentioned collaborations,
visits, presentations are welcome for me.
From me personally, come and visit me,
or come and visit any of our academic community
and I'll set up those interactions.
We can do that virtually,
or we can do that physically.
I welcome your interactions of any kind.
These are my two email addresses please do email us all,
if you're not inclined, follow us on Twitter.
But, thank you for your time today and,
I welcome any questions you may have.
>> So, mode of selection problem. It's tough, right?
So, Net Flow, it's all over the place.
Some day I just have humans on them and
others are just purely machine driven,
and they look a lot different, you know?
Adversities switch there's
a Markov processor, others are very smooth.
So, can you talk a little
about how you approach your model selection there?
>> In that case, no.
I've not approached it but, what I do,
I would probably come up with
some kind of hierarchical representation,
so that I have an indicator as to
the type of device that the machine is.
So, when I'm performing the statistical inference,
in the way that I described of the condition on
my inferred states of that machine.
I use a server as an access devices it,
and, is it some cloud-based
bespoke thing doing something weird.
So, I would produce an additional level of obstruction,
introduce another latent variable
in the true Bayesian way
and solve that particular problem there.
>> That sounds,
that's very consistent with what I think as well.
This probably read into you before that,
which is they figured out
all the different categories that might be
involved in the mixture,
so just servers in laptops or is it printers,
laptops and servers or you know?
What is the resolution at which,
you want to have the mixture, the result to?
You can do this with users and machines and processes?
All of these have sort of
a mixture behavior to them so, yeah.
>> That's where interactions start.
That's where the human dimension
discman and expert knowledge,
you need to talk to the guys that own the network.
At least, get their view on what's on the network.
They may obviously, not
always know what's on the network.
But I think, having those interactions you
can then better model the world
and incorporate the uncertainty
associated with those models.
So again, how I can replicate the Bayesian methodology.
I feel like I've hammered
that point a little bit too much today.
I am pragmatic most of the time,
but as long as the Bayesian solution.
>> So, you talked about using
Polymorphic Encryption and actually
I'm impressed that someone
got logistic progression to work there.
It seems like should be really hard
because exponentiation is not a ring operation.
But you can say you can evaluate using
polymorphic encryption if you want to
do this in some privacy preserving way.
Of course, then you would have the answer to
your thing you evaluated for someone,
but you can only do that if
you get the model from somewhere
and I was wondering if you had
any thoughts about how that might be
done in a privacy preserving way.
There is hope that one might be able to build
a model using encrypted data,
but it's always off and the data has all
been encrypted with the same key
in all the schemes I've heard.
>> So, no is the short answer.
Because, I rely on
my colleagues to do that work for me or with me.
So, no I don't have a sensible answer to that question.
So, I'll avoid the question completely.
Hopefully, if any of my colleagues are
watching then they can email me and say,
"This is what the answer to
that question should have been."
>> That'd be great.
>> So, no, I can't give you an answer to that question.
>> One take away is were very
interested in polymorphic encryption.
So, that's probably a real key collaboration opportunity.
>> Yeah, okay.
>> There are some
awesome experts in that in this building.
>> Certainly, we'll have. I don't have expertise.
>> I'm not one either, actually I'm a-.
>> [inaudible] for me.
>> So, you have more about Spark on SGX?
You're running Spark on SGX?
>> Yes, that's right. Spark set in on top of
an SGX compliance Linux kind of
library. So, if you Google-.
>> You're running Spark inside an SGX?
>> Components of Spark. We have to rewrite Spark or
rather we have to SGX-fy some specific parts of Spark.
The project really was
about identifying which parts of Spark needed to
be inside the SGX enclave
and which parts didn't need to be.
So, we have to make some subtle changes
to the Spark code base.
>> So, you're trying to be training, re-eval or both?
>> So, what we're trying to do is have
a Spark environment so that users can
do wherever they want on top of it,
just writing in standard code in Spark SGX.
The Spark SGX compliant version.
The user's in variance to the fact that it's
got SGX sitting behind it.
So, as a person that doesn't want to have to
worry about the kernel-ish level operations,
that one has to undertake
to get SGX to actually do anything.
I can just sit there, write my Scala code in
Spark as I normally would and be fairly
confident that the SGX side
of things will be taken care of through
this Linux library and
the Spark modifications that sits on top of the SGX LKL.
>> Do you have estimates on how the performance
is related to the SGX version?
>> From memory, I think there's
a five times performance penalty using SGX,
but I believe Intel are making some changes to
the next formal release which should help us with that.
There's still issues with SGX around such
[inaudible] that will always exist.
But, I guess, Intel is claiming that such [inaudible] don't exist.
So, we're not fully secure in the way that I described,
but nothing ever is.
>> You aren't secure as it is.
>> Yeah, exactly. So, there's a couple of papers, again,
on our websites and
all this software now so open-sourced,
so if you're interested then grab it
and spot the hooks
and countries and all the other good stuff.
>> I think omission of the MCMC on Spark?
>> Yeah.
>> So, what kind of performance [inaudible].
>> What we don't at present, so we get
a significant time in
significant reduction in computational performance,
but at least we can then
paralyze across multiple machines.
So, at the moments I
would argue that one can't do anything
at all on Spark with MCMC.
You can there's a couple of attempts at
this using kernel based approximations
plus you can use kernel as samples
and then combining those
and I'll call it from my perspective rather ad hoc way.
So, one can do stuff but
that there's no asymptotic relationship.
No guarantees that asymptotically
those approximations mean anything
unless you combine all of these different approximations.
What we're trying to be
statistically principled when we combine
these different local approximations,
but the penalty for that at the moment is
that costs you a lot more computationally.
So, the next challenge is actually how do
we improve the computational performance of
these quite horrific rejection sampling
related approaches that we've proposed.
That's what we'll be doing next, but in the mean time,
kind of open-source in all these implementations.
So that at least you know if you have the time
to wait and you care
about being statistically principled
then have an ability to do something.
>> The Spark open- source committee welcome in such of-?
>> I don't know. We've only
communicates its statistical community presence.
I guess I need to go to the Spark conference [inaudible].
The thing that happens every now and again,
and present it to them there and
see what they say
and the statistical community seemed to welcome it.
But whether the Spark community
whether they're just happy with
the out-of-the-box machine learning based algorithms and
a lot of the stuff seems to be and
SGD related stochastic gradient descent related and
that may well be
good enough points estimates for everything,
may well be good enough for a lot
of the applications and so this
stuff's not going to be
relevant so a lot of the community,
but for those that actually want to
perform a full Bayesian statistical analysis.
Which tends to be the case in my experience in
bioinformatics then this stuff will be of use.
So, they're the kind of you know,
when books type community.
That community will hopefully be able to translate and
use this types of technology relatively easily.
The other thing I should say is that we're
hoping- I think I can say this.
We are going to be hosting a PPL,
Probalistic Programming Language workshop next year,
so we've got the stand community coming across.
We have a tool pyro, I think it's called.
A PPL language which sits on top of the JVM.
But so we're getting a number of different communities
across then hopefully this stuff will be presented.
It's not a PPL but it's heading
towards that direction which is why I mentioned them.
Again, if there's any- I don't know what
Microsoft is doing in the PPL
space if I'm totally honest,
but if there's any interest in
from your community in PPLs
then welcome your involvement in that workshop too.
Thank you very much.
Không có nhận xét nào:
Đăng nhận xét