>> Good morning, everyone.
Thanks for joining us today for the NCI CBIIT speaker series.
I'm Tony Kerlavage, the acting director of CBIIT.
I want to remind everybody that today's presentation is being recorded
and will be available at the CBIIT website at CBIIT.cancer.gov.
You can find information about future speakers on that site
and by following us on Twitter.
Our handle is @NCI_NCIP.
Today, I am very happy to welcome Dr. Casey Greene,
Assistant Professor of Pharmacology at the University
of Pennsylvania's Perelman School of Medicine and the title
of Dr. Perelman's talk is Deep Learning, What is it Good for?
With that, I'll turn the mic over to Casey.
Welcome.
>> Thank you.
I hope you guys can hear me.
Okay, yes, I see speaking.
Perfect. Yeah, so I'm going to chat about some of the work in deep learning and,
you know, if you follow deep learning at all on sort of social media,
and you try to see the conversation around it,
I think there's a lot of constant discussion about, you know, how useful it is,
and so I think most people feel it falls somewhere on the scale,
where things go either from not at all useful to entirely useful.
So, you know, I think I'll tell you little bit about what we've been up to.
I think the answer is that deep learning is probably more useful than war.
So the answer would not be absolutely nothing, but, you know,
it's probably not quite at the stage where it's also sometimes sold to be.
So I think the answer is kind of in the middle ground.
Before we dive too far into, you know, what deep turning is,
I want to talk about, you know, just briefly, like what machine learning is,
and then we can talk a little bit about how,
at least in my perspective, there's the difference.
So with machine learning, we're going to take a computer.
We're going to add in some features.
These could be things that we think are, you know, important.
If you're thinking about, you're thinking about, you're buying a house,
you might think about, oh, the number of rooms that it has,
the number of square feet, this type of information.
So it's information about the house.
We have some potential outcomes, so like the price that the house sold for,
and we're going to train a computer to build a model,
which will then let us make a prediction.
So we're going to essentially ask
which of these computed features are useful for predicting an outcome?
That goes in.
We create a model, and then we can say, oh, this helps.
It would cost this amount in the future.
We move from houses to cancer.
So this is a study that we recently participated in.
Part of the, actually, MTI PanCancer Atlas.
So this is just going to be a machine learning workflow, no deep learning,
and the specific part of the study we worked on was identifying TP53 loss,
and we wanted to identify it from gene expression data, actually.
And the reason we want to do that was because we suspected
that we had certain cases where we knew TP53 was lost,
and then we had other cases where there was no mutation
that suggested TP53 was lost, and we suspected that in these individuals,
there would be a subset of them that might've lost TP53
through some other mechanism that, yeah, we were quite ready to pick up on,
and so we talked that looking at gene expression
as a predictor could help us find this.
Just in terms of, for those of you who really want the technical details,
this is part of the PanCancer Project, and so, in the DNA damage repair pathway,
so it's actually in that paper,
we ended up using the genes that were the most variable,
and we used elastic net logistic regression.
We did cross validation and had a held-up test set.
And so, this sort of shows the performance of the predictor.
This is the both-positive rate.
This is the true-positive rate.
A perfect predictor would essentially go straight up and then over.
This would be an area receiver/operator characteristic curve of one,
and what we see is that we're doing pretty well.
So this is 93%.
So it's quite high, and we see that for the training set,
the independent held-out test set, and also,
when we assess through cross validation.
So the other nice thing that we see is we don't observe a lot of overfitting,
and to give credit to Greg Way [assumed spelling],
he's the student in my lab who did this work.
And just to give you an idea of the types
of things you can learn from something like this.
So this is looking at what should be an orthogonal measure.
So this is the CNV version of the tumor.
So this is never set in to the machine learning classifier.
The classifier is taught using mutation information.
So a somatic variants, MTP 53,
and it's thought to recognize a mutation via gene expression.
So it never sees CNV information directly.
And so, if we look at the CNV burden in wild-type samples.
So this is what we think is actually true.
These guys, when we think, do not have TP53 loss.
We see that they have a relatively low, in general, CNV burden,
although there might be some subset of samples that have high CNVs,
potentially, from another region.
For the samples that have a predicted wild type, we see the same thing.
So these two match, which is what you'd expect,
given that the classifier appears to be accurate.
When there's TP53 lost, so these have a mutation in TP 53 lost.
So these have a mutation in TP53 that we would expect
to cause a loss of function.
You can see the CNV burden shifts dramatically,
and in the ones where we predict a loss,
so the classifier predicts a loss, you also see the shift.
And of course, the reason we did this analysis is because we wondered,
are there ones where the actual and predicted values are potentially different?
And we think that they're, in fact, losing TP53.
So this is a 375 G to T mutation in TP53, a silent mutation, and what we see,
our classifier called 18 of the 19 samples in TCGA
that have this mutation of TP53 lost.
When we look at the CN burden, we, again, see an increased fraction of them --
many of them have this increased CNV burden.
And so, we think these are actually the types of things we were looking for.
The case where somatic sequencing and evaluation
of somatic variance didn't suggest a T53 loss,
but the gene expression identifies one, and then, in this case, you know,
we can come up with a potential mechanism to back this up.
So this is a silent variant that occurs right near a splice site.
So we looked at splicing, and so,
this is pulling information out of Snaptron from, if I remember correctly,
Jeff Leak's [assumed spelling] group.
And when you pull the samples out of Snaptron, the ones on the left here,
this entire bar, are all of the samples that had this "silent variant."
This is the probability that our classifier assigns,
that they would've lost TP53.
So our classifier think there's a 97 percent chance that this sample,
TCGA4LZAA80, has lost T53 based on gene expression, and then, on the right,
these are just random samples pulled from TCGA.
You can see the probability that our classifier assigned to them.
So since we just pulled them randomly, it's pretty widespread of probabilities,
but one of the things you quickly notice is that the X on four to five,
which is this blue, so this is the canonical splice site,
is present in all of the samples that did not have this variant, and actually,
we don't see any other junctions, but in the samples that do have this variant,
you can essentially see it is splicing events sort of spread all
over the locus in most of them.
And so, what we think is actually going
on here is this silent variant is affecting splicing,
just causing at least one of these isoforms to potentially act
in a dominant negative manner.
So we're able to take the gene expression classifier.
Take it to -- and to identify these potential additional samples that appear
to also have a loss of TP53, and with a little bit of digging
and some light understanding of the biology,
you can actually start to suggest the mechanism behind that.
So that's what, you know, machine learning can get you.
It can help you find [audio cuts out] unexpected cases where everything sort
of looks like what we, you know, hope to find except for the fact
that they didn't have something that was "high-impact somatic variance."
And actually, this variance actually in --
it's been reported, it's associated with [inaudible], as well.
So there's additional support there in the genetics literature.
And so, if you're really interested in kind of just the machine learning side,
you know, I think there's a lot of ways to kind
of do a solid machine-learning analysis that returns value.
So this is the case that is not deep learning, but, you know,
I think we should not sort of forget the fact that sort
of standard machine learning techniques applied
in a rigorous way can be really helpful.
So Greg did analyses for both of the --
so he actually led the Ras Pathway paper for PanCan Atlas,
and he contributed to the P53 analysis,
which went into this one for the DNA damage repair pathway,
and both of these are mentioned a little bit
in the oncogenic signaling pathways paper.
So if you're interested in this kind of machine-learning approach,
I would courage you to check those out for more detail.
As a quick side note, Greg and a post-doc in the lab, Daniel,
decided this was a kind of fun workflow,
and it would be really awesome if anyone can do it, and around that time,
we were also approached by a Philadelphia organization called Code for Philly,
which is -- they do programming for the social good,
and also an organization called Data Philly,
which is a data science meet up group in Philly.
And together, these guys came up with this project, Cognoma.
So the idea was the same type of, the sort of way,
style machine learning analysis but for everyone.
So they started this Cognoma project, and for about a year and half,
they actually worked on it about every two weeks for a couple hours.
We ended up having quite a few people show up.
There's actually [audio cuts out] web servers.
So if you got to Cognoma.org, you can actually run the same style of analysis.
You go in.
You select your mutations.
You select your tumor types, and it will create a Jupiter Notebook,
which we template the values into, run the Jupiter Notebook,
and then email you a link to the results.
So if you're interested it, Cognoma.org.
The other cool thing about doing the project this way was when a lot
of people engaged in the process who otherwise wouldn't.
So this is a picture.
This is actually Daniel Hemelstein [assumed spelling],
a post-doc in the lab at one of the meet-up service events.
So we would host them in a local co-working space.
People would come.
They'd hack away on these for a while, and at the end of the day,
the second goal was that everyone would learn something.
And so, honestly, for about a year and half,
at least one person from my labs ended
up teaching data scientists two hours' worth of cancer biology while, you know,
a subset of core contributors continued to work on the project,
but we ended up having about 40 people who contributed in a way
that actually made it into the master.
So when you go to the website, you know, 40 people made a contribution that sort
of made that possible, and so, this is kind of a fun exercise.
We think that probably on the order of 500 to 1000 people ended
up coming through these workshops.
So just a random side note.
Okay, back to deep learning.
So what makes deep learning different?
So remember with the machine learning information, we had much of this in place.
So we have this outcome information,
and we had the model, we had the prediction.
So these things didn't change.
What changed was this.
And so, before, you know, when we were talking about buying a house,
when we're buying a house, right, we give it the square footage.
We give it this room information.
You know, we might give it, you know, a neighborhood.
All of this stuff is information we feed into the model with machine learning.
For deep learning, we think that we'd like the algorithm
to actually learn what the important features are.
So you could imagine instead of telling the algorithm about the number of rooms,
you might just feed it a bunch of pictures of the house.
So you just like walk camera through the house, and at the end,
what the deep learning algorithm to tell you what it would cost.
So the inputs for the deep learning in the ideal world are much more raw.
So the idea is the computer's responsible not just for taking some features
that might be important and creating a model and creating a model
and making predictions from it,
but it's actually got to take this very raw input,
merge it into complex features,
and then use those complex features in this model.
So this is sort of the deep bed, right?
You go from instead of just having your input going into the model
to the prediction, but there's this sort of intermediate step
where these complex features are learned.
So if you're familiar with deep neural networks, which are all the rage,
as soon as you get to neural networks of size essentially two or above,
you end up in this world where you got these intermediate features, and so,
two or more hidden layers you end up in this world, and the way they're trained,
the features can actually be modified by some of the outcome information.
So it ends up being a really powerful approach if you have certain ingredients.
So this is just a nice little graph that, in fact, contains almost no data
but a lot of truth, which is kind of exciting.
So, yeah, so if you think about where deep learning really works,
it tends to work in cases where you have lots and lots of data,
because you need to build these features.
So for the traditional sort of standard machine learning algorithms,
because you're feeding and useful features,
you can get away with a lot less data, but as you start to get lots
and lots of data, the deep learning algorithms end up being able
to build better features, and we can sort of A priority build in.
So as you, you know, if we expect to see rapidly increasing amounts
of data I think we're going to see our increase in emphasis on methods
that are capable of doing this sort of complex feature engineering.
So what could you do if you were going to use deep learning or these types
of methods in this context?
So one thing you could do is you could try
to understand these features themselves.
So as you're building these models,
they got this sort of feature engineering steps, and the question would be like,
okay, how do those features work?
And so, the first project I want to talk about, it's not in cancer biology.
We're actually going to look at microbial systems,
but it starts to get at this question of kind
of building an understanding these complex features.
So if you really wanted to know how a living system worked,
just starting from data, not assuming anything
about the system itself other than, perhaps, you know, the complement of genes,
what you could imagine doing is something like this experiment
where you serve 1000s of biologists
and then identify what they thought the most important experiments
of the moment were.
You'd then perform those experiments, process the data,
and then analyze that data to see what the regulatory patterns are.
Essentially like what experiments would you do if you could do them,
and then do a lot of them in sort of this standardized framing.
The challenge, of course, for this, well, talking to NCI folks.
So you've probably noticed.
You can imagine this would be a very difficult type of proposal to get funded.
It's not -- it's a fishing expedition, right?
It's not a sort of well-justified hypothesis driven proposal, and essentially,
the only way to do these types of fishing expeditions are things, you know,
I think like DPGA, where they're very large initiatives and large efforts,
and for this case, you really have to do this for many different systems.
So it would be extremely challenging to imagine how it would work.
On the other hand, if you're just a person who happens
to have an Internet connection,
you have another pretty valuable resource at your fingertips.
So right now, if you have an Internet connection, you can actually download --
these numbers are a little bit out of date,
but about 2.2 million publicly available genomide gene expression assays,
which if you ballpark that each of those probably cost at least $1000
to generate, when you consider sort of materials,
reagents, supplies, person time.
That's about a $2 billion data resource.
The challenge of working with this data is, of course,
it's vaguely like the experiment that we designed above,
except people don't generally upload high-quality metadata alongside this data.
So, you know, in the ideal world, you'd know, oh,
this is exactly the modification that happened here,
and here's all the other conditions that were used.
So if are looking at a microbial system, we'd say, oh,
the bacteria were grown to this concentration.
You know, all this sort of information is not always available.
And so, you have this really valuable data resource,
but we don't have any of the metadata to necessarily work with it.
So, you know, for a lot of this, this is exciting but challenging.
You'd imagine like you've got this really valuable resource, but essentially,
it's posed also with this impossible question,
in that there is experimental design behind it,
in terms of like people are actually doing experiments that are often sort
of well-designed, but the challenge is that we don't actually have any
of that experimental design available to us.
So any analysis that we would like to do that relies on it,
we're essentially stuck, right?
We couldn't do it.
And so, what we wanted to do with this work is say can we take a sort
of new class, new approach to machine learning and actually integrate this data
without having to assume anything about the metadata describing this?
So we don't have any sort of sample labels in this case.
So I'll pose a challenging version of this first.
I blacked out some things on this slide.
Normally I ask people what I blacked out,
but I suspect that won't be very effective in this type of setting.
So, you know, I can tell you it's basically impossible to figure
out what I blacked out, because there's essentially no context here, right?
You've got just a blank slide with some gray squares on it.
If, on the other hand, you had a little bit more information.
Like I've blacked out something here, but I'm guessing you can fill it in.
So, in this case, you know, you might recognize this is the Nike logo.
So you say, okay, you blacked out the word it.
I recognize the Nike swoosh in the corner.
I recognize the words just do it as a really famous Nike slogan.
So even though I removed information, you were able to still reconstructed.
You can do that, because you've had this context.
So the approach we wanted to use was one that forced computers to this
in the gene expression spaces.
Essentially, you take images, or in this case, you take gene expression values.
You block out random values,
and you force the computer to reconstruct what was held out,
and that reconstruction forces the computer to identify the sort
of dependency structure like when there's a Nike Swoosh,
the word it usually appears, and so, we weren't the first to do this.
So there was a really nice paper out of Google
where their group showed 16,000 computers 10 million images from YouTube
and then forced them to do the same type of task.
I won't get into the details of this paper,
but this is one of the most famous neural network nodes that I think now exists.
And so, this is one of the outputs of --
one of the results in their paper is they found that there was a node
that recognized this type of feature, and if you happen to have cats
or are aware of cats, what you can see is this algorithm learns just by looking
at random, still images that cats exist, and so that's pretty exciting.
The same type of structure for biology.
Looks a little bit different but not too different.
In this case, we're going to look at gene expression.
So we've got input values.
This is one sample.
It has no associated metadata.
So that we know nothing about that sample,
except we have gene expression measurements.
So in this case, this was a gene with low expression.
This is another gene with low expression.
This is a set of genes with high expression in here,
and what we're going to do is we're going
to train a neural network to do something with that.
First, we're going to remove some data.
So we're going to drop out some samples.
So the ones here with Xs, I've removed the information
about gene expression with those.
And we've got this neural network.
This one's already been trained a little bit, and so,
with this trained neural network,
we're going to feed that as input into the neural networks.
Some of the nodes will be off, in this case, blue.
Other nodes will be on, in this case yellow.
And the reason they're on is because all the genes that feed into it
with high weight shown by the thickness in the line are red.
So these were highly expressed,
and so the neural network node they feed into is on.
The ones that feed into this have low weight.
So the neural network node is off.
And the neural network has just been trained to reduce the error
between a reconstruction where it takes this,
runs it through the neural network,
and uses it to build a reconstruction, and the input.
And what that means is, in this case, the neural network would have inferred
that this gene was expressed at relatively low level.
So it should be filled in with this, you know, this sort of green color,
and in this case, the gene was expressed at a moderately high level.
So it should be filled in with this red color.
And actually, you can see that sort of vaguely matches the beginning.
So this is sort of the sketch of how it works,
and the way training proceeds is you're essentially teaching the neural network
to modify these weights, such that this is very similar to this,
even though you've added noise at this step, and this neural network can be,
essentially, of arbitrary number of dimensions.
So you could have one layer.
You could have two layers.
You can have three layers, depending sort
of how complex the features you want to build are.
So the work I'm actually going to talk about today,
we're using both of these are going to use single-layer neural networks.
The reason we do that is because we have this sort
of biologist interpretation stage, which turns out to be important for us,
and so, we found those are sort of easier to interpret.
There's some really nice methods for understanding for deeper models how
to understand what features contribute, but from an interpretability stage,
I think all of those methods still leave a little bit to be desired.
So, in this case, this example all be a single-layer approach.
So there are complex features but still constructed
from the gene expression values, not from other neural network layers.
And the test case I'll talk about today,
we are Pseudomonas aeruginosa compendium.
So Pseudomonas is an opportunistic pathogen.
It causes infection that people with compromised immune systems,
particularly cystic fibrosis.
So it'll cause individuals with cystic fibrosis,
it'll cause lung infections that are difficult or actually impossible to clear.
It tends to form these really difficult to deal with biofilms.
The reason we started with this is that there's about 100 different experiments,
just over a couple thousand assays, and so, ballpark,
you're thinking about a million dollars worth of assays.
From a computational point of view, the reason you might want to start here is
because it supposes all the sort of problems of the larger data sets.
So you can train the neural network model on your laptop
in a reasonable amount of time.
So if you're sort of doing development, figuring out to these methods even work
if you do them for gene expression data as opposed to CAT detection, you know,
it's a nice place to do that type of experiment.
Just a sort of quick couple results to demonstrate how this thing sort of works.
So one of the things we wanted to do first was just validate that these type
of approaches applied to gene expression data are sensible.
In this case, we did this by looking at,
there's a couple of results of the paper, but the one I'll just mention here,
we looked at a certain known transcription factor, ANR,
for which there was a chip-seed experiment,
which turned out to be helpful to us,
and ANR-controlled Pseudomonas's response to low oxygen.
So you can imagine Pseudomonas might be swimming freely in an environment
of high oxygen, but as it moves to the surface of the lung of an individual
with cystic fibrosis, that ends up being a low-oxygen environment,
and it'll form biofilms in this thick,
sticky mucus that ends up in an anaerobic environment.
That turns on ANR, which does a number of things,
including turning on some things that we see as virulence genes that sort
of start to damage or trigger a response on the surface of the long.
And so, there's a health reason to be interested in this, but also,
a kind of a methodological reason,
and that's that we have chip-seeding experiment for ANR.
So we actually can go into the neural network and identify the know that has
where the weights of the node correspond most chip seed experiment.
And when we do that, we end up with a note,
which is actually termed -- it ended up being node 42.
So we didn't name it node 42, it just happened to be node 42.
So if any of you guys are Hitchhiker's Guide to the Galaxy fans,
you'd probably know that the answer to the ultimate question of life,
the universe, and everything.
Apparently, it's also the answer to the ultimate question of, in this case,
our machine-learning methods.
And, so, we did an experiment to validate it.
So first, we looked at a bunch of public data.
The results are in the paper, but if you look at the outputs of this method,
as opposed to things like PCA or ICA.
So principal components analysis or independent components analysis, the method,
the sort of denoising auto-encoder methods, which we called ADAGE,
analysis using denoising auto-encoders of gene expression,
actually does a really nice job of describing the ANR activity in public data,
that we can infer based on the sort of light metadata that annotated
around oxygen availability.
What we really wanted to do was then generate a new experiment
that didn't exist anywhere in the compendium,
which is an ANR knockout experiment on CF genotype airway epithelial cells.
So in this case, there's a monolayer of airway epithelial cells grown in a dish,
and then Pseudomonas is put on top of them,
and it actually will form these microcolonies where it forms this biofilm,
and when it does that, it creates an anaerobic environment.
So this is a case where ANR should be on.
And then, we're going to knock out ANR and ask what happens
to node 42, so the ADAGE node.
So we did a microarray experiment.
If you knock out ANR, what you see is that the node that we associated
with ANR activity actually turns.
So you connect this with what exactly you predict.
The wild type remains on.
You know, I think one thing that's really potentially of interest is the sort
of robustness of this model.
So if you take this model and apply it to microarray data, which is,
essentially, what the training data largely are.
Actually, in this case, all the training data microarray data,
and so if you take it and you apply it to microarray data,
the model describes this experiment that had never been done before that wasn't
in the compendium perfectly.
If you then ask how robust is the model?
You know, microarrays are sort of old technology.
Could you apply to something more modern?
The answer turns out to be yes.
So Jack Hammond, who is the grad student and my collaborator, Deb Hogan's lab,
also ran two RNA seek experiments, one using the lab strain.
So these both use the canonical lab strain TA01.
This one actually uses a clinical isolate called J215, and in those,
we also see that the gene expression --
that the neural network describes the ANR activity in those experiments,
as well, despite the fact that there's this large kind of platform shift here.
And so, you know, we think these models are actually quite robust.
Some of this training process with removing noise appears
to make the models relatively robust, and also,
kind of the structure of the neural network itself appears to be helpful
over some of the linear methods, like PCA or ICA.
So this was pretty exciting to us.
We ended up publishing this.
This is sort of the first paper in a series
of denoising auto-encoder papers for gene expression data.
Actually, there's one before this that's from PSD in 2015 that we wrote
that focuses on breast cancer, but this is the first one we had sort
of solid experimental validation that the model actually described things
that were from experiments that weren't even in the compendium.
So that was really exciting.
First author is Gia Tan.
Jack Hammond did all the experimental work on it, and my collaborator,
Deb Hogan, I have a microbiology lab, which is what Jack worked in,
and we worked together on sort
of designing this project and carrying it forward.
And so, if you like this, the code is also on GitHub,
for those of you who like to find some source code.
So that's cool.
So we've been able to validate this method.
The question then is like, that's great.
You can validate it.
Do you actually learn anything from it?
So I love this quote from Albert Szent-Gyorgyi,
which "Research is to see what everybody else is seeing
and to think what nobody else has thought."
That's what we'd like to be up to do.
We'd like to be able to reanalyze data but identify patterns
that the original investigators did not find when we go back
and do that reanalysis doing this sort of new approach.
So here's an example of that.
This is the phosphate starvation pathway in Pseudomonas.
It's supposed to be two-component system.
So they're tightly regulated.
When Pseudomonas are in low phosphate, Pho R becomes active.
That activates transcription factor, Pho B, which turns on a lot of things.
One of the one that matters here is that it'll turn
on alkaline phosphatase that'll turn media that contained B set blue.
You can see the pathway diagrams here.
In wild type, it turns blue.
If you knock out two other genes, TSTB or TSTA, it also turns blue.
If you knock Pho R or Pho B, it does not turn blue.
So this is how the pathway works.
Sorry, there's a siren in the background I assume you can actually hear it.
This is what we end up observing.
If we go in and we use our neural network to identify cases of low phosphate,
and we start poking at them, we start to see cases like this,
where in the wild type or in the PST DNA, you know,
it looks just like you'd expect, but in this case,
Pho R doesn't appear to be required.
So this is an experiment run on peptide media, and in fact,
this happens not just once, but actually,
so this is our phosphate starvation signature in this case.
So the things that are high here, we expect to be phosphate starved.
So you can see, there's a big set up them up here
that we think are phosphate starved.
This is PIA.
So this is one media.
You can see it's got the sort of funky shape to it,
where it actually doesn't seem to require Pho R. Peptone also doesn't seem
to require Pho R. This is the only experiment that looks like it was done
to actually assess phosphate starvation.
So this is a public experiment that just happened to be out there
where there's this NGM media with high phosphate,
compared to NGM media with low phosphate, and you can see it induces a shift,
and in fact, in this case, it looks just like the textbook.
So if you do the experiment at the extremes, it looks just like the textbook.
So this is the type of thing we're trained to do, right, as scientists.
You take something.
You change it a lot, and you see what differed.
But what we see in most cases is, actually,
that the pattern actually doesn't look like the textbook, and so,
what this suggests is, you know, yeah, sure,
the textbook describes the extremes, but what happens between the extremes?
So the very high in the very low phosphate,
that appears to actually be different.
So there's a nonlinearity here, and if there's that nonlinearity,
and we're doing all our experiments, you know, at the extremes,
so we can get the largest effect size, so we can reliably publish a paper,
we're going to actually miss the biology that occurs in the middle,
which is particularly problematic if that biology
in the middle is physiologically relevant,
and the biology of the extremes is potentially less physiologically relevant.
So, you know, there's the goal of getting a paper can be somewhat diverged
from the actual goal of understanding how a system works
at physiologically relevant concentrations.
The question would be can machine learning help us fill in the gaps,
and can the sort of neural network help us fill in the gaps?
So one of the things that we noticed when we did this quick analysis that, hey,
there's this funky thing going on here where PIA has a really large band,
and in fact, it's got six samples that are really high phosphate starvation
and six samples that are really low for phosphate starvation.
And these are two completely different experiments done at different times.
So Albert Szent-Gyorgyi would be happy.
We're looking at other people's data and finding things that they didn't notice.
In this case, this is just PIA, with PIA plus an additive.
This one, actually, is an experiment that looks
like it's targeting understanding what RPON does.
So these people deleted RPON in the context of a KinB deletion,
which they compared to a KinB deletion.
So neither one of those individual experiments are too helpful to us,
but the difference between them is because they're both run in the same media,
and they show a really different signature.
And one of the things that's really kind of cool is that those are the ones
that have low phosphate starvation, actually have this deletion KinB,
which actually suggested a potential mechanism.
So maybe KinB is actually mediating, to an extent,
this change in phosphate starvation, and sure enough,
so some students in Deb's lab actually did a fair amount of experimental work
to get to the bottom of this.
So this is Pseudomonas grown on a minimal media.
It's called MOPS media, and in MOPS media they're then titrating
in increasing amounts of phosphate, and we're looking at a wild type, our Pho B,
Pho R, and KinB deletion, and the thing to really focus
on is the wild type starts to turn off the phosphate starvation signature
on this media, just after 0.5 millimolar,
while the KinB deletion actually does this at 0.4.
And so, what you can start to see here is that actually it looks
like KinB is actually modulating the level
at which phosphate starvation turns on.
So now we know.
We don't have an exact mechanism for how KinB does it,
but we now know that KinB is one potential input to this pathway.
One of the things we worried about kind of with this result is that,
you know, KinB is a kinase.
We're looking at phosphate starvation.
Maybe any kinase would induce the same thing, right?
We're only looking at the public data.
That's how we found KinB.
So it's possible any other kinase could induce the exact same thing.
However, we worked with the Laub Lab at MIT to actually test.
So they have a deletion collection of every histamine kinase
in the Pseudomonas genome, and so,
we did the same sort of assays for every histamine kinase
in the Pseudomonas genome, one by one,
and this effect actually appears to be specific to KinB.
So this is the way were you can use these types of features that are learned
by these neural network approaches to then go potentially fishing in public data
to identify this new player in with actually a textbook pathway,
and really start to fill in some of these missing pieces, particularly --
especially when the sort of, you know, intermediate concentrations differ
from what happens at the extremes, where we tend to sort of do our experiments.
And so, if you're interested in this, the paper's out now.
It was published in Cell Systems last year.
The first authors are Gia Tan and Georgia Doing, and again,
this was a collaboration with the Hogan lab,
and then the Laub Lab also helped us out with deletion collection.
And if you prefer to find the source code online or to reproduce these results
or apply to your own heading, in this case, the work is on BitBucket,
so BitBucket.org/greenlab/CADAGE.
Okay, so this is one way that deep learning can help.
They can sort of power these exploratory data analysis methods,
and what we found is that the sort of neural network framing is often sort
of efficient to train and robust framing for a lot of these efforts.
So in cases where the sort of standard ICA and PCA approaches struggle,
we found that the neural network approaches can be a little bit more robust.
Is there anything else you can do?
So this is one potential use case,
these kind of large-scale exploratory analyses.
This is something we played around with a little bit recently, too.
So is there a way that you can take data that are not shareable
and make them shareable using deep neural networks?
So you can imagine, you know, we got all these data but are locked in these,
in this case, we're looking at clinical trials data.
So they're sort of locked behind walls to protect the individuals
who participated, which is, you know, and admirable thing to do,
but it means that it's very hard to then do secondary data analysis,
because even doing secondary data analysis requires going through many steps
for sort of regulatory approval.
So the question is, sort of, can deep learning help us keep secrets?
You'd imagine it'd be really amazing if you could just take a clinical trial,
feed it into some sort of neural network, and generate data that were synthetic.
So they didn't contain any actual people from the clinical trial,
but they have the statistical patterns
of the clinical trial at the individual level.
So people could go back and actually do secondary data analysis on them.
And so, this was a project that a student
in the lab [inaudible] Jones worked on, and the way we ended up approaching this
with something called a generative adversarial neural network.
I don't know if any of you sort of have read some of those sort
of popular literature or, I think the MIT Technology Review named --
they're called GANs for short, G-A-N.
So they named GANs one of the top technologies to watch of 2018.
So if you feel like getting a little bit more of a back story on GANs,
there's a couple really nice feature from the MIT technology review
over the last -- from pretty early this year, I'd say January or February,
if you're interested in the topic.
The way these methods work is you essentially pull
from random values and coordinate space.
You're going to feed these into one neural network.
This neural network is called a generator.
It's supposed to make new data, and then you're going to feed the output
of that generator into something called a discriminator neural network,
which is trying to decide if the data come from the generator.
So if they're fake, or if they're actually real data.
So they're just actual data from the study,
and the discriminator neural network's job is to figure --
is to say, hey, these data are real, or hey, those data are synthetic.
And the way you end up training this information
from how the discriminator makes its decision,
ends up working its way back to the generator,
and so the generator starts learning how to take these random values
and coordinate space, which have essentially no meaning,
and to produce synthetic data out of them
that look indistinguishable from real data.
So if you prefer not to think in neural networks,
but you're a huge board game fan,
if any of you have ever played the game Balderdash, it's sort of the same idea.
So in Balderdash, each person gets a card.
They get a word, and then they have to either write the actual definition
of that word, if they happen to know it, or they have to make up the definition
of the word, and once they make up the definition of the word,
all the values get -- all the cards get pooled,
and then people have to decide if that's actually a real definition
or a fake definition, and if you cool people, you get points.
And so, this is essentially what the neural networks are doing that's
in balderdash, where you're trying to generate fake data
and then a different one is trying to figure out if the data are real or fake.
So balderdash but for neural networks.
The new way to generate synthetic data, right?
And so, if you actually do this, it ends up working surprisingly well.
There's some steps you have to take to --
the off-the-shelf GANs don't actually work quite as well in this scenario.
There's a few steps you have to take to improve them.
So one thing we had to do was, despite the fact that, you know,
these neural networks are trying to make synthetic data.
There's no guarantee that they can't memorize, for instance,
one individual and then just return the values for a memorized individual.
So that way, they could potentially leak data.
So we also added something called differential privacy.
So in this case, if you go to this paper, which is,
if you're really interested in this, what I recommend.
The cases where we call it private, so that's again,
with differential privacy layered onto it.
We also evaluate GANs without differential privacy.
So these are ones where they're creating synthetic data,
but we can't put any guarantees.
You can imagine a GAN could theoretically learn
to regenerate actual people from the data set.
There's no protection in this one.
So these do not have that protection, and then this is actually real data.
This is data from the Sprint trial.
So these people are getting standard therapy or intensive therapy,
and what you can end up seeing after a fair amount of methodological work
to figure out how to make this work, this is one of the final results figures.
You end up saying that the data from the neural network,
especially the blue data, so that's the one to focus on.
The private data end up looking a lot like the real data.
So this is the average systolic blood pressure for people in both the real data
and the GAN-generated data.
So now, essentially, the neural network is making new data for us,
and the data from the private model, we can say, given our privacy budget,
we can say data actually has none of the individuals from the initial set.
So it gives anyone who participated in the study plausible deniability
and would allow large-scale sharing of the data
with a quantified amount of risk.
And so, we're really excited about this.
So if you're thinking about, you know, how would you share the outcomes
of clinical trials, particularly for targeted variables in a safe way,
we think this is a nice way to go forward,
and that would allow sort of further secondary data analysis.
One of the things we also did with this is we have a panel of three clinicians
who each received 100 records, half real, half synthetic,
and they tried to figure out if the records were real or synthetic,
and in this case, they were not able
to distinguish whether the data were actually from the Sprint trial
or the neural network had made them up.
So I guess you could say that in this case for this very weird,
limited touring test, the neural network passes it.
So if you're interested in this, the code is on GitHub,
and the paper is up at Bio Archives, so if you're interested.
This is the title, and then the co-authors on this,
so Steven Wu helped with the privacy work.
Chris Williams helped with a lot of initial neural network optimization.
Ran Lee, Sanjeev Bavannani, and Brian Byrd all helped with --
they're actually the physicians who did this blinded evaluation,
and if you're interested, I think this was a fun one to check out.
And then finally, I've given you some examples of our work.
That's sort of are places where, I think, deep learning has been useful to us,
or these types of neural networks methods have been useful to us.
But I also want to talk a little bit more generally about what we see.
Initially, I was going to do a bit of a talk
on some things other people have done, as well,
but I was sort of looking at time,
I realized that I'm probably it would be better to just direct you
to just really nice article by Sarah Webb.
So this was in Nature earlier this year.
Sarah Webb wrote this article, and she highlights a lot of the different ways
that deep learning is starting to make contributions.
And so, I'd recommend if you're interested in kind of a broad study,
take a look at this and get some ideas
for other places it's being used, as well.
I just want to give you a little bit of perspective.
So we're pretty heavy users of these types of algorithms.
We do some methods development like adding differential privacy to GANs.
You know, I want to give you some perspective on sort of where we are
and what they're good at and what makes them work and what makes them struggle.
So you could imagine, you know, deep learning's just utterly magic, right?
We can just go out into the world now.
We've got -- I got you all the way to the everything side of this line.
So deep learning's good for everything.
I want to pull it back just a little bit.
So, I hate to tell you, deep learning isn't magic.
There's lots of things that you need to do to make it work, and particularly,
if your problem is not an off-the-shelf problem
that everyone has already solved, life is much more painful.
The downside of working in computational biology is most of our problems
for deep learning are not off-the-shelf problems, because, you know,
we're not primarily working on image data that's sort of very well aligned
with the types of images that the, you know,
large tech companies are working on.
So part four, final part, practical deep learning.
If you want to make your deep-learning algorithm work, there are two steps.
Easiest one, get a lot of stuff.
So just go collect as much data as you can collect.
Label it perfectly.
So take all your millions of data points.
Give them perfect labels.
Imagine you're putting them in a museum.
If you have this, you're probably done.
Deep learning is very likely to work for you.
So if this is the setting you're working in, where you got lots of really,
really, really well-labeled data, I would encourage you to just start --
go out and just try it and start using it.
I think you will be pretty excited with what the results are like.
If, on the other hand, you're like me,
and so you don't have one of these things, wait.
I don't have that.
So in this case, we have lots of data, right?
We have all this public gene expression data.
We got lots of public literature.
What we don't have is the labels.
So for us, you know, we don't live in a world where the deep-learning methods
that everyone else is using always work off the shelf every time.
What we find is we often have to do something different,
and a couple of years ago, we were starting to recognize that, you know,
maybe just the sort of standard discussion
of how deep learning was working wasn't entirely sufficient
for understanding deep learning in biology in the same way.
And so, I guess August 2016 I ended up starting this project where we were going
to write a review paper about deep learning in the open on GitHub,
and anyone can comment and contribute, and actually, we did this.
And so, now the paper's been written.
Is actually published now at the Journal of Royal Society Interface.
We ended up having about 40 people who ended
up coming along and making a contribution.
So if you read this, it's essentially the perspective of 40 different people,
many of whom are sort of experts in the field and working in this area.
And it's a relatively comprehensive tone.
So I think it's around 36,000 words, but it's nicely divided into sections.
So if you really care about imaging, or you really care about clinical records,
it's basically got this state of the field as a probably about a year now.
You know, we've tried to update it some,
but most of the writing happened about a year ago.
Or, at least, it was sort of wrapped up by then.
We submitted it.
We got some revisions.
We made some revisions where we did update some of the literature, but,
you know, there's still some stuff missing.
So I'll just give you a few real quick kind of take-home nuggets from this.
So first thing is, as we reviewed all this literature,
one of the things that we found over and over again across methods
that worked well in the domain is that if there is structuring your data,
and you impose a parallel structure in the neural network,
you're likely going to be much better.
So you're going to be able to succeed with less data
if there is meaningful structure and you impose it on the neural network.
So there's a couple types of neural network shown here.
So one of the multilayer perceptron, where every input ends connected
to every node in the hidden layer, and every node in the hidden layer
and sub connected to the output.
We can contrast that with the neighbor.
So here this is accomplished on neural network.
So each input node only ends up, in this case,
it's only involved with this neighbor.
So it's only connected to the neighboring nodes,
and you can see like for these nodes,
instead of here having three edges here, they have two edges.
And this sort of reduction, as the neural networks get much larger,
is particularly dramatic.
And so, this can really help you if you've got modest amounts
of data not to have to learn.
If you don't impose the structure,
you're forcing the neural network to learn that structure.
It's already used some of the information of your data.
And so, you're not using it as effectively as you could.
So if the neural networks, if convolution is important,
if the neural network structure can be matched to the data,
I highly recommend doing it.
One caveat I would say, so I see this sometimes is that people will say, hey,
convolutional neural networks are great.
I'm going to use a convolutional neural network,
but their data actually don't have the relationships that are required
for the convolutional assumption to hold.
So like, you know, you don't have this ability to assume that input
in its neighboring input are related.
In that case, it might be hazardous.
So the convolutional neural network is going
to be potentially working against you.
So I would say impose it if meaningful.
If not meaningful, you can try imposing it, and if it works,
it's going to tell you you probably have too many parameters.
But in reality, you probably should try to match your structure to the data.
I think you're going to get much better results, and consistently,
what we saw across the biomedical literature was people who did match
that structure got better results.
Another thing to think about, there's something called data augmentation.
We saw a lot of really successful examples where it was used.
Imagine, in this case, you're looking at this slide
and your deep neural network is looking at the slide.
It has an orientation, right?
It is a top and a bottom.
But the top and the bottom are really not that meaningful
for the content of the slide.
So if you, with your neural network, sometimes feed it in in this orientation
and sometimes feed it in in this orientation,
you can help the neural network avoid biases in the data
that are sort of maybe scanner based.
You know, if there is sort of a distinct top-to-bottom pattern,
it can help to avoid that and hope to focus the evaluation on the parts
of the example that are actually relevant.
So if you can augment your data by making these sort of transformations
that don't change the meaning of the data, but they actually change the sort
of inputs of the neural network, we see that a lot of things that work well.
Another thing that we found was pretty effective was multitask learning.
So, you know, we would think
that learning two things is harder than learning one thing.
So imagine like you're going to learn to ski
and you're going to learn to snowboard.
You think, well, I don't want to have to learn to both ski
and snowboard at the same time.
That'd be really hard, but with these neural networks,
because the input featured can be shared,
you can actually benefit from doing multiple things at once.
If you imagine, okay, I first have to learn how to get to the mountain, right?
Because your neural network is working on really raw features.
Well, if you got to get to the mountain,
you need to do that to ski or to snowboard.
So if layer 1 is the get to the mountain layer, right, those are shared.
And so, the more that you can do multitask learning when you have shared tasks,
it's actually going to help you, again, use your data more effectively.
It's going to be a continuing theme, I think, in biology.
We tend to be at the very lower limits of these types of methods
that we want peak performance.
And so, things that help you use your data effectively are going to be critical.
Okay, so these are some tips.
Here's some caveats.
So we've walked back.
Now we're going to start walking our way back
from like deep neural networks that are perfect.
So this is a really nice paper from Ian Goodfellow,
where they sort of start to talk about --
well, they talk about adversarial examples.
So what's an adversarial example?
An adversarial example is something that we can add with very low magnitude.
So in this case, 7000ths of the magnitude of the other, and yet,
produce a substantial change in the neural network calls.
So in this case, this is an image that I perceive to be a panda.
We're going to add this, and the neural network also perceives it
to be a panda with about 60% confidence.
We're going to add 7/1000ths of this pattern.
So this pattern, to me, doesn't really look like much.
To a computer, it really doesn't look like much either.
In this case, they say it's nematode, but only 8.2% confident.
So it's not really that confident, and yet when you do this, in this case,
the neural network ends up with this image, which to me,
it looks a lot like the input, right, because we didn't add much of this,
but the neural network will say, hey, this is a given, with 99.3% confidence.
And so, this type of thing you should really be aware of.
There are ways to attack these neural networks.
So if you're thinking about, hey, I'm going to develop, you know,
a system for the clinic where, you know,
we're going to guide patient therapy based on the outputs
of these neural networks, I would strongly, you know,
this might be of future we imagine,
but it's probably a future that got a human somewhere in the loop,
because if you put this into your health system and you get some bad actor
who comes along, they might be able to start throwing
in these adversarial examples that, you know,
produce essentially no perceivable change.
So you'd look at this you'd say, oh, there's nothing different,
but the neural network's [inaudible].
So it could be done, you know, in targeted cases.
This is, you know, I potential problem.
Cool bit of trivia, this isn't just a thing you can do on a computer.
You can do this in the real world.
So this is a sticker.
There's a few of them in this really nice paper from last year.
So this is a sticker.
If you put the sticker on a table, you can kind of see why it would do this.
It'll make the neural network think that is looking at a toaster,
but this is the neural network looking at what I perceived
to be a banana on the table.
The neural network correctly classifies this as a banana.
Throw the sticker next to it.
Now the neural network's very confident it's looking at a toaster.
This is pretty clearly, you can see why a neural network
with think this is a toaster, but there are number of these images
that really just look like noise, and you know,
they can still fool neural networks.
So this is something that can also touch on work in the real world.
So I would say, I think of these adversarial examples,
particularly as we think about the kind of clinical impact of deep learning
and the way we're going to introduce it into workflows are things
that we do not want to forget about.
I think they're particularly important.
This is kind of cool.
So normally, if I'm giving this talk, I ask people what they see,
but in this case, you know, I can ask,
but you're not going to be able to answer this.
So this is from February of this year.
This is a nice paper where they -- this is an adversarial example.
It doesn't just for fool neural networks.
It also fools humans.
So this is actually -- it looks to many people,
and I actually initially perceived it this way, too, as a German Shepherd.
However, the original input image here is actually cat,
and it's got this sort of noise pattern added over it, and just, you know,
the noise pattern is designed to fool a neural network.
It also fooled my neural network, right?
So the input's a cat, and I perceived it as a dog
with this sort of like perturbation.
So, you know, we're going to get to a place where you'd imagine putting a human
in the loop will solve all of our problems.
Potentially not.
If we could come up with adversarial examples that make small changes
but that also fools humans, now we've got a world where, you know,
we have to think about not just the security of our data and not just sort
of putting a human in the loop, but actually,
the security of our machine-learning models, as well.
So, you know, in a clinical context where you're going to start treating people,
I think it's important to think about this one's also, I think,
particularly important in the context we work in.
There's a misunderstanding in the field that neural networks,
because you're training them sort of ab initio, have no bias.
This is not correct.
So I won't go into this in a huge amount of detail,
but what these guys did is they trained using ID pictures from people
who had been convicted of a crime and not convicted of a crime
to generate neural a network that would look at an ID picture.
So without any sort of prior information about conviction or anything.
So look at the ID photo and tell you this person is likely
to commit a crime or not.
And they said, oh, this is perfect because, you know,
this is just criminality, right?
It has no bias whatsoever, but in fact, it has,
because you're training based on the criminal/noncriminal labels,
this actually has every bias
that in the case the judicial system of the country has.
And so, and in fact, has all biases.
We just want to make sure that we don't misunderstand it in
and up laundering our biases.
So I think that was the mistake that these authors made.
Another thing that's sort of worth knowing, you know, as this is a fun example.
There's a bunch of these now.
There's actually a puppy bagel Twitter account,
if you feel like following it with a number of these samples,
where neural networks and humans even can get confused.
Because this is such Chihuahua.
This is a blueberry muffin.
You can see why they look a little bit similar in many cases, and, you know,
as a human, if I told you I really want you to tell me if I'm looking
at a blueberry muffin or a Chihuahua,
you can go in and look at these things a little bit more carefully.
With the types of neural networks
that we're seeing used right now can't they can't go back
and take that second look.
And so, at the moment, you really need to go in and explore the errors
that your neural network makes at the level of kind
of this type of sort of inspection.
I like this quote.
This is actually from a post-doc in my lab replying
to a blog post from someone else.
I was reading this blog post, and that I was reading the comments section,
and I saw this thing from my post-doc, and I was like,
oh, this is really insightful.
So he notes that, you know, if we start thinking of kind
of statistical uncertainty and P values, the challenge with big data is often
that the collection biases, so how we collected the data.
You know, we can get a significant P value,
but the data collection biases are going
to be potentially a major driver of these.
And so, I agree with him, that sort
of thinking how we collected data becomes increasingly important.
One more thing to note.
There's a lot of work right now talking about explainability.
The challenge that I see is that very few people use explainability
in the way that, as a biologist, I'd like to see it used.
So you can think about the joke of why did the chicken cross the road?
You know, as soon as a kid learns that there's like eight answers
to this question, it's the riddle you can never solve, right?
Because they're going to keep telling you the answer you didn't give,
once they've memorized more than one of them.
And so, for these neural networks,
I think the explainability challenge is similar.
We can go to a single neural network, and we can now do a pretty good job
of saying why that neural network made that prediction on that example,
but as a biologist, what I'd really like to get is
like not why did this neural network make this prediction on this example,
but what is actually generally true about the system the data are generated by,
and that's an area that, you know, I think has received much less focus,
and I think we're much further from.
And finally, you know, I think one of the things I've seen in my own experience
and I'm continuing to see sort of coming out of other labs is that, you know,
we think that deep learning, because it constructs features, can do everything,
and so, you know, we can just have some deep learning experts solve all
our problems.
One of the things that I'm seeing is actually it looks
like domain expertise matters more and not less, and so,
models that are predicated on, you know, a single deep learning expert
or a team of deep learning experts pushing this expertise out to the community
and these models out to the community where they get employed,
I think is going to be less successful than a model than a model that tries
to bootstrap some of this expertise inside these really --
these labs or groups that have really strong domain expertise, right?
I think that has some implications for how we train students
and I think is going to be important going forward.
I think the fields that do a better job of integrating this throughout,
this type of technique throughout training and to a broad swath of members
of the field are going to be advantaged by that.
So back to our initial question.
Deep learning, what's it good for?
I'm pretty convinced the answer isn't absolutely nothing.
However, I'm also pretty convinced it's not everything.
I'd say I'm probably a deep learning optimist.
I fall in, you know, somewhere kind of on the upper half of this,
but one of the things I noticed about the article that was
in Nature earlier this year is I think I'm mostly quoted in the caveat section.
So even though I'm, I guess, in the upper half,
and probably one of the more pessimistic people up there.
So I guess I'd say I think it's good for many things.
It's not good for everything, and I think you have to be careful
about how you use it, and I think having domain expertise on the team has,
from our experience, always been a huge benefit to the project.
So with that, I want to just think the people
in my group who made this possible.
The work that I presented today, the exploratory work, was primarily Gia Tan,
who is a grad student in the lab who's graduated, and [inaudible] Jones,
who was a grad student in the lab, who graduated,
who did the work on the privacy-preserving neural networks,
and I don't know if there are questions with this or how that works,
but I'd be happy to hang around for a little bit and chat if people want to.
>> Great, thanks very much, Casey, for a fascinating
and very engaging presentation.
We're almost at the top of the hour.
So we have maybe an opportunity for one or two very quick questions.
If you're on the WebEx, just raise your hand.
While we're waiting for that,
I just want to comment that your unlocking clinical trials data,
that seems to me to have enormous potential, given the reluctance of researchers
to basically make those data available for numerous reasons.
But, you know, protecting patient privacy being the one you're addressing there.
I'm curious.
You showed that example with blood pressure.
Have you looked at any other factors and been able
to demonstrate similar results with the synthetic data?
>> Yeah, so for that, we actually looked at the Sprint trial
for a particular reason, and we ended up participating in sort
of the New England Journal of Medicine's data sharing challenge,
or we intended to participate, but when we got the data sets,
there was actually a lot there to work with.
We were kind of disappointed by the amount that had been released.
And so, this was actually our attempt to do something interesting with the data,
even with the sort of limitations that they had.
So we haven't pushed this into another domain yet, but, you know,
I will say the way it currently works pretty well
as if you have continuous measures and you have a modest number of them
and a relatively large number of participants.
So if you could imagine sharing data from tens of measures if you have tens
of thousands of participants.
I think this can be improved dramatically, but there's some methodological work
that needs to be done to help sort of expand the number of variables
that you can reliably generate and also to, alternatively,
reduce the sample size required to generate them.
>> Great, thanks.
Gerry Lee online has got a question.
Gerry, go ahead.
>> Yeah, so to make the deep neural net more useful, obviously,
the first step is to make correct labeling and predictions.
Once you do that, I think the next step,
once it's outperforming other machine learning techniques,
and even outperforming humans,
I guess the next step is asking why the black box made the correct labeling
or correct prediction?
So rather than just being a black box,
can we do well at the interpretable neural nets?
Do you have any thoughts around those lines?
>> Yeah, so I think there are some things that can help
to make neural networks more interpretable.
So I think the more the structure of the neural network matches your problem,
I think that can help a lot.
You know, we're working on some approaches to use sort
of multiple knowledge bases to try to interpret hidden layers at the same time.
You know, there's some other nice approaches to try to identify
which factors led a neural network to make specific predictions, but again,
you know, I think understanding --
I think we're at the stage now
where we can generally understand why a neural network makes a certain
prediction on a certain sample.
What we struggle with is then extracting that out to generalities.
And so, I know there's people working on it.
I'm excited to see what happens.
I think, there's just a lot still to be done.
So I don't have an answer for you on how to best do it,
but I'm excited about where we're going.
>> All right, I'm afraid we've overstayed our welcome in the room
that we're broadcasting from here,
and we're going to need to cut the conversation short.
Casey, if you're fine, I would recommend --
I know there are a couple more questions online to have people just reach
out to you directly to follow up and continue the conversation.
I just want to -- great.
I want to remind people our next presentation will be on July 18th
when our own Dowd Mersonmen [assumed spelling] will be presenting
at the speaker series.
So once again, thanks very much, Casey,
and thanks to everybody who's participated today.
Talk to you next time.
Không có nhận xét nào:
Đăng nhận xét