Yt is on video: Waching daily Jul 2 2018

In the late 1920s a young man outside Tampa in Florida picked up an axe and hacked his

family to death.

His name was Victor Lacarta.

At that point cannabis was not illegal in the United States.

And a man called Harry Anslinger had taken over the Department of Alcohol Prohibition

just as alcohol prohibition was ending.

So he inherits this huge government department that has just lost the war on alcohol.

It's riddled with corruption, and he wants to keep it going.

He had previously said that cannabis was not dangerous, we shouldn't worry about it.

He suddenly decided that cannabis was the most evil—literally these are his words:

"The most evil drug in the world."

He said it's much worse than heroin.

He said, "if Frankenstein's monster met cannabis on the staircase it would drop dead

of fright."

And he latched onto the case of Victor Lacarta.

With the help of kind of the Fox News of his day, which is called the Hearst Newspapers,

he announced that 'Victor Lacarta had smoked cannabis.

That's why he hacked his family to death with an axe.

And this is what will happen if we allow cannabis to spread, and we need to ban and prohibit

cannabis.'

It was in the wake of that case that cannabis was banned.

Years later somebody goes back, a researcher went many years later, decades later, a researcher

went back and looked at Victor Lacarta's files.

There's no evidence he even smoked cannabis.

His family had been told he needed to be institutionalized a year before because he was severely mentally

ill, but they decided to keep him at home.

The origins of the war on drugs, which I wrote about in my book Chasing the Scream: The First

and Last Days of the War on Drugs, overwhelmingly looked like that.

If you'd asked me when I started to do the research for Chasing the Scream why was cannabis

banned I would have guessed they would have given the reasons then that if you stop someone

on the street now they would give, you know, we don't want kids to use drugs, we don't

want people to become addicted.

What's fascinating is that stuff virtually never came up when they were banning cannabis,

right, or indeed the other drugs.

It was overwhelmingly a kind of racial panic and absurd hysterias about what was going

to happen.

Now the one thing you can say in defense of the war on drugs and the war on cannabis in

particular is we've given it a fair shot, right?

The United States has spent a trillion dollars, it's imprisoned more people than any other

country in human history including Stalin's Russia and Mao's China.

It's destroyed whole countries like Columbia.

At the end of all that we can't even keep drugs out of our prisons, where we pay someone

to walk about the wall perimeter the whole time.

Which gives you some idea of how well we're going to keep it out of a country with 2,000-3,000

mile borders.

There is an alternative for how we can think about cannabis.

There are places that have legalized cannabis and we can see the results.

So I spent a lot of time in Colorado, in Washington state.

I actually spent time in places that have legalized other drugs – heroin, for example,

has been legalized in Switzerland with extraordinary results.

There have been zero overdose deaths on legal heroin in Switzerland in the more than 13

years since they legalized.

But with cannabis specifically, again we can see the results.

When you ban cannabis several things happen.

The first thing that happens is, you will have noticed, it does not disappear.

It's transferred from licensed legal businesses to armed criminal gangsters.

Those armed criminal gangsters have to operate differently to a licensed business.

So you will have noticed the head of Budweiser does not go and shoot the head of Heineken

in the face, right?

Your local liquor store does not send people to go and stab people in the local bar, right?

Exactly that happened at the alcohol prohibition, right?

I mean exactly that.

And it ended the day alcohol prohibition ended.

Why?

Because when you ban drugs they have to operate in an illegal market where there's no – and

I learned this from lots of drug dealers I spent time with for Chasing the Scream, not

just for fun.

Though there was some fun in there as well.

An illegal market could only operate through violence, right?

You have no recourse to—If I go up here now and I try to steal a bottle of vodka,

you know, the liquor store will call the cops.

The cops will come and take me away.

So there's no need for that to be violent, to be intimidating.

If I go out of here now and try to steal a bag of weed from the people who sell it not

so far away they can't call the cops, right?

The cops would come and arrest them.

They have to fight me.

Now you don't want to be having a fight every day if you're a dealer, so you've

got to establish a reputation for being so frightening that people wouldn't be so stupid

as to come and fight you, right?

So this is the ratchet effect you get with prohibition.

You create a market that can only be regulated by violence and where there's actually premium

on being the most violent person and the most intimidating person, where you have to be

violent and intimidating to protect your market.

Now when you legalize that goes away immediately.

Where is Al Capone?

Does anyone even know the head of Smirnoff's name?

Everyone watching this knows Al Capone's name.

I bet nobody watching this knows the head of Smirnoff's name.

What changed?

It's not the drug.

It's the fact that it went from being illegal and therefor controlled by gangsters to legal

and being controlled by legal and licensed businesses.

The second thing that happens, which is a slightly more wonky point but I think is really

interesting and important, is that milder forms of the drug disappear.

So you often get people to say "Well, we can't legalize cannabis because the cannabis

people use today isn't the cannabis people used in the 1960s, right?

It's much stronger."

It's things like skunk and super skunk.

That's true, and it's entirely the product of cannabis prohibition.

If you want to understand why you've got to understand something: Just before alcohol

was banned in the United States the most popular drinks by far were beer and wine.

After alcohol was legalized again the most popular drinks by far were beer and wine,

as they remain today.

And when alcohol was banned you couldn't get hold of beer or wine anywhere.

The most popular drinks were whiskey and moonshine.

Well why is that?

Why would banning a drug change the form of the drug?

It's kind of a rather prosaic reason.

Imagine we had to smuggle enough alcohol for your local bar from the Mexican border or

the Canadian border, right.

If we fill our wagon with beer we'll get drinks for 100 people.

If we fill our wagon with whiskey we'll get drinks for thousands of people.

When you ban drugs there's suddenly a premium on getting the biggest possible kick into

the smallest possible space because it's got to be smuggled, right?It's got to be

transported in secret.

This happens with all drugs.

This is why cannabis has become much more potent.

The most popular form of consuming cocaine prior to it being banned was in tea—coca

tea.

You may remember a drink called Coca Cola.

That really did contain exactly what it sounds like.

They don't exist anymore.

What do you ever hear about coca tea, right?

Coca Cola exists but it's not what it was obviously.

The most popular way of consuming opiates by far was a drink called laudanum and something

called Mrs. Winslow's Soothing Syrup.

It was small trace of opiates, right?

Again they disappeared and the most popular form becomes heroin.

When you ban a drug only the most extreme forms of that drug become available.

Now most people who smoke cannabis don't want skunk, right?

Just like if I go into a bar around the corner from here today, probably very few people

are going to be drinking vodka and no one is going to be drinking Absinthe, right?

If you're concerned about the more extreme forms of the drug, which I think we should

be with cannabis because there is – it's not a huge relationship but there's some

relationship with dysfunctional behavior—Then you want to put a really high premium on getting

milder forms of the drug available to almost all the users who want it, right?

So that's another effect that happens with legalization.

And you know, one of the things that's so moving—everywhere I went where they moved

beyond the war on drugs, from Portugal where they decriminalized all drugs to Switzerland

where they legalized heroin for addicts, to Uruguay and Washington and Colorado where

they legalized cannabis, the pattern was always the same.

It is super controversial at first and understand that people are really worried.

And then they see the effects.

It's not a silver bullet.

There's still problems, but there's such a significant improvement that the support

massively rises.

So 55 percent of people in Colorado voted to legalize it in 2015.

Today 70 percent of people support it.

Governor Hickenlooper who opposed it now says it works really well.

They've had a huge increase in tax revenue.

I interviewed one former police officer and he told me a story about this that really

stayed with me.

One day in the 70s he was staking out a dealer in a car park in Wayne in New Jersey in plainclothes.

And a kid came up to him and said, "Hey mister, will you go into that liquor store

and buy me some booze?

I'm not allowed to, I'm too young."

He was like a 12 year old or something.

And he said "No, get out of here."

So the kid walked over to the drug dealer and bought some drugs from him instead.

And he had this kind of epiphany.

He was like "Oh, actually legalization, a legal regulated system puts a barrier between

children and drugs that does not currently exist."

If you're a parent who doesn't want your child to smoke cannabis (and if you're a

parent you shouldn't because there is evidence that it can impair development for teenagers),

you want to put a really high premium on getting cannabis out of the hands of armed criminal

gangs who don't care whether the customers are 13, 30 or 80 and getting it into the hands

of licensed legal regulated businesses who have something to lose.

This is why in Colorado all the research shows there's been a really significant fall in

cartel and organized crime activities since they legalized.

There's been a fall in teenagers using—It was already quite low, but it's fallen.

There's been a massive increase in tax revenue.

You know it's not perfect.

There are some things I would have tweaked in the Colorado legalization but, you know,

you can tweak a legal market.

We don't have any power over an illegal market.

There's nothing we can do about it, right.

At some point the president of Switzerland Ruth Dreifuss when she made the case to the

Swiss people for legalizing heroin— and she's one of the great heroes I've ever

met in my life—She explained to them, "You know, when you hear the word legalization

you picture like anarchy and chaos.

What we have now is anarchy and chaos!

We have unknown criminals selling unknown chemicals to unknown drug users all in the

dark all filled with violence, right?

Legalization is the way we restore order this chaos."

At some point we have to look at the results.

It's worked incredibly well, and let's look at the places that are maintaining a

criminal war on cannabis.

How well is that working out for you?

For more infomation >> Marijuana prohibition is racist and criminal, harms kids, and ruins lives | Johann Hari - Duration: 10:23.

-------------------------------------------

Deep Learning: What Is It Good For? - Duration: 1:04:04.

>> Good morning, everyone.

Thanks for joining us today for the NCI CBIIT speaker series.

I'm Tony Kerlavage, the acting director of CBIIT.

I want to remind everybody that today's presentation is being recorded

and will be available at the CBIIT website at CBIIT.cancer.gov.

You can find information about future speakers on that site

and by following us on Twitter.

Our handle is @NCI_NCIP.

Today, I am very happy to welcome Dr. Casey Greene,

Assistant Professor of Pharmacology at the University

of Pennsylvania's Perelman School of Medicine and the title

of Dr. Perelman's talk is Deep Learning, What is it Good for?

With that, I'll turn the mic over to Casey.

Welcome.

>> Thank you.

I hope you guys can hear me.

Okay, yes, I see speaking.

Perfect. Yeah, so I'm going to chat about some of the work in deep learning and,

you know, if you follow deep learning at all on sort of social media,

and you try to see the conversation around it,

I think there's a lot of constant discussion about, you know, how useful it is,

and so I think most people feel it falls somewhere on the scale,

where things go either from not at all useful to entirely useful.

So, you know, I think I'll tell you little bit about what we've been up to.

I think the answer is that deep learning is probably more useful than war.

So the answer would not be absolutely nothing, but, you know,

it's probably not quite at the stage where it's also sometimes sold to be.

So I think the answer is kind of in the middle ground.

Before we dive too far into, you know, what deep turning is,

I want to talk about, you know, just briefly, like what machine learning is,

and then we can talk a little bit about how,

at least in my perspective, there's the difference.

So with machine learning, we're going to take a computer.

We're going to add in some features.

These could be things that we think are, you know, important.

If you're thinking about, you're thinking about, you're buying a house,

you might think about, oh, the number of rooms that it has,

the number of square feet, this type of information.

So it's information about the house.

We have some potential outcomes, so like the price that the house sold for,

and we're going to train a computer to build a model,

which will then let us make a prediction.

So we're going to essentially ask

which of these computed features are useful for predicting an outcome?

That goes in.

We create a model, and then we can say, oh, this helps.

It would cost this amount in the future.

We move from houses to cancer.

So this is a study that we recently participated in.

Part of the, actually, MTI PanCancer Atlas.

So this is just going to be a machine learning workflow, no deep learning,

and the specific part of the study we worked on was identifying TP53 loss,

and we wanted to identify it from gene expression data, actually.

And the reason we want to do that was because we suspected

that we had certain cases where we knew TP53 was lost,

and then we had other cases where there was no mutation

that suggested TP53 was lost, and we suspected that in these individuals,

there would be a subset of them that might've lost TP53

through some other mechanism that, yeah, we were quite ready to pick up on,

and so we talked that looking at gene expression

as a predictor could help us find this.

Just in terms of, for those of you who really want the technical details,

this is part of the PanCancer Project, and so, in the DNA damage repair pathway,

so it's actually in that paper,

we ended up using the genes that were the most variable,

and we used elastic net logistic regression.

We did cross validation and had a held-up test set.

And so, this sort of shows the performance of the predictor.

This is the both-positive rate.

This is the true-positive rate.

A perfect predictor would essentially go straight up and then over.

This would be an area receiver/operator characteristic curve of one,

and what we see is that we're doing pretty well.

So this is 93%.

So it's quite high, and we see that for the training set,

the independent held-out test set, and also,

when we assess through cross validation.

So the other nice thing that we see is we don't observe a lot of overfitting,

and to give credit to Greg Way [assumed spelling],

he's the student in my lab who did this work.

And just to give you an idea of the types

of things you can learn from something like this.

So this is looking at what should be an orthogonal measure.

So this is the CNV version of the tumor.

So this is never set in to the machine learning classifier.

The classifier is taught using mutation information.

So a somatic variants, MTP 53,

and it's thought to recognize a mutation via gene expression.

So it never sees CNV information directly.

And so, if we look at the CNV burden in wild-type samples.

So this is what we think is actually true.

These guys, when we think, do not have TP53 loss.

We see that they have a relatively low, in general, CNV burden,

although there might be some subset of samples that have high CNVs,

potentially, from another region.

For the samples that have a predicted wild type, we see the same thing.

So these two match, which is what you'd expect,

given that the classifier appears to be accurate.

When there's TP53 lost, so these have a mutation in TP 53 lost.

So these have a mutation in TP53 that we would expect

to cause a loss of function.

You can see the CNV burden shifts dramatically,

and in the ones where we predict a loss,

so the classifier predicts a loss, you also see the shift.

And of course, the reason we did this analysis is because we wondered,

are there ones where the actual and predicted values are potentially different?

And we think that they're, in fact, losing TP53.

So this is a 375 G to T mutation in TP53, a silent mutation, and what we see,

our classifier called 18 of the 19 samples in TCGA

that have this mutation of TP53 lost.

When we look at the CN burden, we, again, see an increased fraction of them --

many of them have this increased CNV burden.

And so, we think these are actually the types of things we were looking for.

The case where somatic sequencing and evaluation

of somatic variance didn't suggest a T53 loss,

but the gene expression identifies one, and then, in this case, you know,

we can come up with a potential mechanism to back this up.

So this is a silent variant that occurs right near a splice site.

So we looked at splicing, and so,

this is pulling information out of Snaptron from, if I remember correctly,

Jeff Leak's [assumed spelling] group.

And when you pull the samples out of Snaptron, the ones on the left here,

this entire bar, are all of the samples that had this "silent variant."

This is the probability that our classifier assigns,

that they would've lost TP53.

So our classifier think there's a 97 percent chance that this sample,

TCGA4LZAA80, has lost T53 based on gene expression, and then, on the right,

these are just random samples pulled from TCGA.

You can see the probability that our classifier assigned to them.

So since we just pulled them randomly, it's pretty widespread of probabilities,

but one of the things you quickly notice is that the X on four to five,

which is this blue, so this is the canonical splice site,

is present in all of the samples that did not have this variant, and actually,

we don't see any other junctions, but in the samples that do have this variant,

you can essentially see it is splicing events sort of spread all

over the locus in most of them.

And so, what we think is actually going

on here is this silent variant is affecting splicing,

just causing at least one of these isoforms to potentially act

in a dominant negative manner.

So we're able to take the gene expression classifier.

Take it to -- and to identify these potential additional samples that appear

to also have a loss of TP53, and with a little bit of digging

and some light understanding of the biology,

you can actually start to suggest the mechanism behind that.

So that's what, you know, machine learning can get you.

It can help you find [audio cuts out] unexpected cases where everything sort

of looks like what we, you know, hope to find except for the fact

that they didn't have something that was "high-impact somatic variance."

And actually, this variance actually in --

it's been reported, it's associated with [inaudible], as well.

So there's additional support there in the genetics literature.

And so, if you're really interested in kind of just the machine learning side,

you know, I think there's a lot of ways to kind

of do a solid machine-learning analysis that returns value.

So this is the case that is not deep learning, but, you know,

I think we should not sort of forget the fact that sort

of standard machine learning techniques applied

in a rigorous way can be really helpful.

So Greg did analyses for both of the --

so he actually led the Ras Pathway paper for PanCan Atlas,

and he contributed to the P53 analysis,

which went into this one for the DNA damage repair pathway,

and both of these are mentioned a little bit

in the oncogenic signaling pathways paper.

So if you're interested in this kind of machine-learning approach,

I would courage you to check those out for more detail.

As a quick side note, Greg and a post-doc in the lab, Daniel,

decided this was a kind of fun workflow,

and it would be really awesome if anyone can do it, and around that time,

we were also approached by a Philadelphia organization called Code for Philly,

which is -- they do programming for the social good,

and also an organization called Data Philly,

which is a data science meet up group in Philly.

And together, these guys came up with this project, Cognoma.

So the idea was the same type of, the sort of way,

style machine learning analysis but for everyone.

So they started this Cognoma project, and for about a year and half,

they actually worked on it about every two weeks for a couple hours.

We ended up having quite a few people show up.

There's actually [audio cuts out] web servers.

So if you got to Cognoma.org, you can actually run the same style of analysis.

You go in.

You select your mutations.

You select your tumor types, and it will create a Jupiter Notebook,

which we template the values into, run the Jupiter Notebook,

and then email you a link to the results.

So if you're interested it, Cognoma.org.

The other cool thing about doing the project this way was when a lot

of people engaged in the process who otherwise wouldn't.

So this is a picture.

This is actually Daniel Hemelstein [assumed spelling],

a post-doc in the lab at one of the meet-up service events.

So we would host them in a local co-working space.

People would come.

They'd hack away on these for a while, and at the end of the day,

the second goal was that everyone would learn something.

And so, honestly, for about a year and half,

at least one person from my labs ended

up teaching data scientists two hours' worth of cancer biology while, you know,

a subset of core contributors continued to work on the project,

but we ended up having about 40 people who contributed in a way

that actually made it into the master.

So when you go to the website, you know, 40 people made a contribution that sort

of made that possible, and so, this is kind of a fun exercise.

We think that probably on the order of 500 to 1000 people ended

up coming through these workshops.

So just a random side note.

Okay, back to deep learning.

So what makes deep learning different?

So remember with the machine learning information, we had much of this in place.

So we have this outcome information,

and we had the model, we had the prediction.

So these things didn't change.

What changed was this.

And so, before, you know, when we were talking about buying a house,

when we're buying a house, right, we give it the square footage.

We give it this room information.

You know, we might give it, you know, a neighborhood.

All of this stuff is information we feed into the model with machine learning.

For deep learning, we think that we'd like the algorithm

to actually learn what the important features are.

So you could imagine instead of telling the algorithm about the number of rooms,

you might just feed it a bunch of pictures of the house.

So you just like walk camera through the house, and at the end,

what the deep learning algorithm to tell you what it would cost.

So the inputs for the deep learning in the ideal world are much more raw.

So the idea is the computer's responsible not just for taking some features

that might be important and creating a model and creating a model

and making predictions from it,

but it's actually got to take this very raw input,

merge it into complex features,

and then use those complex features in this model.

So this is sort of the deep bed, right?

You go from instead of just having your input going into the model

to the prediction, but there's this sort of intermediate step

where these complex features are learned.

So if you're familiar with deep neural networks, which are all the rage,

as soon as you get to neural networks of size essentially two or above,

you end up in this world where you got these intermediate features, and so,

two or more hidden layers you end up in this world, and the way they're trained,

the features can actually be modified by some of the outcome information.

So it ends up being a really powerful approach if you have certain ingredients.

So this is just a nice little graph that, in fact, contains almost no data

but a lot of truth, which is kind of exciting.

So, yeah, so if you think about where deep learning really works,

it tends to work in cases where you have lots and lots of data,

because you need to build these features.

So for the traditional sort of standard machine learning algorithms,

because you're feeding and useful features,

you can get away with a lot less data, but as you start to get lots

and lots of data, the deep learning algorithms end up being able

to build better features, and we can sort of A priority build in.

So as you, you know, if we expect to see rapidly increasing amounts

of data I think we're going to see our increase in emphasis on methods

that are capable of doing this sort of complex feature engineering.

So what could you do if you were going to use deep learning or these types

of methods in this context?

So one thing you could do is you could try

to understand these features themselves.

So as you're building these models,

they got this sort of feature engineering steps, and the question would be like,

okay, how do those features work?

And so, the first project I want to talk about, it's not in cancer biology.

We're actually going to look at microbial systems,

but it starts to get at this question of kind

of building an understanding these complex features.

So if you really wanted to know how a living system worked,

just starting from data, not assuming anything

about the system itself other than, perhaps, you know, the complement of genes,

what you could imagine doing is something like this experiment

where you serve 1000s of biologists

and then identify what they thought the most important experiments

of the moment were.

You'd then perform those experiments, process the data,

and then analyze that data to see what the regulatory patterns are.

Essentially like what experiments would you do if you could do them,

and then do a lot of them in sort of this standardized framing.

The challenge, of course, for this, well, talking to NCI folks.

So you've probably noticed.

You can imagine this would be a very difficult type of proposal to get funded.

It's not -- it's a fishing expedition, right?

It's not a sort of well-justified hypothesis driven proposal, and essentially,

the only way to do these types of fishing expeditions are things, you know,

I think like DPGA, where they're very large initiatives and large efforts,

and for this case, you really have to do this for many different systems.

So it would be extremely challenging to imagine how it would work.

On the other hand, if you're just a person who happens

to have an Internet connection,

you have another pretty valuable resource at your fingertips.

So right now, if you have an Internet connection, you can actually download --

these numbers are a little bit out of date,

but about 2.2 million publicly available genomide gene expression assays,

which if you ballpark that each of those probably cost at least $1000

to generate, when you consider sort of materials,

reagents, supplies, person time.

That's about a $2 billion data resource.

The challenge of working with this data is, of course,

it's vaguely like the experiment that we designed above,

except people don't generally upload high-quality metadata alongside this data.

So, you know, in the ideal world, you'd know, oh,

this is exactly the modification that happened here,

and here's all the other conditions that were used.

So if are looking at a microbial system, we'd say, oh,

the bacteria were grown to this concentration.

You know, all this sort of information is not always available.

And so, you have this really valuable data resource,

but we don't have any of the metadata to necessarily work with it.

So, you know, for a lot of this, this is exciting but challenging.

You'd imagine like you've got this really valuable resource, but essentially,

it's posed also with this impossible question,

in that there is experimental design behind it,

in terms of like people are actually doing experiments that are often sort

of well-designed, but the challenge is that we don't actually have any

of that experimental design available to us.

So any analysis that we would like to do that relies on it,

we're essentially stuck, right?

We couldn't do it.

And so, what we wanted to do with this work is say can we take a sort

of new class, new approach to machine learning and actually integrate this data

without having to assume anything about the metadata describing this?

So we don't have any sort of sample labels in this case.

So I'll pose a challenging version of this first.

I blacked out some things on this slide.

Normally I ask people what I blacked out,

but I suspect that won't be very effective in this type of setting.

So, you know, I can tell you it's basically impossible to figure

out what I blacked out, because there's essentially no context here, right?

You've got just a blank slide with some gray squares on it.

If, on the other hand, you had a little bit more information.

Like I've blacked out something here, but I'm guessing you can fill it in.

So, in this case, you know, you might recognize this is the Nike logo.

So you say, okay, you blacked out the word it.

I recognize the Nike swoosh in the corner.

I recognize the words just do it as a really famous Nike slogan.

So even though I removed information, you were able to still reconstructed.

You can do that, because you've had this context.

So the approach we wanted to use was one that forced computers to this

in the gene expression spaces.

Essentially, you take images, or in this case, you take gene expression values.

You block out random values,

and you force the computer to reconstruct what was held out,

and that reconstruction forces the computer to identify the sort

of dependency structure like when there's a Nike Swoosh,

the word it usually appears, and so, we weren't the first to do this.

So there was a really nice paper out of Google

where their group showed 16,000 computers 10 million images from YouTube

and then forced them to do the same type of task.

I won't get into the details of this paper,

but this is one of the most famous neural network nodes that I think now exists.

And so, this is one of the outputs of --

one of the results in their paper is they found that there was a node

that recognized this type of feature, and if you happen to have cats

or are aware of cats, what you can see is this algorithm learns just by looking

at random, still images that cats exist, and so that's pretty exciting.

The same type of structure for biology.

Looks a little bit different but not too different.

In this case, we're going to look at gene expression.

So we've got input values.

This is one sample.

It has no associated metadata.

So that we know nothing about that sample,

except we have gene expression measurements.

So in this case, this was a gene with low expression.

This is another gene with low expression.

This is a set of genes with high expression in here,

and what we're going to do is we're going

to train a neural network to do something with that.

First, we're going to remove some data.

So we're going to drop out some samples.

So the ones here with Xs, I've removed the information

about gene expression with those.

And we've got this neural network.

This one's already been trained a little bit, and so,

with this trained neural network,

we're going to feed that as input into the neural networks.

Some of the nodes will be off, in this case, blue.

Other nodes will be on, in this case yellow.

And the reason they're on is because all the genes that feed into it

with high weight shown by the thickness in the line are red.

So these were highly expressed,

and so the neural network node they feed into is on.

The ones that feed into this have low weight.

So the neural network node is off.

And the neural network has just been trained to reduce the error

between a reconstruction where it takes this,

runs it through the neural network,

and uses it to build a reconstruction, and the input.

And what that means is, in this case, the neural network would have inferred

that this gene was expressed at relatively low level.

So it should be filled in with this, you know, this sort of green color,

and in this case, the gene was expressed at a moderately high level.

So it should be filled in with this red color.

And actually, you can see that sort of vaguely matches the beginning.

So this is sort of the sketch of how it works,

and the way training proceeds is you're essentially teaching the neural network

to modify these weights, such that this is very similar to this,

even though you've added noise at this step, and this neural network can be,

essentially, of arbitrary number of dimensions.

So you could have one layer.

You could have two layers.

You can have three layers, depending sort

of how complex the features you want to build are.

So the work I'm actually going to talk about today,

we're using both of these are going to use single-layer neural networks.

The reason we do that is because we have this sort

of biologist interpretation stage, which turns out to be important for us,

and so, we found those are sort of easier to interpret.

There's some really nice methods for understanding for deeper models how

to understand what features contribute, but from an interpretability stage,

I think all of those methods still leave a little bit to be desired.

So, in this case, this example all be a single-layer approach.

So there are complex features but still constructed

from the gene expression values, not from other neural network layers.

And the test case I'll talk about today,

we are Pseudomonas aeruginosa compendium.

So Pseudomonas is an opportunistic pathogen.

It causes infection that people with compromised immune systems,

particularly cystic fibrosis.

So it'll cause individuals with cystic fibrosis,

it'll cause lung infections that are difficult or actually impossible to clear.

It tends to form these really difficult to deal with biofilms.

The reason we started with this is that there's about 100 different experiments,

just over a couple thousand assays, and so, ballpark,

you're thinking about a million dollars worth of assays.

From a computational point of view, the reason you might want to start here is

because it supposes all the sort of problems of the larger data sets.

So you can train the neural network model on your laptop

in a reasonable amount of time.

So if you're sort of doing development, figuring out to these methods even work

if you do them for gene expression data as opposed to CAT detection, you know,

it's a nice place to do that type of experiment.

Just a sort of quick couple results to demonstrate how this thing sort of works.

So one of the things we wanted to do first was just validate that these type

of approaches applied to gene expression data are sensible.

In this case, we did this by looking at,

there's a couple of results of the paper, but the one I'll just mention here,

we looked at a certain known transcription factor, ANR,

for which there was a chip-seed experiment,

which turned out to be helpful to us,

and ANR-controlled Pseudomonas's response to low oxygen.

So you can imagine Pseudomonas might be swimming freely in an environment

of high oxygen, but as it moves to the surface of the lung of an individual

with cystic fibrosis, that ends up being a low-oxygen environment,

and it'll form biofilms in this thick,

sticky mucus that ends up in an anaerobic environment.

That turns on ANR, which does a number of things,

including turning on some things that we see as virulence genes that sort

of start to damage or trigger a response on the surface of the long.

And so, there's a health reason to be interested in this, but also,

a kind of a methodological reason,

and that's that we have chip-seeding experiment for ANR.

So we actually can go into the neural network and identify the know that has

where the weights of the node correspond most chip seed experiment.

And when we do that, we end up with a note,

which is actually termed -- it ended up being node 42.

So we didn't name it node 42, it just happened to be node 42.

So if any of you guys are Hitchhiker's Guide to the Galaxy fans,

you'd probably know that the answer to the ultimate question of life,

the universe, and everything.

Apparently, it's also the answer to the ultimate question of, in this case,

our machine-learning methods.

And, so, we did an experiment to validate it.

So first, we looked at a bunch of public data.

The results are in the paper, but if you look at the outputs of this method,

as opposed to things like PCA or ICA.

So principal components analysis or independent components analysis, the method,

the sort of denoising auto-encoder methods, which we called ADAGE,

analysis using denoising auto-encoders of gene expression,

actually does a really nice job of describing the ANR activity in public data,

that we can infer based on the sort of light metadata that annotated

around oxygen availability.

What we really wanted to do was then generate a new experiment

that didn't exist anywhere in the compendium,

which is an ANR knockout experiment on CF genotype airway epithelial cells.

So in this case, there's a monolayer of airway epithelial cells grown in a dish,

and then Pseudomonas is put on top of them,

and it actually will form these microcolonies where it forms this biofilm,

and when it does that, it creates an anaerobic environment.

So this is a case where ANR should be on.

And then, we're going to knock out ANR and ask what happens

to node 42, so the ADAGE node.

So we did a microarray experiment.

If you knock out ANR, what you see is that the node that we associated

with ANR activity actually turns.

So you connect this with what exactly you predict.

The wild type remains on.

You know, I think one thing that's really potentially of interest is the sort

of robustness of this model.

So if you take this model and apply it to microarray data, which is,

essentially, what the training data largely are.

Actually, in this case, all the training data microarray data,

and so if you take it and you apply it to microarray data,

the model describes this experiment that had never been done before that wasn't

in the compendium perfectly.

If you then ask how robust is the model?

You know, microarrays are sort of old technology.

Could you apply to something more modern?

The answer turns out to be yes.

So Jack Hammond, who is the grad student and my collaborator, Deb Hogan's lab,

also ran two RNA seek experiments, one using the lab strain.

So these both use the canonical lab strain TA01.

This one actually uses a clinical isolate called J215, and in those,

we also see that the gene expression --

that the neural network describes the ANR activity in those experiments,

as well, despite the fact that there's this large kind of platform shift here.

And so, you know, we think these models are actually quite robust.

Some of this training process with removing noise appears

to make the models relatively robust, and also,

kind of the structure of the neural network itself appears to be helpful

over some of the linear methods, like PCA or ICA.

So this was pretty exciting to us.

We ended up publishing this.

This is sort of the first paper in a series

of denoising auto-encoder papers for gene expression data.

Actually, there's one before this that's from PSD in 2015 that we wrote

that focuses on breast cancer, but this is the first one we had sort

of solid experimental validation that the model actually described things

that were from experiments that weren't even in the compendium.

So that was really exciting.

First author is Gia Tan.

Jack Hammond did all the experimental work on it, and my collaborator,

Deb Hogan, I have a microbiology lab, which is what Jack worked in,

and we worked together on sort

of designing this project and carrying it forward.

And so, if you like this, the code is also on GitHub,

for those of you who like to find some source code.

So that's cool.

So we've been able to validate this method.

The question then is like, that's great.

You can validate it.

Do you actually learn anything from it?

So I love this quote from Albert Szent-Gyorgyi,

which "Research is to see what everybody else is seeing

and to think what nobody else has thought."

That's what we'd like to be up to do.

We'd like to be able to reanalyze data but identify patterns

that the original investigators did not find when we go back

and do that reanalysis doing this sort of new approach.

So here's an example of that.

This is the phosphate starvation pathway in Pseudomonas.

It's supposed to be two-component system.

So they're tightly regulated.

When Pseudomonas are in low phosphate, Pho R becomes active.

That activates transcription factor, Pho B, which turns on a lot of things.

One of the one that matters here is that it'll turn

on alkaline phosphatase that'll turn media that contained B set blue.

You can see the pathway diagrams here.

In wild type, it turns blue.

If you knock out two other genes, TSTB or TSTA, it also turns blue.

If you knock Pho R or Pho B, it does not turn blue.

So this is how the pathway works.

Sorry, there's a siren in the background I assume you can actually hear it.

This is what we end up observing.

If we go in and we use our neural network to identify cases of low phosphate,

and we start poking at them, we start to see cases like this,

where in the wild type or in the PST DNA, you know,

it looks just like you'd expect, but in this case,

Pho R doesn't appear to be required.

So this is an experiment run on peptide media, and in fact,

this happens not just once, but actually,

so this is our phosphate starvation signature in this case.

So the things that are high here, we expect to be phosphate starved.

So you can see, there's a big set up them up here

that we think are phosphate starved.

This is PIA.

So this is one media.

You can see it's got the sort of funky shape to it,

where it actually doesn't seem to require Pho R. Peptone also doesn't seem

to require Pho R. This is the only experiment that looks like it was done

to actually assess phosphate starvation.

So this is a public experiment that just happened to be out there

where there's this NGM media with high phosphate,

compared to NGM media with low phosphate, and you can see it induces a shift,

and in fact, in this case, it looks just like the textbook.

So if you do the experiment at the extremes, it looks just like the textbook.

So this is the type of thing we're trained to do, right, as scientists.

You take something.

You change it a lot, and you see what differed.

But what we see in most cases is, actually,

that the pattern actually doesn't look like the textbook, and so,

what this suggests is, you know, yeah, sure,

the textbook describes the extremes, but what happens between the extremes?

So the very high in the very low phosphate,

that appears to actually be different.

So there's a nonlinearity here, and if there's that nonlinearity,

and we're doing all our experiments, you know, at the extremes,

so we can get the largest effect size, so we can reliably publish a paper,

we're going to actually miss the biology that occurs in the middle,

which is particularly problematic if that biology

in the middle is physiologically relevant,

and the biology of the extremes is potentially less physiologically relevant.

So, you know, there's the goal of getting a paper can be somewhat diverged

from the actual goal of understanding how a system works

at physiologically relevant concentrations.

The question would be can machine learning help us fill in the gaps,

and can the sort of neural network help us fill in the gaps?

So one of the things that we noticed when we did this quick analysis that, hey,

there's this funky thing going on here where PIA has a really large band,

and in fact, it's got six samples that are really high phosphate starvation

and six samples that are really low for phosphate starvation.

And these are two completely different experiments done at different times.

So Albert Szent-Gyorgyi would be happy.

We're looking at other people's data and finding things that they didn't notice.

In this case, this is just PIA, with PIA plus an additive.

This one, actually, is an experiment that looks

like it's targeting understanding what RPON does.

So these people deleted RPON in the context of a KinB deletion,

which they compared to a KinB deletion.

So neither one of those individual experiments are too helpful to us,

but the difference between them is because they're both run in the same media,

and they show a really different signature.

And one of the things that's really kind of cool is that those are the ones

that have low phosphate starvation, actually have this deletion KinB,

which actually suggested a potential mechanism.

So maybe KinB is actually mediating, to an extent,

this change in phosphate starvation, and sure enough,

so some students in Deb's lab actually did a fair amount of experimental work

to get to the bottom of this.

So this is Pseudomonas grown on a minimal media.

It's called MOPS media, and in MOPS media they're then titrating

in increasing amounts of phosphate, and we're looking at a wild type, our Pho B,

Pho R, and KinB deletion, and the thing to really focus

on is the wild type starts to turn off the phosphate starvation signature

on this media, just after 0.5 millimolar,

while the KinB deletion actually does this at 0.4.

And so, what you can start to see here is that actually it looks

like KinB is actually modulating the level

at which phosphate starvation turns on.

So now we know.

We don't have an exact mechanism for how KinB does it,

but we now know that KinB is one potential input to this pathway.

One of the things we worried about kind of with this result is that,

you know, KinB is a kinase.

We're looking at phosphate starvation.

Maybe any kinase would induce the same thing, right?

We're only looking at the public data.

That's how we found KinB.

So it's possible any other kinase could induce the exact same thing.

However, we worked with the Laub Lab at MIT to actually test.

So they have a deletion collection of every histamine kinase

in the Pseudomonas genome, and so,

we did the same sort of assays for every histamine kinase

in the Pseudomonas genome, one by one,

and this effect actually appears to be specific to KinB.

So this is the way were you can use these types of features that are learned

by these neural network approaches to then go potentially fishing in public data

to identify this new player in with actually a textbook pathway,

and really start to fill in some of these missing pieces, particularly --

especially when the sort of, you know, intermediate concentrations differ

from what happens at the extremes, where we tend to sort of do our experiments.

And so, if you're interested in this, the paper's out now.

It was published in Cell Systems last year.

The first authors are Gia Tan and Georgia Doing, and again,

this was a collaboration with the Hogan lab,

and then the Laub Lab also helped us out with deletion collection.

And if you prefer to find the source code online or to reproduce these results

or apply to your own heading, in this case, the work is on BitBucket,

so BitBucket.org/greenlab/CADAGE.

Okay, so this is one way that deep learning can help.

They can sort of power these exploratory data analysis methods,

and what we found is that the sort of neural network framing is often sort

of efficient to train and robust framing for a lot of these efforts.

So in cases where the sort of standard ICA and PCA approaches struggle,

we found that the neural network approaches can be a little bit more robust.

Is there anything else you can do?

So this is one potential use case,

these kind of large-scale exploratory analyses.

This is something we played around with a little bit recently, too.

So is there a way that you can take data that are not shareable

and make them shareable using deep neural networks?

So you can imagine, you know, we got all these data but are locked in these,

in this case, we're looking at clinical trials data.

So they're sort of locked behind walls to protect the individuals

who participated, which is, you know, and admirable thing to do,

but it means that it's very hard to then do secondary data analysis,

because even doing secondary data analysis requires going through many steps

for sort of regulatory approval.

So the question is, sort of, can deep learning help us keep secrets?

You'd imagine it'd be really amazing if you could just take a clinical trial,

feed it into some sort of neural network, and generate data that were synthetic.

So they didn't contain any actual people from the clinical trial,

but they have the statistical patterns

of the clinical trial at the individual level.

So people could go back and actually do secondary data analysis on them.

And so, this was a project that a student

in the lab [inaudible] Jones worked on, and the way we ended up approaching this

with something called a generative adversarial neural network.

I don't know if any of you sort of have read some of those sort

of popular literature or, I think the MIT Technology Review named --

they're called GANs for short, G-A-N.

So they named GANs one of the top technologies to watch of 2018.

So if you feel like getting a little bit more of a back story on GANs,

there's a couple really nice feature from the MIT technology review

over the last -- from pretty early this year, I'd say January or February,

if you're interested in the topic.

The way these methods work is you essentially pull

from random values and coordinate space.

You're going to feed these into one neural network.

This neural network is called a generator.

It's supposed to make new data, and then you're going to feed the output

of that generator into something called a discriminator neural network,

which is trying to decide if the data come from the generator.

So if they're fake, or if they're actually real data.

So they're just actual data from the study,

and the discriminator neural network's job is to figure --

is to say, hey, these data are real, or hey, those data are synthetic.

And the way you end up training this information

from how the discriminator makes its decision,

ends up working its way back to the generator,

and so the generator starts learning how to take these random values

and coordinate space, which have essentially no meaning,

and to produce synthetic data out of them

that look indistinguishable from real data.

So if you prefer not to think in neural networks,

but you're a huge board game fan,

if any of you have ever played the game Balderdash, it's sort of the same idea.

So in Balderdash, each person gets a card.

They get a word, and then they have to either write the actual definition

of that word, if they happen to know it, or they have to make up the definition

of the word, and once they make up the definition of the word,

all the values get -- all the cards get pooled,

and then people have to decide if that's actually a real definition

or a fake definition, and if you cool people, you get points.

And so, this is essentially what the neural networks are doing that's

in balderdash, where you're trying to generate fake data

and then a different one is trying to figure out if the data are real or fake.

So balderdash but for neural networks.

The new way to generate synthetic data, right?

And so, if you actually do this, it ends up working surprisingly well.

There's some steps you have to take to --

the off-the-shelf GANs don't actually work quite as well in this scenario.

There's a few steps you have to take to improve them.

So one thing we had to do was, despite the fact that, you know,

these neural networks are trying to make synthetic data.

There's no guarantee that they can't memorize, for instance,

one individual and then just return the values for a memorized individual.

So that way, they could potentially leak data.

So we also added something called differential privacy.

So in this case, if you go to this paper, which is,

if you're really interested in this, what I recommend.

The cases where we call it private, so that's again,

with differential privacy layered onto it.

We also evaluate GANs without differential privacy.

So these are ones where they're creating synthetic data,

but we can't put any guarantees.

You can imagine a GAN could theoretically learn

to regenerate actual people from the data set.

There's no protection in this one.

So these do not have that protection, and then this is actually real data.

This is data from the Sprint trial.

So these people are getting standard therapy or intensive therapy,

and what you can end up seeing after a fair amount of methodological work

to figure out how to make this work, this is one of the final results figures.

You end up saying that the data from the neural network,

especially the blue data, so that's the one to focus on.

The private data end up looking a lot like the real data.

So this is the average systolic blood pressure for people in both the real data

and the GAN-generated data.

So now, essentially, the neural network is making new data for us,

and the data from the private model, we can say, given our privacy budget,

we can say data actually has none of the individuals from the initial set.

So it gives anyone who participated in the study plausible deniability

and would allow large-scale sharing of the data

with a quantified amount of risk.

And so, we're really excited about this.

So if you're thinking about, you know, how would you share the outcomes

of clinical trials, particularly for targeted variables in a safe way,

we think this is a nice way to go forward,

and that would allow sort of further secondary data analysis.

One of the things we also did with this is we have a panel of three clinicians

who each received 100 records, half real, half synthetic,

and they tried to figure out if the records were real or synthetic,

and in this case, they were not able

to distinguish whether the data were actually from the Sprint trial

or the neural network had made them up.

So I guess you could say that in this case for this very weird,

limited touring test, the neural network passes it.

So if you're interested in this, the code is on GitHub,

and the paper is up at Bio Archives, so if you're interested.

This is the title, and then the co-authors on this,

so Steven Wu helped with the privacy work.

Chris Williams helped with a lot of initial neural network optimization.

Ran Lee, Sanjeev Bavannani, and Brian Byrd all helped with --

they're actually the physicians who did this blinded evaluation,

and if you're interested, I think this was a fun one to check out.

And then finally, I've given you some examples of our work.

That's sort of are places where, I think, deep learning has been useful to us,

or these types of neural networks methods have been useful to us.

But I also want to talk a little bit more generally about what we see.

Initially, I was going to do a bit of a talk

on some things other people have done, as well,

but I was sort of looking at time,

I realized that I'm probably it would be better to just direct you

to just really nice article by Sarah Webb.

So this was in Nature earlier this year.

Sarah Webb wrote this article, and she highlights a lot of the different ways

that deep learning is starting to make contributions.

And so, I'd recommend if you're interested in kind of a broad study,

take a look at this and get some ideas

for other places it's being used, as well.

I just want to give you a little bit of perspective.

So we're pretty heavy users of these types of algorithms.

We do some methods development like adding differential privacy to GANs.

You know, I want to give you some perspective on sort of where we are

and what they're good at and what makes them work and what makes them struggle.

So you could imagine, you know, deep learning's just utterly magic, right?

We can just go out into the world now.

We've got -- I got you all the way to the everything side of this line.

So deep learning's good for everything.

I want to pull it back just a little bit.

So, I hate to tell you, deep learning isn't magic.

There's lots of things that you need to do to make it work, and particularly,

if your problem is not an off-the-shelf problem

that everyone has already solved, life is much more painful.

The downside of working in computational biology is most of our problems

for deep learning are not off-the-shelf problems, because, you know,

we're not primarily working on image data that's sort of very well aligned

with the types of images that the, you know,

large tech companies are working on.

So part four, final part, practical deep learning.

If you want to make your deep-learning algorithm work, there are two steps.

Easiest one, get a lot of stuff.

So just go collect as much data as you can collect.

Label it perfectly.

So take all your millions of data points.

Give them perfect labels.

Imagine you're putting them in a museum.

If you have this, you're probably done.

Deep learning is very likely to work for you.

So if this is the setting you're working in, where you got lots of really,

really, really well-labeled data, I would encourage you to just start --

go out and just try it and start using it.

I think you will be pretty excited with what the results are like.

If, on the other hand, you're like me,

and so you don't have one of these things, wait.

I don't have that.

So in this case, we have lots of data, right?

We have all this public gene expression data.

We got lots of public literature.

What we don't have is the labels.

So for us, you know, we don't live in a world where the deep-learning methods

that everyone else is using always work off the shelf every time.

What we find is we often have to do something different,

and a couple of years ago, we were starting to recognize that, you know,

maybe just the sort of standard discussion

of how deep learning was working wasn't entirely sufficient

for understanding deep learning in biology in the same way.

And so, I guess August 2016 I ended up starting this project where we were going

to write a review paper about deep learning in the open on GitHub,

and anyone can comment and contribute, and actually, we did this.

And so, now the paper's been written.

Is actually published now at the Journal of Royal Society Interface.

We ended up having about 40 people who ended

up coming along and making a contribution.

So if you read this, it's essentially the perspective of 40 different people,

many of whom are sort of experts in the field and working in this area.

And it's a relatively comprehensive tone.

So I think it's around 36,000 words, but it's nicely divided into sections.

So if you really care about imaging, or you really care about clinical records,

it's basically got this state of the field as a probably about a year now.

You know, we've tried to update it some,

but most of the writing happened about a year ago.

Or, at least, it was sort of wrapped up by then.

We submitted it.

We got some revisions.

We made some revisions where we did update some of the literature, but,

you know, there's still some stuff missing.

So I'll just give you a few real quick kind of take-home nuggets from this.

So first thing is, as we reviewed all this literature,

one of the things that we found over and over again across methods

that worked well in the domain is that if there is structuring your data,

and you impose a parallel structure in the neural network,

you're likely going to be much better.

So you're going to be able to succeed with less data

if there is meaningful structure and you impose it on the neural network.

So there's a couple types of neural network shown here.

So one of the multilayer perceptron, where every input ends connected

to every node in the hidden layer, and every node in the hidden layer

and sub connected to the output.

We can contrast that with the neighbor.

So here this is accomplished on neural network.

So each input node only ends up, in this case,

it's only involved with this neighbor.

So it's only connected to the neighboring nodes,

and you can see like for these nodes,

instead of here having three edges here, they have two edges.

And this sort of reduction, as the neural networks get much larger,

is particularly dramatic.

And so, this can really help you if you've got modest amounts

of data not to have to learn.

If you don't impose the structure,

you're forcing the neural network to learn that structure.

It's already used some of the information of your data.

And so, you're not using it as effectively as you could.

So if the neural networks, if convolution is important,

if the neural network structure can be matched to the data,

I highly recommend doing it.

One caveat I would say, so I see this sometimes is that people will say, hey,

convolutional neural networks are great.

I'm going to use a convolutional neural network,

but their data actually don't have the relationships that are required

for the convolutional assumption to hold.

So like, you know, you don't have this ability to assume that input

in its neighboring input are related.

In that case, it might be hazardous.

So the convolutional neural network is going

to be potentially working against you.

So I would say impose it if meaningful.

If not meaningful, you can try imposing it, and if it works,

it's going to tell you you probably have too many parameters.

But in reality, you probably should try to match your structure to the data.

I think you're going to get much better results, and consistently,

what we saw across the biomedical literature was people who did match

that structure got better results.

Another thing to think about, there's something called data augmentation.

We saw a lot of really successful examples where it was used.

Imagine, in this case, you're looking at this slide

and your deep neural network is looking at the slide.

It has an orientation, right?

It is a top and a bottom.

But the top and the bottom are really not that meaningful

for the content of the slide.

So if you, with your neural network, sometimes feed it in in this orientation

and sometimes feed it in in this orientation,

you can help the neural network avoid biases in the data

that are sort of maybe scanner based.

You know, if there is sort of a distinct top-to-bottom pattern,

it can help to avoid that and hope to focus the evaluation on the parts

of the example that are actually relevant.

So if you can augment your data by making these sort of transformations

that don't change the meaning of the data, but they actually change the sort

of inputs of the neural network, we see that a lot of things that work well.

Another thing that we found was pretty effective was multitask learning.

So, you know, we would think

that learning two things is harder than learning one thing.

So imagine like you're going to learn to ski

and you're going to learn to snowboard.

You think, well, I don't want to have to learn to both ski

and snowboard at the same time.

That'd be really hard, but with these neural networks,

because the input featured can be shared,

you can actually benefit from doing multiple things at once.

If you imagine, okay, I first have to learn how to get to the mountain, right?

Because your neural network is working on really raw features.

Well, if you got to get to the mountain,

you need to do that to ski or to snowboard.

So if layer 1 is the get to the mountain layer, right, those are shared.

And so, the more that you can do multitask learning when you have shared tasks,

it's actually going to help you, again, use your data more effectively.

It's going to be a continuing theme, I think, in biology.

We tend to be at the very lower limits of these types of methods

that we want peak performance.

And so, things that help you use your data effectively are going to be critical.

Okay, so these are some tips.

Here's some caveats.

So we've walked back.

Now we're going to start walking our way back

from like deep neural networks that are perfect.

So this is a really nice paper from Ian Goodfellow,

where they sort of start to talk about --

well, they talk about adversarial examples.

So what's an adversarial example?

An adversarial example is something that we can add with very low magnitude.

So in this case, 7000ths of the magnitude of the other, and yet,

produce a substantial change in the neural network calls.

So in this case, this is an image that I perceive to be a panda.

We're going to add this, and the neural network also perceives it

to be a panda with about 60% confidence.

We're going to add 7/1000ths of this pattern.

So this pattern, to me, doesn't really look like much.

To a computer, it really doesn't look like much either.

In this case, they say it's nematode, but only 8.2% confident.

So it's not really that confident, and yet when you do this, in this case,

the neural network ends up with this image, which to me,

it looks a lot like the input, right, because we didn't add much of this,

but the neural network will say, hey, this is a given, with 99.3% confidence.

And so, this type of thing you should really be aware of.

There are ways to attack these neural networks.

So if you're thinking about, hey, I'm going to develop, you know,

a system for the clinic where, you know,

we're going to guide patient therapy based on the outputs

of these neural networks, I would strongly, you know,

this might be of future we imagine,

but it's probably a future that got a human somewhere in the loop,

because if you put this into your health system and you get some bad actor

who comes along, they might be able to start throwing

in these adversarial examples that, you know,

produce essentially no perceivable change.

So you'd look at this you'd say, oh, there's nothing different,

but the neural network's [inaudible].

So it could be done, you know, in targeted cases.

This is, you know, I potential problem.

Cool bit of trivia, this isn't just a thing you can do on a computer.

You can do this in the real world.

So this is a sticker.

There's a few of them in this really nice paper from last year.

So this is a sticker.

If you put the sticker on a table, you can kind of see why it would do this.

It'll make the neural network think that is looking at a toaster,

but this is the neural network looking at what I perceived

to be a banana on the table.

The neural network correctly classifies this as a banana.

Throw the sticker next to it.

Now the neural network's very confident it's looking at a toaster.

This is pretty clearly, you can see why a neural network

with think this is a toaster, but there are number of these images

that really just look like noise, and you know,

they can still fool neural networks.

So this is something that can also touch on work in the real world.

So I would say, I think of these adversarial examples,

particularly as we think about the kind of clinical impact of deep learning

and the way we're going to introduce it into workflows are things

that we do not want to forget about.

I think they're particularly important.

This is kind of cool.

So normally, if I'm giving this talk, I ask people what they see,

but in this case, you know, I can ask,

but you're not going to be able to answer this.

So this is from February of this year.

This is a nice paper where they -- this is an adversarial example.

It doesn't just for fool neural networks.

It also fools humans.

So this is actually -- it looks to many people,

and I actually initially perceived it this way, too, as a German Shepherd.

However, the original input image here is actually cat,

and it's got this sort of noise pattern added over it, and just, you know,

the noise pattern is designed to fool a neural network.

It also fooled my neural network, right?

So the input's a cat, and I perceived it as a dog

with this sort of like perturbation.

So, you know, we're going to get to a place where you'd imagine putting a human

in the loop will solve all of our problems.

Potentially not.

If we could come up with adversarial examples that make small changes

but that also fools humans, now we've got a world where, you know,

we have to think about not just the security of our data and not just sort

of putting a human in the loop, but actually,

the security of our machine-learning models, as well.

So, you know, in a clinical context where you're going to start treating people,

I think it's important to think about this one's also, I think,

particularly important in the context we work in.

There's a misunderstanding in the field that neural networks,

because you're training them sort of ab initio, have no bias.

This is not correct.

So I won't go into this in a huge amount of detail,

but what these guys did is they trained using ID pictures from people

who had been convicted of a crime and not convicted of a crime

to generate neural a network that would look at an ID picture.

So without any sort of prior information about conviction or anything.

So look at the ID photo and tell you this person is likely

to commit a crime or not.

And they said, oh, this is perfect because, you know,

this is just criminality, right?

It has no bias whatsoever, but in fact, it has,

because you're training based on the criminal/noncriminal labels,

this actually has every bias

that in the case the judicial system of the country has.

And so, and in fact, has all biases.

We just want to make sure that we don't misunderstand it in

and up laundering our biases.

So I think that was the mistake that these authors made.

Another thing that's sort of worth knowing, you know, as this is a fun example.

There's a bunch of these now.

There's actually a puppy bagel Twitter account,

if you feel like following it with a number of these samples,

where neural networks and humans even can get confused.

Because this is such Chihuahua.

This is a blueberry muffin.

You can see why they look a little bit similar in many cases, and, you know,

as a human, if I told you I really want you to tell me if I'm looking

at a blueberry muffin or a Chihuahua,

you can go in and look at these things a little bit more carefully.

With the types of neural networks

that we're seeing used right now can't they can't go back

and take that second look.

And so, at the moment, you really need to go in and explore the errors

that your neural network makes at the level of kind

of this type of sort of inspection.

I like this quote.

This is actually from a post-doc in my lab replying

to a blog post from someone else.

I was reading this blog post, and that I was reading the comments section,

and I saw this thing from my post-doc, and I was like,

oh, this is really insightful.

So he notes that, you know, if we start thinking of kind

of statistical uncertainty and P values, the challenge with big data is often

that the collection biases, so how we collected the data.

You know, we can get a significant P value,

but the data collection biases are going

to be potentially a major driver of these.

And so, I agree with him, that sort

of thinking how we collected data becomes increasingly important.

One more thing to note.

There's a lot of work right now talking about explainability.

The challenge that I see is that very few people use explainability

in the way that, as a biologist, I'd like to see it used.

So you can think about the joke of why did the chicken cross the road?

You know, as soon as a kid learns that there's like eight answers

to this question, it's the riddle you can never solve, right?

Because they're going to keep telling you the answer you didn't give,

once they've memorized more than one of them.

And so, for these neural networks,

I think the explainability challenge is similar.

We can go to a single neural network, and we can now do a pretty good job

of saying why that neural network made that prediction on that example,

but as a biologist, what I'd really like to get is

like not why did this neural network make this prediction on this example,

but what is actually generally true about the system the data are generated by,

and that's an area that, you know, I think has received much less focus,

and I think we're much further from.

And finally, you know, I think one of the things I've seen in my own experience

and I'm continuing to see sort of coming out of other labs is that, you know,

we think that deep learning, because it constructs features, can do everything,

and so, you know, we can just have some deep learning experts solve all

our problems.

One of the things that I'm seeing is actually it looks

like domain expertise matters more and not less, and so,

models that are predicated on, you know, a single deep learning expert

or a team of deep learning experts pushing this expertise out to the community

and these models out to the community where they get employed,

I think is going to be less successful than a model than a model that tries

to bootstrap some of this expertise inside these really --

these labs or groups that have really strong domain expertise, right?

I think that has some implications for how we train students

and I think is going to be important going forward.

I think the fields that do a better job of integrating this throughout,

this type of technique throughout training and to a broad swath of members

of the field are going to be advantaged by that.

So back to our initial question.

Deep learning, what's it good for?

I'm pretty convinced the answer isn't absolutely nothing.

However, I'm also pretty convinced it's not everything.

I'd say I'm probably a deep learning optimist.

I fall in, you know, somewhere kind of on the upper half of this,

but one of the things I noticed about the article that was

in Nature earlier this year is I think I'm mostly quoted in the caveat section.

So even though I'm, I guess, in the upper half,

and probably one of the more pessimistic people up there.

So I guess I'd say I think it's good for many things.

It's not good for everything, and I think you have to be careful

about how you use it, and I think having domain expertise on the team has,

from our experience, always been a huge benefit to the project.

So with that, I want to just think the people

in my group who made this possible.

The work that I presented today, the exploratory work, was primarily Gia Tan,

who is a grad student in the lab who's graduated, and [inaudible] Jones,

who was a grad student in the lab, who graduated,

who did the work on the privacy-preserving neural networks,

and I don't know if there are questions with this or how that works,

but I'd be happy to hang around for a little bit and chat if people want to.

>> Great, thanks very much, Casey, for a fascinating

and very engaging presentation.

We're almost at the top of the hour.

So we have maybe an opportunity for one or two very quick questions.

If you're on the WebEx, just raise your hand.

While we're waiting for that,

I just want to comment that your unlocking clinical trials data,

that seems to me to have enormous potential, given the reluctance of researchers

to basically make those data available for numerous reasons.

But, you know, protecting patient privacy being the one you're addressing there.

I'm curious.

You showed that example with blood pressure.

Have you looked at any other factors and been able

to demonstrate similar results with the synthetic data?

>> Yeah, so for that, we actually looked at the Sprint trial

for a particular reason, and we ended up participating in sort

of the New England Journal of Medicine's data sharing challenge,

or we intended to participate, but when we got the data sets,

there was actually a lot there to work with.

We were kind of disappointed by the amount that had been released.

And so, this was actually our attempt to do something interesting with the data,

even with the sort of limitations that they had.

So we haven't pushed this into another domain yet, but, you know,

I will say the way it currently works pretty well

as if you have continuous measures and you have a modest number of them

and a relatively large number of participants.

So if you could imagine sharing data from tens of measures if you have tens

of thousands of participants.

I think this can be improved dramatically, but there's some methodological work

that needs to be done to help sort of expand the number of variables

that you can reliably generate and also to, alternatively,

reduce the sample size required to generate them.

>> Great, thanks.

Gerry Lee online has got a question.

Gerry, go ahead.

>> Yeah, so to make the deep neural net more useful, obviously,

the first step is to make correct labeling and predictions.

Once you do that, I think the next step,

once it's outperforming other machine learning techniques,

and even outperforming humans,

I guess the next step is asking why the black box made the correct labeling

or correct prediction?

So rather than just being a black box,

can we do well at the interpretable neural nets?

Do you have any thoughts around those lines?

>> Yeah, so I think there are some things that can help

to make neural networks more interpretable.

So I think the more the structure of the neural network matches your problem,

I think that can help a lot.

You know, we're working on some approaches to use sort

of multiple knowledge bases to try to interpret hidden layers at the same time.

You know, there's some other nice approaches to try to identify

which factors led a neural network to make specific predictions, but again,

you know, I think understanding --

I think we're at the stage now

where we can generally understand why a neural network makes a certain

prediction on a certain sample.

What we struggle with is then extracting that out to generalities.

And so, I know there's people working on it.

I'm excited to see what happens.

I think, there's just a lot still to be done.

So I don't have an answer for you on how to best do it,

but I'm excited about where we're going.

>> All right, I'm afraid we've overstayed our welcome in the room

that we're broadcasting from here,

and we're going to need to cut the conversation short.

Casey, if you're fine, I would recommend --

I know there are a couple more questions online to have people just reach

out to you directly to follow up and continue the conversation.

I just want to -- great.

I want to remind people our next presentation will be on July 18th

when our own Dowd Mersonmen [assumed spelling] will be presenting

at the speaker series.

So once again, thanks very much, Casey,

and thanks to everybody who's participated today.

Talk to you next time.

Yt is on video

Thứ Hai, 2 tháng 7, 2018

Waching daily Jul 2 2018

Không có nhận xét nào:

Đăng nhận xét