>> All right, let's get started.
So, very happy to have Dhanya Sridhar here today.
She is a fifth year computer science PhD student
at University of California, Santa Cruz.
She's a student of basic contours.
She's interested in statistical relational learning,
causal inference and discovery,
computational social science and biology.
>> Great, thanks for the introduction. So I'm Dhanya.
I'm going to be talking about my research today on
Structured Probabilistic Models for
Computational Social Science.
So with the amount of social media data
and Web logs and logs from applications,
there's this great opportunity for
computational methods to make inferences that
can really help answer some social science questions.
For example, on the Web you might
have logs of users from applications.
You might also have sort of longitudinal data
of user behavior and this can help us answer
questions like characterizing people's moods
or understanding their various behavioral factors.
And you might have dialogue and interaction data from
social media with people talking and attracting.
And that can help us understand
people's attitudes towards one
another and understand how
new links might form in social media.
But these kinds of inferences are
very different than standard machine learning tasks,
in that these inferences are often interrelated,
across users and across timestamps,
because there's a structure to these kinds of domains.
And then importantly, you
get many signals for any given user
or you get different amounts of information
for different users and you have
heterogeneous data that you
want to combine in a principled way.
And then also,
we often need to go beyond prediction in these kinds of
domains and actually discover
new domain knowledge and make
causal inferences to be able
to help experts understand the domain better.
So my technical contributions in this space has
been to develop probabilistic models
that can address these kinds of challenges.
These methods can exploit the structure in these domains,
fuse signals of different reliabilities,
support causal inference from observational data,
and discover new patterns.
So as a road map for this talk,
first I'm going to dive into a very simple sort of
motivating example that can illustrate
the needs for all these sophisticated techniques.
And after that, I'm going to dive in to each of
my contributions in more detail and show how they apply.
So first a problem that we can all relate to,
recent issues come up in news all the time and
an important problem is to
understand attitudes and how
users feel about these topics.
So we want to understand stance and
a recent issue has been net neutrality with
key actors like Ajit Pai and Eric Schneiderman.
And one dataset for this might be social media.
So on these social media sites,
you'll have these top actors themselves where you post.
So Ajit and Eric will talk
about their viewpoints and then regular users
will retweet or reply
and support or disagree with these top users.
So you have a supporter of Ajit Pai who's saying,
"No, go with it, or go Mr. Pai".
And then you have people replying to one another as well.
So there's a debate going
on on one of Eric Schneiderman's tweets.
So the standard approach for
then modeling text documents of this,
will vary anywhere along the spectrum between
unsupervised techniques all the
way to fully supervised techniques.
So methods like topic modeling,
will try to partition these words
into sort of support and against
topics and understand how documents
and tweets will fit into these two partitions.
You can use sort of pre-trained sentiment analysis types
of dictionaries like Word2vec might give you something,
information about the semantics of these words.
And then there's manually annotated dictionaries
as well that tell you
how word score against
the whole bunch of different kinds
of categories like LIWC.
Or you can get manually obtained annotations
and then train a fully supervised model
to understand stances.
And these methods can go pretty far.
So we might understand that
Ajit Pai and his supporters have
a strong probability to be for net neutrality,
whereas Eric Schneiderman is against
and we might, with less confidence,
but still correctly side
the two people who are debating on
Eric Schneiderman's post as
being one of them
is for net neutrality and one of them is against.
But if we take a closer look,
it turns out that one of the people who seemingly
was supporting Ajit Pai
was writing a very sarcastic tweet.
So it's very clear when
we've read the tweet that he's saying,
Thank you for having the bravery
to stand against giant corporations.
So this is very
sarcastic and this is not something that text
alone is going to be able to cite correctly.
So how can we improve upon mistakes and errors like this?
So, one thing is to take a step
back and realize that there's
a lot of dependencies in this network.
So, we can see that this sarcastic user
at some point had liked
and/or retweeted one of Eric Schneiderman's posts.
And so we can use
that support relationship between
the two of them to enforce
that consistency across predictions to say that people
who will support one
another should share the same stance.
The other thing is we might have a lot of
different data sources as well to help
us in this stance classification problem.
So, Eric and Ajit might have written articles
themselves op-eds for major news sources
or they could have had articles written about them.
And these are kind of
high signal information sources
that we'd like to use in our problem,
and for regular users we might
have mentions and retweets and hashtags.
So we want to combine these sources of
varying reliability in a principled way.
Then finally, there are a long range sort of dependencies
in structures like this that won't
be obvious to us as humans.
And so people that share the same,
people that retweet one another,
share the same stance might be
something that's obvious to us.
But we'd like to discover new patterns
and we might end up
discovering a much longer-range pattern that says,
Users who retweet those followed by
top users actually share the same stance.
So you might find this multi-hop sort of
complex path that might not be obvious to us as humans.
So, in this talk,
I'm going to talk about how I've developed
methods that can address
these various needs of computational social science
and it looks like some of
my color is off on these slides.
So, for each of these contributions,
I'm going to focus in on
a specific problem, a case study,
but we'll see that there's these approaches
can more broadly and generally be
applied to a lot of computational social science problems
and they work in tandem.
So, the first thing I'm gonna talk about
is online debate and discussion,
and there we'll see
patterns and templates for exploiting structure.
And then, to look at how methods can fuse signals,
I'm going to look at detecting
indicators of alcoholism relapse from social media.
And then lastly, I will
talk about supporting causal inference
and discovery from observational data
and we'll look at a use case in mood modeling for this.
So, like I said, I'll be
focusing on individual problems but
these approaches are more broadly applicable
to across a whole variety of
computational social science problems,
and we'll see that throughout the talk.
Before I dive into my work,
I'm going to give a little bit of background
on tools that I'll be building my work on.
So, one way to represent
relationships between users or between entities,
as well as talk about
constraints across predictions is with logic.
So, from our motivating example,
we might encode a constraint that says that people that
retreat one another should share
the same side or the same stance on an issue.
And logic is a powerful language for this,
but you might get conflicting observations or
conflicting evidence and this happens often
with data and this is a big problem with logic.
So, on one hand,
we see that we get a correct instantiation of this rule
where Eric Schneiderman does
share the same side as someone that retweets him.
But Ajit Pai may have retweeted
Eric as well and now we get
an incorrect conflicting pieces of evidence.
So, one of the main problems with logic is that it
leads to these infeasible states
where there's no assignment,
and this is
a combinatorial optimization problem which doesn't scale.
So, my work uses probabilistic soft logic.
And I'm going to just give a quick overview and,
for more details, you can look at other work.
And here, we first relax these variables to
be between zero and one rather than take
on yes or no values.
And when we do this,
we also have to relax
our understanding of whether a rule is satisfied or not.
So, we apply one particular relaxation in
this language and like regular logic,
we have this property that if the rule is satisfied,
there's no penalty that we
incur for making a particular assignment.
But given some assignments if the rule isn't satisfied,
then we get a penalty that looks like this,
it turns out to be a linear function
of the variables and we get
this from a specific relaxation
of logic called the Łukasiewicz t-norm.
But I won't go into detail here.
So, putting it all together,
given rules and a set of inferences that we want to make,
so in this case, the stances for all the users,
and given some observations,
the goal of inference in
probabilistic soft logic is to come up with
a set of assignments that
minimize all the soft penalties to these rules.
And the form of that inference
turns out to be a convex optimization problem,
and so you can do inference exactly and it's fast.
So this is the tool that I'll be
building off in my work today.
And so, I'll go into the first part of the talk
on templates for exploiting
structure in social science problems.
So, we already talked about
the need to understand stances on issues.
So, on social media,
people often debate and
have discourse about various topics that come up.
And in order to understand ideologies and biases,
a key first step is to
understand how people feel about topics.
So, online debate forums,
many of them are on the internet and
they're an important dataset
to be able to study this problem.
So if we zoom into a specific thread,
you might get this structure where
there's a topic and one user
will initiate a discussion and other users will reply.
So, we have two people who reply to the person who
has initiated this thread and they're
both against the initial user.
And then, the initial user might
right back at the end of it all.
And so, the text actually gives us two signals.
One, it tells us about how people feel about the topic,
but it also tells us how people feel about one another.
So, it gives us some indication of agreement
or disagreement and basically,
the polarity of the interactions that people have.
So, in the stance classification problem,
we want to understand or infer
a stance for every single user in our network.
And in this particular instance,
we're going to treat it as
a supervised classification problem
where the labels are
either self-reported by users on
these forums or we get them from annotation.
Before we get into modeling,
there are two important questions we have to answer.
So, the first one is
about the right level of
granularity at which to aggregate this data.
So, in one hand,
we can say, we want to look at users.
So, we'll call that the user-author level.
And what we'll do is for people who author posts,
we'll aggregate or concatenate
all their text and we'll get feature vectors that way,
and then, if we don't have labels
at the level of the the user already,
the way we would get it is by looking at
the majority label of their posts.
The other way of aggregating information
might be by seeing that
posts are the units that we care
about, identifying stands for.
And there, we'll treat
each post individually when we want to
get features and when we want to get labels.
If we don't have it for the post already,
then we'll just apply the author label to the post.
So these are. Go for it.
>> So, how much of a problem is it
that classifying things into plus-minus,
sides with, doesn't side with,
but in the real-world,
everybody is good as nuanced and a lot
of the disagreements you see on
online forums is where somebody says,
X, and they say, Oh, you're conservative.
And you say, No, actually, I'm a liberal,
but I've drawn the line like this and this seems like
this approach couldn't handle that.
>> Right.
>> If this was a ground-truth, plus or
minus that you're looking to find.
>> Right. So there is a lot of work
on doing unsupervised or
semi-supervised or weakly supervised
stance to classification as well.
So, this idea of applying structure or
exploiting structure in the problem can definitely,
generally, be applied to unsupervised techniques as well.
So, in this case, I'm focusing on, yes,
someone has cited themselves as pro or anti,
and this does actually
come up quite a bit in that in a lot of
these online debate forums people do
side themselves in a binary sort of way.
So, the supervised methods still apply,
but this idea that I'll show of
exploiting structure will still
apply in the unsupervised settings.
Okay. So, this is this first question of aggregating
information and the second question I think,
kind of starts to get to your point of what are
more nuanced ways we can handle this text.
And I should have mentioned
that in the sort of previous question,
a lot of prior work has
treated posts as the unit
of interest and they've done post stance detection.
Whereas we've asked this question of what's
the appropriate level of modelling in the problem.
So, a lot of
previous work has made this assumption here as well,
that replies should be an indication
of disagreement in citing stance.
So, of course, that's
probably true in a lot of online debate forums,
but exactly like you said,
people will say things like,
I disagree with this aspect of your argument,
but I do agree with this.
And so that, the sort of
more simple or naive way
of treating disagreement would
not be able to capture that.
So, we asked this question of
is it more appropriate to actually
model the polarity of
the replies jointly with the stance?
So, I'm going to build up the models for this.
So, the first kinds of
intuitions that you can model is say,
that I'm going to build
a local classifier of text using logistic regression,
and I'm going to use
the class probabilities to
predict my global stance variables.
So I can say that my text classifier
gives me my final label.
Now building on this, we can add the. Go ahead.
>> A couple of slides ago,
you were laying out these two [inaudible] a problem
with the user level or the post level,
did you pick one of those or.
>> We're going to evaluate
all these different modeling choices.
>> Both ways.
>> That's right. Yeah.
>> This looks like user level here.
>> That's a great, yeah.
So the figures are going to look like
they're at the user level but you
can substitute users with
posts as well and these rules will still hold.
Okay, so the the naive collective
classification assumption I was talking about where
you look at a reply and you
assume that it's a disagreement.
And the way that you'd model this is to say
that if users disagree
then they should have opposite stances.
But the more sophisticated thing to do would be
to come up with
a text classifier that identifies disagreement as well.
And then you can include
then more sophisticated rules that
propagate these two different inferences.
So you can say that if people agree,
they should have the same stance or that
if two people have
different stances they should disagree.
And you can come up with all combinations
of these rules and we did.
So I'm just going
to show a subset of the rules that we used,
but this is it's all in the spirit and this flavor.
So we evaluate all combinations of
these modeling choices that I talked about on
two different online debate data-sets.
So 4Forums and CreateDebate
and we got four topics from each
and they have about 300 users in
each topic that author about four to 19 posts.
And so to recap the author level
was where we would aggregate features for
users and apply the majority post label
and for the post level,
we're going to get separate features for
posts and we'll apply the authors label if we
don't have labels at
the post level already. So I'm going.
>> In the second case are you
getting the authors label
from this process or does it come from somewhere else?
>> So in 4Forums we have annotations where
people the Turkers did it at
the author level in CreateDebate,
people have self labelled when they write a post,
so it's at the post level.
So we kind of have to do the cross-product,
so for CreateDebate, when we want author labels,
we have to take
the majority post label and
vice versa for 4Forums when we want the post label,
we have to take the author's label.
>> [inaudible] democratic versus republican type of thing
because otherwise what is the semantics of aggregating
over [inaudible] same author.
>> So these are stances on a particular topic,
so aggregating by taking the majority would be to say
that there's some sort of consistency about the person,
so when they write multiple posts on the same topic.
>> So is it all on the same topic?
>> Exactly.
>> I see.
>> Yes. So the more complicated question
is then to aggregate,
to understand something about
ideology and thin attractions between topics,
but here we're saying we're going to look at per topic.
So, I'm going to show
some findings and I'll
focus on a specific topic from 4Forums,
just for ease of exposition,
but these same trends held across
all the different topics that we evaluated on.
And the first finding was that
this granularity of aggregating
information does have ramifications.
So we evaluated on two tasks on one of
the tasks was to predict the stance of
users and the other was to predict the stance of posts.
And it turns out that the best-performing model for
both tasks is this joint model,
which was jointly modelling the polarity of
the disagreement and agreement links
and was modelling at the author level.
And that's maybe not so surprising for
the author stance task because
you're predicting the stance of people, that's fine.
But it turns out that even when
you're trying to understand the stance of posts,
aggregating at the author level was important.
>> You said that when people reply,
you're treating that as a disagreement, right?
>> Yeah. So that would be corresponding to
the simple collective model in this case.
>> It could also
be and sometimes is the case it is the case.
So maybe it's true mostly it's the case
that we're more motivated to reply.
Sometimes reply and say,
I agree with that and that was a great point.
>> Yeah, exactly.
>> Does that just show up as error in
your models or do you have some way of
identifying when a reply
is an agreement, not a disagreement?
>> So the joint model
is exactly trying to model that from text, right?
I'm going to as well as inferring stands,
infer whether people are
disagreeing or whether they're agreeing.
And then if they're agreeing,
it uses a different set of constraints or
dependencies to enforce consistency.
So, if they agree,
it'll say people should have the same stance.
So, and we're evaluating here.
This is all accuracy for a stance, and not disagreement.
>> So, if you're
looking at the structure of the [inaudible] saying,
my prior is that means disagree.
>> The only model that makes that assumption
is the simple collective model, not the joint model.
So, our contribution was to come up
with more sophisticated methods that
can jointly sort of
model the edge and node labels in this graph.
>> Where are you getting out of edge that if you're
just looking at text anywhere?
>> So, all models use the text
as a local feature for sounds.
The simple collective model does
not use the text for disagreement,
but then the joint model out also
uses the text to understand
something about how people support,
or disagree with one another. Do I answer your question?
>> That would for that one.
>> Okay. So, going back to this,
the method that assume that replies were
an indicator of disagreement,
actually that can be
a harmful assumption to make in certain nuanced topics.
And we had this finding in multiple different topics,
and I'm going to show an example in one specific topic,
gun control where we have this post-reply pair,
I'll give you a second to read it,
and the stances in
this post-reply pair were correctly
predicted by our joint model,
whereas the simple collective model
was not able to capture
the stances correctly because
exactly like you said earlier,
there is nuance here where people might say,
I agree with you on this,
and this, but I disagree on these other things.
And they can still have the same
stands under these different conditions.
So, it is important to model this nuance,
and the joint model is able to
more powerfully capture these patterns.
And so, the takeaways that I looked at it in
this specific problem where we
have disagreement agreements sort of relationships,
but this general property
of being able to use similarity or
dissimilarity to
propagate information across predictions is
a useful and general template that shows
up in many social science problems.
So, the next part of my talk is going to look at fusing
information and
combining multiple signals for prediction.
And so for this, we looked at Twitter
for people that tweeted
about going into their first AA meeting.
And for these users,
we gather tweets before they said that,
and we gather tweets
after for up to 90 days, and actually beyond,
but I'm going to look at
the 90-day recovery mark because that's
typically they use that as a benchmark for AA.
And to understand what happens after 90 days,
we look for very clear indications in the text
for these users that say that
they maintained their sobriety,
or they've continued to stay sober,
or that they've relapsed after 90 days.
So, this is how we acquire
these labels of indications that they relapsed,
and for these users we
also go out and collect tweets of their friends,
which on Twitter, we're defining
friends is people that I follow that follow me back.
And we, with all that set down by looking for people
who co-mentioned one another,
and often retweet one
another's posts because we think that's
a stronger indication of
ties between people than just following.
So, we have these kind of egocentric networks
at the end of it where we have these users,
who we care about their relapse or not and then we
get a network of their friends and their their tweets.
And here, the main intuition that we
kind of want to capture is how
our friend's behavior kind of correlates
with our own behavior.
So, here we have
contrasting sort of negative and positive interaction.
So, in one case, the person
who's attending AA might say something like,
I know I feel like I want a drink,
and my friend might enable
that behavior by retweeting something very
positive about alcohol versus I
might say something about my sobriety,
and my friend might reaffirm that,
or give me some positive affirmation,
and that might be supportive behavior
that is a good predictor of my ability to recover.
So, with all these texts,
there's actually multiple language
signals that we can use.
So, we first came up
with a dictionary of words corresponding to alcohol,
and words corresponding to sobriety,
and here we were able to use some domain knowledge
from our collaborator at UMD did this,
came up with these two dictionaries.
Now, we can model this intuition with
these two relationships uses
alcohol word, and uses sober word.
The next thing is that we care about our affect and
sentiment as well, because
someone might be talking about sobriety,
but they might say sobriety sucks.
So, for affect we turn to LIWC,
which like I said before,
is a manually curated dictionary
that has come up with
a whole bunch of semantic categories,
and maps, words to those different categories.
And we look at Sentiwordnet,
which again describes a kind
of positive and negative balances
for a lot of different words.
And we're able to capture
these relationships by using PosAffect and
PosSentiment as the relationships.
If it is just do anything like Clyde,
am not drinking versus Clyde,
am drinking so like
a reason which is even beyond that, right?
Or I mean aboutit, is this problem or?
Yeah, I guess I've given very contrived examples
of these things, right?
So, you're absolutely right that there's a lot of things
that will be captured by these simple language signals,
and this is exactly why we
need to take into account the context and
the structure, because language
is only going to get you so far.
But you might have just a handful of
users whose language signals are fairly clear, right?
And structure helps us
propagate that information across other users,
and so that's really
the statistical strength that we want to leverage here.
So, the final thing again, very contrived example,
but there are more nuanced words that might be
associated with alcohol or sobriety
that just coming up
with a dictionary might not be able to capture,
and for this we use seated LDA.
And seated LDA is a way to use domain knowledge in
topic modeling where you can seed certain topics
with particular words that you
expect to be associated with those topics.
So here, we seeded the alcohol and
sober topics with the dictionaries that we came up with,
and the benefit of
doing that in addition to being able to use
some domain knowledge is that now you have
two topics that you can very clearly
say that these are the ones I want,
I care about, and I want to kind of use.
And we can use
this information with both tweet and user topics,
again aggregating at these different levels,
kind of comes up again.
So, the kinds of signals that we'll model, again,
they should be familiar
from the previous part of the work,
where we were trying to model something
at the local level.
So, here, we're going to try
and understand if the tweets of
a user tend to
go towards a lot of alcohol-related topics.
And if they do, we might say that
there's a probability that they won't recover,
and vice versa for recovery.
And again like the previous section,
I'm not going to go over every single rule that we use,
but I'm just going to give you a flavor of the kinds of
dependencies that we model just to be able to
fuse the different language signals together.
So, using those alcohol words
and sober words that we defined,
we might, sorry, this is the LDA rules.
So using the alcohols of the topics that we found,
we might want to encode something that says,
If my friend has
a propensity for going towards
these alcohol topics and they're positive about it,
that might not be a good indication
about my recovery because that might mean
they're sort of engaging in a lot of enabling
behavior and same for sobriety.
So, if my friends tweets
positively about sobriety fairly often,
then that might be a good sign for me.
And those same intuitions we can
capture by using
different combinations of our language signals.
So we can use the affect signal from LIWC and we can use
the alcohol sober words that we defined to capture
the same sort of negative versus positive interactions.
>> Which are the topic of today,
just lumping everything together or like looking
just to buckle user?
>>So in this case, this is on a per Tweet basis.
So they're not lumping together.
But, the thing that I want to point out is,
in this full model,
we actually considered all combinations
of these things to understand
how we can combine different sentiments signals and
different topics signals both
at the user and tweet level.
So another important takeaway here to really highlight
is that this is a nice kind
of unified framework where if you have a bunch of
different domain knowledge and
different models that you want to evaluate,
you can kind of encode
different bits of intuition to
understand which actually holds out better in the data.
>> Right. So the tweet topic
in the first line refers to U1's tweet.
So like U2 is retweeting U1 and U1 said something
positive about sobriety. Is that right?
>> I think it's actually U2's tweet.
Sorry. So, yeah
U1, your'e right. Yeah.
>> Okay. The second one.
>> In the second one we're looking at what
the friend is saying whereas in U1 we're
looking at what the user is saying. Yes.
>> Are these for separate models or they're both?
>> They're all in the same model, that's right. So.
>> The way I'm imagining this is
that there's kind of like trying and
all these new constrains and then you get better feeds.
So you may be ahead, you get
worse feeds, you drop constraints.
So, how are you not worried about overfeeding?
You know, just kind of starting
to kind of create your own patterns.
>> So, one thing that I
have not highlighted a lot in
this particular work is that,
in this probabilistic soft logic framework,
we also associate weights
with the rules that we introduce.
So these are first order rules and we
associate a weight with them.
And the weight gives us some sense of
relative importance of satisfying
that rule versus other rules.
And that are basically the parameters that we can
fit from training data the same way that we might fit.
>> I'm not sure so what I'm asking is like, so,
I imagined that you did not come up with the list
of all of these rules or did
you just sit down and come up with
all the rules and
you throw them at the system and you're done?
Is it how this works or is it more iterative?
And so my question was like months it's intuitive,
then you just kinf of have problems playing with it then
sometimes you get a feed and
sometimes you get a worse feed, better feed?
>> That's a good question. So, in that case,
so typically what we do is,
we might come up with a lot of rules and the weights,
where that comes into play is that,
you might not sit there and try
one rule at a time but rather you'd come up with a model,
and then do weight learning to
estimate the relative importances.
And the standard kind of philosophy applies here as well.
Where you want to do this kind of looking at
what happens and tweaking your model and so on,
on a validation dataset
or if you have multiple validation datasets.
That's the best. And in this case,
we kept sort of held out
data that we never looked at before.
>> The data was it from a friend time period
or different users?
>> Users. Just a different set of users.
Yeah. And I mean there's a whole different work on how
do you come up with the right
held out data set and so on, but yeah.
So, then finally, we also want to go back to
that collective idea of how do we enforce
consistency across predictions and here,
we want to encode some notion of homofily that says that,
similar people have similar behavior.
And so, here, to get similarity,
we look at cosine similarity between tweets of users,
and that gives us a score that we can then say,
if similar users either
or both recover or both not recover.
And so like I said,
there are a lot of other rules that I didn't show here,
but this full combined approach was able to
outperform a text only
baseline for predicting relapse after 90 days.
But, I think the more interesting thing is that,
there are real examples in the tweets
of sort of enabling unsupportive behavior,
and that goes to show that this richer model that fused
different patterns in the language
and encoded different dependencies was actually kind of,
getting it real behaviors
and interactions that were happening in the data.
So, you know, someone who is going to AA
might talk about how they
want to drink and friends
might actually encourage them by saying,
you know, like yeah, lets have beer or lets drink.
Whereas we also see people who exhibit a lot of,
friends who exhibit a lot of
supportive behavior towards recovery as well.
So the takeaway here is that by fusing
signals we can combine sort
of sources of different reliabilities,
we can capture more nuanced
dependencies than we would otherwise.
And importantly, like I said,
you might have different models
in your head of how the world might play out.
And you can kind of encode all of these
together or you can evaluate
different sets of model and see which bears out
best on data in a unified framework.
And so the last part of my talk is going to look at
some efforts toward supporting
causal inference and discovery on observational datasets.
So for this, I'm going to
look at a mood modeling dataset,
this was an application that came out of
UC Santa Cruz, where users
can log on and log across a whole range of time,
a whole bunch of different behavioral factors.
So they might reach their mood,
energy, sleep and so on, on any given day.
And on top of this,
they include a text description of their day.
So this is different and unique in that
not only do you have this
standard observational data measurements
of a whole bunch of variables,
you have free-form text going with every single instance.
And we might want to ask a causal question like,
we want to estimate the causal effect
of exercise on mood.
And we're currently focusing on
this particular link because it's
well-validated in literature and so it's a good sort of
gold standard to try and study.
So the standard tools for causal analysis might be
that you first perform matching to
understand for all treatment units,
so in this case we can say that treatment
are those that exercise.
We want to kind of find
control units that are most
similar to the treatment units.
And we can get similarity from a whole bunch of
different techniques including Euclidean distance
and sort of nearest neighbor sort of matching.
So then there are
a lot of different techniques that support
this causal estimation including
doing sort of looking at the average difference
between the control and
treatment group including regression,
and there are a lot of
sophisticated techniques beyond this,
but regression is one of the simplest ones.
But there are sort of
requirements for causal inference to be sound.
And so the first one is that we need to include
all these common causes of both the treatment
and outcome both in our matching and regression.
So, we have these confounding variables
and they should be included in our analysis.
But, we want to avoid spurious associations
and that can come from
including these collider variables,
which both the treatment and outcome cause,
and we want to exclude such variables from our analysis.
The problem is that especially when we have
these observational datasets like the one I showed you,
there are so many unmeasured latent confounders.
There's no way that just from looking
at that observational data alone,
we could have measured every single confounder.
So, the force push which
is ongoing efforts with my psychology and sociology
collaborators at UC Santa Cruz is to use
the text as a way to come up with
proxies for all these
potentially hidden confounder variables.
So, here we're proposing to improve
the matching techniques by turning to LIWC categories,
again to come up
with text-based variables
that are potentially co-correlated
with both outcome and treatment and treat them as
proxies for the latent confounders in
the observational dataset. Yeah?
>> [inaudible]
>> Can you say that louder.
>> [inaudible]
>> Do you mean to say is this like how do you-
>> [inaudible]
>> Avoid-
>> [inaudible] >> The collider, sorry.
>> [inaudible] distinguish the text.
>> From the text,
we're not going to try and distinguish that.
And I'm going to talk, in just a few more slides,
I'm going to introduce a method that can
then give us a tool to better ask,
answer these questions of identifiability.
So we'll get there in just a second.
So that kind of starts to bring us to
this question exactly of what
we're just looking from text,
it's not enough to just kind of
look at co-correlations, right? Because we
might end up conditioning
on potential colliders that
could introduce selection bias.
And so for this,
I'm going to turn away
from standard causal inference
and go towards causal graphs.
And so there's a whole community on
causal graphical models where
there's a semantic here with
the edges that say that a parent
causes or is a direct cause of
the child, which means that the changes in
the parents values will always change the child's values.
And so one of the important uses of a causal graph
is that it gives us
a language and a tool to
answer questions about identifiability.
So it tells us which causal inferences can
I actually make from this data because I
have the correct set of confounders.
It can tell us about
what variables not to
condition on, so that we avoid selection bias.
But the problem is that,
especially in a big dataset
or in a dataset where you're
mining confounders from text,
you may not know this graph at all,
or you might know only parts of this graph.
And so, in the causal discovery literature,
there's a lot of work on finding the structure or
at least parts of the structure
from observational data alone.
So you want to maximally
orient these causal relationships
and one way to do this is by
constraints from the observational data.
And these constraints come from
conditional independence tests run
on the observational data.
And because this graphical model
encodes a set of conditional independent statements,
by doing these tests on data we can basically reverse
engineer parts of the graph or
it gives us restrictions on what valid graphs can be.
And I want to keep emphasizing that of course you're
not going to be able to find all the causal edges,
not all of them will be identifiable,
but you basically will
get some un-directed edges and some directed edges.
So here we recently had a paper
that we cast this problem
of discovering causal relationships
as an inference problem.
So for all pairs of variables,
we associate a causal prediction variable
and something called an ancestral prediction variable.
And ancestor here refers
to long and indirect causal links,
so a linear path of causal relationships.
And the inputs to this problem will
be these independent statements
and conditional independent statements.
And from these, we can also infer adjacencies.
Which are undirected edges.
And I'm not going to go
into all the constraints that we used,
but I just want to
illustrate that some of the common ones that come up.
So, the first one that we can
encode as constraints are
finding colliders along the path.
And then by fusing
together this causal prediction and ancestral prediction,
we have rules as well that can model common parents.
And then, there's a lot
of work on how you can use constraints to
characterize ancestral edges and
we use that literature as well to come up with
constraints just from statistical tests
that tell us about the ancestral graph.
So, the point here is not to kind of go into
detail on these rules and what
these constraints mean and how they they arrive,
but the point is to show that with a framework,
like probabilistic soft logic,
we were able to encode
well-understood constraints and fuse
them together in one system.
Whereas previously before, these were done iteratively.
So the constraints would be applied one rule at a time,
and would propagate a lot of
prediction errors and here
we do this as a joint inference problem.
And the application to this,
was in the genomic setting.
So, we were able to successfully
show an application in
predicting gene regulatory networks
and we saw significant improvements
as well as results on being
able to fuse text
in this kind of discovery inference problem.
But I'm not going to go into those results here instead,
I want to look at sort of,
going back to this problem of,
why is this graph important to our initial problem?
And, the main takeaway here is that,
the graph that we infer can be a tool
that we then use to
answer questions about identifiability,
and analyzing this data can help us to understand
what spurious associations might arise
by conditioning on colliders, and so on.
And understanding the paths can help
us search for confounders as well.
The last thing I also want to just mention that is
an open question right now that we're
looking at in this data set also is,
modeling, sort of the structure between observations,
these entries and users.
So again, this is an aggregation problem and there's been
a lot of work in literature
where they say that, you know,
for a user if I have all these entries,
I'm going to assign them to treatment if
they exercise even once.
But that is maybe
a very poor assumption that we don't want to make here,
we might want to come up
with better ways of aggregating data.
So that's something we're studying,
and there's a lot
of other work that I've done that I didn't go over today.
So, I've also worked on discovering rules
automatically from these kind of relational data sets as
well as applying
all these different modelling patterns
to biological applications.
So I'd like to just quickly conclude with
a roadmap of my future research,
and I'm going to highlight some work from
MSR to just show how
I think I can be a complementary fit.
So, first, I think that there's a great opportunity to
unify unsupervised and structured methods
for text analysis.
So, there's a lot of work on topic
modelling, and word embeddings,
and factorization methods to
understand interaction between people and attitudes,
and here I think there's an opportunity to include
structured prediction methods like the ones I work on
and learn biases on social and news media,
as well as characterize
evolving ideologies and ties between groups.
Then, I think that there's an opportunity to
continue to combine other kinds of data for causality.
So, here we looked at text,
and I think we
can continue to exploit additional sources of data to
discover hidden variables as well as
identify potential outcomes and
treatments from text data,
or social media data.
Also, there's a lot of work on
detecting causal relationships
from text and understanding why
people do things just based on language.
And so I think,
detecting and characterizing reasons
from text data is a very interesting problem.
And finally, there's a lot of work
on understanding spreads and
diffusions of ideas on networks,
and there it's already very common in practice to look at
structure and the relationships
of interactions between users,
and here I proposed to
combine a more comprehensive model of the user.
So, you can use
the language and social media data to understand
that if you're looking at the spread of fake news,
understand that some users are actually
maybe just more prone to being gullible,
or are prone to being influenced by fake news and so on.
And, my work on
discovering rules and interactions
I think can also be applied
to discover new patterns of
interactions that were not obvious to
us before, from the data itself.
So, in a nutshell,
we saw that these methods that can exploit structure
are more broadly applicable templates
in many different social science problems,
and the methods that can then fuse
different signals can help
us capture more nuanced dependencies.
And then, it was important to
leverage new modes of
evidence to support
causal inference from observational data.
And also by discovering models from data directly,
we can help support
better causal reasoning and better reasoning in general.
So, this is all my work.
For more details, and
I'm happy to take questions. So, thank you.

For more infomation >> Kevin Can Wait - I Want To Cook For You - Duration: 2:19.
For more infomation >> Nor'easter Round Two Possible For Wednesday - Duration: 3:24.
For more infomation >> Search Continues For Missing Boater On Lake Natoma - Duration: 2:04. 
For more infomation >> Seacoast officials assess damage, prepare for next storm - Duration: 1:39. 

For more infomation >> Fight for Freedom - Duration: 22:22.
For more infomation >> Emotions run high for Princeville residents still recovering from Matthew - Duration: 2:17.
For more infomation >> Superior sophomore set to star for Husker Volleyball team - Duration: 1:49.
For more infomation >> Chance for light rain or snow showers - Duration: 1:54.
For more infomation >> Scream for the Green - Duration: 21:43.
For more infomation >> Community holds vigil for 10-year-old Marissa Kennedy - Duration: 1:18. 
Không có nhận xét nào:
Đăng nhận xét