Active Search with Complex Actions and Rewards

advertisement
Active Search with Complex
Actions and Rewards
Yifei Ma
www.cs.cmu.edu/~yifeim/active_search.pdf (or *.pptx)
Collaborators:
Jeff Schneider, Roman Garnett, Tzu-Kuo Huang, Dougal
Sutherland
Active Search
Assume: A set of instances; some smoothness structure
Action: query user/environment for label value
Goal: find all positive instances as soon as possible
[Azimi 2011]
2
Active Search
All users
on social
network
Human
inspector
Frauds
©Tricia Aaderud 2012
3
Active Search
A body
of water
Measurements
Polluted
regions
©ying yang 2015
4
Active Search
Search for all positive items with active feedback
Pollution in open water [Scerri 2014]
Localization w/ attention window [Houghten 2008, Haupt 2009]
Public opinion search
Active personalization [Li 2010, Yue 2011]
Hyper-parameter optimization [Swersky 2012]
5
My Research
State of the art:
Euclidean space
point action
point reward
Active Point rewards
search
Region rewards
Point
action
• Σ-optimality for active learning
on graphs [Ma 2013]
• Active search on graphs [Ma
2014]
• Active area search [Ma 2014]
• Active pointillistic pattern
search [Ma* 2015]
Group
action
• Active search for sparse
rewards (in progress)
• Theory of everything ?
6
Active Search on Graphs
Assume: known graph; smooth label function
Task: find all
nodes interactively
Applications: find target emails, public opinion search, etc
Approach:
Exploit+explore
My focus:
What is good exploration? [NIPS 2013]
Incorporate it in active search? [UAI 2015]
7
Good Exploration Similar
to Experimental Designs
Optimal Design [Smith 1918]
Design experiments to optimize some criterion (e.g. variance, entropy)
Blind of actual observations
D-optimality
V-optimality
Σ-optimality
Eg. regression
yi = xiTβ
design xi, observe yi, learn β?
Kernel regression/Gaussian process
[By NOAA (http://www.photolib.noaa.gov/htmls/theb1982.htm) [Public domain], via Wikimedia Commons]
8
Gaussian Random Fields
[Zhu 2004]
Prior
fi : node value. A : adjacency matrix.
L=D-A : graph Laplacian.
Design s, observe ys, learn f?
L=
...
…
…
…
4
-1
…
-1
4
-1
-1
4
-1
…
-1
4
…
…
…
…
…
(Posterior) covariance matrix
…
-1
…
…
…
…
…
…
-1
…
…
9
D-Optimality (Information
Criterion)?
Mutual information is roughly marginal variance
Near-optimal sensor placement [Krause 2008]
GP-Bandit [Srinivas 2010]
Level set estimation [Gotovos 2013]
Bandits on graphs [Valko 2014]
Waste samples at boundaries
10
Visualization:
D-Optimality Picks Outliers
High effective dim
Choose the periphery
Coauthorship graph
1711 nodes, 2898 edges.
Labels (author area):
• Machine learning
• Data mining
• Information retrieval
• Database
11
V-Optimality [Ji 2012]
Marginal variance
minimization
Variance may not be
bounded
12
Σ-Optimality for Active
Surveying
Bayesian optimal active search and survey [Garnett 2012]
Modeling class proportion
Bayesian risk minimization
What will happen if applied to graphs?
Cluster centers?
[Ma 2013]
13
Σ-Optimality on Graphs
[Ma 2013]
Cluster centers!
Better active learning
accuracy
Monotone & submodular:
greedy selections
0.8
0.6
0.4
0.2
0
unc
ig
mig
eer vopt sopt
14
accuracy better
DBLP Coauthorship
1711 nodes (scholars). Labels:
• Machine learning
• Data mining
• Information retrieval
• Database
2898 edges:
• Coauthorship
2D embedding: OpenOrd
Used in Ji & Han 2012
15
accuracy better
Cora Citation Graph
2485 nodes (papers). Labels:
• Case based / Genetic algorithms / Neural
networks / Probabilistic methods /
Reinforcement learning / Rule learning /
Theory
5069 edges:
• Undirected citation
Created by McCallum et al 2000
16
16
Citeseer Citation Graph
2109 nodes (papers) labels:
Agents / AI / DB / IR / ML / HCI
3665 edges:
accuracy better
Presence of a citation
17
Handwritten Digits
1797 images of 8x8 resolution
4-nearest neighbor graph
accuracy better
Random subsample 70% digits
First query fixed at largest
degree
18
accuracy better
Isolet 1+2+3+4
6238 Spoken letter recordings
617 dimensional frequency feature
5-nearest neighbor graph from raw input
Random subsample 70% instances
First query fixed at largest degree
19
better
Active Surveying
DBLP Coauthorship
Cora Citation
20
Citeseer
Citation
Insights? Break It Down
to Greedy Application
Write current covariance matrix
Look-ahead covariance as rank-1 update
The Idea: L-2 vs L-1 for robustness.
21
Exploitation (post mean), Tradeoff, Exploration (post std)
aIn practice, often explore boundaries first ...
N EAR -O PTIMAL S ENSOR P LACEMENTS IN G AUSSIAN P ROCESSES
Ë Exploi
Ë Explor
2
4
6
8
Active Search on Graphs
[Ma 2015]
end
ret
[Krause et al. 2008]
[Gotovos et al. 2013]
Select observations
based on
Ë Spectral-UCB [Valko et al. 2014]
aGraph kernel from regularized
Laplacian, (
) = 1 ˜ 11
V A A
aAlgorithmA unchanged; theoretical guarantees improved
aStill explores
boundaries first
V S...
U
S
U
0
5
10
15
20
Figure 4: An example of placements chosen using entropy and mutual information criteria on a
subset of the temperature data from the Intel deployment. Diamonds indicate the positions
chosen using entropy; squares the positions chosen using MI.
the criterion was derived from the “predictive” formulation H(
equivalent to maximizing H( ).
\
|
) in Equation (3), which is
Caselton and Zidek (1984) proposed a different optimization criterion, which searches for the subset
of sensor locations that most significantly reduces the uncertainty about the estimates in the rest of
the space. More formally, we consider our space as a discrete set of locations = ∪ composed
of two parts: a set of possible positions where we can place sensors, and another set of positions
of interest, where no sensor placements are possible. The goal is to place a set of k sensors that will
give us good predictions at all uninstrumented locations V \ A . Specifically, we want to find
Main Contributions
A ∗ = argmaxA⊆S:|A|= k H( XV \ A ) − H( XV \ A | XA ),
that is, the set A ∗ that maximally reduces the entropy over the rest of the space V \ A ∗ . Note that
this criterion H( XV \ A ) − H( XV \ A | XA ) is equivalent to finding the set that maximizes the mutual
information I( XA ; XV \ A ) between the locations A and the rest of the space V \ A . In their follow-up
work, Caselton et al. (1992) and Zidek et al. (2000), argue against the use of mutual information
in a setting where the entropy H( XA ) in the observed locations constitutes a significant part of the
total uncertainty H( XV ). Caselton et al. (1992) also argue that, in order to compute MI( A ), one
needs an accurate model of P( XV ). Since then, the entropy criterion has been dominantly used
as a placement criterion. Nowadays however, the estimation of complex nonstationary models for
P( XV ), as well as computational aspects, are very well understood and handled. Furthermore, we
show empirically, that even in the sensor selection case, mutual information outperforms entropy on
several practical placement problems.
On the same simple example in Figure 4, this mutual information criterion leads to intuitively
appropriate central sensor placements that do not have the “wasted information” property of the
entropy criterion. Our experimental results in Section 9 further demonstrate the advantages in
performance of the mutual information criterion. For simplicity of notation, we will often use
MI( A ) = I( XA ; XV \ A ) to denote the mutual information objective function. Notice that in this no-
or the f
GP- SO
min
Ë Tradetimal c
Discu
Ë Σ-opti
Choices in previous work
Choices by our algorithm
has nat
Ë New exploration term: favor cluster centers over boundaries
Ë GP- SO
aMeasured by improvement of Σ-optimality [Ma et al. 2013] 22
243
Active Search on Graphs
[Ma 2015]
Select observations based on
Experiment
Nodes: 5000 populated places
Edges: wikipedia links
Search: 725 administrative regions
among countries, cities,
towns and villages
23
Wikipedia Pages On
Programming Languages
Nodes: 5271
programming
languages
Targets: 202
object-oriented
programming
pages
Most targets
reside in one
large hub
24
Citation Network
Nodes: 14117
papers
Targets: 1844
NIPS papers
Targets appear
in many small
components
25
Enron Scandal Emails
Nodes: 20112
emails
Targets: 803
related to
downfall of
Enron
26
eider*
sity. * * Microsoft Research
schnei de@cs. cmu. edu
Regret Analysis
High Probability Regret Bounds
f,
=
Define Regret
(1)
Define Information
max
non-repeat
= max
y active
2
)
0
=1
1)
proper
log 1 +
matrix.
GP- SOPT.TT / TOPK
˜
Compare With
˜
( )
(y ; )
f ˜f
Assume
( )
=1
+
+
2
0
,
any .
, [ref 5]
Experiments
Ë Populated Places. 725 targets (administrative regions) in 5,000 nodes. Targets
spread over components of varying sizes
Ë Wikipedia Pages on Programming Languages. 202 targets (object-oriented
27
Further Questions
Modeling covariance in high dim
side information [Alon et al. 2014]
Sparse additive models [Kirthevasan 2015]
28
Further Questions
Markov decision process
restless bandits
reinforcement learning
29
Exploitation (post mean), Tradeoff, Exploration (post std)
aIn practice, often explore boundaries first ...
8
N EAR -O PTIMAL S ENSOR P LACEMENTS IN G AUSSIAN P ROCESSES
Ë Exploi
Ë Explor
2
4
6
Further Questions
end
ret
[Gotovos et al. 2013]
Dynamics?[Krause et al. 2008]
Ë Spectral-UCB [Valko et al. 2014]
Partial observation?
aGraph kernel from regularized
Laplacian, (
) = 1 ˜ 11
V A A
A
aAlgorithmvs
unchanged;
theoretical guarantees improved
Graph Laplacian
covariance?
aStill explores
boundaries first
V S...
U
S bound?
U
Matching lower
0
5
10
15
20
Figure 4: An example of placements chosen using entropy and mutual information criteria on a
subset of the temperature data from the Intel deployment. Diamonds indicate the positions
chosen using entropy; squares the positions chosen using MI.
the criterion was derived from the “predictive” formulation H(
equivalent to maximizing H( ).
\
|
) in Equation (3), which is
Caselton and Zidek (1984) proposed a different optimization criterion, which searches for the subset
of sensor locations that most significantly reduces the uncertainty about the estimates in the rest of
the space. More formally, we consider our space as a discrete set of locations = ∪ composed
of two parts: a set of possible positions where we can place sensors, and another set of positions
of interest, where no sensor placements are possible. The goal is to place a set of k sensors that will
give us good predictions at all uninstrumented locations V \ A . Specifically, we want to find
Main Contributions
A ∗ = argmaxA⊆S:|A|= k H( XV \ A ) − H( XV \ A | XA ),
that is, the set A ∗ that maximally reduces the entropy over the rest of the space V \ A ∗ . Note that
this criterion H( XV \ A ) − H( XV \ A | XA ) is equivalent to finding the set that maximizes the mutual
information I( XA ; XV \ A ) between the locations A and the rest of the space V \ A . In their follow-up
work, Caselton et al. (1992) and Zidek et al. (2000), argue against the use of mutual information
in a setting where the entropy H( XA ) in the observed locations constitutes a significant part of the
total uncertainty H( XV ). Caselton et al. (1992) also argue that, in order to compute MI( A ), one
needs an accurate model of P( XV ). Since then, the entropy criterion has been dominantly used
as a placement criterion. Nowadays however, the estimation of complex nonstationary models for
P( XV ), as well as computational aspects, are very well understood and handled. Furthermore, we
show empirically, that even in the sensor selection case, mutual information outperforms entropy on
several practical placement problems.
On the same simple example in Figure 4, this mutual information criterion leads to intuitively
appropriate central sensor placements that do not have the “wasted information” property of the
entropy criterion. Our experimental results in Section 9 further demonstrate the advantages in
performance of the mutual information criterion. For simplicity of notation, we will often use
MI( A ) = I( XA ; XV \ A ) to denote the mutual information objective function. Notice that in this no-
or the f
GP- SO
min
Ë Tradetimal c
Discu
Ë Σ-opti
Choices in previous work
Choices by our algorithm
has nat
Ë New exploration term: favor cluster centers over boundaries
Ë GP- SO
aMeasured by improvement of Σ-optimality [Ma et al. 2013] 30
243
Table of Content
Graph-Based
Region Rewards
[wikipedia]
Group Actions
31
[Charginghawk, wikipedia]
Region Rewards
Regions of water been polluted?
Telephone poll to search for opinions by a group of people?
[crw-cmu]
[wikipedia]
32
Related Settings
Bayesian optimization (BO) [Mockus,1989; Jones 1998; Brochu 2010]
Simple reward (BO) vs. cumulative rewards (active search)
[Azimi 2011]
33
[Ying Yang]
Active Area Search (AAS)
[Ma 2014]
Point actions
On the upper level
Pay to observe
Assume GP connection
Region rewards
On the lower level
Region integral > threshold
Input
GP prior
Region definitions
Threshold
34
1D Toy Example
Circles: collected; blue: GP posterior; gray/green: post. of region integrals.
Method: maximize 1-step look-ahead expected reward
Posterior
actions
Expected reward
35
Properties of Action
Selection
Point selection?
Design x* to minimize variance of the region integral [Minka]
Connect to Σ-optimality
Region selection?
High posterior mean
AND Large ratio in variance reduction after 1-step look-ahead
(under simplified assumptions)
36
Water Quality
(Dissolved Oxygen)
Ground truth dots: raw measurement; target regions: low values (black)
Re-pick measurements via active search with region rewards
37
Water Quality
(Dissolved Oxygen)
Recall of target regions
Re-picked measurements
38
PA Election
(Races vs. Precincts)
Search for positive electoral races with precinct queries.
Regions
Dots: precinct centers, same color: races;
Build kernel on precincts by demographic info.
Report
a mapGoogle
error
Map data
©2015
39
Identify Fluid Flow Vortices via
Point Observations [Ma 2015]
Observe point vectors
Objective overlapping windows of
11x11 that contain a vortex
Classifier 2-layer neural net
40
Region Rewards,
Further Thoughts
Define region rewards in multi-armed bandits?
• Use bandit algorithm (UCB) to select regions
• Mutual information to select points in a region
Can we have a better regret rate?
Relation to multi-task Bayesian optimization [Swersky 2013]?
Region integral (AAS) ≈ average value on multiple tasks (BO)
41
Table of Content
Graph-Based
Region Rewards
[wikipedia]
Group Actions
42
[Charginghawk, wikipedia]
Group Actions
Test mixtures of drugs?
Fly high to obtain noisy observations?
[Charginghawk, wikipedia]
43
Related Work
Adaptive Compressive Sensing Does not Help [Davenport 2011]
But for log terms [Haupt 2009]
Adaptive Constrained Sensing?
Fixed batch-size MAB [Yue 2011]
Twenty questions [Jadynak 2012]
Other constraints?
Non-iid signals? [Hedge 2015]
44
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
45
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
46
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
47
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
48
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
49
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
50
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
51
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
52
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
53
Demo of IG Criterion
Actions: any subset of arms; outcome: mean value ± const. noise;
search objective: the non-zero entry of a 1-sparse vector.
Method: maximize IG of an action
54
Thank you!
Active Point rewards
search
Region rewards
Point
action
• Σ-optimality for active learning
on graphs [Ma 2013]
• Active search on graphs [Ma
2014]
• Active area search [Ma 2014]
• Active pointillistic pattern
search [Ma* 2015]
Group
action
• Active search for sparse
rewards (in progress)
• Theory of everything ?
55
Download