Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

advertisement
Spatially Smoothed Kernel Density Estimation via
Generalized Empirical Likelihood
Kuangyu Wen & Ximing Wu
Texas A&M University
Info-Metrics Institute Conference: Recent Innovations in Info-Metrics
October 2014
Spatial Kernel Density Estimation
1 / 20
Motivation
Nonparametric density estimation offers a flexible alternative to
parametric method.
To obtain the same level of precision as that of its parametric
counterpart, a larger sample size is desired in nonparametric
estimation due to its slower convergence rate.
Given similarity/dependence among distributions across geographic
units, efficiency gain is possible via information pooling; particularly
so in nonparametric estimations with small sample size for each unit.
Example: Crop yield distributions are typically estimated
(individually) at the county level. Taking into account their spatial
dependence may improve estimation.
Spatial Kernel Density Estimation
2 / 20
Nonparametric estimation of spatially dependent densities
An information-theoretic nonparametric estimation procedure
Weighted Kernel Density Estimator (WKDE) that satisfies moment
conditions implied by spatial dependence among units
Observation weights are the implied Generalized Empirical Likelihood
(GEL) probabilities associated with spatial moment conditions.
GEL offers a flexible yet simple framework to incorporate
out-of-sample information into density estimation.
Computationally inexpensive; joint estimation of a large number of
spatially dependent densities is avoided.
Spatial Kernel Density Estimation
3 / 20
Talk plan
Weighted KDE (WKDE) via the GEL
Spatial smoothing of KDE
Monte Carlo Simulations
Empirical example
Spatial Kernel Density Estimation
4 / 20
Kernel Density Estimation (KDE)
Data: X1 , . . . , XT i.i.d. from a continuously differentiable distribution
F with density f
KDE:
T
1 X
fˆ(x) =
K
Th
t=1
x − Xt
h
≡
T
1 X
Kh (x − Xt )
T
t=1
where K is a kernel function and h the bandwidth.
Weighted KDE (WKDE)
fˆ(x) =
T
X
pt Kh (x − Xt )
t=1
where p = (p1 , . . . , pT ) are non-negative observation weights that
sum to one.
Spatial Kernel Density Estimation
5 / 20
Incorporation of out-of-sample information
Incorporation of out-of-sample information, for instance population
moments or theoretical properties, may enhance the KDE.
Suppose out-of-sample information comes in the form
E[gm (x)] = µm , m = 1, . . . , M,
where gm are known real functions.
Chen (1997) uses WKDE to incorporate out-of-sample information;
the weights is calculated via empirical likelihood (Owen, 1988; 2001)
max
p1 ,...,pT
s.t.
T
X
t=1
T
X
log pt
pt gm (Xt ) = µm , m = 1, . . . , M,
t=1
pt ≥ 0,
T
X
pt = 1.
t=1
Spatial Kernel Density Estimation
6 / 20
Empirical likelihood weighted KDE:
fˆEL (x) =
T
X
p̂t Kh (x − Xt )
t=1
where p̂ = (p̂1 , . . . , p̂T ) are the implied EL probabilities.
Under mild regularity conditions
abias(fˆEL (x)) = abias(fˆ(x)) + o(n−1 )
avar(fˆEL (x)) = avar(fˆ(x)) − r (x)f 2 (x)/n + o(n−1 )
where r (x) : R → R+ depends on the unknown density f and
moment functions g = (g1 , . . . , gM ).
An important implication: fˆEL can use the optimal bandwidth for fˆ;
joint search of p and bandwidth is not required.
Spatial Kernel Density Estimation
7 / 20
Extension to Generalized Empirical Likelihood
Cressie-Read power discrepancy
CRν =
T
X
1
(npt )−ν − 1
ν(ν + 1)
t=1
Generalized Empirical Likelihood (GEL)
T
X
1
min
(npt )−ν − 1
p1 ,...,pT ν(ν + 1)
t=1
s.t.
T
X
pt gm (Xt ) = µm , m = 1, . . . , M,
t=1
pt ≥ 0,
T
X
pt = 1.
t=1
ν = 0 =⇒ EL
ν = 1 =⇒ Exponential Tilting (ET) or maximum entropy
Spatial Kernel Density Estimation
8 / 20
Spatial smoothing of KDE
Consider N spatially dependent densities fi , i = 1, . . . , N; for each i,
we observe Yi = (Yi,1 , . . . , Yi,T ) i.i.d. from fi .
Given selected moment functions g = (g1 , . . . , gM ), define their
sample analog for each unit
ĝ(i)
T
1 X
g (Yi,t ), i = 1, . . . , N.
=
T
t=1
Spatial weights for unit i: wi = {wi,j }N
j=1 ; wi,j ≥ 0 and
PN
j=1 wi,j
= 1.
Spatially smoothed sample moments
g̃(i) =
N
X
j=1
Spatial Kernel Density Estimation
wi,j
!
T
N
X
1 X
g (Yj,t ) =
wi,j ĝ(j)
T
t=1
j=1
9 / 20
Estimate fi using the WKDE fˆEL or fˆET subject to spatial moment
conditions g̃(i) , i = 1, . . . , N.
For implication, we need to specify
spatial weights w = {w1 , . . . , wN }
moment functions g = (g1 , . . . , gM )
Spatial Kernel Density Estimation
10 / 20
Construction of spatial weights
di,j ≥ 0: the spatial distance between unit i and j
Nik : an indicator set including unit i and its k nearest neighbors
Spatial weights for unit i:
2 /γ )1(j ∈ N k )
exp(−di,j
i
i
, j = 1, . . . , N,
wi,j = PN
2
k
j=1 exp(−di,j /γi )1(j ∈ Ni )
where 1 is the indicator function and
1 X
di,j
γi =
k
k
j∈Ni
Spatial Kernel Density Estimation
11 / 20
Specification of moment functions
Consider for now the Exponential Tilting estimator fˆET . It can be
shown to be asymptotically equivalent to a minimum cross entropy
density estimator with the KDE fˆ as the reference density:
Z
f (x)
f˜ET = arg min f (x) log
dx
fˆ(x)
f
Z
s.t. f (x)dx = 1,
Z
gm (x)f (x)dx = µm , m = 1, . . . , M.
The solution takes the form
(
)
M
X
f˜ET (x) = fˆ(x) exp λ̃0 +
λ̃m gm (x) = fˆET (x) + op (n−1 )
m=1
where λ̃0 is a normalization constant.
Spatial Kernel Density Estimation
12 / 20
Thus fˆET is seen to modify fˆ with a multiplicative adjustment, which
consists of a basis expansion in exponent. Therefore the selection of
moment functions for fˆET amounts to the selection of basis functions
for nonparametric jminimum cross entropy density estimation.
Power series, g (x) = (x, x 2 , x 3 , ...), is customarily used, but can be
sensitive to outliers and require the existence of population moments
up to order M.
We use spline basis due to its flexibility and robustness.; e.g.,
g (x) = (x, x 2 , (x − K1 )2+ , (x − K2 )2+ , . . . , (x − KM )2+ ),
where (x)+ = max(0, x) and K1 , . . . , KM are spline knots. Only the
first two moments are required to exist.
The resultant density can be viewed as a hybrid of KDE and log
spline estimator (Kooperberg and Stone, 1992; Stone, et al., 1997)
Spatial Kernel Density Estimation
13 / 20
Monte Carlo Simulations
N = 100 locations in a unit square I
Construct a 20 × 20 grid evenly spaced over I
Randomly draw 100 points from the grid without replacement
0.0
0.4
0.8
Calculate the distance matrix D = [di,j ]
0.0
Spatial Kernel Density Estimation
0.2
0.4
0.6
0.8
1.0
14 / 20
Constructions of spatially dependent distributions
Assume fi ∼ Skewed Normal (µi , σi , αi ) for i = 1, · · · , N.
µ = (µ1 , · · · , µN )> , σ = (σ1 , · · · , σN )> and α = (α1 , · · · , αN )> .
A N-dimensional vector θ follows spatial error model SpE (β, λ, σ ) if
θ =β+η
η = λWη + ,
where ∼ N 0, σ2 IN and the spatial weighting matrix W is
constructed as
1
2
∗ ∗
W∗ = wi,j
, wi,j = 1 if i 6= j and the location i is one of the 5 nearest
∗
neighbors of the location j; otherwise wi,j
= 0;
PN
∗
∗
row standardization, i.e. W = [wi,j ], wi,j = wi,j
/ j=1 wi,j
.
Spatial Kernel Density Estimation
15 / 20
We set µ ∼ SpE (−1, 0.8, 0.03), log (σ) ∼ SpE (0, 0.8, 0.03) and
α ∼ SpE (−0.75, 0.8, 0.03).
Each vector of shape parameter is given by
θ = β + (IN − λW)−1 .
0.0
0.2
0.4
Randomly draw to obtain a set of values for each µ, σ and α
respectively.
−5
Spatial Kernel Density Estimation
−3
−1 0
1
2
16 / 20
Generate data Yi = {Yi,t }T
t=1 from fi , i = 1, . . . , N
Sample size T = 30 and 60
We consider the KDE fˆ and two WKDEs: fˆEL and fˆET .
For each unit, KDE bandwidth is selected according to the rule of
thumb.
Moment functions: quadratic splines with two knots at the 33th and
66th percentiles of the pooled sample:
g (y ) = (y , y 2 , (y − Q̄33 )2+ , (y − Q̄66 )2+ ),
where (y )+ = max(y , 0) and Q̄α is the αth sample percentile.
Number of nearest neighbors in the calculation of spatial weights:
k=5
Number of repetitions: R = 500
Spatial Kernel Density Estimation
17 / 20
Second experiment
Construct fi as a two-component normal mixture, i.e.
fi = ωi φ(µ1,i , σ1,i ) + (1 − ωi )φ(µ2,i , σ2,i ).
0.0
0.2
0.4
Parameters: Φ−1 (ω) ∼ SpE(Φ−1 (0.2), 0.6, 0.03),
µ1 ∼ SpE(−3, 0.6, 0.2), µ2 ∼ SpE(0, 0.6, 0.1),
log (σ 1 ) ∼ SpE(log(1.5), 0.6, 0.03) and log (σ 2 ) ∼ SpE(0, 0.6, 0.03).
−8 −6 −4 −2
Spatial Kernel Density Estimation
0
2
4
18 / 20
Evaluation of performance
(r )
Denote by fˆi an estimate of fi in the r th experiment, i = 1, . . . , N
and r = 1, . . . , R.
Global performance: average mean integrated squared error
N R Z
2
1 X X ˆ(r )
fi − fi
NR
i=1 r =1
Tail performance: let Qi,5 be the 5% quantile of the fi and
R
(r )
(r )
π̂i = x≤Qi,5 fˆi (x)dx, the average mean squared error of tail
probability estimation
N R
2
1 X X (r )
π̂i − 5%
NR
i=1 r =1
Spatial Kernel Density Estimation
19 / 20
Simulation results
MSE of KDE and ratio of WKDE MSE relative to that of KDE
T = 30
global
SN
tail
global
NM
tail
T = 60
global
SN
tail
global
NM
tail
Spatial Kernel Density Estimation
WKDE
ET
KDE
EL
0.0176
0.0017
0.0148
0.0013
63.51%
56.08%
77.01%
34.66%
63.89%
56.50%
78.21%
34.90%
0.0100
0.0009
0.0088
0.0006
62.88%
62.95%
75.49%
37.91%
65.05%
63.53%
76.74%
38.11%
20 / 20
Download