Fast D
A
P
Speaker: Hanghang Tong
Joint work w/ Yehuda Koren, Christos Faloutsos
2007-8-13 KDD 2007, San Jose
• Un-directed graph
– What is Prox between A and B
– ‘ how close is Smith to Johnson ’?
1 1
A 1 B
1
1
1 1
But, many real graphs are directed….
2
1 0.5
What is Prox from A to B?
What is Prox from B to A?
3
Motivating Questions ( Fast DAP )
• Q1: How to define it?
• Q2: How to compute it efficiently ?
• Q3: How to benefit real applications ?
4
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
5
Defining DAP: escape probability
• Define Random Walk ( RW ) on the graph
• Esc_Prob(A B)
– Prob (starting at A, reaches B before returning to A)
A the remaining graph
B
Esc_Prob = Pr (smile before cry) 6
1 1
A 0.5
B
1
1
0.5
1
Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5
7
• Issue #1:
– `Degree-1 node’ effect
• Issue #2:
– Weakly connected pair
Need some practical modifications!
8
Issue#1: `degree1 node’ effect
[Faloutsos+] [Koren+]
A 1 D 1 B
Esc_Prob(a->b)=1
A 1
1/3
1
E
D
1
1/3
1/3
F
B
Esc_Prob(a->b)=1
• no influence for degree-1 nodes (E, F)!
– known as ‘pizza delivery guy’ problem in undirected graph
• Solutions: Universal Absorbing Boundary!
9
Universal Absorbing Boundary
Footnote: fly-out probability = 0.1
0.1
0.1
0.1
1
U-A-B is a black-hole!
10
Introducing Universal-Absorbing-Boundary
A
A
1 D
Esc_Prob(a->b)=1
1 B
A 0.9
D 0.9
0.1
0.1
U-A-B
Prox(a->b)=0.91
0.1
B
1
1/3
1
E
D
1
1/3
1/3
F
Esc_Prob(a->b)=1
B
Footnote: fly-out probability = 0.1
A
0.1
0.9
0.3
0.9
E
D
0.1
0.3
0.9
0.3
F
0.1
0.1
U-A-B
Prox(a->b)=0.74
0.1
B
11
A 1 1 1 B
Prox(A B) = Prox (B A)=0
Solution: Partial symmetry!
i w j i j
12
Practical Modifications: Partial Symmetry
A 1 1 1 B
Prox(A B) = Prox (B A)=0
A
0.1
0.9
0.1
0.9
0.1
0.9
B
Prox(A B) =0.081 > Prox (B A)=0.009
13
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
14
P: transition matrix (row norm.) n: # of nodes in the graph
1 x (n-2) i^th row removing i^th & j^th elements
(n-2) x (n-2)
P removing i^th
& j^th rows & cols
1 x (n-2) i^th col removing i^th & j^th elements
One matrix inversion , one Esc_Prob!
15
p p p p p p p p p p
5,1 5,2 5,3 5,4 5,5 5,6 p
1,1
2,1
3,1
4,1
6,1 p p p p p
1,2
2,2
3,2
4,2
6,2 p p p p p
1,3
2,3
3,3
4,3
6,3 p p p p
1,4 p
2,4
3,4
4,4
6,4 p p p p
1,5 p
2,5
3,5
4,5
6,5 p p
1,6 p p p
2,6
3,6
4,6
6,6
P: Transition matrix (row norm.)
1
0.5
0.5
0.5
2
0.5
6
0.5
3
1
0.5
1
0.5
4
0.5
5
-1
Esc_Prob(1->5)
= I -
+
18
Solving DAP (Straight-forward way)
1-c: fly-out probability (to black-hole)
1 x (n-2)
(n-2) x (n-2) 1 x (n-2)
One matrix inversion, one proximity!
19 i
j
2 p t
i
I
1 p j
c p
• Case 1, Medium Size Graph
– Matrix inversion is feasible, but…
– What if we want many proximities?
– A: FastAllDAP !
• Case 2: Large Size Graph
– Matrix inversion is infeasible
– Q: How to get one proximity efficiently?
– A: FastOneDAP !
20
• Q1: How to efficiently compute all possible proximities on a medium size graph?
– a.k.a. how to efficiently solve multiple linear systems simultaneously?
• Goal: reduce # of matrix inversions!
21
FastAllDAP: Observation
P=
p p p
1,1
2,1
3,1
4,1 p p p
1,2
2,2
3,2
4,2 p p p
1,3
2,3
3,3
4,3 p p p
1,4
2,4
3,4
4,4 p p
1,5 p
2,5
3,5
4,5 p
1,6 p p p p p p p p
2,6
3,6
4,6 p p p p p p
5,1 5,2 5,3 5,4 5,5 5,6
p p p p p p
6,1 6,2 6,3 6,4 6,5 6,6
1
0.5
0.5
0.5
2
0.5
0.5
1
6
3
P=
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6 p p p p p p p
3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p
4,1 4,2 4,3 4,4 4,5 4,6 p
2,1
5,1 p p
2,2
5,2 p p
2,3
5,3 p p
2,4
5,4 p p
2,5
5,5 p p
2,6
5,6
p p p p p p
6,1 6,2 6,3 6,4 6,5 6,6
0.5
1
0.5
4
0.5
5
Need two different matrix inversions!
22
P=
Prox(1 5) p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6 p
2,1 p
2,2 p
2,3 p
2,4 p
2,5 p p p p p p p
3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p
4,1 4,2 4,3 4,4 4,5
2,6
4,6 p
5,1 p
5,2 p
5,3 p
5,4 p
5,5 p
5,6
p p p p p p
6,1 6,2 6,3 6,4 6,5 6,6
P=
Prox(1 6)
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6 p p p p p p p
3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p
4,1 4,2 4,3 4,4 4,5 4,6 p
2,1
5,1 p p
2,2
5,2 p p
2,3
5,3 p p
2,4
5,4 p p
2,5
5,5 p p
2,6
5,6
p p p p p p
6,1 6,2 6,3 6,4 6,5 6,6
Overlap between two gray parts!
Redundancy among different linear systems!
23
• Theorem: • Example:
• Proof: by SM Lemma
24
• Alg.
– Compute Q
– For i,j =1,…, n, compute
• Example
– w/ 1000 nodes,
– 1m matrix inversion vs. 1 matrix!
25
• Q1: How to efficiently compute one single proximity on a large size graph?
– a.k.a. how to solve one linear system efficiently?
• Goal: avoid matrix inversion!
26
FastOneDAP: Observation
1
0.5
0.5
0.5
2
0.5
6
0.5
3
1
0.5
1
0.5
4
0.5
5
Partial Info. (4 elements /2 cols ) of Q is enough!
27
FastOneDAP: Observation
• Q: How to compute one column of Q?
• A: Taylor expansion
[0, …0, 1, 0, …, 0]
T
Reminder:
28
FastOneDAP: Observation
[0, …0, 1, 0, …, 0]
T
….
x x x
Sparse matrix-vector multiplications!
29
FastOneDAP: Iterative Alg.
30
• Convergence Guaranteed !
• Computational Save
– Example:
• 100K nodes and 1M edges (50 Iterations)
• 10,000,000x fast!
• Footnote: 1 col is enough!
– (details in paper)
31
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
32
Name Node # Edge # Directionality
WL 4k
PC 36k
10k
64k
A-links to-B
Who-contact-whom
EP 76k
CN 28k
AE 38k
509k
353k
115k
Who-trust-whom
A-cites-B
Who-email to-whom
33
0.18
density
0.16
0.14
Link Prediction: existence
0.12
0.1
0.08
0.06
with link
0.04
0.02
Prox (i j)+Prox (j i)
0.25
0
0 0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
DAP is effective to distinguish red and blue !
density
0.2
0.15
no link
0.1
0.05
0
0 0.01
0.02
0.03
0.04
Prox (i j)+Prox (j i)
0.05
0.06
0.07
0.08
0.09
35
Dataset
WL
PC
AE
CN
EP
Accuracy
65.40%
79.60%
81.51%
86.71%
92.21%
37
• Q: Given the existence of the link, what is the direction of the link?
• A: Compare prox(i j) and prox(j i)
>70% density
38
Prox (i j) - Prox (j i)
Time (sec)
Straight-Solver
1,000x faster!
FastAllDAP
Size of Graph
41
Time (sec)
Straight-Solver
1,0000x faster!
FastOneDAP
Size of Graph
42
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
43
• Q1: How to define it?
• A1: Esc_Prob + Practical Modifications
• Q2: How to compute it efficiently?
• A2: FastAllDAP & FastOneDAP
– (100x – 10,000x faster!)
• Q3: How to benefit real applications?
• A3: Link Prediction (existence & direction)
44
• Generalization to group proximity
– Definitions; Fast solutions
– ‘ How close between/from CEOs and/to Accountants?’
• More applications
– Dir-CePS, attributed-graphs
B B
B
A
CePS
C
B
...
A
Common descendant
C
A
Common ancestor
C C
A
Descendant of B; & Common
45 ancestor of A and C
Cupid uses arrows, so does graph mining!
Thank you!
www.cs.cmu.edu/~htong
46