FastOneDAP - VideoLectures.NET

advertisement

Fast D

irection-

A

ware

P

roximity for Graph Mining

Speaker: Hanghang Tong

Joint work w/ Yehuda Koren, Christos Faloutsos

2007-8-13 KDD 2007, San Jose

Proximity on Graph

• Un-directed graph

– What is Prox between A and B

– ‘ how close is Smith to Johnson ’?

1 1

A 1 B

1

1

1 1

But, many real graphs are directed….

2

Edge Direction w/ Proximity

1 0.5

What is Prox from A to B?

What is Prox from B to A?

3

Motivating Questions ( Fast DAP )

• Q1: How to define it?

• Q2: How to compute it efficiently ?

• Q3: How to benefit real applications ?

4

Roadmap

• DAP definitions

– Escape Probability

– Issue # 1: ‘degree-1 node’ effect

– Issue # 2: weakly connected pair

• Computational Issues

– FastAllDAP: ALL pairs

– FastOneDAP: One pair

• Experimental Results

• Conclusion

5

Defining DAP: escape probability

• Define Random Walk ( RW ) on the graph

• Esc_Prob(A  B)

– Prob (starting at A, reaches B before returning to A)

A the remaining graph

B

Esc_Prob = Pr (smile before cry) 6

Esc_Prob: Example

1 1

A 0.5

B

1

1

0.5

1

Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5

7

Esc_Prob is good, but…

• Issue #1:

– `Degree-1 node’ effect

• Issue #2:

– Weakly connected pair

Need some practical modifications!

8

Issue#1: `degree1 node’ effect

[Faloutsos+] [Koren+]

A 1 D 1 B

Esc_Prob(a->b)=1

A 1

1/3

1

E

D

1

1/3

1/3

F

B

Esc_Prob(a->b)=1

• no influence for degree-1 nodes (E, F)!

– known as ‘pizza delivery guy’ problem in undirected graph

• Solutions: Universal Absorbing Boundary!

9

Universal Absorbing Boundary

Footnote: fly-out probability = 0.1

0.1

0.1

0.1

1

U-A-B is a black-hole!

10

Introducing Universal-Absorbing-Boundary

A

A

1 D

Esc_Prob(a->b)=1

1 B

A 0.9

D 0.9

0.1

0.1

U-A-B

Prox(a->b)=0.91

0.1

B

1

1/3

1

E

D

1

1/3

1/3

F

Esc_Prob(a->b)=1

B

Footnote: fly-out probability = 0.1

A

0.1

0.9

0.3

0.9

E

D

0.1

0.3

0.9

0.3

F

0.1

0.1

U-A-B

Prox(a->b)=0.74

0.1

B

11

Issue#2: Weakly connected pair

A 1 1 1 B

Prox(A  B) = Prox (B  A)=0

Solution: Partial symmetry!

i w j i j

12

Practical Modifications: Partial Symmetry

A 1 1 1 B

Prox(A  B) = Prox (B  A)=0

A

0.1

0.9

0.1

0.9

0.1

0.9

B

Prox(A  B) =0.081 > Prox (B  A)=0.009

13

Roadmap

• DAP definitions

– Escape Probability

– Issue # 1: ‘degree-1 node’ effect

– Issue # 2: weakly connected pair

• Computational Issues

– FastAllDAP: ALL pairs

– FastOneDAP: One pair

• Experimental Results

• Conclusion

14

Solving Esc_Prob: [Doyle+]

P: transition matrix (row norm.) n: # of nodes in the graph

1 x (n-2) i^th row  removing i^th & j^th elements

(n-2) x (n-2)

P  removing i^th

& j^th rows & cols

1 x (n-2) i^th col  removing i^th & j^th elements

One matrix inversion , one Esc_Prob!

15

P=

 p p p p p p p p p p

5,1 5,2 5,3 5,4 5,5 5,6 p

1,1

2,1

3,1

4,1

6,1 p p p p p

1,2

2,2

3,2

4,2

6,2 p p p p p

1,3

2,3

3,3

4,3

6,3 p p p p

1,4 p

2,4

3,4

4,4

6,4 p p p p

1,5 p

2,5

3,5

4,5

6,5 p p

1,6 p p p

2,6

3,6

4,6

6,6

P: Transition matrix (row norm.)

1

0.5

0.5

0.5

2

0.5

6

0.5

3

1

0.5

1

0.5

4

0.5

5

-1

Esc_Prob(1->5)

= I -

+

18

Solving DAP (Straight-forward way)

1-c: fly-out probability (to black-hole)

1 x (n-2)

(n-2) x (n-2) 1 x (n-2)

One matrix inversion, one proximity!

19 i

 j

2 p t

Prox( )=c (

i

I

ˆ

)

1 p j

 c p

Challenges

• Case 1, Medium Size Graph

– Matrix inversion is feasible, but…

– What if we want many proximities?

– A: FastAllDAP !

• Case 2: Large Size Graph

– Matrix inversion is infeasible

– Q: How to get one proximity efficiently?

– A: FastOneDAP !

20

FastAllDAP

• Q1: How to efficiently compute all possible proximities on a medium size graph?

– a.k.a. how to efficiently solve multiple linear systems simultaneously?

• Goal: reduce # of matrix inversions!

21

FastAllDAP: Observation

P=

 p p p

1,1

2,1

3,1

4,1 p p p

1,2

2,2

3,2

4,2 p p p

1,3

2,3

3,3

4,3 p p p

1,4

2,4

3,4

4,4 p p

1,5 p

2,5

3,5

4,5 p

1,6 p p p p p p p p

2,6

3,6

4,6 p p p p p p

5,1 5,2 5,3 5,4 5,5 5,6

 p p p p p p

6,1 6,2 6,3 6,4 6,5 6,6

1

0.5

0.5

0.5

2

0.5

0.5

1

6

3

P=

 p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6 p p p p p p p

3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p

4,1 4,2 4,3 4,4 4,5 4,6 p

2,1

5,1 p p

2,2

5,2 p p

2,3

5,3 p p

2,4

5,4 p p

2,5

5,5 p p

2,6

5,6

 p p p p p p

6,1 6,2 6,3 6,4 6,5 6,6

0.5

1

0.5

4

0.5

5

Need two different matrix inversions!

22

P=

Prox(1  5) p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6 p

2,1 p

2,2 p

2,3 p

2,4 p

2,5 p p p p p p p

3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p

4,1 4,2 4,3 4,4 4,5

2,6

4,6 p

5,1 p

5,2 p

5,3 p

5,4 p

5,5 p

5,6

FastAllDAP: Rescue

 p p p p p p

6,1 6,2 6,3 6,4 6,5 6,6

P=

Prox(1  6)

 p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6 p p p p p p p

3,1 3,2 3,3 3,4 3,5 3,6 p p p p p p

4,1 4,2 4,3 4,4 4,5 4,6 p

2,1

5,1 p p

2,2

5,2 p p

2,3

5,3 p p

2,4

5,4 p p

2,5

5,5 p p

2,6

5,6

 p p p p p p

6,1 6,2 6,3 6,4 6,5 6,6

Overlap between two gray parts!

Redundancy among different linear systems!

23

FastAllDAP: Theorem

• Theorem: • Example:

• Proof: by SM Lemma

24

FastAllDAP: Algorithm

• Alg.

– Compute Q

– For i,j =1,…, n, compute

• Example

– w/ 1000 nodes,

– 1m matrix inversion vs. 1 matrix!

25

FastOneDAP

• Q1: How to efficiently compute one single proximity on a large size graph?

– a.k.a. how to solve one linear system efficiently?

• Goal: avoid matrix inversion!

26

FastOneDAP: Observation

1

0.5

0.5

0.5

2

0.5

6

0.5

3

1

0.5

1

0.5

4

0.5

5

Partial Info. (4 elements /2 cols ) of Q is enough!

27

FastOneDAP: Observation

• Q: How to compute one column of Q?

• A: Taylor expansion

[0, …0, 1, 0, …, 0]

T

Reminder:

28

FastOneDAP: Observation

[0, …0, 1, 0, …, 0]

T

….

x x x

Sparse matrix-vector multiplications!

29

FastOneDAP: Iterative Alg.

30

FastOneDAP: Property

• Convergence Guaranteed !

• Computational Save

– Example:

• 100K nodes and 1M edges (50 Iterations)

• 10,000,000x fast!

• Footnote: 1 col is enough!

– (details in paper)

31

Roadmap

• DAP definitions

– Escape Probability

– Issue # 1: ‘degree-1 node’ effect

– Issue # 2: weakly connected pair

• Computational Issues

– FastAllDAP: ALL pairs

– FastOneDAP: One pair

• Experimental Results

• Conclusion

32

Datasets (all real)

Name Node # Edge # Directionality

WL 4k

PC 36k

10k

64k

A-links to-B

Who-contact-whom

EP 76k

CN 28k

AE 38k

509k

353k

115k

Who-trust-whom

A-cites-B

Who-email to-whom

33

0.18

density

0.16

0.14

Link Prediction: existence

0.12

0.1

0.08

0.06

with link

0.04

0.02

Prox (i  j)+Prox (j  i)

0.25

0

0 0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

DAP is effective to distinguish red and blue !

density

0.2

0.15

no link

0.1

0.05

0

0 0.01

0.02

0.03

0.04

Prox (i  j)+Prox (j  i)

0.05

0.06

0.07

0.08

0.09

35

Link Prediction: existence

Dataset

WL

PC

AE

CN

EP

Accuracy

65.40%

79.60%

81.51%

86.71%

92.21%

37

Link Prediction: direction

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(i  j) and prox(j  i)

>70% density

38

Prox (i  j) - Prox (j  i)

Efficiency: FastAllDAP

Time (sec)

Straight-Solver

1,000x faster!

FastAllDAP

Size of Graph

41

Efficiency: FastOneDAP

Time (sec)

Straight-Solver

1,0000x faster!

FastOneDAP

Size of Graph

42

Roadmap

• DAP definitions

– Escape Probability

– Issue # 1: ‘degree-1 node’ effect

– Issue # 2: weakly connected pair

• Computational Issues

– FastAllDAP: ALL pairs

– FastOneDAP: One pair

• Experimental Results

• Conclusion

43

Conclusion ( Fast DAP )

• Q1: How to define it?

• A1: Esc_Prob + Practical Modifications

• Q2: How to compute it efficiently?

• A2: FastAllDAP & FastOneDAP

– (100x – 10,000x faster!)

• Q3: How to benefit real applications?

• A3: Link Prediction (existence & direction)

44

More in the paper…

• Generalization to group proximity

– Definitions; Fast solutions

– ‘ How close between/from CEOs and/to Accountants?’

• More applications

– Dir-CePS, attributed-graphs

B B

B

A

CePS

C

B

...

A

Common descendant

C

A

Common ancestor

C C

A

Descendant of B; & Common

45 ancestor of A and C

Cupid uses arrows, so does graph mining!

Thank you!

www.cs.cmu.edu/~htong

46

Download