Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA Vishwani D. Agrawal Department of ECE Auburn University, AL 36849, USA October 5, 2006 Outline Background – – – – Dynamic power dissipation Glitch reduction Previous LP model with fixed gate delay Process-variation-resistant LP model Input-specific optimization – Without process-variation – With process-variation Experimental results Conclusion Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 2 Background Dynamic power dissipation – Pdyn= Pswitching + Pshort-circuit Switching power dissipation – Pswitching = 1/2 kCLVdd2fclk Vdd Vdd 1 off 0 on 1 1 0 0 ic on isupply 1 0 off CL Gnd Oct. 5, 2005 CL Gnd Fei Hu, ISLPED 2006, Tegernsee, Germany 3 Background Glitch reduction – A important dynamic power reduction technique Static glitch Dynamic glitch – Glitch power consumes 30~70% Pdyn – Related techniques Balanced delay Hazard filtering Transistor/Gate sizing Linear Programming approach Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 4 Glitch reduction Original circuit 1 1 1 Balanced path/ path balancing – Equalize delays of all path incident on a gate – Balancing requires insertion of delay buffers. 1.5 .5 .5 1 1 Hazard/glitch filtering – Utilize glitch filtering effect of gate – Not necessary to insert buffer Oct. 5, 2005 .5 Fei Hu, ISLPED 2006, Tegernsee, Germany 1 3 5 Glitch reduction Transistor/gate sizing – – – – Find transistor sizes in the circuit to realize the delay No need to insert delay buffers Suffers from nonlinearity of delay model large solution space, numerical convergence and global optimization not guaranteed Linear programming approach – Adopts both path balancing and hazard filtering – Finds the optimal delay assignments for gates – Uses technology mapping to map the gate delay assignments to transistor/gate dimensions – Guarantees optimal solution, a convenient way to solve a large scale optimization problem Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 6 Previous LP approach 28 15 1 18 22 4 6 20 7 5 23 8 12 14 27 10 24 21 16 13 29 19 2 11 25 9 3 26 17 Timing window (t, T) t 6 T6 t7 T7 d7 t T5 Gate constraints: T7 T5 + d7 T7 T6 + d7 t7 ≤ t5 + d7 t7 ≤ t6 + d7 d7 > T7 – t7 Circuit delay constraints: T11 ≤ maxdelay T12 ≤ maxdelay Objective: Minimize sum of buffer delays 5 Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 7 Process-variation-resistant optimization Motivation – Gate delay assumed fixed in previous models – Variation of gate delay in real circuits Environmental factors: temperature, Vdd Physical factors: process variations – Effect of delay variation Glitch filtering conditions corrupted Power dissipation increases from the optimized value – Our proposal Consider delay variations in dynamic power optimization Only consider process variations (major source of delay variation) Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 8 LP model based on statistical timing Statistical timing model with random variables Gate 1 ta1 Ta1 ... Gate j taj tai Taj ... tak Tai Gate i Tak di Gate k tbi Oct. 5, 2005 Tbi Fei Hu, ISLPED 2006, Tegernsee, Germany 9 Outline Background – – – – Dynamic power dissipation Glitch reduction Previous LP model with fixed gate delay Process-variation-resistant LP model Input-specific optimization – Without process-variation – With process-variation Experimental results Conclusion Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 10 Input-specific optimization Motivation – Previous LP models guarantee glitch filtering for ANY input vector sequence Ti - ti < di for all gates – Redundancy in optimization Insertion of more buffers Increased overhead in power/area – In reality, gates are under embedded environments Optimization for input vector sequence that is possible for the circuit, e.g., functional vectors Same reduction in power dissipation with lower overheads Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 11 Input-specific optimization Glitch generation pattern – Input vector pair that can potentially generate a glitch – AND gate example: 1 1 0 0 1 0 1 1 0 1 0 0 1 0 Glitch generation probability Pg[ i ] = Ng[ i ] / N – Probability glitch-generation pattern occurs at inputs of gate i – Steady state signal values match the pattern Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 12 Input-specific optimization Application to basic LP model w/ fixed gate delay model – Static optimization Only static glitches/hazards considered – Relaxation of constraints Relax glitch filtering constraints where glitches unlikely Ti - ti < di => (Ti – ti)*i < di Selective relaxation 0 if Pg [i] 0 i 1 if Pg [i] 0 Generalized relaxation i 1 e Oct. 5, 2005 Pg [ i ] Fei Hu, ISLPED 2006, Tegernsee, Germany 13 Input-specific optimization Application to process-variation-resistant LP model based on statistical timing – Static optimization – Relaxation of constraints di [Wi 3 k ( Wi r di ) ] i ; Selective relaxation Generalized relaxation – Tuning factor Original objective Minimize d ; j ( j buffers) j Current objective Minimize d j Oct. 5, 2005 j TF ( 1 di ); ( j buffers, i other gates) N i Fei Hu, ISLPED 2006, Tegernsee, Germany 14 Input-specific optimization Why do we need a tuning factor – Dominating path affects critical delay distribution PIs Can be [1,41] Dominating path 41 0 Other logic Always 0 Oct. 5, 2005 1 20 40 1 0 Fei Hu, ISLPED 2006, Tegernsee, Germany 1 PO 1 15 Outline Background – – – – Dynamic power dissipation Glitch reduction Previous LP model with fixed gate delay Process-variation-resistant LP model Input-specific optimization – Without process-variation – With process-variation Experimental results Conclusion Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 16 Experimental results Experimental procedure Circuit – Power estimation Event driven logic simulation Fanout weighted sum of switching activities Monte-Carlo simulation with 1,000 samples of delays under process-variation Data extraction Constraint set data Dmax r, LP models Gate delays – Results analysis Un-Opt., unit-delay circuit Opt1, previous basic LP model w/ fixed gate delay Opt2, Process-variation-resistant LP model IS-Opt1, IS-Opt2, Input-specific optimizations Oct. 5, 2005 AMPL Fei Hu, ISLPED 2006, Tegernsee, Germany Circuit generation Optimized circuit Logic simulations Results 17 Experimental results – input-specific optimization Application to “Opt1” (basic LP model), IS-Opt1 Un-Opt c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Oct. 5, 2005 maxdelay 34 68 22 33 48 120 48 120 80 200 64 160 94 235 98 245 228 620 86 215 Pwr. 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Opt (w/o proc var.) Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.54 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Delay 34 68 22 33 51 121 48 121 82 203 65 163 95 239 100 249 226 620 89 220 Buffers 66 58 48 0 35 30 192 128 62 34 34 9 139 78 167 53 870 857 91 44 Fei Hu, ISLPED 2006, Tegernsee, Germany IS-Opt (input-specific w/o proc) Pwr. 0.74 0.74 0.94 0.95 0.54 0.54 0.93 0.93 0.54 0.53 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Delay 35 69 22 33 49 122 48 120 86 204 66 162 101 239 104 250 228 620 88 221 Buffers 66 41 33 0 32 24 113 25 52 3 30 1 122 73 170 52 870 853 84 38 18 Experimental results – input-specific optimization Application to “Opt2” under process-variation, IS-Opt2 under 15% intra-die and 5% inter-die variation Un-opt. Cir. DMax c432 50 99 32 48 70 174 70 174 116 290 93 232 137 341 143 356 331 899 125 312 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Oct. 5, 2005 Nom. Pwr. 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Opt2 (statistical proc) Nom. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.74 0.73 0.59 0.59 0.55 0.55 0.13 0.13 0.52 0.52 Mean Pwr. 0.76 0.74 0.95 0.95 0.59 0.55 0.98 0.94 0.64 0.58 0.80 0.76 0.66 0.62 0.63 0.60 0.38 0.26 0.59 0.56 Max Dev. (%) 11.1 3.7 2.0 1.0 18.2 8.6 10.2 3.0 35.8 21.4 13.6 6.2 17.8 10.1 20.8 13.4 223.8 125.3 18.7 11.8 No. Buf. 88 106 88 129 57 62 305 305 135 190 249 211 281 311 399 418 1121 1473 481 645 IS-Opt2 (input-specific statistical proc) Nom. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.73 0.73 0.59 0.59 0.55 0.55 0.13 0.13 0.52 0.52 Fei Hu, ISLPED 2006, Tegernsee, Germany Mean Pwr. 0.76 0.74 0.95 0.95 0.59 0.56 1.01 0.95 0.64 0.57 0.79 0.75 0.65 0.61 0.63 0.60 0.38 0.26 0.58 0.55 Max Dev. (%) 9.3 3.3 1.9 1.8 20.4 9.0 13.1 4.7 34.7 18.4 11.3 4.3 15.6 7.4 21.0 13.2 225.2 125.5 18.1 10.9 No. Buf. 81 76 88 58 38 38 253 160 107 104 186 79 247 188 389 413 1115 1243 389 520 19 Experimental results – input-specific optimization Critical delay Nominal delay Max. deviation – Similar performance for “Opt2” and “IS-Opt2” Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 20 Outline Background – – – – Dynamic power dissipation Glitch reduction Previous LP model with fixed gate delay Process-variation-resistant LP model Input-specific optimization – Without process-variation – With process-variation Experimental results Conclusion Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 21 Conclusions Explored a new aspect of low-power optimization for VLSI circuits – The input-specific Optimization – Optimizing the circuit for a given input sequence that may be specified for the circuit. Defined the concept of glitch-generation probability – adaptively relax glitch-filtering constraints Experimental results – Better solution with fewer delay buffers – Maintain similar power reduction and delay performance – Up to 80% and 63% reductions in delay buffers Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 22 Q&A Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 23 Backups Process and delay variations Process variations – Variations due to semiconductor process VT, tox, Leff, Wwire, THwire,etc. – Inter-die variation Constant within a die, vary from one die to another die of a wafer or wafer lot – Intra-die variation Variation within a die Due to equipment limitations or statistical effects in the fabrication process, e.g., variation in doping concentration Spatial correlations and deterministic variation due to CMP and optical proximity effect Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 25 Delay model and implications Random gate delay model – D total , i Dnom, i Dinter,i Dintra,i – Truncated normal distribution – Assume independence – Variation in terms of σ/Dnom,i ratio Effect of inter-die variations – Depends on its effect to switching activities – Definition of glitch-filtering probability Pglt = P {t2-t1< d} Signal arrival time t1, t2 Gate inertial delay d – Theorem 1 states the change of Pglt due to inter-die variation 1 k k Pglt erf( ) erf( ) 2 2 2 2 2(r k ) erf(), the error function k, a path and gate dependent constant r, σ/Dnom,i ratio for inter-die variations Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 26 Delay model and implications Process-variation-resistant design – Can be achieved by path balancing and glitch filtering – Critical delay may increase Theorem 2 states that a solution is guaranteed only if circuit delay is allowed to increase Proved by example, assuming 10% variation 1 1 1 A 1 1 2.1 3.9 1 1 C B Oct. 5, 2005 1 Fei Hu, ISLPED 2006, Tegernsee, Germany 27 LP model based on statistical timing Statistical timing model with random variables Gate 1 ta1 Ta1 ... Gate j taj tai Taj ... tak Tai Gate i Tak di Gate k tbi Oct. 5, 2005 Tbi Fei Hu, ISLPED 2006, Tegernsee, Germany 28 LP model based on statistical timing Minimum-maximum statistics – needed for tbi, Tbi – Previous works tbi Min(ta1 , ta j , tak ); Tbi Max(Ta1 , Ta j , Tak ); Min, Max for two normal random variable not necessarily distributed as normal Can be approximated with a normal distribution Requiring complex operations, e.g., integration, exponentiation, etc. – Challenges for LP approach Require simple approximation w/o nonlinear operations Our approximation for C=Max(A,B), A, B, and C are Gaussian RVs C Max( A , B ) C 3 C Max( A 3 A , B 3 B ) Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 29 LP model based on statistical timing Min-Max statistics approximation error – Negligible when |A-B|> 3(σA+ σB) – Largest when A=B P 1 CDFA Actual CDF for Max(A,B) CDFB 0.5 0 Oct. 5, 2005 C Max( A , B ) Approximated CDF for Max(A,B) A B C 1 Max( A 3 A , B 3 B ) C 3 x Fei Hu, ISLPED 2006, Tegernsee, Germany 30 LP model based on statistical timing Variables – Timing, delay variables with mean and std dev σ – Auxiliary variables, TTb , ttb ,Wi Tbi tbi , W ,W i i i i Constraints – Gate constraints Timing window at the inputs for a two-input gate i Tb Ta ;TTb Ta 3 Ta ; tb ta ; ttb ta 3 Ta ; Tb Ta ;TTb Ta 3 Ta ; tb ta ; ttb ta 3 Ta ; Tb (TTb Tb ) / 3; tb ( tb ttb ) / 3; i 1 i 2 i i 1 i 1 2 i i 2 1 i i i 2 i 1 i i 2 1 2 i Timing window at outputs Ta Tb d ; Ta k ( Tb r d ); ta tb d ; ta k ( tb r d ); i i Oct. 5, 2005 i i i i i i i i Fei Hu, ISLPED 2006, Tegernsee, Germany i i 31 LP model based on statistical timing Constraints – Gate constraint Linear approximation Ta Tb2 (r d ) 2 Ta k ( Tb r d ) i i i i i k [0.707, 1]; choose k=0.85, since – Glitch filtering constraints – W Tb tb ; i i i i A B A2 B2 A B; 2 3σ P W k ( Tb tb ); i i i d W 3 k ( W r d ); i i i i – Circuit delay constraint di-Wi Ta (1 3r ) Dmax i Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 32 LP model based on statistical timing Parameter – r, σ/Dnom,i ratio – Dmax, circuit delay parameter – , optimism factor d W 3 k ( W r d ) ; i i i i =1, no relaxation <1, optimistic about the actual glitch width =0, reduce to previous model Objective – Minimize #buffer inserted – sum of buffer delays Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, Germany 33