Slides

advertisement
MANAGING DISTRIBUTED UPS ENERGY FOR
EFFECTIVE POWER CAPPING IN DATA CENTERS
Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun,
E. Pettis*, D. Tullsen, T. Rosing
*Google
ISCA 2012
UCSD
Datacenter market is growing
2

World is becoming more IT dependent.

Internet users increased from 16% to 30% of world population
in 5 years [Internet World Stats]

Smart phones are projected to jump from 500M in 2011
to 2B in 2015 [Inter.Telecom.Union]

Internet heavily depends on Datacenters

Data center power will double in 5 years

Expected worldwide Datacenter Investment in 2012: 35B$
(equivalent to GDP of Lithuania) [DataCenterDynamics]
Important to build cost-effective Datacenters
Power Oversubscription - Opportunity
3
Datacenter
More servers
Server
Cost
Total Cost of Ownership / Server
No Oversubscription
One time
capital
expenses
Servers
Supporting equipment
Recurring
Costs
With Oversubscription
Same infrastructure
Power Oversubscription More Cost-effective Data centers
Power Oversubscription – Opportunity
4


[Barroso et al. + APC TCO calc]
Assumptions:






Server cost: 1500$
28000 servers (10MW)
Energy: 4.7c/KWh
Power: 12$/kW
Amort. Time DC: 10y, servers: 4y
Distributed LA-based UPS
Available at:
http://cseweb.ucsd.edu/~tullsen/DCmodeling.html
Utility Peak
5.5%
Facility Space
4.5%
Utility Energy
11.7%
Power
Infrastructure
7.9%
Cooling
Infrastructure
3.3%
PUE overhead
2.6%
Server Opex
2.0%
Rest
11.9%
DC opex
9.9%
Server
Depreciation
40.6%
UPS LA
0.2%
Power Oversubscription using Stored Energy
5
Power Profile
Pulse
Model
Shaping
Diurnal Power Profile
Power
Power
Peak
Power
M
Tu
W
Time



…
Peak Power
Pulse
Peak Power
Reduction
Low Power
Pulse
…
UPS stored
Energy
+
_
Su
Time
Leverage diurnal patterns of web services
Discharge UPS batteries during high activity (once per day)
Recharge during high (once per day)
Centralized UPS
6
Used in most small / medium data
centers
 Scales poorly
 High losses in AC-DC-AC conversion
(5-10%)
 Centralized single point of failure,
requires redundancy

X
Increasingly cost-inefficient for large data centers
Distributed UPS
7
Used in large data centers
 Scales with data center size
 Avoids AC-DC-AC conversion
 Distributed points of failure

Facebook
Cheaper UPS solution
Google
Related work and our proposal
8

Utility

Diesel
Generator
UPS

+
_
Centralized UPSs for power
capping [Govindan, ISCA 2011]
Distributed UPSs for rare power
emergencies [Govindan, ASPLOS
2012]
Our proposal:

PDUs
…



Racks

Provision distributed UPS for peak
power capping
Different battery technology
Shave power on daily basis
Place more servers under same
power infrastructure
Better amortize capex costs
Outline
9
Introduction
 Choosing the right battery for power shaving
 Datacenter workload and power modeling
 Policies and results
 Conclusions

Outline
10
Introduction
 Choosing the right battery for power shaving
 Datacenter workload and power modeling
 Policies and results
 Conclusions

Competing Battery Technologies
11

Lead Acid (LA)

Lithium Cobalt
Oxide (LCO)

Lithium Iron
Phosphate (LFP)
Electric
Metrics
12
Backup
 UPS batteries rarely used (3-4 times per year)
 Proper metrics:


Cost
Size
Wh / $
Volumetric Density (Wh / liter)
Backup + peak shaving
 UPS batteries used on daily basis
 Proper metrics:




Charge cycles
Cost
Size
Recharge speed
Wh * cycles / $
Volumetric Density (Wh / litre)
( % charge / hour)
Battery Technology Comparison
13
Backup: Lead Acid (cheaper)
Backup+Peak Shaving: Lithium Iron Phosphate (cost effective)
Battery Capacity-Cost Estimation
Power
14
Peak
Duration
E shaved
Peak
Reduction
Time
LFP
Lead Acid
Assumptions
15
Number of servers
28K
Server Type
Custom Sun Fire X4270
- Intel Xeon (8-core), 8 GB Mem.
- Idle Power: 175W
- Max Power: 350W
PSU efficiency
80%
Workload
Pulse Model, utilization 50%
Batteries
LFP (5$/Ah), LA (2$/Ah)
TCO savings with peak duration
16
LFP
LFP size constraint
LA
LA size constraint
LA
The more we shave, the more we gain!
LFP more space,energy efficient than LA, can shave more!
TCO savings with battery DoD
17

When shaving same energy:
Low DoD
High DoD
+
_
(a) LA
+
_
(b) LFP
Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%)
Key points for battery selection
18
When using batteries for peak power shaving:
 Shave as much power as possible (reasonably sized
battery)
 There is a DoD sweet spot, maximizing TCO savings
 LFP better technology because:




lots of recharges
more efficient discharge
higher energy density
cheaper in the future
What if: - Servers with unbalanced load?
- Day-to-day variation in demand?
Outline
19
Introduction
 Choosing the right battery for power shaving
 Datacenter workload and power modeling
 Policies and results
 Conclusions

Workload Modeling
20
Whole year traffic data from Google Transparency Report
 Apply weights according to web presence:
(Search 29.2%, Social Networking 55.8%, Map Reduce 15%)
 Present results for 3 worst consecutive days
(11/17/2010-11/19/2010)

Workload Modeling (cont.)
21


Model 1000 machine cluster, with 5 PDUs, 10 racks per PDU,
20 servers (2u) per rack.
We simulate load based on M/M/8 queues and scale inter-arrival
time according to workload traffic
Interarrival Time
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
8 Cores (consumers)/ Server
Job
Job
Job
Job
Job
Scheduler
(Round Robin or Load-aware)
……..
Job
Service
Time
Outline
22
Introduction
 Choosing the right battery for power shaving
 Datacenter workload and power modeling
 Policies and results
 Conclusions

Policy goals
23
Guarantee power budget at specific level of power
hierarchy
 Discharge during only high activity,
charge during only low activity
 Effective irrespective of job scheduling
 Make uniform battery usage

Uncoordinated Policy
24
Power over Threshold
Recharge Complete
Available
In Use
Power below Threshold
Recharge
Reached DoD Goal
Applied at the server level
 Easy to implement
 Runs independently per server
 DoD goal set to 60% of
battery capacity (LFP)

Not
Available
(Power + Bat. Recharge Power) below Threshold
Uncoordinated Policy Results
25
Round Robin Scheduling

Batteries discharge when
not required

Batteries recharge during peak

Fails to guarantee budget
Budget
violation
Uncoordinated Policy Results (cont.)
26
Load-aware Scheduling

Batteries discharge all together
(wasteful)

Recharge all together
(violates budget)

Fails to guarantee budget
Coordination is required!!
Budget
violation
Coordinated Control
27
Applied at higher levels
(PDU, Cluster)
 Requires remote battery
enable/disable, initiate recharge
 Number of batteries enabled
proportional to peak magnitude
 Batteries used spatially
distributed
Overall Power

300 server
100 server
equivalent
equivalent
200 server 200 server
equivalent
equivalent
0 server
equivalent
Day1
Day2
rack1
Day3
rack2
Coordinated Policies
28
Pdu-level
Cluster-level
Power cap close to Average power (ideal) of 250W
Peak power reduction of 19%  23% more servers
 6.2% TCO/server reduction
Discussion: Energy proportionality
29
Modern Servers
Sharper, thinner peaks
 We can shave more power,
with same stored energy

Overall Power
Energy Proporional Servers
Day1
Day2
Day3
Peak power reduction of up to 37.5% with the 40Ah LFP battery
Concluding remarks
30
Battery provisioning of distributed UPS topologies to cap power
and oversubscribe data center is beneficial
 Critical to reconsider battery properties
(technology, capacity, DoD)
 Coordination of charges and discharges is required
 We cap peak power by 19%, allow 23% more servers and
better amortize capex costs
 Achieve 6.2% reduction in TCO/server ($15M -- 28k server DC)

31
BACKUP SLIDES
TCO savings with battery cost
32


LA is stable technology
LFP advancements expected, due to electric vehicles
TCO savings increase over time with LFP!
When things go wrong?
33

Scenario 1: Unexpected daily traffic
We use the additional 35% capacity in our
batteries (DoD optimized for TCO savings at 60%)

Scenario 2: Batteries are not replaced immediately
With 50% of batteries dead we can still reduce
peak by 15%
Grouping battery maintenance/replacement for cost
savings possible
Exploration of Dead Batteries
34
Discussion: DVFS
35
To DVFS or not DVFS?
Datacenter SLAs violations
likely during peak load
 DVFS bad during high demand
 Great during low demand
 Creates higher margins for
aggressive battery capping

Overall Power

Potential SLA violation
WITH
No DVFS
SLA violation unlikely
Day1
Day2
Day3
Battery Capacity-Cost Estimation
36
E Datacenter,shaved =
Power

PeakReduction * PeakDuration
Peak
Duration
E shaved
Peak
Reduction
Time
E server,shaved
= E Datacenter,shaved* PSUEff
# servers

Cbattery
1
1
Eserver,shaved
PE-1
*I
*
*
=
DoD 0.8
V

Cbattery *CostperAh * # servers
UPSdepreciation =
Min(servicelife, DoD(cycles) / 30)

LFP
Lead Acid (~twice volume)
Battery Related Assumptions
37
Workload partitioning
38
Distributed Algorithm
39
Download