Hybrid Memory in Multi-Core Architectures HanBin Yoon Justin Meza Rachael Harding

advertisement
Hybrid Memory in Multi-Core Architectures
1
HanBin Yoon
Justin Meza
Rachael Harding
hanbinyoon@cmu.edu
meza@cmu.edu
rharding@andrew.cmu.edu
Motivation
PCM memory without hardware alterations.
Today’s main memory relies entirely on DRAM. Strides
in process technology have continued to enable DRAM to
scale to smaller design rules (hence larger capacities) and
be more efficient. However, experts predict DRAM scalability to be approaching its limit due to its charge-based
nature of representing information [6]. Modern systems
continue to demand larger amounts of memory, partly owing to their propensity towards chip multiprocessing. Supplying this need with DRAM exclusively will quickly become expensive in terms of both power and cost.
Phase-change memory (PCM) offers a competitive alternative to DRAM. Recent research on PCM has enabled its performance and power characteristics to become comparable to that of DRAM [2, 3]. Furthermore,
PCM is projected to scale to smaller design rules than
DRAM, due to PCM’s resistive nature of representing information [6]. However, enabling PCM has its challenges,
such as lower performance and higher power consumption compared to DRAM [1, 7]. Furthermore, PCM requires wear-leveling [4]. PCM technology can be used
in conjunction with DRAM to enable high-performance,
energy-efficient main memory that multi-core architectures demand in large capacities.
We propose a method of combining the strengths of
DRAM and PCM that improves performance, quality of
service, and energy-efficiency over existing proposals.
Specifically, we plan to dynamically partition the amount
of DRAM and PCM assigned to processes in a multi-core
environment based on their workload characteristics.
2
2. Its hardware use is wasteful: it requires space to store
the full PCM page mapping table in SRAM, even
though the table may be sparse (for 4GB of PCM
with 8KB page size, 4MB of SRAM is required).
3. It is unrealistic: the study’s restrictive evaluation
only covers a single-core system, while current
trends in processor development tend towards multicore systems that exhibit memory contention between multiple applications.
Furthermore, the study does not propose a clear algorithm for DRAM-PCM page allocation and replacement
policy.
Other recent research has focused on improving the
quality of service (QoS) in entirely PCM-based main
memory systems based on predetermined application priorities [7]. While maintaining QoS in a multi-core environment is important to ensure fairness and prevent starvation, we believe that future systems will likely incorporate a mixture of DRAM and PCM due to performance and
reliability concerns. Such systems will require innovative
techniques to enable QoS.
In [2, 3], Lee et al. propose a wear-leveling technique
and introduce multiple row buffers in order to improve
PCM’s performance and reliability. Their changes to
PCM architecture also improve its power characteristics
to make it a more desirable alternative to DRAM. This
work is orthogonal to our proposed research and could be
used as an underlying hardware implementation of PCM.
Qureshi et al. developed a wear-leveling algorithm for
PCM that exhibits low memory overhead [4]. However, if
an attack to wear-out PCM were to write repeatedly to the
same line, “Region Based Start-Gap (RBSG)” will only
distribute the wear over a single start-gap region. Such an
approach does not utilize the full endurance capacity of
the PCM. Our own approach may extend the existing operating system page table to perform wear-leveling more
evenly over the entire PCM.
Related Work
Prior work has taken a multi-core-agnostic approach
to integrating DRAM and PCM for improved energyefficiency [1]. Though the study addresses some of the
issues of incorporating DRAM and PCM, the approach
suffers from several shortcomings:
1. It is not scalable: it introduces a PCM page map
stored in SRAM in the memory controller which is
tightly coupled to the amount of PCM memory being used. It does not scale with varying capacities of
1
3
Approach
Milestone 2 Simulate SPEC CPU2006 workload traces
under a modified system architecture and analyze the
This research aims to develop and evaluate algorithms and
intermediate results.
hardware support for hybrid DRAM-PCM systems that
provide an effective main memory solution for multi-core Milestone 3 Refine our mechanisms based on simulation
architectures. Our anticipated contributions (and impleresults. Perform a sensitivity study, varying the ratio
mentation ideas) include:
of system DRAM-to-PCM and observing the resulting system characteristics. Begin drafting the paper.
1. Evaluating applications to determine optimal page
placement in a hybrid memory system to effectively
References
provide QoS based on application characteristics.
We modify the utility-based partitioning approach [1] G. Dhiman, R. Ayoub, and T. Rosing. PDRAM: a
hybrid PRAM and DRAM main memory system. In
in [5] to determine per-thread memory access patDAC ’09: Proceedings of the 46th Annual Design Auterns. We aim to speed up all the threads in a multitomation Conference, pages 664–669, New York, NY,
core system by striking a balance amongst the memUSA, 2009. ACM.
ory requirements of threads that have large and small
working sets.
[2] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting phase change memory as a scalable DRAM
2. Developing a dynamic memory mapping algorithm
alternative. In ISCA ’09: Proceedings of the 36th anand page allocation and replacement policies for a
nual international symposium on Computer architechybrid memory system. In addition to using applicature, pages 2–13, New York, NY, USA, 2009. ACM.
tion characteristics to determine data placement, we
will also identify frequently-accessed, frequently[3] B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao,
written “hot” pages and improve performance and
E. Ipek, O. Mutlu, and D. Burger. Phase-change techenergy-efficiency by locating them in DRAM.
nology and the future of main memory. IEEE Micro,
30(1):143–143, 2010.
3. Developing a novel wear-leveling method for a hybrid memory system. This may be accomplished by
[4] M. K. Qureshi, J. Karidis, M. Franceschini, V. Sriniextending the operating system’s virtual memory invasan, L. Lastras, and B. Abali. Enhancing lifefrastructure to ensure page writes are performed unitime and security of PCM-based main memory with
formly across PCM.
start-gap wear leveling. In MICRO 42: Proceedings
of the 42nd Annual IEEE/ACM International Sympo4. Proposing software and hardware changes to implesium on Microarchitecture, pages 14–23, New York,
ment the above design goals. For example, adapting
NY, USA, 2009. ACM.
operating system page tables and the TLB to address
pages stored in PCM.
[5] M. K. Qureshi and Y. N. Patt. Utility-based cache
partitioning: A low-overhead, high-performance, run5. Evaluating the performance and energy-efficiency of
time mechanism to partition shared caches. In
our architecture. This includes a sensitivity study of
MICRO 39: Proceedings of the 39th Annual
DRAM-to-PCM ratios to find an optimal provisionIEEE/ACM International Symposium on Microaring of DRAM and PCM.
chitecture, pages 423–432, Washington, DC, USA,
2006. IEEE Computer Society.
4 Methodology and Roadmap
We will test our memory-mapping algorithm and hard- [6] Semiconductor Industry Association. International
ware improvements on the BLeSS simulator modified to
Technology Roadmap for Semiconductors: Process
include PCM. We will evaluate our architecture’s perIntegration, Devices, and Structures. 2007.
formance, QoS, and energy-efficiency using benchmarks
from SPEC CPU2006. We will derive a power model [7] P. Zhou, Y. Du, Y. Zhang, and J. Yang. Fine-grained
QoS scheduling for PCM-based main memory sysbased on the data listed in [1].
tems. pages 1 –12, apr. 2010.
Milestone 1 Consider possible page migration policies,
wear-leveling algorithms, and quality of service algorithms. Modify the BLeSS simulator, adding support for PCM and page tables.
2
Download