CS-447– Computer Architecture
M,W 10-11:20am
Lecture 6
Performance
September 17, 2007
Karem Sakallah
ksakalla@qatar.cmu.edu
www.qatar.cmu.edu/~msakr/15447-f07/
15-447 Computer Architecture
Fall 2007 ©
Today
°Lecture & Discussion
°Next Lecture: Review
Done by now
°Read the chapters & slides.
°Practice the performance examples in the
Patterson book.
15-447 Computer Architecture
Fall 2007 ©
Assessing & Understanding Performance
This chapter discusses how to
measure, report, and summarize
performance of a computer.
15-447 Computer Architecture
Fall 2007 ©
Motivation
It is often helpful to have some yardstick by
which to compare systems
• During development to evaluate different
algorithms or optimizations
• During purchasing to compare between
product offerings
•…
15-447 Computer Architecture
Fall 2007 ©
Performance
° Measure, Report, and Summarize
° Make intelligent choices
° See through the marketing hype
° Key to understanding underlying organizational
motivation
15-447 Computer Architecture
Fall 2007 ©
Performance
Why is some hardware better than others for
different programs?
What factors of system performance are
hardware related?
(e.g., Do we need a new machine, or a new
operating system?)
How does the machine's instruction set affect
performance?
15-447 Computer Architecture
Fall 2007 ©
Which of these airplanes has the best performance?
Airplane
Boeing 737-100
Boeing 747
BAC/Sud Concorde
Douglas DC-8-50
Passengers
Range (mi)
101
470
132
146
630
4150
4000
8720
Speed (mph)
598
610
1350
544
°How much faster is the Concorde compared to the 747?
°How much bigger is the 747 than the Douglas DC-8?
15-447 Computer Architecture
Fall 2007 ©
Computer Performance
° Response Time (latency)
— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
° Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?
15-447 Computer Architecture
Fall 2007 ©
Execution Time
° Elapsed Time
• counts everything
(disk and memory accesses, I/O , etc.)
• a useful number, but often not good for
comparison purposes
15-447 Computer Architecture
Fall 2007 ©
Execution Time
° CPU time
• doesn't count I/O or time spent running other
programs
• can be broken up into system time, and user time
• Our focus: user CPU time
• time spent executing the lines of code that are
"in" our program
15-447 Computer Architecture
Fall 2007 ©
Definition of Performance
° For some program running on machine X,
PerformanceX = 1 / Execution timeX
"X is n times faster than Y"
PerformanceX / PerformanceY = n
15-447 Computer Architecture
Fall 2007 ©
Definition of Performance
Problem:
• machine A runs a program in 20 seconds
• machine B runs the same program in 25
seconds
15-447 Computer Architecture
Fall 2007 ©
Comparing and Summarizing Performance
Program1(sec)
Program2(sec)
Total time (sec)
Computer A
1
1000
1001
Computer B
10
100
110
How to compare the performance?
Total Execution Time : A Consistent Summary Measure
Performance B Execution TimeA 1001
9.1
Performance A Execution TimeB 110
15-447 Computer Architecture
Fall 2007 ©
Clock Cycles
° Instead of reporting execution time in seconds,
we often use cycles
seconds
cycles
seconds
program program
cycle
° Clock “ticks” indicate when to start activities
(one abstraction):
time
15-447 Computer Architecture
Fall 2007 ©
Clock cycles
° cycle time = time between ticks = seconds per cycle
° clock rate (frequency) = cycles per second
(1 Hz = 1 cycle/sec)
A 4 Ghz clock has a
250ps cycle time
15-447 Computer Architecture
Fall 2007 ©
CPU Execution Time
CPU execution time for a program
(CPU clock cycles for a program) x (clock cycle time)
Seconds
Cycles
Seconds
Program Program Cycle
cycles
cycle
/
Program seconds
cycle / sec onds clock rate
15-447 Computer Architecture
Fall 2007 ©
How to Improve Performance
seconds
cycles
seconds
program
program
cycle
So, to improve performance (everything else being equal) you
can either increase or decrease?
________ the # of required cycles for a program, or
________ the clock cycle time or, said another way,
________ the clock rate.
15-447 Computer Architecture
Fall 2007 ©
How to Improve Performance
seconds
cycles
seconds
program
program
cycle
So, to improve performance (everything else being equal)
you can either increase or decrease?
_decrease_ the # of required cycles for a program, or
_decrease_ the clock cycle time or, said another way,
_increase_ the clock rate.
15-447 Computer Architecture
Fall 2007 ©
How many cycles are required for a program?
...
6th
5th
4th
3rd instruction
2nd instruction
1st instruction
Could assume that # of cycles equals # of instruction
time
This assumption is incorrect, different instructions
take different amounts of time on different machines.
15-447 Computer Architecture
Fall 2007 ©
Different numbers of cycles for different instructions
time
° Multiplication takes more time than addition
° Floating point operations take longer than integer ones
° Accessing memory takes more time than accessing
registers
° Important point: changing the cycle time often
changes the number of cycles required for various
instructions
15-447 Computer Architecture
Fall 2007 ©
Now that we understand cycles
Components of Performance Units of Measure
CPU execution time for a
Seconds for the program
program
Instruction count
Instructions executed for the
program
Clock Cycles per Instruction
Average number of clock
(CPI)
cycles per instruction
Clock cycle time
Seconds per clock cycle
15-447 Computer Architecture
Fall 2007 ©
CPI
CPU clock cycles = Instructions for a program
x Average clock cycles per Instruction (CPI)
CPU time = Instruction count x CPI x clock cycle time
Instruction count CPI
Clock rate
15-447 Computer Architecture
Fall 2007 ©
Performance
° Performance is determined by execution time
° Do any of the other variables equal performance?
• # of cycles to execute program?
• # of instructions in program?
• # of cycles per second?
• average # of cycles per instruction?
• average # of instructions per second?
° Common pitfall: thinking one of the variables is indicative of
performance when it really isn’t.
15-447 Computer Architecture
Fall 2007 ©
CPU Clock Cycles
n
CPU clock cycles (CPIi Ci )
i 1
CPIi : the average number of cycles per instructions for that
instruction class
Ci : the count of the number of instructions of class i executed.
n : the number of instruction classes.
15-447 Computer Architecture
Fall 2007 ©
Example
°Instruction Classes:
• Add
• Multiply
°Average Clock Cycles per Instruction:
• Add 1cc
• Mul 3cc
°Program A executed:
• 10 Add instructions
• 5 Multiply instructions
15-447 Computer Architecture
Fall 2007 ©
Benchmarks
° Performance best determined by running a real application
• Use programs typical of expected workload
• Or, typical of expected class of applications
ex: compilers/editors, scientific applications, graphics
° Small benchmarks
• nice for architects and designers
• easy to standardize
• can be abused
15-447 Computer Architecture
Fall 2007 ©
Benchmarks (2)
° SPEC (Standard Performance Evaluation Corporation)
• companies have agreed on a set of real programs
and inputs
• valuable indicator of performance (and compiler
technology)
• can still be abused
15-447 Computer Architecture
Fall 2007 ©
Standard Performance Evaluation Corporation
• SPEC is supported by a number of computer vendors to create
standard sets of benchmarks for modern computer systems.
• The SPEC benchmark sets include CPU performance, graphics,
High-performance computing, Object-oriented computing, Java
applications, Client-server models, Mail systems, File systems, and
Web servers.
15-447 Computer Architecture
Fall 2007 ©
SPEC ‘89
° Compiler “enhancements” and
performance
800
700
SPEC performance ratio
600
500
400
300
200
100
0
gcc
espresso
spice
doduc
nasa7
li
eqntott
matrix300
fpppp
tomcatv
Benchmark
Compiler
Enhanced compiler
15-447 Computer Architecture
Fall 2007 ©
SPEC CPU Benchmarks
the execution time of Sun Ultra5 [300MHz]
SPEC ratio
the execution time on the measured computer
CINT2000 : the SPEC ratio for the integer benchmark sets
CFP2000 : the SPEC ratio for the floating-point benchmark
sets.
15-447 Computer Architecture
Fall 2007 ©
SPEC 2000
Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better
performance?
1400
1200
Pentium 4 CFP2000
1000
Pentium 4 CINT2000
800
600
Pentium III CINT2000
400
Pentium III CFP2000
200
0
500
1000
1500
2000
2500
3000
3500
Clock rate in MHz
15-447 Computer Architecture
Fall 2007 ©
SPEC 2000
Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better
performance?
1.6
Pentium M @ 1.6/0.6 GHz
Pentium 4-M @ 2.4/1.2 GHz
1.4
Pentium III-M @ 1.2/0.8 GHz
1.2
1.0
0.8
0.6
0.4
0.2
0.0
SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000
Always on/maximum clock
Laptop mode/adaptive
clock
Minimum power/minimum
clock
Benchmark and power mode
15-447 Computer Architecture
Fall 2007 ©
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
15-447 Computer Architecture
Fall 2007 ©
Example
°Application execution time = 20sec
• 12 seconds are spent performing add
operations
°If we improve the add operation to run
twice as fast, how much faster will the
application run?
15-447 Computer Architecture
Fall 2007 ©
Amdahl’s Law
° Example:
"Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds
of this time. How much do we have to improve the
speed of multiplication if we want the program to
run 4 times faster?"
15-447 Computer Architecture
Fall 2007 ©
Amdahl's Law
Execution time after improvement
80 seconds
(100 80 seconds)
n
100
Execution time after improvemen t
25
4
80
20, n 16
n
15-447 Computer Architecture
Fall 2007 ©
MIPS (million instructions per second)
Example
Instructio n count
MIPS
Execution time 106
Instruction Counts (in billions)
for each instruction set
Code from
Compiler 1
A (1 CPI)
B (2 CPI)
C (3 CPI)
5
1
1
Compiler 2
10
Clock rate = 4GHz
1
1
A,B,C : Instruction Classes
• Which code sequence will execute faster according to MIPS?
• According to execution time?
15-447 Computer Architecture
Fall 2007 ©
Execution time & MIPS
CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109
CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109
10 10 9
2.5seconds
Execution time1
9
4 10
15 109
3 . 75 seconds
Execution time2
9
4 10
15-447 Computer Architecture
Fall 2007 ©
Execution time & MIPS (2)
(5 1 1) 109
MIPS 1
2800
6
2.5 seconsd 10
(10 1 1) 10
MIPS 2
3200
6
3.75 10
9
15-447 Computer Architecture
Fall 2007 ©
Performance Evaluation
° Performance depends on
• Hardware architecture
• Software environment
° Meaning of performance depends on
viewpoint
• User: time
• System Manager: throughput
15-447 Computer Architecture
Fall 2007 ©
Performance Evaluation
°Kinds of Performance
• Graphics
• Network
• Transactional
• Multi-user system
• I/O
• Scientific/Engineering codes
15-447 Computer Architecture
Fall 2007 ©
Example on the MIPS R10K
Prof run at: Tue Apr 28 15:50:26 1998
Command line: prof suboptim.ideal.m28293
109148754:
0.55974s:
77660914:
1.405:
195:
R10000:
cycles(%)
cum %
Total number of cycles
Total execution time
Total number of instructions executed
Ratio of cycles / instruction
Clock rate in MHz
Target processor modelled
secs
61901843(56.71) 56.71
47212563(43.26) 99.97
31767( 0.03) 100.00
1069( 0.00) 100.00
:
:
instrns
0.32
0.24
0.00
0.00
:
calls procedure
45113360
32523280
21523
887
:
15-447 Computer Architecture
1
1
1
3
:
pdot
init
vsum
fflush
:
Fall 2007 ©