ISR - patel group of institutions

advertisement
PATEL GROUP OF INSTITUTIONS
Embedded System Material Unit Wise
Subject:-Software Development for Embedded system (SD-ES)
Subject Code: - 650012
Unit 1:- Introduction
Q.1 what is Embedded Systems? List out its Application where embedded System used
ANS:An embedded system is a single-purpose computer built into a larger system for the purposes of
controlling and monitoring the system also used for perform Specific task
–
Computing systems embedded within electronic devices
–
Hard to define. Nearly any computing system other than a desktop computer
–
Billions of units produced yearly, versus millions of desktop units
–
Perhaps 50 per household and per automobile
Embedded systems are integrated systems. Each system is designed for a specific functionality. They
contain integrated hardware pieces with software loaded in their memories. Simple examples are –
cell phones, Smart cards, DVD players, Digital cameras, Robotics in Assembly line, Control systems used
in automobiles, Guided Missiles, Satellites… the list is unending.
A “short list” of embedded systems where Embedded System Used
Q.2 Explain the Characteristics of Embedded System
ANS:Three main characteristics of Embedded Systems that distinguish such systems from other computing
systems:
1) Single Functioned (2) Tightly Constrained and (3) Reactive and Real Time.
(1) Single Functioned: Most of the ESs execute a special function repeatedly. [Exceptions are present
where some systems update their programs. Some systems swap several programs in and out due to
size limitations.]
(2) Tightly Constrained: Most of the embedded systems have constraints on design metrics such as cost,
size, performance and power. An embedded system must cost less, must be sized to fit on a single chip,
must perform fast enough to process data in real time, must consume minimum power etc..
(3) Reactive and Real Time: Many embedded systems must continuously react to changes in system’s
environment. They must compute certain results in real time without delay. [ Example: Car’s cruise
controller, which is an embedded system, continuously monitors and reacts to speed and brake sensors.
It must compute the values of acceleration and deceleration repeatedly within a limited time; delayed
computation could result in a failure to maintain control of the car.]
An embedded system example -- a digital camera
•
Single-functioned -- always a digital camera
•
Tightly-constrained -- Low cost, low power, small, fast
•
Reactive and real-time -- only to a small extent
Q.3 Short note on Design Challenges of Optimizing Design Metrics
ANS:•
Obvious design goal:
–
•
Key design challenge:
–
•
Construct an implementation with desired functionality
Simultaneously optimize numerous design metrics
Design metric
–
A measurable feature of a system’s implementation
–
Optimizing design metrics is a key challenge
–
Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE
cost
–
NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing
the system
–
Size: the physical space required by the system
–
Performance: the execution time or throughput of the system
–
Power: the amount of power consumed by the system
–
Flexibility: the ability to change the functionality of the system without incurring heavy
NRE cost
–
Time-to-prototype: the time needed to build a working version of the system
–
Time-to-market: the time required to develop a system to the point that it can be
released and sold to customers
–
Maintainability: the ability to modify the system after its initial release
–
Correctness, safety, many more
Time-to-market: a demanding design metric
Losses due to delayed market entry
NRE and unit cost metrics
$200,000
B
$160,000
$120,000
$80,000
A
B
$160
C
per product cost
total cost (x1000)
$200
A
C
$120
$40,000
$80
$40
$0
$0
0
800
1600
2400
0
Number of units (volume)
800
1600
2400
Number of units (volume)
The performance design metric
•
•
•
•
Widely-used measure of system, widely-abused
–
Clock frequency, instructions per second – not good measures
–
Digital camera example – a user cares about how fast it processes images, not clock
speed or instructions per second
Latency (response time)
–
Time between task start and end
–
e.g., Camera’s A and B process images in 0.25 seconds
Throughput
–
Tasks per second, e.g. Camera A processes 4 images per second
–
Throughput can be more than latency seems to imply due to concurrency, e.g. Camera B
may process 8 images per second (by capturing a new image while previous image is
being stored).
Speedup of B over S = B’s performance / A’s performance
–
Throughput speedup = 8/4 = 2
Q.4 discuss and explain Processor Technology in Embedded Systems
OR
Explain Types of Processor Technology
ANS:The architecture of the computation engine used to implement a system’s desired functionality
A part of the whole system the complete system
Processor does not have to be programmable A processor is not necessarily a general-purpose
Programmable processor something that processes data
Input
Processing
Output
General-purpose processors:•
Programmable device used in a variety of applications
–
•
•
•
Also known as “microprocessor”
Features
–
Program memory
–
General datapath with large register file and general ALU
User benefits
–
Low time-to-market and NRE costs
–
High flexibility
“Pentium” the most well-known, but there are hundreds of others
Single-purpose processors:Programmable processor Optimized for applications having
Common characteristics Compromise between general purpose
And single-purpose
Features
Program memory
Special functional units
Benefits
Flexibility
Performance
Size, Power
Application-specific processors:•
Programmable processor optimized for a particular class of applications having common
characteristics
–
•
•
Compromise between general-purpose and single-purpose processors
Features
–
Program memory
–
Optimized datapath
–
Special functional units
Benefits
–
Some flexibility, good performance, size and power
Q.5 Explain the IC Technology with its types
ANS:The manner in which a digital implementation is mapped onto a technological solution
Integrated Circuit
Technologies differ in their customization to a design Consist of numerous layers
Integrated circuit technologies differ with respect to
Who builds each layer
When layers are built
•
The manner in which a digital (gate-level) implementation is mapped onto an IC
–
IC: Integrated circuit, or “chip”
–
IC technologies differ in their customization to a design
–
IC’s consist of numerous layers (perhaps 10 or more)
•
•
IC technologies differ with respect to who builds each layer and when
Three types of IC technologies
•
Full-custom/VLSI
•
Semi-custom ASIC (gate array and standard cell)
•
PLD (Programmable Logic Device)
Full-custom/VLSI:•
•
All layers are optimized for an embedded system’s particular digital implementation
–
Placing transistors
–
Sizing transistors
–
Routing wires
Benefits
–
•
Excellent performance, small size, low power
Drawbacks
–
High NRE cost (e.g., $300k), long time-to-market
Semi-custom ASIC (Gate Array and Standard Cell) :•
Lower layers are fully or partially built and already built and also particularly implementation
–
•
Benefits
–
•
Designers are left with routing of wires and maybe placing some blocks
Good performance, good size, less NRE cost than a full-custom implementation (perhaps
$10k to $100k)
Drawbacks
–
Still require weeks to months to develop
PLD (Programmable Logic Device) :•
•
All layers already exist
–
Designers can purchase an IC
–
Connections on the IC are either created or destroyed to implement desired
functionality
–
Field-Programmable Gate Array (FPGA) very popular
Benefits
–
•
Low NRE costs, almost instant IC availability
Drawbacks
–
Bigger, expensive (perhaps $30 per unit), power hungry, slower
Q.6 short note on Design Technology
ANS:The manner in which we convert our concept of desired system functionality into an implementation
Compilation/Synthesis
Automates exploration and insertion of implementation details for lower level
Libraries/IP
Incorporates pre-designed implementation from lower abstraction level into higher level
Test/Verification
Ensures correct functionality at each level, thus reducing costly iterations between levels
The co-design ladder :-
Q.7 Discuss Various Trade-offs for Embedded Systems
ANS:-
Basic tradeoff with independence IC
–
General vs. custom
–
With respect to processor technology or IC technology
–
The two technologies are independent
Design productivity gap:•
While designer productivity has grown at an impressive rate over the past decades, the rate of
improvement has not kept pace with chip capacity
•
1981 leading edge chip required 100 designer months
– 10,000 transistors / 100 transistors/month
2002 leading edge chip requires 30,000 designer months
– 150,000,000 / 5000 transistors/month
Designer cost increase from $1M to $300M
•
•
Unit 2:- Custom Single Purpose Processors: Hardware
Q.1 what is Custom Single-Purpose Processors
ANS:Its Digital circuit that performs a computation tasks like Controller and datapath, Single-purpose: one
particular computation task, Custom single-purpose: non-standard task
•
A custom single-purpose processor may be
– Fast, small, low power
– But, high NRE, longer time-to-market, less flexible
Q.2 Explain Combinational Logic in detail
ANS:In Combinational have two parts: (1) transistors (2) Gates
(1) transistors
A transistor is the basic electrical component of digital systems. Combinations of
Transistors form more abstract components called logic gates, which designers primarily
Use when building digital systems.
A transistor acts as a simple on/off switch. One type of transistor (CMOS - Complementary Metal Oxide
Semiconductor)
Now Transistors have following Types:
Circuit of Transistors
–
Voltage at “gate” controls whether current flows from source to drain
–
“gate” controls whether current flows from source to drain
(2) Gates
When a high voltage (typically +5 Volts, which we'll refer to as logic 1) is applied to the gate, the
transistor conducts, so current flows. When low voltage (which we'll refer to as logic 0, typically ground,
which is drawn as several horizontal lines of
decreasing width) is applied to the gate, the transistor does not conduct. We can also build a transistor
with the opposite functionality
When logic 0 is applied to the gate, the transistor conducts, and when logic 1 is applied, the
transistor does not conduct. Given these two basic transistors, we can easily build a circuit whose output
inverts its gate input,
Now following gats are implemented in Transistors
Basics Logic gates:-
Q.3 Short note on Basic Combinational Logic Design
ANS:A combinational circuit is a digital circuit whose output is purely a function of its current inputs; such a
circuit has no memory of past inputs. We can apply a simple technique to design a combinational circuit
using our basic logic gates
Q.4 give the details of RT-Level Combinational Components
ANS:-
A multiplexor, sometimes called a selector, allows only one of its data inputs Im to pass through to the
output O. Thus, a multiplexor acts much like a railroad switch, allowing only one of multiple input tracks
to connect to a single output track. If there are m data inputs, then there are log2(m) select lines S, and
we call this an m-by-1 multiplexor (m data inputs, one data output). The binary value of S determines
which data input passes through; 00...00 means I0 may pass, 00...01 means I1 may pass, 00...10 means
I2 may pass, and so on. For example, an 8x1 multiplexor has 8 data inputs and thus 3 select
lines. If those three select lines have values of 110, then I6 will pass through to the output. So if I6 is 1,
then the output would be 1; if I6 is 0, then the output would be 0. We commonly use a more complex
device called an n-bit multiplexor, in which each data
input, as well as the output, consists of n lines. Suppose the previous example used a 4-bit
8x1 multiplexor. Thus, if I6 is 0110, then the output would be 0110. Note that n does not
affect the number of select lines.
A decoder converts its binary input I into a one-hot output O. "One-hot" means that exactly one of the
output lines can be 1 at a given time. Thus, if there are n outputs, then there must be log2(n) inputs. We
call this a log2(n)xn decoder. For example, a 3x8 decoder has 3 inputs and 8 outputs. If the input is 000,
then the output O0 will be 1. If the input is 001, then the output O1 would be 1, and so on. A common
feature on a decoder is an extra input called enable. When enable is 0, all outputs are 0. When enable is
1, the decoder functions as before
An adder adds two n-bit binary inputs A and B, generating an n-bit output sum along with an output
carry. For example, a 4-bit adder would have a 4-bit A input, a 4-bit B input, a 4-bit sum output, and a 1bit carry output. If A is 1010 and B is 1001, then sum would be 0011 and carry would be 1.
A comparator compares two n-bit binary inputs A and B, generating outputs that indicate whether A is
less than, equal to, or greater than B. If A is 1010 and B is 1001, then less would be 0, equal would be 0,
and greater would be 1.
An ALU (arithmetic-logic unit) can perform a variety of arithmetic and logic functions on its n-bit inputs A
and B. The select lines S choose the current function; if there are m possible functions, then there must
be at least log2(m) select lines. Common functions include addition, subtraction, AND, and OR.
Q.5 what is Sequential Logic
ANS: - A sequential circuit is a digital circuit whose outputs are a function of the current as well as
previous input values. In other words, sequential logic possesses memory. One of the most basic
sequential circuits is the flip-flop. A flip-flop stores a single bit. The simplest type of flip-flop is the D flipflop. It has two inputs: D and clock. When clock is 1, the value of D is stored in the flip-flop, and that
value appears at an output Q. When clock is 0, the value of D is ignored; the output Q maintains its
value. Another type of flip-flop is the SR flip-flop, which has three inputs: S, R and clock. When clock is 0,
the previously stored bit is maintained and appears at output Q. When clock is 1, the inputs S and R are
examined. If S is 1, a 1 is stored. If R is 1, a 0 is stored. If both are 0, there’s no change. If both are 1,
behavior is undefined. Thus, S stands for set and R for reset. Another flip-flop type is a JK flip-flop, which
is the same as an SR flip-flop except that when both J and K are 1, the stored bit toggles from 1 to 0 or 0
to 1. To prevent unexpected behavior from signal glitches, flip-flops are typically designed to be
edgetriggered, meaning they only pay attention to their non-clock inputs when the clock is rising from 0
to 1, or alternatively when the clock is falling from 1 to 0.
Q.6 Explain RT-Level Sequential Components
ANS:- RT-Level Sequential have following Components
A register stores n bits from its n-bit data input I, with those stored bits appearing at its output O. A
register usually has at least two control inputs, clock and load. For a rising-edge-triggered register, the
inputs I are only stored when load is 1 and clock is rising from 0 to 1. The clock input is usually drawn as
a small triangle, as shown in the figure. Another common register control input is clear, which resets all
bits to 0, regardless of the value of I. Because all n bits of the register can be stored in parallel, we often
refer to this type of register as a parallel-load register, to distinguish it from a shift register, which we
now describe.
A shift register stores n bits, but these bits cannot be stored in parallel. Instead, they must be shifted
into the register serially, meaning one bit per clock edge. A shift register has a one-bit data input I, and
at least two control inputs clock and shift. When clock is rising and shift is 1, the value of I is stored in the
(n)’th bit, while the (n)’th bit is stored in the (n-1)’th bit, and likewise, until the second bit is stored in the
first bit. The first bit is typically shifted out, meaning it appears over an output Q.
A counter is a register that can also increment (add binary 1) to its stored binary value. In its simplest
form, a counter has a clear input, which resets all stored bits to 0, and a count input, which enables
incrementing on the clock edge. A counter often also has a parallel load data input and associated
control signal. A common counter feature is both up and down counting (incrementing and
decrementing), requiring an additional control input to indicate the count direction.
Q.7 Explain RT-Level Sequential Components
ANS:Sequential logic design can be achieved using a straightforward technique, whose steps are illustrated in
Figure 4.1. We again start with a problem description. We translate this description to a state diagram.
We describe state diagrams further in a later chapter. Briefly, each state represents the current "mode"
of the circuit, serving as the circuit’s memory of past input values. The desired output values are listed
next to each state. The input conditions that cause a transistion from one state to another are shown
next to each Each arc condition is implicitly AND’ed with a rising (or falling) clock edge. In other words,
all inputs are synchronous. State diagrams can also describe asynchronous systems, but we do not cover
such systems in this book, since they are not common.
We will implement this state diagram using a register to store the current state, and combinational logic
to generate the output values and the next state. We assign each state with a unique binary value, and
we then create a truth table for the combinational logic. The inputs for the combinational logic are the
state bits coming from the state register,
and the external inputs, so we list all combinations of these inputs on the left side of the table. The
outputs for the combinational logic are the state bits to be loaded into the register on the next clock
edge (the next state), and the external output values, so we list desired values of these outputs for each
input combination on the right side of the table. Because we used a state diagram for which outputs
were a function of the current state
only, and not of the inputs, we list an external output value only for each possible state, ignoring the
external input values. Now that we have a truth table, we proceed with combinational logic design as
described earlier, by generating minimized output equations, and then drawing the combinational logic
circuit.
Q.8 Explain Custom Single Purpose Processor Design
ANS:We can apply the above combinational and sequential logic design techniques tobuild datapath
components and controllers. Therefore, we have nearly all the knowledgewe need to build a custom
single-purpose processor for a given program, since a processor consists of a controller and a datapath.
We now describe a technique for
building such a processor.
We begin with a sequential program we must implement. Figure 4.3 provides a example based on
computing a greatest common divisor (GCD).
To begin building our single-purpose processor implementing the GCD program, we first convert our
program into a complex state diagram, in which states and arcs may include arithmetic expressions,
and these expressions may use external inputs and outputs or variables. In contrast, our earlier state
diagrams only included Boolean expressions, and these expressions could only use external inputs and
outputs, not
Example: greatest common divisor
•
•
First create algorithm
Convert algorithm to “complex” state machine
– Known as FSMD: finite-state machine with datapath
– Can use templates to perform such conversion
Algorithm and FSMD
Templates
We can use templates to convert a program to a state diagram, as illustrated in Figure First, we classify
each statement as an assignment statement, loop statement, or branch (if-then-else or case) statement.
For an assignment statement, we create a state with that statement as its action. We add an arc from
this state to the state for the next
statement, whatever type it may be. For a loop statement, we create a condition state C and a join state
J, both with no actions. We add an arc with the loop’s condition from the condition state to the first
statement in the loop body. We add a second arc with the complement of the loop’s condition from the
condition state to the next statement after
the loop body. We also add an arc from the join state back to the condition state. For a branch
statement, we create a condition state C and a join state J, both with no actions. We add an arc with the
first branch’s condition from the condition state to the branch’s first statement. We add another arc
with the complement of the first branch’s condition AND’ed with the second branches condition from
the condition state to the branches first statement. We repeat this for each branch. Finally, we connect
the arc leaving the last statement of each branch to the join state, and we add an arc from this state to
the next statement’s state.
Using this template approach, we convert our GCD program to the complex state diagram of Figure we
are now well on our way to designing a custom single-purpose processor that executes the GCD
program
State diagram templates:-
Q.9 Explain RT-level Custom Single Purpose Processor Design
ANS:•
•
We often start with a state machine
– Rather than algorithm
– Cycle timing often too central to functionality
Example
– Bus bridge that converts 4-bit bus to 8-bit bus
– Start with FSMD
– Known as register-transfer (RT) level
– Exercise: complete the design
Q.10 explain Optimizing single-purpose processors
ANS:-
•
•
Optimization is the task of making design metric values the best possible
Optimization opportunities
– original program
– FSMD
– Datapath
– FSM
Optimizing the original program:-
Optimizing the FSMD:•
Areas of possible improvements
– merge states
• states with constants on transitions can be eliminated, transition taken is
already known
• states with independent operations can be merged
– separate states
• states which require complex operations (a*b*c*d) can be broken into smaller
states to reduce hardware size
– scheduling
Optimizing the datapath:• Sharing of functional units
– one-to-one mapping, as done previously, is not necessary
– if same operation occurs in different states, they can share a single functional unit
• Multi-functional units
– ALUs support a variety of operations, it can be shared among operations occurring in
different states
–
Optimizing the FSM:•
State encoding
– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
• State minimization
– task of merging equivalent states into a single state
state equivalent if for all possible input combinations the two states generate the same outputs and
transitions to the next same state
Unit 3:- General Purpose Processors: Software
Q.1 what is General Purpose Processor? Explain the Architecture of General Purpose Processor
ANS:A general-purpose processor is a programmable digital system intended to solve computation tasks in a
large variety of applications. Copies of the same processor may solve computation problems in
applications as diverse as communication, automotive, and industrial embedded systems. An embedded
system designer choosing to use a general-purpose processor to implement part of a system’s
functionality may achieve several benefits.
–
–
–
–
Low unit cost, in part because manufacturer spreads NRE over large numbers of units
• Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
Carefully designed since higher NRE is acceptable
• Can yield good performance, size and power
Low NRE cost, short time-to-market/prototype, high flexibility
• User just writes software; no processor design
a.k.a. “microprocessor” – “micro” used when they were implemented on one or a few
chips rather than entire rooms
Architecture of General Purpose Processor:-
Datapath:The datapath consists of the circuitry for transforming data and for storing temporary data. The
datapath contains an arithmetic-logic unit (ALU) capable of transforming data through operations such
as addition, subtraction, logical AND, logical OR, inverting, and shifting. The ALU also generates status
signals, often stored in a status register (not shown), indicating particular data conditions. Such
conditions include indicating whether data is zero, or whether an addition of two data items generates a
carry. The datapath also contains registers capable of storing temporary data. Temporary data may
include data brought in from memory but not yet sent through the ALU, data coming from the ALU that
will be needed for later ALU operations or will be sent back to memory, and data that must be moved
from one memory location to another. The internal data bus is the bus over which data travels within
the datapath, while the external data bus is the bus over which data is brought to and from the data
memory.
Controller:The controller consists of circuitry for retrieving program instructions, and for moving data to, from, and
through the datapath according to those instructions. The controller contains a program counter (PC)
that holds the address in memory of the next program instruction to fetch. The controller also contains
an instruction register (IR) to hold the fetched instruction. Based on this instruction, the controller’s
control logic generates the appropriate signals to control the flow of data in the datapath. Such flows
may include inputting two particular registers into the ALU, storing ALU results into a particular register,
or moving data between memory and a register. Finally, the next-state logic determines the next value
of the PC. For a non-branch instruction, this logic increments the PC. For a branch instruction, this logic
looks at the datapath status signals and the IR to determine the appropriate next address.
The PC’s bit-width represents the processor’s address size. The address size is independent of the data
word size; the address size is often larger. The address size determines the number of directly accessible
memory locations, referred to as the address space or memory space. If the address size is M, then the
address space is 2M. Thus, a processor with a 16-bit PC can directly address 216 = 65,536 memory
locations. We would typically refer to this address space as 64K, although if 1K = 1000, this number
would represent 64,000, not the actual 65,536. Thus, in computer-speak, 1K = 1024.
Memory:While registers serve a processor’s short term storage requirements, memory serves the processor’s medium and
long-term information-storage requirements. We can classify stored information as either program or data.
Program information consists of the sequence of instructions that cause the processor to carry out the desired
system functionality. Data information represents the values being input, output and transformed by the program.
We can store program and data together or separately. In a Princeton architecture, data and program words share
the same memory space. In a Harvard architecture, the program memory space is distinct from the data memory
space. Figure 2.2 illustrates these two methods. The Princeton architecture may result in a simpler hardware
connection to memory, since only one connection is necessary. A Harvard architecture, while requiring two
connections, can perform instruction and data fetches simultaneously, so may result in improved performance.
Most machines have a Princeton architecture. The Intel 8051 is a well-known Harvard architecture. Memory may
be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than
RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded
system’s program does not change. Constant-data may be stored in ROM, but other data of course requires RAM.
Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip
memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip
memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip
memory.
To reduce the time needed to access (read or write) memory, a local copy of a portion of memory may
be kept in a small but especially fast memory called cache, as illustrated in Figure Cache memory often
resides on-chip, and often uses fast but expensive static RAM technology rather than slower but cheaper
dynamic RAM (see Chapter 5). Cache memory is based on the principle that if at a particular time a
processor accesses a particular memory location, then the processor will likely access that location and
immediate neighbors of the location in the near future. Thus, when we first access a location in
memory, we copy that location and some number of its neighbors (called a block) into cache, and then
access the copy of the location in cache. When we access another location, we first check a cache table
to see if a copy of the location resides in
cache. If the copy does reside in cache, we have a cache hit, and we can read or write that location very
quickly. If the copy does not reside in cache, we have a cache miss, so we must copy the location’s block
into cache, which takes a lot of time. Thus, for a cache to be effective in improving performance, the
ratio of hits to misses must be very high, requiring intelligent caching schemes. Caches are used for both
program memory (often called instruction cache, or I-cache) as well as data memory (often called Dcache).
Figure of types of Memory:-
Figure of Cache Memory:-
Q.2 Give the Short note on operation of Instruction Execution with Pipelining
ANS:-
Instruction execution
We can think of a microprocessor’s execution of instructions as consisting of several
basic stages:
1. Fetch instruction: the task of reading the next instruction from memory into
the instruction register.
2. Decode instruction: the task of determining what operation the instruction
in the instruction register represents (e.g., add, move, etc.).
3. Fetch operands: the task of moving the instruction’s operand data into
appropriate registers.
4. Execute operation: the task of feeding the appropriate registers through the
ALU and back into an appropriate register.
5. Store results: the task of writing a register into memory
Pipelining
Pipelining is a common way to increase the instruction throughput of a microprocessor. We first make a
simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach,
the first person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute
per dish per person, this approach requires 16 minutes. The approach is clearly inefficient since at any
time only one person is working and the other is idle. Obviously, a better approach is for the second
person to begin drying the first dish immediately after it has been washed. This approach requires only 9
minutes -- 1 minute for the first dish to be washed, and then 8 more minutes until the last dish is finally
dry . We refer to this latter approach as pipelined. Each dish is like an instruction, and the two tasks of
washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for
each stage, we can pipeline instruction execution. After the instruction fetch unit fetches the first
instruction, the decode unit decodes it while the instruction fetch unit simultaneously
Q.3 Explain the Programmer’s View for instruction
ANS:A programmer writes the program instructions that carry out the desired functionality on the generalpurpose processor. The programmer may not actually need to know detailed information about the
processor’s architecture or operation, but instead may deal with an architectural abstraction, which
hides much of that detail. The level of abstraction depends on the level of programming. We can
distinguish between two levels of programming. The first is assembly-language programming, in which
one programs in a language representing processor-specific instructions as mnemonics. The second is
structured-language programming, in which one programs in a language using processor independent
instructions. A compiler automatically translates those instructions to processor-specific instructions.
Ideally, the structured-language programmer would need no information about the processor
architecture, but in embedded systems, the programmer must usually have at least some awareness, as
we shall discuss. Actually, we can define an even lower-level of programming, machine-language
programming, in which the programmer writes machine instructions in binary. This level of
programming has become extremely rare due to the advent of assemblers. Machinelanguage
programmed computers often had rows of lights representing to the programmer the current binary
instructions being executed. Today’s computers look more like boxes or refrigerators, but these do not
make for interesting movie props, so you may notice that in the movies, computers with rows of blinking
lights live on.
Instruction Set
The assembly-language programmer must know the processor’s instruction set. The instruction set
describes the bit-configurations allowed in the IR, indicating the atomic processor operations that the
programmer may invoke. Each such configuration forms an assembly instruction, and a sequence of such
instructions forms an assembly program. An instruction typically has two parts, an opcode field and
operand fields. An opcode specifies the operation to take place during the instruction. We can classify
instructions into three categories. Data-transfer instructions move data between memory and registers,
between input/output channels and registers, and between registers themselves. Arithmetic/logical
instructions configure the ALU to carry out a particular function, channel data from the registers through
the ALU, and channel data from the ALU back to a particular register. Branch instructions determine the
address of the next program instruction, based possibly on datapath status signals. Branches can be
further categorized as being unconditional jumps, conditional jumps or procedure call and return
instructions. Unconditional jumps always determine the address of the next instruction, while
Conditional jumps do so only if some condition evaluates to true, such as a particular register containing
zero. A call instruction, in addition to indicating the address of the next instruction, saves the address of
the current
Instruction so that a subsequent return instruction can jump back to the instruction immediately
following the most recent invoked call instruction. This pair of instructions facilitates the
implementation of procedure/function call semantics of high-level programming languages. An operand
field specifies the location of the actual data that takes part in an operation. Source operands serve as
input to the operation, while a destination operand stores the output. The number of operands per
instruction varies among processors. Even for a given processor, the number of operands per instruction
may vary depending on the instruction type.
The operand field may indicate the data’s location through one of several addressing modes, illustrated
in Figure In immediate addressing; the operand field contains the data itself. In register addressing, the
operand field contains the address of a datapath register in which the data resides. In register-indirect
addressing, the operand field
contains the address of a register, which in turn contains the address of a memory location in which the
data resides. In direct addressing, the operand field contains the address of a memory location in which
the data resides. In indirect addressing, the operand field contains the address of a memory location,
which in turn contains the address of a memory location in which the data resides. Those familiar with
structured languages may note that direct addressing implements regular variables, and indirect
addressing implements pointers. In inherent or implicit addressng, the particular register or memory
location of the data is implicit in the opcode; for example, the data may reside in a register called the
"accumulator." In indexed addressing, the direct or indirect operand must be added to a particular
implicit register to obtain the actual operand address. Jump instructions may use relative addressing to
reduce the number of bits needed to indicate the jump address. A relative address indicates how far to
jump from the current address, rather than indicating the complete address – such addressing is very
common since most jumps are to nearby instructions.
Figure of instruction stored in Memory
Figure of Addressing Mode
Figure of A Simple (Trivial) Instruction Set
Sample Programs:
Q.4 Explain the Development Environment for general Software Design
ANS:Several software and hardware tools commonly support the programming of general-purpose
processors. First, we must distinguish between two processors we deal with when developing an
embedded system.
1. Development processor
– The processor on which we write and debug our programs
• Usually a PC
2. Target processor
– The processor that the program will run on in our embedded system
Often different from the development processor
Software Development Process
Assemblers translate assembly instructions to binary machine instructions. In addition to just replacing
opcode and operand mnemonics by binary equivalents, an assembler may also translate symbolic labels
into actual addresses. For example, a programmer may add a symbolic label END to an instruction A, and
may reference END in a branch instruction. The assembler determines the actual binary address of A,
and replaces references to END by this address.
A linker allows a programmer to create a program in separately-assembled files; it combines the
machine instructions of each into a single program, perhaps incorporating instructions from standard
library routines.
Compilers translate structured programs into machine (or assembly) programs. Structured programming
languages possess high-level constructs that greatly simplify programming, such as loop constructs, so
each high-level construct may translate to several or tens of machine instructions. Compiler technology
has advanced tremendously over the past decades, applying numerous program optimizations, often
yielding very size and performance efficient code. A cross-compiler executes on one processor (our
development processor), but generates code for a different processor (our target processor). Crosscompilers are extremely common in embedded system development.
Debuggers help programmers evaluate and correct their programs. They run on the development
processor and support stepwise program execution, executing one instruction and then stopping,
proceeding to the next instruction when instructed by the user. They permit execution up to userspecified breakpoints, which are instructions that when encountered cause the program to stop
executing. Whenever the program stops, the user can examine values of various memory and register
locations. A source-level debugger enables step-by-step execution in the source program language,
whether assembly language or a structured language. A good debugging capability is crucial, as
today’s programs can be quite complex and hard to write correctly
Device programmers download a binary machine program from the development processor’s memory
into the target processor’s memory.
Emulators support debugging of the program while it executes on the target processor. An emulator
typically consists of a debugger coupled with a board connected to the desktop processor via a cable.
The board consists of the target processor plus some support circuitry (often another processor). The
board may have another cable with a device having the same pin configuration as the target processor,
allowing one to plug this device into a real embedded system. Such an in-circuit emulator enables one to
control and monitor the program’s execution in the actual embedded system circuit. In circuit emulators
are available for nearly any processor intended for embedded use, though they can be quite expensive if
they are to run at real speeds
Figure
Q.5 define Testing and Debugging and Running the Program of General Processor
ANS:Testing and Debugging
Running a Program
•
•
If development processor is different than target, how can we run our compiled code? Two
options:
– Download to target processor
– Simulate
Simulation
– One method: Hardware description language
• But slow, not always available
– Another method: Instruction set simulator (ISS)
• Runs on development processor, but executes instructions of target processor
Q.6 Explain the Application-Specific Instruction-Set Processors (ASIPs)
ANS:•
ASIPs – targeted to a particular domain
– Contain architectural features specific to that domain
• e.g., embedded control, digital signal processing, video processing, network
processing, telecommunications, etc.
– Still programmable
A Common ASIP is:Microcontroller
•
•
For embedded control applications
– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in huge amounts
– e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing
machine, microwave oven
Microcontroller features
– On-chip peripherals
• Timers, analog-digital converters, serial communication, etc.
• Tightly integrated for programmer, typically part of register space
– On-chip program and data memory
– Direct programmer access to many of the chip’s pins
– Specialized instructions for bit-manipulation and other low-level operations
Digital Signal Processors (DSP)
•
•
For signal processing applications
– Large amounts of digitized data, often streaming
– Data transformations must be applied fast
– e.g., cell-phone voice filter, digital TV, music synthesizer
DSP features
– Several instruction execution units
– Multiple-accumulate single-cycle instruction, other instrs.
– Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.
Q.7 explain designing a General Purpose Processor
ANS:•
Not something an embedded system designer normally would do
– But instructive to see how simply we can build one top down
– Remember that real processors aren’t usually built this way
• Much more optimized, much more bottom-up design
Architecture of a Simple Microprocessor:•
•
•
•
Storage devices for each declared variable
– register file holds each of the variables
Functional units to carry out the FSMD operations
– One ALU carries out every required operation
Connections added among the components’ ports corresponding to the operations required by
the FSM
Unique identifiers created for every control signal
Unit 4:- Standard Single Purpose Processors Peripherals
Q.1 Give the short note on Timers, counters, watchdog timers
ANS:-
Timers:A timer is a device that generates a signal pulse at specified time intervals. A time interval is a "realtime" measure of time, such as 3 milliseconds. These devices are extremely useful in systems in which a
particular action, such as sampling an input signal or generating an output signal, must be performed
every X time units. Internally, a simple timer may consist of a register, counter, and an extremely simple
controller. The register holds a count value representing the number of clock cycles that equals the
desired real-time value. This number can be computed using the simple formula:
Number of clock cycles = Desired real-time value / Clock cycle
For example, to obtain a duration of 3 milliseconds from a clock cycle of 10 nanoseconds (100 MHz), we
must count (3x10-6 s / 10x10-9 s/cycle) = 300 cycles. The counter is initially loaded with the count value,
and then counts down on every clock cycle until 0 is reached, at which point an output signal is
generated, the count value is reloaded, and the process repeats itself.
•
Timer: measures time intervals
– To generate timed output events
• e.g., hold traffic light green for 10 s
– To measure input events
• e.g., measure a car’s speed
•
Based on counting clock pulses
• E.g., let Clk period be 10 ns
• And we count 20,000 Clk pulses
• Then 200 microseconds have passed
• 16-bit counter would count up to 65,535*10 ns = 655.35 microsec., resolution =
10 ns
Top: indicates top count reached, wrap-around
Counters
A counter is nearly identical to a timer, except that instead of counting clock cycles (pulses on the clock
signal), a counter counts pulses on some other input signal.
•
Counter: like a timer, but counts pulses on a general input signal rather than clock
– e.g., count cars passing over a sensor
– Can often configure device as either a timer or counter
Other Counters
•
•
•
Interval timer
– Indicates when desired time interval has passed
– We set terminal count to desired interval
• Number of clock cycles = Desired time interval / Clock period
Cascaded counters
Prescaler
– Divides clock
– Increases range, decreases resolution
Watchdog timer:A watchdog timer can be thought of as having the inverse functionality than that of a regular timer. We
configure a watchdog timer with a real-time value, just as with a regular timer. However, instead of the
timer generating a signal for us every X time units, we must generate a signal for the timer every X time
units. If we fail to generate this signal in time, then the timer generates a signal indicating that we failed.
We often connect this signal to the reset or interrupt signal of a general-purpose processor. Thus, a
watchdog timer provides a mechanism of ensuring that our software is working properly; every so often
in the software, we include a statement that generates a signal to the watchdog timer (in particular, that
resets the timer). If something undesired happens in the software (e.g., we enter an undesired infinite
loop, we wait for an input signal that never arrives, a part fails, etc.), the watchdog generates a signal
that we can use to restart or test parts of the system. Using an interrupt service routine, we may record
information as to the number of failures and the causes of each, so that a service technician may later
evaluate this information to determine if a particular part requires replacement. Note that an
embedded system often must recover from failures whenever possible, as the user may not have the
means to reboot the system in the same manner that he/she might reboot a desktop system.
–
–
–
–
e.g., ATM machine
16-bit timer, 2 microsec. resolution
timereg value = 2*(216-1)–X = 131070–X
For 2 min., X = 120,000 microsec.
Q.2 Give the details about UART with Example
ANS:-
A UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it as parallel data
(usually one byte), and takes parallel data and transmits it as serial data. The principles of serial
communication appear in a later chapter. Such serial communication is beneficial when we need to
communicate bytes of data between devices separated by long distances, or when we simply have few
available I/O pins. Principles of serial communication will be discussed in a later chapter. For our
purpose in this section, we must be aware that we must set the transmission and reception rate, called
the baud rate, which indicates the frequency that the signal changes. Common rates include 2400, 4800,
9600, and 19.2k. We must also be aware that an extra bit may be added to each data word, called
parity, to detect transmission errors -- the parity bit is set to high or low to indicate if the word has an
even or odd number of bits. Internally, a simple UART may possess a baud-rate configuration register,
and two independently operating processors, one for receiving and the other for transmitting. The
transmitter may possess a register, often called a transmit buffer, that holds data to be sent. This
register is a shift register, so the data can be transmitted one bit at a time by shifting at the appropriate
rate. Likewise, the receiver receives data into a shift register,
and then this data can be read in parallel. Note that in order to shift at the appropriate rate based on
the configuration register, a UART requires a timer. To use a UART, we must configure its baud rate by
writing to the configuration register, and then we must write data to the transmit register and/or read
data from the received register. Unfortunately, configuring the baud rate is usually not as simple as
writing the desired rate (e.g., 4800) to a register. For example, to configure the UART of an 8051, we
must use the following equation:
Baudrate = (2s mod / 32) *oscfreq / (12 *(256 - TH1)))
smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of the oscillator, and
TH1 is an 8-bit rate register of a built-in timer. Note that we could use a general-purpose processor to
implement a UART completely in software. If we used a dedicated general-processor, the
implementation would be inefficient in terms of size. We could alternatively integrate the transmit and
receive functionality with our main program. This would require creating a routine to send data serially
over an I/O port, making use of a timer to control the rate. It would also require using an interrupt
service routine to capture serial data coming from another I/O port whenever such data begins arriving.
However, as with the timer functionality, adding send and receive functionality can detract from time
for other computations.
Q.3 Give the details about Pulse width modulator
ANS:A pulse-width modulator (PWM) generates an output signal that repeatedly switches between high and
low. We control the duration of the high value and of the low value by indicating the desired period, and
the desired duty cycle, which is the percentage of time the signal is high compared to the signal’s period.
A square wave has a duty cycle of 50%. The pulse’s width corresponds to the pulse’s time high. Again,
PWM functionality could be implemented on a dedicated general-purpose processor, or integrated with
another program’s functionality, but the single-purpose processor approach has the benefits of
efficiency and simplicity. One common use of a PWM is to control the average current or voltage input
to a device. For example, a DC motor rotates when power is applied, and this power can be turned on
and off by setting an input high or low. To control the speed, we can adjust the
input voltage, but this requires a conversion of our high/low digital signals to an analog signal.
Fortunately, we can also adjust the speed simply by modifying the duty cycle of the motors on/off input,
an approach which adjusts the average voltage. This approach works because a DC motor does not
come to an immediate stop when power is turned off, but rather it coasts, much like a bicycle coasts
when we stop pedaling. Increasing the duty cycle increases the motor speed, and decreasing the duty
cycle decreases the speed. This duty cycle adjustment principle applies to the control other types of
electric devices, such as dimmer lights. Another use of a PWM is to encode control commands in a single
signal for use by another device. For example, we may control a radio-controlled car by sending pulses
of different widths. Perhaps a 1 ms width corresponds to a turn left command, a 4 ms width to turn
right, and 8 ms to forward.
Example of PWM
Controlling a DC motor with a PWM
Q.4 Give the details about LCD controller
ANS:An LCD (Liquid crystal display) is a low-cost, low-power device capable of displaying text and images.
LCDs are extremely common in embedded systems, since such systems often do not have video
monitors standard for desktop systems. LCDs can be found in numerous common devices like watches,
fax and copy machines, and calculators. The basic principle of one type of LCD (reflective) works as
follows. First, incoming light passes through a polarizing plate. Next, that polarized light encounters
liquid crystal material. If we excite a region of this material, we cause the material’s molecules to align,
which in turn causes the polarized light to pass through the material. Otherwise, the light does not pass
through. Finally, light that has passed through hits a mirror and reflects back, so the excited region
appears to light up. Another type of LCD (absorption) works similarly, but uses a black surface instead of
a mirror. The surface below the excited region absorbs light, thus appearing darker than the other
regions. One of the simplest LCDs is 7-segment LCD. Each of the 7 segments can be activated to display
any digit character or one of several letters and symbols. Such an LCD may have 7 inputs, each
corresponding to a segment, or it may have only 4 inputs to represent the numbers 0 through 9. An LCD
driver converts these inputs to the electrical signals necessary to excite the appropriate LCD segments. A
dot-matrix LCD consists of a matrix of dots that can display alphanumeric characters (letters and digits)
as well as other symbols. A common dot-matrix LCD has 5 columns and 8 rows of dots for one character.
An LCD driver converts input data into the appropriate electrical signals necessary to excite the
appropriate LCD bits. Each type of LCD may be able to display multiple characters. In addition, each
character may be displayed in normal or inverted fashion. The LCD may permit a character to be blinking
(cycling through normal and inverted display) or may permit display of a cursor (such as a blinking
underscore) indicating the "current" character. This functionality would be difficult for us to implement
using software. Thus, we use an LCD controller to provide us with a simple interface, perhaps 8 data
inputs and one enable input. To send a byte to the LCD, we provide a value to the 8 inputs and pulse the
enable. This byte may be a control word, which instructs the LCD controller to initialize the LCD, clear
the display, select the position of the cursor, brighten the display, and so on. Alternatively, this byte may
be a data word, such as an ASCII character, instructing the LCD to display the character at the currentlyselected display position.
Q.5 Give the details about Keypad controller
ANS:-
A keypad consists of a set of buttons that may be pressed to provide input to an embedded system. Again, keypads
are extremely common in embedded systems, since such systems may lack the keyboard that comes standard with
desktop systems. A simple keypad has buttons arranged in an N-column by M-row grid. The device has N outputs,
each output corresponding to a column, and another M outputs, each output corresponding to a row. When we
press a button, one column output and one row output go high, uniquely identifying the pressed button. To read
such a keypad from software, we must scan the column and row outputs. The scanning may instead be performed
by a keypad controller (actually, such a device decodes rather than controls, but we’ll call it a controller for
consistency with the other peripherals discussed). A simple form of such a controller scans the column and row
outputs of the keypad. When the controller detects a button press, it stores a code corresponding to that button
into a register and sets an output high, indicating that a button has been pressed. Our software may poll this
output every 100 milliseconds or so, and read the register when the output is high. Alternatively, this output can
generate an interrupt on our general-purpose processor, eliminating the need for polling.
Q.6 Give the details about Stepper motor controller
ANS:A stepper motor is an electric motor that rotates a fixed number of degrees whenever we apply a "step"
signal. In contrast, a regular electric motor rotates continuously whenever power is applied, coasting to
a stop when power is removed. We specify a stepper motor either by the number of degrees in a single
step, such as 1.8E, or by the number of steps required to move 360E, such as 200 steps. Stepper motors
obviously abound in embedded systems with moving parts, such as disk drives, printers, photocopy and
fax machines, robots, camcorders, VCRs, etc. Internally, a stepper motor typically has four coils. To
rotate the motor one step, we pass current through one or two of the coils; the particular coils depends
on the present orientation of the motor. Thus, rotating the motor 360E requires applying current to the
coils in a specified sequence. Applying the sequence in reverse causes reversed rotation. In some cases,
the stepper motor comes with four inputs corresponding to the four coils, and with documentation that
includes a table indicating the proper input sequence. To control the motor from software, we must
maintain this table in software, and write a step routine that applies high values to the inputs based on
the table values that follow the previously-applied values. In other cases, the stepper motor comes with
a built-in controller (i.e., a special purpose processor) implementing this sequence. Thus, we merely
create a pulse on an input signal of the motor, causing the controller to generate the appropriate high
signals to the coils that will cause the motor to rotate one step.
Stepper motor with controller (driver)
Q.7 Give the details about Analog-to-digital converters
ANS:-
An analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal, and a
digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conversions are necessary
because, while embedded systems deal with digital values, an embedded system’s surroundings
typically involve many analog signals. Analog refers to continuously-valued signal, such as temperature
or speed represented by a voltage between 0 and 100, with infinite possible values in between. "Digital"
refers to discretely-valued signals, such as integers, and in computing systems, these signals are
encoded in binary. By converting between analog and digital signals, we can use digital processors in an
analog environment. For example, consider the analog signal of Figure The analog input voltage varies
over time from 1 to 4 Volts. We sample the signal at successive time units, and encode the current
voltage into a 4-bit binary number. Conversely, consider Figure We want to generate an analog output
voltage for the given binary numbers over time. We generate the analog signal shown. We can compute
the digital values from the analog values, and vice-versa, using the
following ratio:
Vmax is the maximum voltage that the analog signal can assume, n is the number of bits available for
the digital encoding, d is the present digital encoding, and e is the present analog voltage. This
proportionality of the voltage and digital encoding is shown graphically in Figure In our example of
Figure, suppose Vmax is 7.5V. Then for e = 5V, we have the following ratio: 5/7.5 = d/15, resulting in d =
1010 (ten), as shown in Figure The resolution of a DAC or ADC is defined as Vmax/(2n-1), representing
the number of volts between successive digital encodings. The above discussion assumes a minimum
voltage of 0V. Internally, DACs possess simpler designs than ADCs. A DAC has n inputs for the digital
encoding d, a Vmax analog input, and an analog output e. A fairly straightforward
circuit (involving resistors and an op-amp) can be used to convert d to e.
ADCs, on the other hand, require designs that are more complex, for the following reason. Given a Vmax
analog input and an analog input e, how does the converter know what binary value to assign in order to
satisfy the above ratio? Unlike DACs, there is no simple analog circuit to compute d from e. Instead, an
ADC may itself contain a DAC
also connected to Vmax. The ADC "guesses" an encoding d, and then evaluates its guess by inputting d
into the DAC, and comparing the generated analog output e’ with the original analog input e (using an
analog comparator). If the two sufficiently match, then the ADC has found a proper encoding. So now
the question remains: how do we guess
the correct encoding?
Digital-to-analog conversion using successive approximation
Q.8 Give the details about Real Time Clocks
ANS:-
Much like a digital wristwatch, a real-time clock (RTC) keeps the time and date in an embedded system.
Read-time clocks are typically composed of a crystal-controlled oscillator, numerous cascaded counters,
and a battery backup. The crystal-controlled oscillator generates a very consistent high-frequency digital
pulse that feed the cascaded counters. The first counter, typically, counts these pulses up to the
oscillator frequency, which corresponds to exactly one second. At this point, it generates a pulse that
feeds the next counter. This counter counts up to 59, at which point it generates a pulse feeding the
minute counter. The hour, date, month and year counters work in similar fashion. In addition, real-time
clocks adjust for leap years. The rechargeable back-up battery is used to keep the real-time clock
running while the system is powered off.
From the micro-controller’s point of view, the content of these counters can be set to a desired value,
(this corresponds to setting the clock), and retrieved. Communication
between the micro-controller and a real-time clock is accomplished through a serial bus, such as I2C. It
should be noted that, given a timer peripheral, it is possible to implement a real-time clock in software
running on a processor. In fact, many systems use this approach to maintain the time. However, the
drawback of such systems is that when the processor is shut down or reset, the time is lost.
Unit 5:- Memory
Q.1 Discuss Memory in details
ANS:-
Any embedded system’s functionality consists of three aspects: processing, storage, and
communication. Processing is the transformation of data, storage is the retention of data for later use,
and communication is the transfer of data. Each of these aspects must be implemented. We use
processors to implement processing, memories to implement storage, and buses to implement
communication. The earlier chapters described common processor types: general-purpose processors,
standard single-purpose processors, and custom single-purpose processors.
A memory stores large numbers of bits. These bits exist as m words of n bits each, for a total of m*n bits.
We refer to a memory as an m x n ("m-by-n") memory. Log2(m) address input signals are necessary to
identify a particular word. Stated another way, if a memory has k address inputs, it can have up to 2k
words. n signals are necessary to output (and possibly input) a selected word. To read a memory means
to retrieve the word of a particular address, while to write a memory means to store a word in a
particular address. Some memories can only be read from (ROM), while others can be both read from
and written to (RAM).
Q.2 discuss various Write ability/ storage permanence
ANS:-
•
•
•
•
Traditional ROM/RAM distinctions
– ROM
• read only, bits stored without power
– RAM
• read and write, lose stored bits without power
Traditional distinctions blurred
– Advanced ROMs can be written to
• e.g., EEPROM
– Advanced RAMs can hold bits without power
• e.g., NVRAM
Write ability
– Manner and speed a memory can be written
Storage permanence
– ability of memory to hold stored bits after they are written
•
•
Ranges of write ability
– High end
• processor writes to memory simply and quickly
• e.g., RAM
– Middle range
• processor writes to memory, but slower
• e.g., FLASH, EEPROM
– Lower range
• special equipment, “programmer”, must be used to write to memory
• e.g., EPROM, OTP ROM
– Low end
• bits stored only during fabrication
• e.g., Mask-programmed ROM
In-system programmable memory
– Can be written to by a processor in the embedded system using the memory
– Memories in high end and middle range of write ability
•
•
Range of storage permanence
– High end
• essentially never loses bits
• e.g., mask-programmed ROM
– Middle range
• holds bits days, months, or years after memory’s power source turned off
• e.g., NVRAM
– Lower range
• holds bits as long as power supplied to memory
• e.g., SRAM
– Low end
• begins to lose bits almost immediately after written
• e.g., DRAM
Nonvolatile memory
– Holds bits after power is no longer supplied
– High end and middle range of storage permanence
Q.3 discuss Common Memory Types
ANS: - Two types of Memory
1. ROM Read Only Memory
ROM, or read-only memory, is a memory that can be read from, but not typically written to, during
execution of an embedded system. Of course, there must be a mechanism for setting the bits in the
memory (otherwise, of what use would the read data serve?), but we call this "programming," not
writing. Such programming is usually done off-line, i.e., when the memory is not actively serving as a
memory in an embedded system. We usually program a ROM before inserting it into the embedded
system. Figure provides a block diagram of a ROM.
We can use ROM for various purposes. One use is to store a software program for a general-purpose
processor. We may write each program instruction to one ROM word. For some processors, we write
each instruction to several ROM words. For other processors, we may pack several instructions into a
single ROM word. A related use is to store constant data, like large lookup tables of strings or numbers.
Another common use is to implement a combinational circuit. We can implement any combinational
function of k variables by using a 2kx 1 ROM, and we can implement n functions of the same k variables
using a 2kx n ROM. We simply program the ROM to implement the truth table for the functions,
Figure provides a symbolic view of the internal design of an 8x4 ROM. To the right of the 3x8 decoder in
the figure is a grid of lines, with word lines running horizontally and data lines vertically; lines that cross
without a circle in the figure are not
connected. Thus, word lines only connect to data lines via the programmable connection lines shown.
The figure shows all connection lines in place except for two connections in word 2. To see how this
device acts as a read-only memory, consider an input address of "010." The decoder will thus set word
2’s line to 1. Because the lines connecting this word line with data lines 2 and 0 do not exist, the ROM
output will read "1010." Note that if the ROM enable input is 0, then no word is read. Also note that
each data line is shown as a wired-OR, meaning that the wire itself acts to logically OR all the
connections to it.
•
Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM
Types of ROM in Briefly
Mask-programmed ROM:•
•
•
•
Connections “programmed” at fabrication
– set of masks
Lowest write ability
– only once
Highest storage permanence
– bits never change unless damaged
Typically used for final design of high-volume systems
– spread out NRE cost for a low unit cost
OTP ROM: One-time programmable ROM:•
Connections “programmed” after manufacture by user
– user provides file of desired contents of ROM
– file input to machine called ROM programmer
– each programmable connection is a fuse
– ROM programmer blows fuses where connections should not exist
• Very low write ability
– typically written only once and requires ROM programmer device
• Very high storage permanence
– bits don’t change unless reconnected to programmer and more fuses blown
• Commonly used in final products
cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM:•
•
•
•
Programmable component is a MOS transistor
– Transistor has “floating” gate surrounded by an insulator
– (a) Negative charges form a channel between source and drain storing a logic 1
– (b) Large positive voltage at gate causes negative charges to move out of channel and
get trapped in floating gate storing a logic 0
– (c) (Erase) Shining UV rays on surface of floating-gate causes negative charges to return
to channel from floating gate restoring the logic 1
– (d) An EPROM package showing quartz window through which UV light can pass
Better write ability
– can be erased and reprogrammed thousands of times
Reduced storage permanence
– program lasts about 10 years but is susceptible to radiation and electric noise
Typically used during design development
EEPROM: Electrically erasable programmable ROM:•
•
•
•
Programmed and erased electronically
– typically by using higher than normal voltage
– can program and erase individual words
Better write ability
– can be in-system programmable with built-in circuit to provide higher than normal
voltage
• built-in memory controller commonly used to hide details from memory user
– writes very slow due to erasing and programming
• “busy” pin indicates to processor EEPROM still writing
– can be erased and programmed tens of thousands of times
Similar storage permanence to EPROM (about 10 years)
Far more convenient than EPROMs, but more expensive
Flash Memory:•
•
•
•
Extension of EEPROM
– Same floating gate principle
– Same write ability and storage permanence
Fast erase
– Large blocks of memory erased at once, rather than one word at a time
– Blocks typically several thousand bytes large
Writes to single words may be slower
– Entire block must be read, word updated, then entire block written back
Used with embedded systems storing large data items in nonvolatile memory
– e.g., digital cameras, TV set-top boxes, cell phones
2. RAM: “Random-access” memory
•
•
•
Typically volatile memory
– bits are not held without power supply
Read and written to easily by embedded system during execution
Internal structure more complex than ROM
– a word consists of several memory cells, each storing 1 bit
– each input and output data line connects to each cell in its column
– rd/wr connected to every cell
– when row is enabled by decoder, each cell has logic that stores input data bit when
rd/wr indicates write or outputs stored bit when rd/wr indicates read
Basic types of RAM:•
•
SRAM: Static RAM
– Memory cell uses flip-flop to store bit
– Requires 6 transistors
– Holds data as long as power supplied
DRAM: Dynamic RAM
– Memory cell uses MOS transistor and capacitor to store bit
– More compact than SRAM
– “Refresh” required due to capacitor leak
• word’s cells refreshed when read
– Typical refresh rate 15.625 microsec.
– Slower to access than SRAM
Other Types of RAM OR RAM variations:•
•
PSRAM: Pseudo-static RAM
– DRAM with built-in memory refresh controller
– Popular low-cost high-density alternative to SRAM
NVRAM: Nonvolatile RAM
– Holds data after external power removed
– Battery-backed RAM
• SRAM with own permanently connected battery
• writes as fast as reads
• no limit on number of writes unlike nonvolatile ROM-based memory
– SRAM with EEPROM or flash
• stores complete RAM contents on EEPROM or flash before power turned off
Q.4 gives the Details of Composing Memory
ANS:•
•
•
Memory size needed often differs from size of readily available memories
When available memory is larger, simply ignore unneeded high-order address bits and higher
data lines
When available memory is smaller, compose several smaller memories into one larger memory
– Connect side-by-side to increase width of words
– Connect top to bottom to increase number of words
• added high-order address line selects smaller memory containing desired word
using a decoder
• Combine techniques to increase number and width of words
An embedded system designer is often faced with the situation of needing a particular-sized memory
(ROM or RAM), but having readily available memories of a different size. For example, the designer may
need a 2k x 8 ROM, but may have 4k x 16 ROMs readily available. Alternatively, the designer may need a
4k x 16 ROM, but may have 2k x 8 ROMs available for use.
The case where the available memory is larger than needed is easy to deal with. We simply use the
needed lower words in the memory, thus ignoring unneeded higher words and their high-order address
bits, and we use the lower data input/output lines, thus ignoring unneeded higher data lines. (Of course,
we could use the higher data lines and ignore the lower lines instead).
The case where the available memory is smaller than needed requires more design effort. In this case,
we must compose several smaller memories to behave as the larger memory we need. Suppose the
available memories have the correct number of words, but each word is not wide enough. In this case,
we can simply connect the available memories side-by-side. For example, Figure illustrates the situation
of needing a ROM three-times wider than that available. We connect three ROMs side-by-side, sharing
the same address and enable lines among them, and concatenating the data lines to form the desired
word width.
Suppose instead that the available memories have the correct word width, but not enough words. In
this case, we can connect the available memories top-to-bottom. For example, Figure illustrates the
situation of needing a ROM with twice as many words, and hence needing one extra address line, than
that available. We connect the ROMs top-to-bottom, OR’ing the corresponding data lines of each. We
use the extra high-order address line to select the higher or lower ROM (using a 1x2 decoder),
Q.5 Explain the Concept of Memory Hierarchy and Cache
ANS:-
•
•
•
Want inexpensive, fast memory
Main memory
– Large, inexpensive, slow memory stores entire program and data
Cache
– Small, expensive, fast memory stores copy of likely accessed parts of larger memory
– Can be multiple levels of cache
When we design a memory to store an embedded system’s program and data, we often face the
following dilemma: we want an inexpensive and fast memory, but inexpensive memories tend to be
slow, whereas fast memories tend to be expensive. The solution to this dilemma is to create a memory
hierarchy, as illustrated in Figure We use an inexpensive but slow main memory to store all of the
program and data. We use a small amount of fast but expensive cache memory to store copies of likelyaccessed parts of main memory. Using cache is analogous to posting on a wall near a telephone a short
list of important phone numbers rather than posting the entire phonebook
A cache operates as follows. When we want the processor to access (read or write) a main memory
address, we first check for a copy of that location in cache. If the copy is in the cache, called a cache hit,
then we can access it quickly. If the copy is not there, called a cache miss, then we must first read the
address (and perhaps some of its neighbors) into the cache. This description of cache operation leads to
several cache design choices: cache mapping, cache replacement policy, and cache write techniques.
These design choices can have significant impact on system cost, performance, as well as power, and
thus should be evaluated carefully for a given application.
Cache is usually designed using static RAM rather than dynamic RAM, which is one reason that cache is
more expensive but faster than main memory.
Q.6 Explain the various Cache Mapping Techniques
ANS:Cache mapping
• Far fewer number of available cache addresses
• Are address’ contents in cache?
• Cache mapping used to assign main memory address to cache address and determine hit or miss
• Three basic techniques:
– Direct mapping
– Fully associative mapping
– Set-associative mapping
• Caches partitioned into indivisible blocks or lines of adjacent memory addresses
– usually 4 or 8 addresses per line
Direct mapping:•
•
•
Main memory address divided into 2 fields
– Index
• cache address
• number of bits determined by cache size
– Tag
• compared with tag stored in cache at address indicated by index
• if tags match, check valid bit
Valid bit
– indicates whether data in slot has been loaded from memory
Offset
– used to find particular word in cache line
Fully associative mapping:•
•
•
Complete main memory address stored in each cache address
All addresses stored in cache simultaneously compared with desired address
Valid bit and offset same as direct mapping
Set-associative mapping:•
•
•
•
•
Compromise between direct mapping and fully associative mapping
Index same as in direct mapping
But, each cache address contains content and tags of 2 or more memory address locations
Tags of that set simultaneously compared as in fully associative mapping
Cache with set size N called N-way set-associative
– 2-way, 4-way, 8-way are common
Q.6 Explain Cache Replacement Policy
ANS:-
•
•
•
•
•
Technique for choosing which block to replace
– when fully associative cache is full
– when set-associative cache’s line is full
Direct mapped cache has no choice
Random
– replace block chosen at random
LRU: least-recently used
– replace block not accessed for longest time
FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue
Unit 6:- Interfacing
Q.1 Explain the Basics of Communication
ANS:Communication needs Bus and wires
•
•
Wires:
– Uni-directional or bi-directional
– One line may represent multiple wires
Bus
– Set of wires with a single function
• Address bus, data bus
– Or, entire collection of wires
• Address, data and control
• Associated protocol: rules for communication
Example:
Every direction needs to rd/rw enable for transmit. Also needs some ports which is given below
Ports
•
•
•
•
Conducting device on periphery
Connects bus to processor or memory
Often referred to as a pin
– Actual pins on periphery of IC package that plug into socket on printed-circuit board
– Sometimes metallic balls instead of pins
– Today, metal “pads” connecting processors and memories within single IC
Single wire or set of wires with single function
– E.g., 12-wire address port
Example of timing Diagrams which data write and read also show the address of data
Timing Diagrams:•
•
•
•
•
•
Most common method for describing a communication protocol
Time proceeds to the right on x-axis
Control signal: low or high
– May be active low (e.g., go’, /go, or go_L)
– Use terms assert (active) and deassert
– Asserting go’ means go=0
Data signal: not valid or valid
Protocol may have subprotocols
– Called bus cycle, e.g., read and write
– Each may be several clock cycles
Read example
– rd’/wr set low,address placed on addr for at least tsetup time before enable asserted,
enable triggers memory to place data on data wires by time tread
–
Q.2 Explain the Microprocessor interfacing: I/O addressing
ANS:-
A microprocessor may have tens or hundreds of pins, many of which are control pins, such as a pin for
clock input and another input pin for resetting the microprocessor. Many of the other pins are used to
communicate data to and from the microprocessor, which we call processor I/O. There are two common
methods for using pins to support I/O: ports, and system buses.
A port is a set of pins that can be read and written just like any register in the microprocessor; in fact,
the port is usually connected to a dedicated register. For example, consider an 8-bit port named P0. A Clanguage programmer may write to P0 using an instruction like: P0 = 255, which would set all 8 pins to
1’s. In this case, the C compiler manual would have defined P0 as a special variable that would
automatically be mapped to the register P0 during compilation. Conversely, the programmer might read
the value of a port P1 being written by some other device, by saying something like a=P1. In some
microprocessors, each bit of a port can be configured as input or output by writing to a configuration
register for the port. For example, P0 might have an associated configuration register called CP0. To set
the high-order four bits to input and the loworder four bits to output, we might say: CP0 = 15. This
writes 00001111 to the CP0 register, where a 0 means input and a 1 means output. Ports are often bitaddressable, meaning that a programmer can read or write specific bits of the port. For example, one
might say: x = P0.2, giving x the value of the number 2 connection of port P0. Portbased I/O is also called
parallel I/O.
–
–
Port-based I/O (parallel I/O)
• Processor has one or more N-bit ports
• Processor’s software reads and writes a port just like a register
• E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports
Bus-based I/O
• Processor has address, data and control ports that form a single bus
• Communication protocol is built into the processor
• A single instruction carries out the read or write protocol on the bus
Types of bus-based I/O:
memory-mapped I/O and standard I/O
–
Memory-mapped I/O
• Peripheral registers occupy addresses in same address space as memory
• e.g., Bus has 16-bit address
– lower 32K addresses may correspond to memory
upper 32k addresses may correspond to peripherals
–
Standard I/O (I/O-mapped I/O)
• Additional pin (M/IO) on bus indicates whether a memory or peripheral access
• e.g., Bus has 16-bit address
– all 64K addresses correspond to memory when M/IO set to 0
– all 64K addresses correspond to peripherals when M/IO set to 1
Memory-mapped I/O vs. Standard I/O
•
•
Memory-mapped I/O
– Requires no special instructions
• Assembly instructions involving memory like MOV and ADD work with
peripherals as well
• Standard I/O requires special instructions (e.g., IN, OUT) to move data between
peripheral registers and memory
Standard I/O
– No loss of memory addresses to peripherals
– Simpler address decoding logic in peripherals possible
• When number of peripherals much smaller than address space then high-order
address bits can be ignored
– smaller and/or faster comparators
Figure:• ISA supports standard I/O
– /IOR distinct from /MEMR for peripheral read
• /IOW used for writes
– 16-bit address space for I/O vs. 20-bit address space for memory
– Otherwise very similar to memory protocol
Q.3 Explain the Microprocessor interfacing: interrupts
ANS:•
•
•
Suppose a peripheral intermittently receives data, which must be serviced by the processor
– The processor can poll the peripheral regularly to see if data has arrived – wasteful
– The peripheral can interrupt the processor when it has data
Requires an extra pin or pins: Int
– If Int is 1, processor suspends current program, jumps to an Interrupt Service Routine, or
ISR
– Known as interrupt-driven I/O
– Essentially, “polling” of the interrupt pin is built-into the hardware, so no extra time!
What is the address (interrupt address vector) of the ISR?
– Fixed interrupt
• Address built into microprocessor, cannot be changed
• Either ISR stored at address or a jump to actual ISR stored if not enough bytes
available
– Vectored interrupt
• Peripheral must provide the address
• Common when microprocessor has multiple peripherals connected by a system
bus
• Compromise: interrupt address table
Interrupt-driven I/O using fixed ISR location:-
Q.4 short note on Direct memory access (DMA)
ANS:•
•
•
Buffering
– Temporarily storing data in memory before processing
– Data accumulated in peripherals commonly buffered
Microprocessor could handle this with ISR
– Storing and restoring microprocessor state inefficient
– Regular program must wait
DMA controller more efficient
– Separate single-purpose processor
– Microprocessor relinquishes control of system bus to DMA controller
– Microprocessor can meanwhile execute its regular program
• No inefficient storing and restoring state due to ISR call
• Regular program need not wait unless it requires the system bus
– Harvard archictecture – processor can fetch and execute instructions as
long as they don’t access data memory – if they do, processor stalls
Peripheral to memory transfer without DMA, using vectored interrupt:
Peripheral to memory transfer with DMA:
Q.5 short note on Arbitration
ANS:Types of Arbitration
Priority arbiter:
•
•
Consider the situation where multiple peripherals request service from single resource (e.g.,
microprocessor, DMA controller) simultaneously - which gets serviced first?
Priority arbiter
– Single-purpose processor
– Peripherals make requests to arbiter, arbiter makes requests to resource
– Arbiter connected to system bus for configuration only
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1. Microprocessor is executing its program.
2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2.
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
4. Microprocessor stops executing its program and stores its state.
5. Microprocessor asserts Inta.
6. Priority arbiter asserts Iack1 to acknowledge Peripheral1.
7. Peripheral1 puts its interrupt address vector on the system bus
8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns
(and completes handshake with arbiter).
9. Microprocessor resumes executing its program.
Daisy-chain arbitration:•
•
Arbitration done by peripherals
– Built into peripheral or external logic added
• req input and ack output added to each peripheral
Peripherals connected to each other in daisy-chain manner
– One peripheral connected to resource, all others connected “upstream”
– Peripheral’s req flows “downstream” to resource, resource’s ack flows “upstream” to
requesting peripheral
– Closest peripheral has highest priority
•
Pros/cons
– Easy to add/remove peripheral - no system redesign needed
– Does not support rotating priority
– One broken peripheral can cause loss of access to other peripherals
Network-oriented arbitration:•
•
When multiple microprocessors share a bus (sometimes called a network)
– Arbitration typically built into bus protocol
– Separate processors may try to write simultaneously causing collisions
• Data must be resent
• Don’t want to start sending again at same time
– statistical methods can be used to reduce chances
Typically used for connecting multiple distant chips
– Trend – use to connect multiple on-chip processors
Q.6 short note on Multilevel bus architectures
ANS:-
Q.7 Explain the types of Communication [when this question ask just include three communication]
OR
Advanced Communication Principle [when this question ask so you have to include layering and Error
detection and Correction]
ANS: - three types of Communication
•
•
•
Parallel communication
– Physical layer capable of transporting multiple bits of data
Serial communication
– Physical layer transports one bit of data at a time
Wireless communication
– No physical connection needed for transport at physical layer
Parallel communication:•
•
•
•
Multiple data, control, and possibly power wires
– One bit per wire
High data throughput with short distances
Typically used when connecting devices on same IC or same circuit board
– Bus must be kept short
• long parallel wires result in high capacitance values which requires more time to
charge/discharge
• Data misalignment between wires increases as length increases
Higher cost, bulky
Serial communication:• Single data wire, possibly also control and power wires
• Words transmitted one bit at a time
• Higher data throughput with long distances
– Less average capacitance, so more bits per unit of time
• Cheaper, less bulky
• More complex interfacing logic and communication protocol
– Sender needs to decompose word into bits
– Receiver needs to recompose bits into word
– Control signals often sent on same wire as data increasing protocol complexity
Wireless communication:•
•
Infrared (IR)
– Electronic wave frequencies just below visible light spectrum
– Diode emits infrared light to generate signal
– Infrared transistor detects signal, conducts when exposed to infrared light
– Cheap to build
– Need line of sight, limited range
Radio frequency (RF)
– Electromagnetic wave frequencies in radio spectrum
– Analog circuitry and antenna needed on both sides of transmission
– Line of sight not needed, transmitter power determines range
Q.8 what is layering and Error detection and Correction
ANS:Layering
–
Break complexity of communication protocol into pieces easier to design and
understand
– Lower levels provide services to higher level
• Lower level might work with bits while higher level might work with packets of
data
– Physical layer
• Lowest level in hierarchy
• Medium to carry data from one actor (device or node) to another
•
Error detection and correction
•
•
•
•
•
•
•
Often part of bus protocol
Error detection: ability of receiver to detect errors during transmission
Error correction: ability of receiver and transmitter to cooperate to correct problem
– Typically done by acknowledgement/retransmission protocol
Bit error: single bit is inverted
Burst of bit error: consecutive bits received incorrectly
Parity: extra bit sent with word used for error detection
– Odd parity: data word plus parity bit contains odd number of 1’s
– Even parity: data word plus parity bit contains even number of 1’s
– Always detects single bit errors, but not all burst bit errors
Checksum: extra word sent with data packet of multiple words
– e.g., extra word contains XOR sum of all data words in packet
Q.9 discuss various Serial Communication Protocol
ANS:Serial protocols: I2C
•
I2C (Inter-IC)
– Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago
– Enables peripheral ICs to communicate using simple communication hardware
– Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode
– 3.4 Mbits/s and 10-bit addressing in fast-mode
– Common devices capable of interfacing to I2C bus:
• EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and
microcontrollers
I2C bus structure
Serial protocols: CAN
•
CAN (Controller area network)
– Protocol for real-time applications
– Developed by Robert Bosch GmbH
– Originally for communication among components of cars
– Applications now using CAN include:
• elevator controllers, copiers, telescopes, production-line control systems, and
medical instruments
– Data transfer rates up to 1 Mbit/s and 11-bit addressing
– Common devices interfacing with CAN:
• 8051-compatible 8592 processor and standalone CAN controllers
– Actual physical design of CAN bus not specified in protocol
• Requires devices to transmit/detect dominant and recessive signals to/from bus
• e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used
• Bus guarantees dominant signal prevails over recessive signal if asserted
simultaneously
Serial protocols: FireWire
•
FireWire (a.k.a. I-Link, Lynx, IEEE 1394)
– High-performance serial bus developed by Apple Computer Inc.
– Designed for interfacing independent electronic components
• e.g., Desktop, scanner
– Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing
– Plug-and-play capabilities
– Packet-based layered design structure
– Applications using FireWire include:
• disk drives, printers, scanners, cameras
– Capable of supporting a LAN similar to Ethernet
• 64-bit address:
– 10 bits for network ids, 1023 subnetworks
– 6 bits for node ids, each subnetwork can have 63 nodes
– 48 bits for memory address, each node can have 281 terabytes of
distinct locations
Serial protocols: USB
•
USB (Universal Serial Bus)
– Easier connection between PC and monitors, printers, digital speakers, modems,
scanners, digital cameras, joysticks, multimedia game equipment
– 2 data rates:
• 12 Mbps for increased bandwidth devices
• 1.5 Mbps for lower-speed devices (joysticks, game pads)
– Tiered star topology can be used
• One USB device (hub) connected to PC
– hub can be embedded in devices like monitor, printer, or keyboard or
can be standalone
• Multiple USB devices can be connected to hub
• Up to 127 devices can be connected like this
– USB host controller
• Manages and controls bandwidth and driver software required by each
peripheral
• Dynamically allocates power downstream according to devices
connected/disconnected
Q.10 discuss various Parallel Communication Protocol
ANS:Parallel protocols: PCI Bus
•
PCI Bus (Peripheral Component Interconnect)
– High performance bus originated at Intel in the early 1990’s
– Standard adopted by industry and administered by PCISIG (PCI Special Interest Group)
– Interconnects chips, expansion boards, processor memory subsystems
– Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing
• Later extended to 64-bit while maintaining compatibility with 32-bit schemes
– Synchronous bus architecture
– Multiplexed data/address lines
Parallel protocols: ARM Bus
•
ARM Bus
– Designed and used internally by ARM Corporation
– Interfaces with ARM line of processors
– Many IC design companies have own bus protocol
– Data transfer rate is a function of clock speed
• If clock speed of bus is X, transfer rate = 16 x X bits/s
– 32-bit addressing
Q.11 discuss various Wireless Communication Protocol
ANS:-
Wireless protocols: IrDA
– Protocol suite that supports short-range point-to-point infrared data transmission
– Created and promoted by the Infrared Data Association (IrDA)
– Data transfer rate of 9.6 kbps and 4 Mbps
– IrDA hardware deployed in notebook computers, printers, PDAs, digital cameras, public
phones, cell phones
– Lack of suitable drivers has slowed use by applications
– Windows 2000/98 now include support
– Becoming available on popular embedded OS’s
Wireless protocols: Bluetooth
•
Bluetooth
– New, global standard for wireless connectivity
– Based on low-cost, short-range radio link
– Connection established when within 10 meters of each other
– No line-of-sight required
• e.g., Connect to printer in another room
Wireless Protocols: IEEE 802.11
•
IEEE 802.11
– Proposed standard for wireless LANs
– Specifies parameters for PHY and MAC layers of network
• PHY layer
– physical layer
– handles transmission of data between nodes
– provisions for data transfer rates of 1 or 2 Mbps
– operates in 2.4 to 2.4835 GHz frequency band (RF)
– or 300 to 428,000 GHz (IR)
• MAC layer
– medium access control layer
– protocol responsible for maintaining order in shared medium
– collision avoidance/detection
Unit 7:- Digital Camera Example
Q.1 Explain the Digital Camera Example with all functionality
ANS:Figure of Camera
Followings points required to discuss Digital Camera
•
Introduction to a simple digital camera
•
Designer’s perspective
•
Requirements specification
•
Design
-
Four implementations
Introduction:-
•
•
•
•
•
•
•
Putting it all together
–
General-purpose processor
–
Single-purpose processor
•
Custom
•
Standard
–
Memory
–
Interfacing
Knowledge applied to designing a simple digital camera
–
General-purpose vs. single-purpose processors
–
Partitioning of functionality among different processor types
Captures images
Stores images in digital format
– No film
– Multiple images stored in camera
• Number depends on amount of memory and bits used per image
Downloads images to PC
Only recently possible
– Systems-on-a-chip
• Multiple processors and memories on one IC
– High-capacity flash memory
Very simple description used for example
– Many more features with real digital camera
• Variable size images, image deletion, digital stretching, zooming in and out, etc.
Designer’s perspective:•
Two key tasks
– Processing images and storing in memory
• When shutter pressed:
– Image captured
– Converted to digital form by charge-coupled device (CCD)
– Compressed and archived in internal memory
– Uploading images to PC
• Digital camera attached to PC
• Special software commands camera to transmit archived images serially
Charge-coupled device (CCD)
•
•
Special sensor that captures an image
Light-sensitive silicon solid-state device composed of many cells
Zero-bias error
•
•
•
Manufacturing errors cause cells to measure slightly above or below actual light intensity
Error typically same across columns, but different across rows
Some of left most columns blocked by black paint to detect zero-bias error
– Reading of other than 0 in blocked cells is zero-bias error
– Each row is corrected by subtracting the average error found in blocked cells for that
row
Compression
•
•
•
Store more images
Transmit image to PC in less time
JPEG (Joint Photographic Experts Group)
– Popular standard format for representing digital images in a compressed form
– Provides for a number of different modes of operation
– Mode used in this chapter provides high compression ratios using DCT (discrete cosine
transform)
– Image data divided into blocks of 8 x 8 pixels
– 3 steps performed on each block
• DCT
• Quantization
• Huffman encoding
Uploading to PC
•
When connected to PC and upload command received
– Read images from memory
– Transmit serially using UART
– While transmitting
• Reset pointers, image-size variables and global memory pointer accordingly
Requirements Specification
•
System’s requirements – what system should do
– Nonfunctional requirements
• Constraints on design metrics (e.g., “should use 0.001 watt or less”)
– Functional requirements
• System’s behavior (e.g., “output X should be input Y times 2”)
– Initial specification may be very general and come from marketing dept.
• E.g., short document detailing market need for a low-end digital camera that:
– captures and stores at least 50 low-res images and uploads to PC,
– costs around $100 with single medium-size IC costing less that $25,
– has long as possible battery life,
– has expected sales volume of 200,000 if market entry < 6 months,
– 100,000 if between 6 and 12 months,
– insignificant sales beyond 12 months
Nonfunctional requirements:-
•
Design metrics of importance based on initial specification
•
– Performance: time required to process image
– Size: number of elementary logic gates (2-input NAND gate) in IC
– Power: measure of avg. electrical energy consumed while processing
– Energy: battery lifetime (power x time)
Constrained metrics
– Values must be below (sometimes above) certain threshold
Optimization metrics
– Improved as much as possible to improve product
Metric can be both constrained and optimization
•
•
•
•
•
•
Performance
– Must process image fast enough to be useful
– 1 sec reasonable constraint
• Slower would be annoying
• Faster not necessary for low-end of market
– Therefore, constrained metric
Size
– Must use IC that fits in reasonably sized camera
– Constrained and optimization metric
• Constraint may be 200,000 gates, but smaller would be cheaper
Power
– Must operate below certain temperature (cooling fan not possible)
– Therefore, constrained metric
Energy
– Reducing power or time reduces energy
– Optimized metric: want battery to last as long as possible
Informal functional specification:-
•
•
•
•
•
Flowchart breaks functionality down into simpler functions
Each function’s details could then be described in English
– Done earlier in chapter
Low quality image has resolution of 64 x 64
Mapping functions to a particular processor type not done at this stage
Refined functional specification
•
•
•
•
Refine informal specification into one that can actually be executed
Can use C/C++ code to describe each function
– Called system-level model, prototype, or simply model
– Also is first implementation
Can provide insight into operations of system
– Profiling can find computationally intensive functions
Can obtain sample output used to verify correctness of final implementation
Executable model of digital camera
Design
•
•
•
•
Determine system’s architecture
– Processors
• Any combination of single-purpose (custom or standard) or general-purpose
processors
– Memories, buses
Map functionality to that architecture
– Multiple functions on one processor
– One function on one or more processors
Implementation
– A particular architecture and mapping
– Solution space is set of all implementations
Starting point
– Low-end general-purpose processor connected to flash memory
• All functionality mapped to software running on processor
• Usually satisfies power, size, and time-to-market constraints
• If timing constraint not satisfied then later implementations could:
– use single-purpose processors for time-critical functions
– rewrite functional specification
Implementation 1: Microcontroller alone:-
•
•
•
•
•
Low-end processor could be Intel 8051 microcontroller
Total IC cost including NRE about $5
Well below 200 mW power
Time-to-market about 3 months
However, one image per second not possible
– 12 MHz, 12 cycles per instruction
• Executes one million instructions per second
– CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
• ~100 assembly instructions each iteration
• 409,000 (4096 x 100) instructions per image
• Half of budget for reading image alone
– Would be over budget after adding compute-intensive DCT and Huffman encoding
Implementation 2: Microcontroller and CCDPP
•
•
•
CCDPP function implemented on custom single-purpose processor
– Improves performance – less microcontroller cycles
– Increases NRE cost and time-to-market
– Easy to implement
• Simple datapath
• Few states in controller
–
Simple UART easy to implement as single-purpose processor also
EEPROM for program memory and RAM for data memory added as well
Microcontroller
•
•
•
•
•
•
Synthesizable version of Intel 8051 available
– Written in VHDL
– Captured at register transfer level (RTL)
Fetches instruction from ROM
Decodes using Instruction Decoder
ALU executes arithmetic operations
– Source and destination registers reside in RAM
Special data movement instructions used to load and store externally
Special program generates VHDL description of ROM from output of C compiler/linker
UART
•
•
•
•
UART in idle mode until invoked
– UART invoked when 8051 executes store instruction with UART’s enable register as
target address
• Memory-mapped communication between 8051 and all single-purpose
processors
• Lower 8-bits of memory address for RAM
• Upper 8-bits of memory address for memory-mapped I/O devices
Start state transmits 0 indicating start of byte transmission then transitions to Data state
Data state sends 8 bits serially then transitions to Stop state
Stop state transmits 1 indicating transmission done then transitions back to idle mode
CCDPP
•
•
•
•
•
•
•
•
Hardware implementation of zero-bias operations
Interacts with external CCD chip
– CCD chip resides external to our SOC mainly because combining CCD with ordinary logic
not feasible
Internal buffer, B, memory-mapped to 8051
Variables R, C are buffer’s row, column indices
GetRow state reads in one row from CCD to B
– 66 bytes: 64 pixels + 2 blacked-out pixels
ComputeBias state computes bias for that row and stores in variable Bias
FixBias state iterates over same row subtracting Bias from each element
NextRow transitions to GetRow for repeat of process on next row or to Idle state when all 64
rows completed
Software
•
•
•
System-level model provides majority of code
– Module hierarchy, procedure names, and main program unchanged
Code for UART and CCDPP modules must be redesigned
– Simply replace with memory assignments
• xdata used to load/store variables over external memory bus
• _at_ specifies memory address to store these variables
• Byte sent to U_TX_REG by processor will invoke UART
• U_STAT_REG used by UART to indicate its ready for next byte
– UART may be much slower than processor
– Similar modification for CCDPP code
All other modules untouched
Analysis
•
•
Entire SOC tested on VHDL simulator
– Interprets VHDL descriptions and functionally simulates execution of system
• Recall program code translated to VHDL description of ROM
– Tests for correct functionality
– Measures clock cycles to process one image (performance)
Gate-level description obtained through synthesis
– Synthesis tool like compiler for SPPs
– Simulate gate-level models to obtain data for power analysis
• Number of times gates switch from 1 to 0 or 0 to 1
– Count number of gates for chip area
Implementation 2: Microcontroller and CCDPP
•
Analysis of implementation 2
– Total execution time for processing one image:
• 9.1 seconds
– Power consumption:
• 0.033 watt
– Energy consumption:
• 0.30 joule (9.1 s x 0.033 watt)
– Total chip area:
• 98,000 gates
Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT
•
•
9.1 seconds still doesn’t meet performance constraint of 1 second
DCT operation prime candidate for improvement
– Execution of implementation 2 shows microprocessor spends most cycles here
– Could design custom hardware like we did for CCDPP
• More complex so more design effort
– Instead, will speed up DCT functionality by modifying behavior
DCT floating-point cost:•
•
Floating-point cost
– DCT uses ~260 floating-point operations per pixel transformation
– 4096 (64 x 64) pixels per image
– 1 million floating-point operations per image
– No floating-point support with Intel 8051
• Compiler must emulate
– Generates procedures for each floating-point operation
• mult, add
– Each procedure uses tens of integer operations
– Thus, > 10 million integer operations per image
– Procedures increase code size
Fixed-point arithmetic can improve on this
Fixed-point arithmetic:-
•
•
Integer used to represent a real number
– Constant number of integer’s bits represents fractional portion of real number
• More bits, more accurate the representation
– Remaining bits represent portion of real number before decimal point
Translating a real constant to a fixed-point representation
– Multiply real value by 2 ^ (# of bits used for fractional part)
– Round to nearest integer
– E.g., represent 3.14 as 8-bit integer with 4 bits for fraction
• 2^4 = 16
• 3.14 x 16 = 50.24 ≈ 50 = 00110010
• 16 (2^4) possible values for fraction, each represents 0.0625 (1/16)
• Last 4 bits (0010) = 2
• 2 x 0.0625 = 0.125
• 3(0011) + 0.125 = 3.125 ≈ 3.14 (more bits for fraction would increase accuracy)
Fixed-point arithmetic operations:•
•
•
Addition
– Simply add integer representations
– E.g., 3.14 + 2.71 = 5.85
• 3.14 → 50 = 00110010
• 2.71 → 43 = 00101011
• 50 + 43 = 93 = 01011101
• 5(0101) + 13(1101) x 0.0625 = 5.8125 ≈ 5.85
Multiply
– Multiply integer representations
– Shift result right by # of bits in fractional part
– E.g., 3.14 * 2.71 = 8.5094
• 50 * 43 = 2150 = 100001100110
• >> 4 = 10000110
• 8(1000) + 6(0110) x 0.0625 = 8.375 ≈ 8.5094
Range of real values used limited by bit widths of possible resulting values
Implementation 4:
Microcontroller and CCDPP/DCT
•
Analysis of implementation 4
– Total execution time for processing one image:
• 0.099 seconds (well under 1 sec)
– Power consumption:
• 0.040 watt
• Increase over 2 and 3 because SOC has another processor
– Energy consumption:
• 0.00040 joule (0.099 s x 0.040 watt)
• Battery life 12x longer than previous implementation!!
– Total chip area:
• 128,000 gates
• Significant increase over previous implementations
Unit 8:- Embedded Software Development Tools
Q.1 what is Host and Target Machine Also Explain Tools of Host and Target Machine
ANS:-
 Host: Where the embedded software is developed, compiled, tested, debugged,
optimized, and prior to its translation into target device. (Because the host has
keyboards, editors, monitors, printers, more memory, etc. for development, while the
target may have not of these capabilities for developing the software.)
 Target: After development, the code is cross-compiled, translated – cross-assembled,
linked (into target processor instruction set) and located into the target
Following Tools Required Target Machine
 Cross-Compilers –
 Native tools are good for host, but to port/locate embedded code to target, the host
must have a tool-chain that includes a cross-compiler, one which runs on the host but
produces code for the target processor
 Cross-compiling doesn’t guarantee correct target code due to (e.g., differences in word
sizes, instruction sizes, variable declarations, library functions)
 Cross-Assemblers and Tool Chain
 Host uses cross-assembler to assemble code in target’s instruction syntax for the target
Tool chain is a collection of compatible, translation tools, which are ‘pipelined’ to produce a complete
binary/machine code that can be linked and located into the target processor
 (See Fig 9.1)
Q.2 Discuss Linker/Locators for Embedded Software
ANS:-
 Native linkers are different from cross-linkers (or locators) that perform additional tasks to
locate embedded binary code into target processors
Address Resolution
 Native Linker: produces host machine code on the hard-drive (in a named file), which
the loader loads into RAM, and then schedules (under the OS control) the program to go
to the CPU.
 In RAM, the application program/code’s logical addresses for, e.g., variable/operands
and function calls, are ordered or organized by the linker. The loader then maps the
logical addresses into physical addresses – a process called address resolution. The
loader then loads the code accordingly into RAM (see Fig 9.2). In the process the loader
also resolves the addresses for calls to the native OS routines
 Locator: produces target machine code (which the locator glues into the RTOS) and the
combined code (called map) gets copied into the target ROM. The locator doesn’t stay
in the target environment, hence all addresses are resolved, guided by locating-tools
and directives, prior to running the code (See Fig)
Locating Program Components – Segments
 Unchanging embedded program (binary code) and constants must be kept in ROM to be
remembered even on power-off
 Changing program segments (e.g., variables) must be kept in RAM
 Chain tools separate program parts using segments concept
 Chain tools (for embedded systems) also require a ‘start-up’ code to be in a separate segment
and ‘located’ at a microprocessor-defined location where the program starts execution
 Some cross-compilers have default or allow programmer to specify segments for program parts,
but cross-assemblers have no default behavior and programmer must specify segments for
program parts
See Fig locating of object-code segments in ROM and RAM
Q.2 Explain different ways to getting Embedded Software into the Target System
ANS:1. PROM Programmers

Moving maps into ROM or PROM, is to create a ROM using hardware tools or a PROM
programmer (for small and changeable software, during debugging)
 If PROM programmer is used (for changing or debugging software), place PROM in a socket
(which makes it erasable – for EPROM, or removable/replaceable) rather than ‘burnt’ into
circuitry
 PROM’s can be pushed into sockets by hand, and pulled using a chip puller
 The PROM programmer must be compatible with the format (syntax/semantics) of the Map
See Fig
2. ROM Emulators –
Another approach is using a ROM emulator (hardware) which emulates the target system, has all the
ROM circuitry, and a serial or network interface to the host system. The locator loads the Map into the
emulator, especially, for debugging purposes.
 Software on the host that loads the Map file into the emulator must understand (be compatible
with) the Map’s syntax/semantics
See Fig
3. Using Flash Memory
 For debugging, a flash memory can be loaded with target Map code using a software on
the host over a serial port or network connection (just like using an EPROM)
 Advantages:
 No need to pull the flash (unlike PROM) for debugging different embedded code
 Transferring code into flash (over a network) is faster and hassle-free
 New versions of embedded software (supplied by vendor) can be loaded into
flash memory by customers over a network - Requires a) protecting the flash
programmer, saving it in RAM and executing from there, and reloading into
flash after new version is written and b) the ability to complete loading new
version even if there are crashes and protecting the startup code as in (a)
 Modifying and/or debugging the flash programming software requires moving it
into RAM, modify/debug, and reloading it into target flash memory using above
methods
4. Monitors
Another option you have on systems with a Communication port is use a Monitor a Program that
Resides in the target ROM and knows how to load new programs onto the Systems
Unit 9:- Debugging Techniques
Q.1 Why Testing on your Host Machine Explain Goals of the typical Testing Process
ANS:-
Goal of Testing Process
 Store test results (target may not even have disk drive to store results)
Q.2 Explain Basic Techniques for Testing on your Host Machine
ANS:-
Testing on Host Machine – Basic Techniques 1.
 Target system on the left: (hardware-indep code, hardware-dep code, hw)
 Test system (on host) on the right: (hardware-indep code – same, scaffold – rest)
 Scaffold provides (in software) all functionalities and calls to hardware as in the
hardware-dep and hardware components of the target system – more like a simulator
for them!
Fig:-
Testing on Host Machine – Basic Techniques 2
 Radio.c -- hardware independent code
 Radiohw.c – hardware dependent code (only interface to hw: inp() and outp()
supporting vTurnOnTransmitter() and vTurnOffTransmitter() functions
 Inp() and outp() must have real hardware code to read/write byte data correctly
- makes testing harder!!
 Replace radiohw.c with scaffold, eliminating the need for inp() and outp() – both
are simulated in software – a program stub!!
Testing on Host Machine – Basic Techniques 3
Calling Interrupt Routines –

Embedded systems are interrupt-driven, so to test based on interrupts
 1) Divide interrupt routines into two components
 A) a component that deals with the hardware
 B) a component of the routine which deals with the rest of the system
 2) To test, structure the routine such that the hardware-dependent component (A) calls
the hardware-independent part (B).
 3) Write component B in C-language, so that the test scaffold can call it
 Hw component (A) is vHandleRxHardware(), which reads characters from the hw
 Sw component (B) is vHandleByte, called by A to buffer characters, among others
 The test scaffold, vTestMain(), then calls vHandleByte(), to test if the system works
[where vTestMain() pretends to be the hardware sending the chars to vHandleByte()]
Testing on Host Machine – Basic Techniques 4

Calling the Timer Interrupt Routine
 Design the test scaffold routine to directly call the timer interrupt routine, rather than
other part of the host environment, to avoid interruptions in the scaffold’s timing of
events
 This way, the scaffold has control over sequences of events in the test which must occur
within intervals of timer interrupts
 Script Files and Output Files
 To let the scaffold test the system in some sequence or repeated times, write a script
file (of commands and parameters) to control the test
 Parse the script file, test system based on commands/parameters, and direct output –
intermixture of the input-script and output lines – into an output file
The commands in the script cause the scaffold to call routines in the B (sw-indp) component -- See Fig
10.5 and Fig 10.6 – for the cordless bar-code scanner
Testing on Host Machine – Basic Techniques 5
 More Advanced Techniques
 Making the scaffold automatically control sequence of events – e.g., calling the printer
interrupt many times but in a controlled order to avoid swamping
 Making the scaffold automatically queue up requests-to-send output lines, by
automatically controlling the button interrupt routine, which will cause successive
pressing of a button to let the next output line be received from the hardware (the
printer interrupt routine). In this way, the hardware-independent software is controlled
by the scaffold, where the button interrupts serve as a switch
The scaffold may contain multiple instances of the software-independent code, and the scaffold serves
as a controller of the communication between the instances – where each instance is called by the
scaffold when the hardware interrupt occurs (e.g., the scanner or the cash register). In this way, the
scaffold simulates the hardware (scanner or register) and provides communication services to the
software-independent code instances it calls.
Fig
Testing on Host Machine – Basic Techniques 6
 Objections, Limitations, and Shortcomings
 1) Hard to test parts which are truly hardware dependent, until the target system is operational.
Yet, good to test most sw-independent parts on host (see Fig 10.8)
 2) Time and effort in writing scaffold – even if huge, it is worthwhile
 3) Having the scaffold run on the host and its RTOS – scaffold can run as low priority task within
the RTOS and have nicely integrated testing environment
 4) The hard to justify limitations – can’t tell in scaffold until the actual test
 Writing to the wrong hardware address – software/hardware interactions
 Realistic interrupt latency due to differences in processor speeds (host v. target)
 Real interrupts that cause shared-data problems, where real enable/disable is the key
 Differences in network addressing, size of data types, data packing schemes – portability
issues
Q.3 Explain Instruction Set Simulators with its Useful abilities
ANS:-
Instruction Set Simulators
 Using software to simulate:
 The target microprocessor instruction set
 The target memory (types - RAM)
 The target microprocessor architecture (interconnections and components)
 Simulator – must understand the linker/locator Map format, parse and interpret it
 Simulator – takes the Map as input, reads the instructions from simulated ROM,
reads/writes from/to simulated registers
 Provide a user interface to simulator for I/O, debugging (using, e.g., a macro language)
Instruction Set Simulators – 1
 Capabilities of Simulators:
 Collect statistics on # instructions executed, bus cycles for estimating actual times
 Easier to test assembly code (for startup software and interrupt routines) in simulator
 Easier to test for portability since simulator takes same Map as the target
 Other parts, e.g., timers and built-in peripherals, can be tested in the corresponding
simulated versions in the simulated microprocessor architecture
 What simulators can’t help:
 Simulating and testing ASICs, sensors, actuators, specialized radios (perhaps, in future
systems!!)
 Lacking I/O interfaces in simulator to support testing techniques discussed (unless
additional provision is made for I/O to support the scaffold; and scripts to format and
reformat files between the simulator, simulated memory, and the scaffold)
Q.4 Explain The assert Macro
ANS: 10.3 The assert Macro
 The assert is used (with a boolean-expression parameter) to check assumptions
 If the expression is TRUE nothing happens, if FALSE, a message is printed and the program
crashes
 Assert works well in finding bugs early, when testing in the host environment
 On failure, assert causes a return to the host operating systems (can’t do on target, and can’t
print such message on target – may not have the display unit)
 Assert macro that runs on the target are useful for spotting problems:
 1) disabling interrupts and spin in infinite loop – effectively stopping the system
 2) turn on some pattern of LEDs or blinking device
 3) write special code memory for logic analyzer to read
 4) write location of the instruction that cause problem to specific memory for logic
analyzer to read (the Map can help isolate which source code is the culprit!)
 5) execute an illegal op or other to stop the system – e.g., using in-circuit emulators
Example:-
Q.5 Explain various Laboratory Tools
ANS:1. Volt Meters and Ohm Meters
If you have any doubts about the correctness or the reliability of the hardware on which you are testing
your software volt meter for measuring the voltage difference between two points
An ohm meter for measuring the resistance between two points
2. Oscilloscopes
 Oscilloscopes (scopes) test events that repeat periodically – monitoring one or two
signals (graph of time v. voltage), triggering mechanism to indicate start of monitoring,
adjust vertical to know ground-signal, used as voltmeter (flat graph at some vertical
relative to ground signal), test if a device/part is working – is graph flat? Is the digital
signal coming through – expecting a quick rising/falling edge (from 0 – VCC or VCC – 0) –
if not, scope will show slow rising/falling – indicating loading, bus fight, or other
hardware problem
3. Logic Analyzer
 Like storage scopes that (first) capture many signals and displays them simultaneously
 It knows only of VCC and ground voltage levels (displays are like timing diagrams) – Real
scopes display exact voltage (like analog)
 Can be used to trigger on-symptom and track back in stored signal to isolate problem
 Many signals can be triggered at their low and/or high points and for how long in that
state
 Used in Timing or State Mode
 Logic Analyzers in Timing Mode
 Find out if an event occurred – did cordless scanner turn on the radio?
 Measure how long it took software to respond to an interrupt (e.g., between a button
interrupt signal and activation signal of a responding device – to turn off an bell)
Is the software putting out the right pattern of signals to control a hardware device –
looking back in the captured signal for elapsed time
4. In-Circuit Emulators (ICE)
 Replaces target microprocessor in target circuitry (with some engineering)
 Has all the capabilities of a software debugger
 Maintains trace, similar to that of an LA’s
 Has overlay memory to emulate ROM and RAM for a specified range of address within
the ICE (rather than the system’s main ROM or RAM) – facilitates debugging
 ICE v. LA
 LA’s have better trace and filtering mechanism, and easier to detail and find
problems
 LA’s run in timing mode
 LA’s work with any microprocessor – ICE is microprocessor-specific
 LA’s support many but select signals to attach, ICE requires connecting ALL
signals
 ICE is more invasive
5. Software-Only Monitors
 Monitors allow running an embedded system in the target environment, while providing
debugging interfaces on both the host and target environments
 A small portion of the Monitor resides in the target ROM (debugging kernel or
monitor):
 The codes receives programs from serial port, network, copies into target’s
RAM, and run it with full debugging capabilities to test/debug the programs
 Another portion of monitor resides on host – provides debugging capability and
communicates with the debugging kernel over serial port or network, without hardware
modifications
 Compiled, linked (may be located into Map) code is downloaded from the host (by the
portion on the host) to the target RAM or flash (received by the kernel)
Other designs: ROM Emulator interface and JPAG comm. port on the target processor
All Students informs that this Material Covers 9 units out
of 10 just you have to read 10th units from the book
Which given in hard Material that is last
================= End================
Download