ISR - patel group of institutions

PATEL GROUP OF INSTITUTIONS Embedded System Material Unit Wise Subject:-Software Development for Embedded system (SD-ES) Subject Code: - 650012 Unit 1:- Introduction Q.1 what is Embedded Systems? List out its Application where embedded System used ANS:An embedded system is a single-purpose computer built into a larger system for the purposes of controlling and monitoring the system also used for perform Specific task – Computing systems embedded within electronic devices – Hard to define. Nearly any computing system other than a desktop computer – Billions of units produced yearly, versus millions of desktop units – Perhaps 50 per household and per automobile Embedded systems are integrated systems. Each system is designed for a specific functionality. They contain integrated hardware pieces with software loaded in their memories. Simple examples are – cell phones, Smart cards, DVD players, Digital cameras, Robotics in Assembly line, Control systems used in automobiles, Guided Missiles, Satellites… the list is unending. A “short list” of embedded systems where Embedded System Used Q.2 Explain the Characteristics of Embedded System ANS:Three main characteristics of Embedded Systems that distinguish such systems from other computing systems: 1) Single Functioned (2) Tightly Constrained and (3) Reactive and Real Time. (1) Single Functioned: Most of the ESs execute a special function repeatedly. [Exceptions are present where some systems update their programs. Some systems swap several programs in and out due to size limitations.] (2) Tightly Constrained: Most of the embedded systems have constraints on design metrics such as cost, size, performance and power. An embedded system must cost less, must be sized to fit on a single chip, must perform fast enough to process data in real time, must consume minimum power etc.. (3) Reactive and Real Time: Many embedded systems must continuously react to changes in system’s environment. They must compute certain results in real time without delay. [ Example: Car’s cruise controller, which is an embedded system, continuously monitors and reacts to speed and brake sensors. It must compute the values of acceleration and deceleration repeatedly within a limited time; delayed computation could result in a failure to maintain control of the car.] An embedded system example -- a digital camera • Single-functioned -- always a digital camera • Tightly-constrained -- Low cost, low power, small, fast • Reactive and real-time -- only to a small extent Q.3 Short note on Design Challenges of Optimizing Design Metrics ANS:• Obvious design goal: – • Key design challenge: – • Construct an implementation with desired functionality Simultaneously optimize numerous design metrics Design metric – A measurable feature of a system’s implementation – Optimizing design metrics is a key challenge – Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost – NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system – Size: the physical space required by the system – Performance: the execution time or throughput of the system – Power: the amount of power consumed by the system – Flexibility: the ability to change the functionality of the system without incurring heavy NRE cost – Time-to-prototype: the time needed to build a working version of the system – Time-to-market: the time required to develop a system to the point that it can be released and sold to customers – Maintainability: the ability to modify the system after its initial release – Correctness, safety, many more Time-to-market: a demanding design metric Losses due to delayed market entry NRE and unit cost metrics $200,000 B $160,000 $120,000 $80,000 A B $160 C per product cost total cost (x1000) $200 A C $120 $40,000 $80 $40 $0 $0 0 800 1600 2400 0 Number of units (volume) 800 1600 2400 Number of units (volume) The performance design metric • • • • Widely-used measure of system, widely-abused – Clock frequency, instructions per second – not good measures – Digital camera example – a user cares about how fast it processes images, not clock speed or instructions per second Latency (response time) – Time between task start and end – e.g., Camera’s A and B process images in 0.25 seconds Throughput – Tasks per second, e.g. Camera A processes 4 images per second – Throughput can be more than latency seems to imply due to concurrency, e.g. Camera B may process 8 images per second (by capturing a new image while previous image is being stored). Speedup of B over S = B’s performance / A’s performance – Throughput speedup = 8/4 = 2 Q.4 discuss and explain Processor Technology in Embedded Systems OR Explain Types of Processor Technology ANS:The architecture of the computation engine used to implement a system’s desired functionality A part of the whole system the complete system Processor does not have to be programmable A processor is not necessarily a general-purpose Programmable processor something that processes data Input Processing Output General-purpose processors:• Programmable device used in a variety of applications – • • • Also known as “microprocessor” Features – Program memory – General datapath with large register file and general ALU User benefits – Low time-to-market and NRE costs – High flexibility “Pentium” the most well-known, but there are hundreds of others Single-purpose processors:Programmable processor Optimized for applications having Common characteristics Compromise between general purpose And single-purpose Features Program memory Special functional units Benefits Flexibility Performance Size, Power Application-specific processors:• Programmable processor optimized for a particular class of applications having common characteristics – • • Compromise between general-purpose and single-purpose processors Features – Program memory – Optimized datapath – Special functional units Benefits – Some flexibility, good performance, size and power Q.5 Explain the IC Technology with its types ANS:The manner in which a digital implementation is mapped onto a technological solution Integrated Circuit Technologies differ in their customization to a design Consist of numerous layers Integrated circuit technologies differ with respect to Who builds each layer When layers are built • The manner in which a digital (gate-level) implementation is mapped onto an IC – IC: Integrated circuit, or “chip” – IC technologies differ in their customization to a design – IC’s consist of numerous layers (perhaps 10 or more) • • IC technologies differ with respect to who builds each layer and when Three types of IC technologies • Full-custom/VLSI • Semi-custom ASIC (gate array and standard cell) • PLD (Programmable Logic Device) Full-custom/VLSI:• • All layers are optimized for an embedded system’s particular digital implementation – Placing transistors – Sizing transistors – Routing wires Benefits – • Excellent performance, small size, low power Drawbacks – High NRE cost (e.g., $300k), long time-to-market Semi-custom ASIC (Gate Array and Standard Cell) :• Lower layers are fully or partially built and already built and also particularly implementation – • Benefits – • Designers are left with routing of wires and maybe placing some blocks Good performance, good size, less NRE cost than a full-custom implementation (perhaps $10k to $100k) Drawbacks – Still require weeks to months to develop PLD (Programmable Logic Device) :• • All layers already exist – Designers can purchase an IC – Connections on the IC are either created or destroyed to implement desired functionality – Field-Programmable Gate Array (FPGA) very popular Benefits – • Low NRE costs, almost instant IC availability Drawbacks – Bigger, expensive (perhaps $30 per unit), power hungry, slower Q.6 short note on Design Technology ANS:The manner in which we convert our concept of desired system functionality into an implementation Compilation/Synthesis Automates exploration and insertion of implementation details for lower level Libraries/IP Incorporates pre-designed implementation from lower abstraction level into higher level Test/Verification Ensures correct functionality at each level, thus reducing costly iterations between levels The co-design ladder :- Q.7 Discuss Various Trade-offs for Embedded Systems ANS:- Basic tradeoff with independence IC – General vs. custom – With respect to processor technology or IC technology – The two technologies are independent Design productivity gap:• While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity • 1981 leading edge chip required 100 designer months – 10,000 transistors / 100 transistors/month 2002 leading edge chip requires 30,000 designer months – 150,000,000 / 5000 transistors/month Designer cost increase from $1M to $300M • • Unit 2:- Custom Single Purpose Processors: Hardware Q.1 what is Custom Single-Purpose Processors ANS:Its Digital circuit that performs a computation tasks like Controller and datapath, Single-purpose: one particular computation task, Custom single-purpose: non-standard task • A custom single-purpose processor may be – Fast, small, low power – But, high NRE, longer time-to-market, less flexible Q.2 Explain Combinational Logic in detail ANS:In Combinational have two parts: (1) transistors (2) Gates (1) transistors A transistor is the basic electrical component of digital systems. Combinations of Transistors form more abstract components called logic gates, which designers primarily Use when building digital systems. A transistor acts as a simple on/off switch. One type of transistor (CMOS - Complementary Metal Oxide Semiconductor) Now Transistors have following Types: Circuit of Transistors – Voltage at “gate” controls whether current flows from source to drain – “gate” controls whether current flows from source to drain (2) Gates When a high voltage (typically +5 Volts, which we'll refer to as logic 1) is applied to the gate, the transistor conducts, so current flows. When low voltage (which we'll refer to as logic 0, typically ground, which is drawn as several horizontal lines of decreasing width) is applied to the gate, the transistor does not conduct. We can also build a transistor with the opposite functionality When logic 0 is applied to the gate, the transistor conducts, and when logic 1 is applied, the transistor does not conduct. Given these two basic transistors, we can easily build a circuit whose output inverts its gate input, Now following gats are implemented in Transistors Basics Logic gates:- Q.3 Short note on Basic Combinational Logic Design ANS:A combinational circuit is a digital circuit whose output is purely a function of its current inputs; such a circuit has no memory of past inputs. We can apply a simple technique to design a combinational circuit using our basic logic gates Q.4 give the details of RT-Level Combinational Components ANS:- A multiplexor, sometimes called a selector, allows only one of its data inputs Im to pass through to the output O. Thus, a multiplexor acts much like a railroad switch, allowing only one of multiple input tracks to connect to a single output track. If there are m data inputs, then there are log2(m) select lines S, and we call this an m-by-1 multiplexor (m data inputs, one data output). The binary value of S determines which data input passes through; 00...00 means I0 may pass, 00...01 means I1 may pass, 00...10 means I2 may pass, and so on. For example, an 8x1 multiplexor has 8 data inputs and thus 3 select lines. If those three select lines have values of 110, then I6 will pass through to the output. So if I6 is 1, then the output would be 1; if I6 is 0, then the output would be 0. We commonly use a more complex device called an n-bit multiplexor, in which each data input, as well as the output, consists of n lines. Suppose the previous example used a 4-bit 8x1 multiplexor. Thus, if I6 is 0110, then the output would be 0110. Note that n does not affect the number of select lines. A decoder converts its binary input I into a one-hot output O. "One-hot" means that exactly one of the output lines can be 1 at a given time. Thus, if there are n outputs, then there must be log2(n) inputs. We call this a log2(n)xn decoder. For example, a 3x8 decoder has 3 inputs and 8 outputs. If the input is 000, then the output O0 will be 1. If the input is 001, then the output O1 would be 1, and so on. A common feature on a decoder is an extra input called enable. When enable is 0, all outputs are 0. When enable is 1, the decoder functions as before An adder adds two n-bit binary inputs A and B, generating an n-bit output sum along with an output carry. For example, a 4-bit adder would have a 4-bit A input, a 4-bit B input, a 4-bit sum output, and a 1bit carry output. If A is 1010 and B is 1001, then sum would be 0011 and carry would be 1. A comparator compares two n-bit binary inputs A and B, generating outputs that indicate whether A is less than, equal to, or greater than B. If A is 1010 and B is 1001, then less would be 0, equal would be 0, and greater would be 1. An ALU (arithmetic-logic unit) can perform a variety of arithmetic and logic functions on its n-bit inputs A and B. The select lines S choose the current function; if there are m possible functions, then there must be at least log2(m) select lines. Common functions include addition, subtraction, AND, and OR. Q.5 what is Sequential Logic ANS: - A sequential circuit is a digital circuit whose outputs are a function of the current as well as previous input values. In other words, sequential logic possesses memory. One of the most basic sequential circuits is the flip-flop. A flip-flop stores a single bit. The simplest type of flip-flop is the D flipflop. It has two inputs: D and clock. When clock is 1, the value of D is stored in the flip-flop, and that value appears at an output Q. When clock is 0, the value of D is ignored; the output Q maintains its value. Another type of flip-flop is the SR flip-flop, which has three inputs: S, R and clock. When clock is 0, the previously stored bit is maintained and appears at output Q. When clock is 1, the inputs S and R are examined. If S is 1, a 1 is stored. If R is 1, a 0 is stored. If both are 0, there’s no change. If both are 1, behavior is undefined. Thus, S stands for set and R for reset. Another flip-flop type is a JK flip-flop, which is the same as an SR flip-flop except that when both J and K are 1, the stored bit toggles from 1 to 0 or 0 to 1. To prevent unexpected behavior from signal glitches, flip-flops are typically designed to be edgetriggered, meaning they only pay attention to their non-clock inputs when the clock is rising from 0 to 1, or alternatively when the clock is falling from 1 to 0. Q.6 Explain RT-Level Sequential Components ANS:- RT-Level Sequential have following Components A register stores n bits from its n-bit data input I, with those stored bits appearing at its output O. A register usually has at least two control inputs, clock and load. For a rising-edge-triggered register, the inputs I are only stored when load is 1 and clock is rising from 0 to 1. The clock input is usually drawn as a small triangle, as shown in the figure. Another common register control input is clear, which resets all bits to 0, regardless of the value of I. Because all n bits of the register can be stored in parallel, we often refer to this type of register as a parallel-load register, to distinguish it from a shift register, which we now describe. A shift register stores n bits, but these bits cannot be stored in parallel. Instead, they must be shifted into the register serially, meaning one bit per clock edge. A shift register has a one-bit data input I, and at least two control inputs clock and shift. When clock is rising and shift is 1, the value of I is stored in the (n)’th bit, while the (n)’th bit is stored in the (n-1)’th bit, and likewise, until the second bit is stored in the first bit. The first bit is typically shifted out, meaning it appears over an output Q. A counter is a register that can also increment (add binary 1) to its stored binary value. In its simplest form, a counter has a clear input, which resets all stored bits to 0, and a count input, which enables incrementing on the clock edge. A counter often also has a parallel load data input and associated control signal. A common counter feature is both up and down counting (incrementing and decrementing), requiring an additional control input to indicate the count direction. Q.7 Explain RT-Level Sequential Components ANS:Sequential logic design can be achieved using a straightforward technique, whose steps are illustrated in Figure 4.1. We again start with a problem description. We translate this description to a state diagram. We describe state diagrams further in a later chapter. Briefly, each state represents the current "mode" of the circuit, serving as the circuit’s memory of past input values. The desired output values are listed next to each state. The input conditions that cause a transistion from one state to another are shown next to each Each arc condition is implicitly AND’ed with a rising (or falling) clock edge. In other words, all inputs are synchronous. State diagrams can also describe asynchronous systems, but we do not cover such systems in this book, since they are not common. We will implement this state diagram using a register to store the current state, and combinational logic to generate the output values and the next state. We assign each state with a unique binary value, and we then create a truth table for the combinational logic. The inputs for the combinational logic are the state bits coming from the state register, and the external inputs, so we list all combinations of these inputs on the left side of the table. The outputs for the combinational logic are the state bits to be loaded into the register on the next clock edge (the next state), and the external output values, so we list desired values of these outputs for each input combination on the right side of the table. Because we used a state diagram for which outputs were a function of the current state only, and not of the inputs, we list an external output value only for each possible state, ignoring the external input values. Now that we have a truth table, we proceed with combinational logic design as described earlier, by generating minimized output equations, and then drawing the combinational logic circuit. Q.8 Explain Custom Single Purpose Processor Design ANS:We can apply the above combinational and sequential logic design techniques tobuild datapath components and controllers. Therefore, we have nearly all the knowledgewe need to build a custom single-purpose processor for a given program, since a processor consists of a controller and a datapath. We now describe a technique for building such a processor. We begin with a sequential program we must implement. Figure 4.3 provides a example based on computing a greatest common divisor (GCD). To begin building our single-purpose processor implementing the GCD program, we first convert our program into a complex state diagram, in which states and arcs may include arithmetic expressions, and these expressions may use external inputs and outputs or variables. In contrast, our earlier state diagrams only included Boolean expressions, and these expressions could only use external inputs and outputs, not Example: greatest common divisor • • First create algorithm Convert algorithm to “complex” state machine – Known as FSMD: finite-state machine with datapath – Can use templates to perform such conversion Algorithm and FSMD Templates We can use templates to convert a program to a state diagram, as illustrated in Figure First, we classify each statement as an assignment statement, loop statement, or branch (if-then-else or case) statement. For an assignment statement, we create a state with that statement as its action. We add an arc from this state to the state for the next statement, whatever type it may be. For a loop statement, we create a condition state C and a join state J, both with no actions. We add an arc with the loop’s condition from the condition state to the first statement in the loop body. We add a second arc with the complement of the loop’s condition from the condition state to the next statement after the loop body. We also add an arc from the join state back to the condition state. For a branch statement, we create a condition state C and a join state J, both with no actions. We add an arc with the first branch’s condition from the condition state to the branch’s first statement. We add another arc with the complement of the first branch’s condition AND’ed with the second branches condition from the condition state to the branches first statement. We repeat this for each branch. Finally, we connect the arc leaving the last statement of each branch to the join state, and we add an arc from this state to the next statement’s state. Using this template approach, we convert our GCD program to the complex state diagram of Figure we are now well on our way to designing a custom single-purpose processor that executes the GCD program State diagram templates:- Q.9 Explain RT-level Custom Single Purpose Processor Design ANS:• • We often start with a state machine – Rather than algorithm – Cycle timing often too central to functionality Example – Bus bridge that converts 4-bit bus to 8-bit bus – Start with FSMD – Known as register-transfer (RT) level – Exercise: complete the design Q.10 explain Optimizing single-purpose processors ANS:- • • Optimization is the task of making design metric values the best possible Optimization opportunities – original program – FSMD – Datapath – FSM Optimizing the original program:- Optimizing the FSMD:• Areas of possible improvements – merge states • states with constants on transitions can be eliminated, transition taken is already known • states with independent operations can be merged – separate states • states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size – scheduling Optimizing the datapath:• Sharing of functional units – one-to-one mapping, as done previously, is not necessary – if same operation occurs in different states, they can share a single functional unit • Multi-functional units – ALUs support a variety of operations, it can be shared among operations occurring in different states – Optimizing the FSM:• State encoding – task of assigning a unique bit pattern to each state in an FSM – size of state register and combinational logic vary – can be treated as an ordering problem • State minimization – task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state Unit 3:- General Purpose Processors: Software Q.1 what is General Purpose Processor? Explain the Architecture of General Purpose Processor ANS:A general-purpose processor is a programmable digital system intended to solve computation tasks in a large variety of applications. Copies of the same processor may solve computation problems in applications as diverse as communication, automotive, and industrial embedded systems. An embedded system designer choosing to use a general-purpose processor to implement part of a system’s functionality may achieve several benefits. – – – – Low unit cost, in part because manufacturer spreads NRE over large numbers of units • Motorola sold half a billion 68HC05 microcontrollers in 1996 alone Carefully designed since higher NRE is acceptable • Can yield good performance, size and power Low NRE cost, short time-to-market/prototype, high flexibility • User just writes software; no processor design a.k.a. “microprocessor” – “micro” used when they were implemented on one or a few chips rather than entire rooms Architecture of General Purpose Processor:- Datapath:The datapath consists of the circuitry for transforming data and for storing temporary data. The datapath contains an arithmetic-logic unit (ALU) capable of transforming data through operations such as addition, subtraction, logical AND, logical OR, inverting, and shifting. The ALU also generates status signals, often stored in a status register (not shown), indicating particular data conditions. Such conditions include indicating whether data is zero, or whether an addition of two data items generates a carry. The datapath also contains registers capable of storing temporary data. Temporary data may include data brought in from memory but not yet sent through the ALU, data coming from the ALU that will be needed for later ALU operations or will be sent back to memory, and data that must be moved from one memory location to another. The internal data bus is the bus over which data travels within the datapath, while the external data bus is the bus over which data is brought to and from the data memory. Controller:The controller consists of circuitry for retrieving program instructions, and for moving data to, from, and through the datapath according to those instructions. The controller contains a program counter (PC) that holds the address in memory of the next program instruction to fetch. The controller also contains an instruction register (IR) to hold the fetched instruction. Based on this instruction, the controller’s control logic generates the appropriate signals to control the flow of data in the datapath. Such flows may include inputting two particular registers into the ALU, storing ALU results into a particular register, or moving data between memory and a register. Finally, the next-state logic determines the next value of the PC. For a non-branch instruction, this logic increments the PC. For a branch instruction, this logic looks at the datapath status signals and the IR to determine the appropriate next address. The PC’s bit-width represents the processor’s address size. The address size is independent of the data word size; the address size is often larger. The address size determines the number of directly accessible memory locations, referred to as the address space or memory space. If the address size is M, then the address space is 2M. Thus, a processor with a 16-bit PC can directly address 216 = 65,536 memory locations. We would typically refer to this address space as 64K, although if 1K = 1000, this number would represent 64,000, not the actual 65,536. Thus, in computer-speak, 1K = 1024. Memory:While registers serve a processor’s short term storage requirements, memory serves the processor’s medium and long-term information-storage requirements. We can classify stored information as either program or data. Program information consists of the sequence of instructions that cause the processor to carry out the desired system functionality. Data information represents the values being input, output and transformed by the program. We can store program and data together or separately. In a Princeton architecture, data and program words share the same memory space. In a Harvard architecture, the program memory space is distinct from the data memory space. Figure 2.2 illustrates these two methods. The Princeton architecture may result in a simpler hardware connection to memory, since only one connection is necessary. A Harvard architecture, while requiring two connections, can perform instruction and data fetches simultaneously, so may result in improved performance. Most machines have a Princeton architecture. The Intel 8051 is a well-known Harvard architecture. Memory may be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded system’s program does not change. Constant-data may be stored in ROM, but other data of course requires RAM. Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip memory. To reduce the time needed to access (read or write) memory, a local copy of a portion of memory may be kept in a small but especially fast memory called cache, as illustrated in Figure Cache memory often resides on-chip, and often uses fast but expensive static RAM technology rather than slower but cheaper dynamic RAM (see Chapter 5). Cache memory is based on the principle that if at a particular time a processor accesses a particular memory location, then the processor will likely access that location and immediate neighbors of the location in the near future. Thus, when we first access a location in memory, we copy that location and some number of its neighbors (called a block) into cache, and then access the copy of the location in cache. When we access another location, we first check a cache table to see if a copy of the location resides in cache. If the copy does reside in cache, we have a cache hit, and we can read or write that location very quickly. If the copy does not reside in cache, we have a cache miss, so we must copy the location’s block into cache, which takes a lot of time. Thus, for a cache to be effective in improving performance, the ratio of hits to misses must be very high, requiring intelligent caching schemes. Caches are used for both program memory (often called instruction cache, or I-cache) as well as data memory (often called Dcache). Figure of types of Memory:- Figure of Cache Memory:- Q.2 Give the Short note on operation of Instruction Execution with Pipelining ANS:- Instruction execution We can think of a microprocessor’s execution of instructions as consisting of several basic stages: 1. Fetch instruction: the task of reading the next instruction from memory into the instruction register. 2. Decode instruction: the task of determining what operation the instruction in the instruction register represents (e.g., add, move, etc.). 3. Fetch operands: the task of moving the instruction’s operand data into appropriate registers. 4. Execute operation: the task of feeding the appropriate registers through the ALU and back into an appropriate register. 5. Store results: the task of writing a register into memory Pipelining Pipelining is a common way to increase the instruction throughput of a microprocessor. We first make a simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach, the first person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute per dish per person, this approach requires 16 minutes. The approach is clearly inefficient since at any time only one person is working and the other is idle. Obviously, a better approach is for the second person to begin drying the first dish immediately after it has been washed. This approach requires only 9 minutes -- 1 minute for the first dish to be washed, and then 8 more minutes until the last dish is finally dry . We refer to this latter approach as pipelined. Each dish is like an instruction, and the two tasks of washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for each stage, we can pipeline instruction execution. After the instruction fetch unit fetches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously Q.3 Explain the Programmer’s View for instruction ANS:A programmer writes the program instructions that carry out the desired functionality on the generalpurpose processor. The programmer may not actually need to know detailed information about the processor’s architecture or operation, but instead may deal with an architectural abstraction, which hides much of that detail. The level of abstraction depends on the level of programming. We can distinguish between two levels of programming. The first is assembly-language programming, in which one programs in a language representing processor-specific instructions as mnemonics. The second is structured-language programming, in which one programs in a language using processor independent instructions. A compiler automatically translates those instructions to processor-specific instructions. Ideally, the structured-language programmer would need no information about the processor architecture, but in embedded systems, the programmer must usually have at least some awareness, as we shall discuss. Actually, we can define an even lower-level of programming, machine-language programming, in which the programmer writes machine instructions in binary. This level of programming has become extremely rare due to the advent of assemblers. Machinelanguage programmed computers often had rows of lights representing to the programmer the current binary instructions being executed. Today’s computers look more like boxes or refrigerators, but these do not make for interesting movie props, so you may notice that in the movies, computers with rows of blinking lights live on. Instruction Set The assembly-language programmer must know the processor’s instruction set. The instruction set describes the bit-configurations allowed in the IR, indicating the atomic processor operations that the programmer may invoke. Each such configuration forms an assembly instruction, and a sequence of such instructions forms an assembly program. An instruction typically has two parts, an opcode field and operand fields. An opcode specifies the operation to take place during the instruction. We can classify instructions into three categories. Data-transfer instructions move data between memory and registers, between input/output channels and registers, and between registers themselves. Arithmetic/logical instructions configure the ALU to carry out a particular function, channel data from the registers through the ALU, and channel data from the ALU back to a particular register. Branch instructions determine the address of the next program instruction, based possibly on datapath status signals. Branches can be further categorized as being unconditional jumps, conditional jumps or procedure call and return instructions. Unconditional jumps always determine the address of the next instruction, while Conditional jumps do so only if some condition evaluates to true, such as a particular register containing zero. A call instruction, in addition to indicating the address of the next instruction, saves the address of the current Instruction so that a subsequent return instruction can jump back to the instruction immediately following the most recent invoked call instruction. This pair of instructions facilitates the implementation of procedure/function call semantics of high-level programming languages. An operand field specifies the location of the actual data that takes part in an operation. Source operands serve as input to the operation, while a destination operand stores the output. The number of operands per instruction varies among processors. Even for a given processor, the number of operands per instruction may vary depending on the instruction type. The operand field may indicate the data’s location through one of several addressing modes, illustrated in Figure In immediate addressing; the operand field contains the data itself. In register addressing, the operand field contains the address of a datapath register in which the data resides. In register-indirect addressing, the operand field contains the address of a register, which in turn contains the address of a memory location in which the data resides. In direct addressing, the operand field contains the address of a memory location in which the data resides. In indirect addressing, the operand field contains the address of a memory location, which in turn contains the address of a memory location in which the data resides. Those familiar with structured languages may note that direct addressing implements regular variables, and indirect addressing implements pointers. In inherent or implicit addressng, the particular register or memory location of the data is implicit in the opcode; for example, the data may reside in a register called the "accumulator." In indexed addressing, the direct or indirect operand must be added to a particular implicit register to obtain the actual operand address. Jump instructions may use relative addressing to reduce the number of bits needed to indicate the jump address. A relative address indicates how far to jump from the current address, rather than indicating the complete address – such addressing is very common since most jumps are to nearby instructions. Figure of instruction stored in Memory Figure of Addressing Mode Figure of A Simple (Trivial) Instruction Set Sample Programs: Q.4 Explain the Development Environment for general Software Design ANS:Several software and hardware tools commonly support the programming of general-purpose processors. First, we must distinguish between two processors we deal with when developing an embedded system. 1. Development processor – The processor on which we write and debug our programs • Usually a PC 2. Target processor – The processor that the program will run on in our embedded system Often different from the development processor Software Development Process Assemblers translate assembly instructions to binary machine instructions. In addition to just replacing opcode and operand mnemonics by binary equivalents, an assembler may also translate symbolic labels into actual addresses. For example, a programmer may add a symbolic label END to an instruction A, and may reference END in a branch instruction. The assembler determines the actual binary address of A, and replaces references to END by this address. A linker allows a programmer to create a program in separately-assembled files; it combines the machine instructions of each into a single program, perhaps incorporating instructions from standard library routines. Compilers translate structured programs into machine (or assembly) programs. Structured programming languages possess high-level constructs that greatly simplify programming, such as loop constructs, so each high-level construct may translate to several or tens of machine instructions. Compiler technology has advanced tremendously over the past decades, applying numerous program optimizations, often yielding very size and performance efficient code. A cross-compiler executes on one processor (our development processor), but generates code for a different processor (our target processor). Crosscompilers are extremely common in embedded system development. Debuggers help programmers evaluate and correct their programs. They run on the development processor and support stepwise program execution, executing one instruction and then stopping, proceeding to the next instruction when instructed by the user. They permit execution up to userspecified breakpoints, which are instructions that when encountered cause the program to stop executing. Whenever the program stops, the user can examine values of various memory and register locations. A source-level debugger enables step-by-step execution in the source program language, whether assembly language or a structured language. A good debugging capability is crucial, as today’s programs can be quite complex and hard to write correctly Device programmers download a binary machine program from the development processor’s memory into the target processor’s memory. Emulators support debugging of the program while it executes on the target processor. An emulator typically consists of a debugger coupled with a board connected to the desktop processor via a cable. The board consists of the target processor plus some support circuitry (often another processor). The board may have another cable with a device having the same pin configuration as the target processor, allowing one to plug this device into a real embedded system. Such an in-circuit emulator enables one to control and monitor the program’s execution in the actual embedded system circuit. In circuit emulators are available for nearly any processor intended for embedded use, though they can be quite expensive if they are to run at real speeds Figure Q.5 define Testing and Debugging and Running the Program of General Processor ANS:Testing and Debugging Running a Program • • If development processor is different than target, how can we run our compiled code? Two options: – Download to target processor – Simulate Simulation – One method: Hardware description language • But slow, not always available – Another method: Instruction set simulator (ISS) • Runs on development processor, but executes instructions of target processor Q.6 Explain the Application-Specific Instruction-Set Processors (ASIPs) ANS:• ASIPs – targeted to a particular domain – Contain architectural features specific to that domain • e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. – Still programmable A Common ASIP is:Microcontroller • • For embedded control applications – Reading sensors, setting actuators – Mostly dealing with events (bits): data is present, but not in huge amounts – e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features – On-chip peripherals • Timers, analog-digital converters, serial communication, etc. • Tightly integrated for programmer, typically part of register space – On-chip program and data memory – Direct programmer access to many of the chip’s pins – Specialized instructions for bit-manipulation and other low-level operations Digital Signal Processors (DSP) • • For signal processing applications – Large amounts of digitized data, often streaming – Data transformations must be applied fast – e.g., cell-phone voice filter, digital TV, music synthesizer DSP features – Several instruction execution units – Multiple-accumulate single-cycle instruction, other instrs. – Efficient vector operations – e.g., add two arrays • Vector ALUs, loop buffers, etc. Q.7 explain designing a General Purpose Processor ANS:• Not something an embedded system designer normally would do – But instructive to see how simply we can build one top down – Remember that real processors aren’t usually built this way • Much more optimized, much more bottom-up design Architecture of a Simple Microprocessor:• • • • Storage devices for each declared variable – register file holds each of the variables Functional units to carry out the FSMD operations – One ALU carries out every required operation Connections added among the components’ ports corresponding to the operations required by the FSM Unique identifiers created for every control signal Unit 4:- Standard Single Purpose Processors Peripherals Q.1 Give the short note on Timers, counters, watchdog timers ANS:- Timers:A timer is a device that generates a signal pulse at specified time intervals. A time interval is a "realtime" measure of time, such as 3 milliseconds. These devices are extremely useful in systems in which a particular action, such as sampling an input signal or generating an output signal, must be performed every X time units. Internally, a simple timer may consist of a register, counter, and an extremely simple controller. The register holds a count value representing the number of clock cycles that equals the desired real-time value. This number can be computed using the simple formula: Number of clock cycles = Desired real-time value / Clock cycle For example, to obtain a duration of 3 milliseconds from a clock cycle of 10 nanoseconds (100 MHz), we must count (3x10-6 s / 10x10-9 s/cycle) = 300 cycles. The counter is initially loaded with the count value, and then counts down on every clock cycle until 0 is reached, at which point an output signal is generated, the count value is reloaded, and the process repeats itself. • Timer: measures time intervals – To generate timed output events • e.g., hold traffic light green for 10 s – To measure input events • e.g., measure a car’s speed • Based on counting clock pulses • E.g., let Clk period be 10 ns • And we count 20,000 Clk pulses • Then 200 microseconds have passed • 16-bit counter would count up to 65,535*10 ns = 655.35 microsec., resolution = 10 ns Top: indicates top count reached, wrap-around Counters A counter is nearly identical to a timer, except that instead of counting clock cycles (pulses on the clock signal), a counter counts pulses on some other input signal. • Counter: like a timer, but counts pulses on a general input signal rather than clock – e.g., count cars passing over a sensor – Can often configure device as either a timer or counter Other Counters • • • Interval timer – Indicates when desired time interval has passed – We set terminal count to desired interval • Number of clock cycles = Desired time interval / Clock period Cascaded counters Prescaler – Divides clock – Increases range, decreases resolution Watchdog timer:A watchdog timer can be thought of as having the inverse functionality than that of a regular timer. We configure a watchdog timer with a real-time value, just as with a regular timer. However, instead of the timer generating a signal for us every X time units, we must generate a signal for the timer every X time units. If we fail to generate this signal in time, then the timer generates a signal indicating that we failed. We often connect this signal to the reset or interrupt signal of a general-purpose processor. Thus, a watchdog timer provides a mechanism of ensuring that our software is working properly; every so often in the software, we include a statement that generates a signal to the watchdog timer (in particular, that resets the timer). If something undesired happens in the software (e.g., we enter an undesired infinite loop, we wait for an input signal that never arrives, a part fails, etc.), the watchdog generates a signal that we can use to restart or test parts of the system. Using an interrupt service routine, we may record information as to the number of failures and the causes of each, so that a service technician may later evaluate this information to determine if a particular part requires replacement. Note that an embedded system often must recover from failures whenever possible, as the user may not have the means to reboot the system in the same manner that he/she might reboot a desktop system. – – – – e.g., ATM machine 16-bit timer, 2 microsec. resolution timereg value = 2*(216-1)–X = 131070–X For 2 min., X = 120,000 microsec. Q.2 Give the details about UART with Example ANS:- A UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it as parallel data (usually one byte), and takes parallel data and transmits it as serial data. The principles of serial communication appear in a later chapter. Such serial communication is beneficial when we need to communicate bytes of data between devices separated by long distances, or when we simply have few available I/O pins. Principles of serial communication will be discussed in a later chapter. For our purpose in this section, we must be aware that we must set the transmission and reception rate, called the baud rate, which indicates the frequency that the signal changes. Common rates include 2400, 4800, 9600, and 19.2k. We must also be aware that an extra bit may be added to each data word, called parity, to detect transmission errors -- the parity bit is set to high or low to indicate if the word has an even or odd number of bits. Internally, a simple UART may possess a baud-rate configuration register, and two independently operating processors, one for receiving and the other for transmitting. The transmitter may possess a register, often called a transmit buffer, that holds data to be sent. This register is a shift register, so the data can be transmitted one bit at a time by shifting at the appropriate rate. Likewise, the receiver receives data into a shift register, and then this data can be read in parallel. Note that in order to shift at the appropriate rate based on the configuration register, a UART requires a timer. To use a UART, we must configure its baud rate by writing to the configuration register, and then we must write data to the transmit register and/or read data from the received register. Unfortunately, configuring the baud rate is usually not as simple as writing the desired rate (e.g., 4800) to a register. For example, to configure the UART of an 8051, we must use the following equation: Baudrate = (2s mod / 32) *oscfreq / (12 *(256 - TH1))) smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of the oscillator, and TH1 is an 8-bit rate register of a built-in timer. Note that we could use a general-purpose processor to implement a UART completely in software. If we used a dedicated general-processor, the implementation would be inefficient in terms of size. We could alternatively integrate the transmit and receive functionality with our main program. This would require creating a routine to send data serially over an I/O port, making use of a timer to control the rate. It would also require using an interrupt service routine to capture serial data coming from another I/O port whenever such data begins arriving. However, as with the timer functionality, adding send and receive functionality can detract from time for other computations. Q.3 Give the details about Pulse width modulator ANS:A pulse-width modulator (PWM) generates an output signal that repeatedly switches between high and low. We control the duration of the high value and of the low value by indicating the desired period, and the desired duty cycle, which is the percentage of time the signal is high compared to the signal’s period. A square wave has a duty cycle of 50%. The pulse’s width corresponds to the pulse’s time high. Again, PWM functionality could be implemented on a dedicated general-purpose processor, or integrated with another program’s functionality, but the single-purpose processor approach has the benefits of efficiency and simplicity. One common use of a PWM is to control the average current or voltage input to a device. For example, a DC motor rotates when power is applied, and this power can be turned on and off by setting an input high or low. To control the speed, we can adjust the input voltage, but this requires a conversion of our high/low digital signals to an analog signal. Fortunately, we can also adjust the speed simply by modifying the duty cycle of the motors on/off input, an approach which adjusts the average voltage. This approach works because a DC motor does not come to an immediate stop when power is turned off, but rather it coasts, much like a bicycle coasts when we stop pedaling. Increasing the duty cycle increases the motor speed, and decreasing the duty cycle decreases the speed. This duty cycle adjustment principle applies to the control other types of electric devices, such as dimmer lights. Another use of a PWM is to encode control commands in a single signal for use by another device. For example, we may control a radio-controlled car by sending pulses of different widths. Perhaps a 1 ms width corresponds to a turn left command, a 4 ms width to turn right, and 8 ms to forward. Example of PWM Controlling a DC motor with a PWM Q.4 Give the details about LCD controller ANS:An LCD (Liquid crystal display) is a low-cost, low-power device capable of displaying text and images. LCDs are extremely common in embedded systems, since such systems often do not have video monitors standard for desktop systems. LCDs can be found in numerous common devices like watches, fax and copy machines, and calculators. The basic principle of one type of LCD (reflective) works as follows. First, incoming light passes through a polarizing plate. Next, that polarized light encounters liquid crystal material. If we excite a region of this material, we cause the material’s molecules to align, which in turn causes the polarized light to pass through the material. Otherwise, the light does not pass through. Finally, light that has passed through hits a mirror and reflects back, so the excited region appears to light up. Another type of LCD (absorption) works similarly, but uses a black surface instead of a mirror. The surface below the excited region absorbs light, thus appearing darker than the other regions. One of the simplest LCDs is 7-segment LCD. Each of the 7 segments can be activated to display any digit character or one of several letters and symbols. Such an LCD may have 7 inputs, each corresponding to a segment, or it may have only 4 inputs to represent the numbers 0 through 9. An LCD driver converts these inputs to the electrical signals necessary to excite the appropriate LCD segments. A dot-matrix LCD consists of a matrix of dots that can display alphanumeric characters (letters and digits) as well as other symbols. A common dot-matrix LCD has 5 columns and 8 rows of dots for one character. An LCD driver converts input data into the appropriate electrical signals necessary to excite the appropriate LCD bits. Each type of LCD may be able to display multiple characters. In addition, each character may be displayed in normal or inverted fashion. The LCD may permit a character to be blinking (cycling through normal and inverted display) or may permit display of a cursor (such as a blinking underscore) indicating the "current" character. This functionality would be difficult for us to implement using software. Thus, we use an LCD controller to provide us with a simple interface, perhaps 8 data inputs and one enable input. To send a byte to the LCD, we provide a value to the 8 inputs and pulse the enable. This byte may be a control word, which instructs the LCD controller to initialize the LCD, clear the display, select the position of the cursor, brighten the display, and so on. Alternatively, this byte may be a data word, such as an ASCII character, instructing the LCD to display the character at the currentlyselected display position. Q.5 Give the details about Keypad controller ANS:- A keypad consists of a set of buttons that may be pressed to provide input to an embedded system. Again, keypads are extremely common in embedded systems, since such systems may lack the keyboard that comes standard with desktop systems. A simple keypad has buttons arranged in an N-column by M-row grid. The device has N outputs, each output corresponding to a column, and another M outputs, each output corresponding to a row. When we press a button, one column output and one row output go high, uniquely identifying the pressed button. To read such a keypad from software, we must scan the column and row outputs. The scanning may instead be performed by a keypad controller (actually, such a device decodes rather than controls, but we’ll call it a controller for consistency with the other peripherals discussed). A simple form of such a controller scans the column and row outputs of the keypad. When the controller detects a button press, it stores a code corresponding to that button into a register and sets an output high, indicating that a button has been pressed. Our software may poll this output every 100 milliseconds or so, and read the register when the output is high. Alternatively, this output can generate an interrupt on our general-purpose processor, eliminating the need for polling. Q.6 Give the details about Stepper motor controller ANS:A stepper motor is an electric motor that rotates a fixed number of degrees whenever we apply a "step" signal. In contrast, a regular electric motor rotates continuously whenever power is applied, coasting to a stop when power is removed. We specify a stepper motor either by the number of degrees in a single step, such as 1.8E, or by the number of steps required to move 360E, such as 200 steps. Stepper motors obviously abound in embedded systems with moving parts, such as disk drives, printers, photocopy and fax machines, robots, camcorders, VCRs, etc. Internally, a stepper motor typically has four coils. To rotate the motor one step, we pass current through one or two of the coils; the particular coils depends on the present orientation of the motor. Thus, rotating the motor 360E requires applying current to the coils in a specified sequence. Applying the sequence in reverse causes reversed rotation. In some cases, the stepper motor comes with four inputs corresponding to the four coils, and with documentation that includes a table indicating the proper input sequence. To control the motor from software, we must maintain this table in software, and write a step routine that applies high values to the inputs based on the table values that follow the previously-applied values. In other cases, the stepper motor comes with a built-in controller (i.e., a special purpose processor) implementing this sequence. Thus, we merely create a pulse on an input signal of the motor, causing the controller to generate the appropriate high signals to the coils that will cause the motor to rotate one step. Stepper motor with controller (driver) Q.7 Give the details about Analog-to-digital converters ANS:- An analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal, and a digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conversions are necessary because, while embedded systems deal with digital values, an embedded system’s surroundings typically involve many analog signals. Analog refers to continuously-valued signal, such as temperature or speed represented by a voltage between 0 and 100, with infinite possible values in between. "Digital" refers to discretely-valued signals, such as integers, and in computing systems, these signals are encoded in binary. By converting between analog and digital signals, we can use digital processors in an analog environment. For example, consider the analog signal of Figure The analog input voltage varies over time from 1 to 4 Volts. We sample the signal at successive time units, and encode the current voltage into a 4-bit binary number. Conversely, consider Figure We want to generate an analog output voltage for the given binary numbers over time. We generate the analog signal shown. We can compute the digital values from the analog values, and vice-versa, using the following ratio: Vmax is the maximum voltage that the analog signal can assume, n is the number of bits available for the digital encoding, d is the present digital encoding, and e is the present analog voltage. This proportionality of the voltage and digital encoding is shown graphically in Figure In our example of Figure, suppose Vmax is 7.5V. Then for e = 5V, we have the following ratio: 5/7.5 = d/15, resulting in d = 1010 (ten), as shown in Figure The resolution of a DAC or ADC is defined as Vmax/(2n-1), representing the number of volts between successive digital encodings. The above discussion assumes a minimum voltage of 0V. Internally, DACs possess simpler designs than ADCs. A DAC has n inputs for the digital encoding d, a Vmax analog input, and an analog output e. A fairly straightforward circuit (involving resistors and an op-amp) can be used to convert d to e. ADCs, on the other hand, require designs that are more complex, for the following reason. Given a Vmax analog input and an analog input e, how does the converter know what binary value to assign in order to satisfy the above ratio? Unlike DACs, there is no simple analog circuit to compute d from e. Instead, an ADC may itself contain a DAC also connected to Vmax. The ADC "guesses" an encoding d, and then evaluates its guess by inputting d into the DAC, and comparing the generated analog output e’ with the original analog input e (using an analog comparator). If the two sufficiently match, then the ADC has found a proper encoding. So now the question remains: how do we guess the correct encoding? Digital-to-analog conversion using successive approximation Q.8 Give the details about Real Time Clocks ANS:- Much like a digital wristwatch, a real-time clock (RTC) keeps the time and date in an embedded system. Read-time clocks are typically composed of a crystal-controlled oscillator, numerous cascaded counters, and a battery backup. The crystal-controlled oscillator generates a very consistent high-frequency digital pulse that feed the cascaded counters. The first counter, typically, counts these pulses up to the oscillator frequency, which corresponds to exactly one second. At this point, it generates a pulse that feeds the next counter. This counter counts up to 59, at which point it generates a pulse feeding the minute counter. The hour, date, month and year counters work in similar fashion. In addition, real-time clocks adjust for leap years. The rechargeable back-up battery is used to keep the real-time clock running while the system is powered off. From the micro-controller’s point of view, the content of these counters can be set to a desired value, (this corresponds to setting the clock), and retrieved. Communication between the micro-controller and a real-time clock is accomplished through a serial bus, such as I2C. It should be noted that, given a timer peripheral, it is possible to implement a real-time clock in software running on a processor. In fact, many systems use this approach to maintain the time. However, the drawback of such systems is that when the processor is shut down or reset, the time is lost. Unit 5:- Memory Q.1 Discuss Memory in details ANS:- Any embedded system’s functionality consists of three aspects: processing, storage, and communication. Processing is the transformation of data, storage is the retention of data for later use, and communication is the transfer of data. Each of these aspects must be implemented. We use processors to implement processing, memories to implement storage, and buses to implement communication. The earlier chapters described common processor types: general-purpose processors, standard single-purpose processors, and custom single-purpose processors. A memory stores large numbers of bits. These bits exist as m words of n bits each, for a total of m*n bits. We refer to a memory as an m x n ("m-by-n") memory. Log2(m) address input signals are necessary to identify a particular word. Stated another way, if a memory has k address inputs, it can have up to 2k words. n signals are necessary to output (and possibly input) a selected word. To read a memory means to retrieve the word of a particular address, while to write a memory means to store a word in a particular address. Some memories can only be read from (ROM), while others can be both read from and written to (RAM). Q.2 discuss various Write ability/ storage permanence ANS:- • • • • Traditional ROM/RAM distinctions – ROM • read only, bits stored without power – RAM • read and write, lose stored bits without power Traditional distinctions blurred – Advanced ROMs can be written to • e.g., EEPROM – Advanced RAMs can hold bits without power • e.g., NVRAM Write ability – Manner and speed a memory can be written Storage permanence – ability of memory to hold stored bits after they are written • • Ranges of write ability – High end • processor writes to memory simply and quickly • e.g., RAM – Middle range • processor writes to memory, but slower • e.g., FLASH, EEPROM – Lower range • special equipment, “programmer”, must be used to write to memory • e.g., EPROM, OTP ROM – Low end • bits stored only during fabrication • e.g., Mask-programmed ROM In-system programmable memory – Can be written to by a processor in the embedded system using the memory – Memories in high end and middle range of write ability • • Range of storage permanence – High end • essentially never loses bits • e.g., mask-programmed ROM – Middle range • holds bits days, months, or years after memory’s power source turned off • e.g., NVRAM – Lower range • holds bits as long as power supplied to memory • e.g., SRAM – Low end • begins to lose bits almost immediately after written • e.g., DRAM Nonvolatile memory – Holds bits after power is no longer supplied – High end and middle range of storage permanence Q.3 discuss Common Memory Types ANS: - Two types of Memory 1. ROM Read Only Memory ROM, or read-only memory, is a memory that can be read from, but not typically written to, during execution of an embedded system. Of course, there must be a mechanism for setting the bits in the memory (otherwise, of what use would the read data serve?), but we call this "programming," not writing. Such programming is usually done off-line, i.e., when the memory is not actively serving as a memory in an embedded system. We usually program a ROM before inserting it into the embedded system. Figure provides a block diagram of a ROM. We can use ROM for various purposes. One use is to store a software program for a general-purpose processor. We may write each program instruction to one ROM word. For some processors, we write each instruction to several ROM words. For other processors, we may pack several instructions into a single ROM word. A related use is to store constant data, like large lookup tables of strings or numbers. Another common use is to implement a combinational circuit. We can implement any combinational function of k variables by using a 2kx 1 ROM, and we can implement n functions of the same k variables using a 2kx n ROM. We simply program the ROM to implement the truth table for the functions, Figure provides a symbolic view of the internal design of an 8x4 ROM. To the right of the 3x8 decoder in the figure is a grid of lines, with word lines running horizontally and data lines vertically; lines that cross without a circle in the figure are not connected. Thus, word lines only connect to data lines via the programmable connection lines shown. The figure shows all connection lines in place except for two connections in word 2. To see how this device acts as a read-only memory, consider an input address of "010." The decoder will thus set word 2’s line to 1. Because the lines connecting this word line with data lines 2 and 0 do not exist, the ROM output will read "1010." Note that if the ROM enable input is 0, then no word is read. Also note that each data line is shown as a wired-OR, meaning that the wire itself acts to logically OR all the connections to it. • Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM Types of ROM in Briefly Mask-programmed ROM:• • • • Connections “programmed” at fabrication – set of masks Lowest write ability – only once Highest storage permanence – bits never change unless damaged Typically used for final design of high-volume systems – spread out NRE cost for a low unit cost OTP ROM: One-time programmable ROM:• Connections “programmed” after manufacture by user – user provides file of desired contents of ROM – file input to machine called ROM programmer – each programmable connection is a fuse – ROM programmer blows fuses where connections should not exist • Very low write ability – typically written only once and requires ROM programmer device • Very high storage permanence – bits don’t change unless reconnected to programmer and more fuses blown • Commonly used in final products cheaper, harder to inadvertently modify EPROM: Erasable programmable ROM:• • • • Programmable component is a MOS transistor – Transistor has “floating” gate surrounded by an insulator – (a) Negative charges form a channel between source and drain storing a logic 1 – (b) Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0 – (c) (Erase) Shining UV rays on surface of floating-gate causes negative charges to return to channel from floating gate restoring the logic 1 – (d) An EPROM package showing quartz window through which UV light can pass Better write ability – can be erased and reprogrammed thousands of times Reduced storage permanence – program lasts about 10 years but is susceptible to radiation and electric noise Typically used during design development EEPROM: Electrically erasable programmable ROM:• • • • Programmed and erased electronically – typically by using higher than normal voltage – can program and erase individual words Better write ability – can be in-system programmable with built-in circuit to provide higher than normal voltage • built-in memory controller commonly used to hide details from memory user – writes very slow due to erasing and programming • “busy” pin indicates to processor EEPROM still writing – can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive Flash Memory:• • • • Extension of EEPROM – Same floating gate principle – Same write ability and storage permanence Fast erase – Large blocks of memory erased at once, rather than one word at a time – Blocks typically several thousand bytes large Writes to single words may be slower – Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory – e.g., digital cameras, TV set-top boxes, cell phones 2. RAM: “Random-access” memory • • • Typically volatile memory – bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM – a word consists of several memory cells, each storing 1 bit – each input and output data line connects to each cell in its column – rd/wr connected to every cell – when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read Basic types of RAM:• • SRAM: Static RAM – Memory cell uses flip-flop to store bit – Requires 6 transistors – Holds data as long as power supplied DRAM: Dynamic RAM – Memory cell uses MOS transistor and capacitor to store bit – More compact than SRAM – “Refresh” required due to capacitor leak • word’s cells refreshed when read – Typical refresh rate 15.625 microsec. – Slower to access than SRAM Other Types of RAM OR RAM variations:• • PSRAM: Pseudo-static RAM – DRAM with built-in memory refresh controller – Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM – Holds data after external power removed – Battery-backed RAM • SRAM with own permanently connected battery • writes as fast as reads • no limit on number of writes unlike nonvolatile ROM-based memory – SRAM with EEPROM or flash • stores complete RAM contents on EEPROM or flash before power turned off Q.4 gives the Details of Composing Memory ANS:• • • Memory size needed often differs from size of readily available memories When available memory is larger, simply ignore unneeded high-order address bits and higher data lines When available memory is smaller, compose several smaller memories into one larger memory – Connect side-by-side to increase width of words – Connect top to bottom to increase number of words • added high-order address line selects smaller memory containing desired word using a decoder • Combine techniques to increase number and width of words An embedded system designer is often faced with the situation of needing a particular-sized memory (ROM or RAM), but having readily available memories of a different size. For example, the designer may need a 2k x 8 ROM, but may have 4k x 16 ROMs readily available. Alternatively, the designer may need a 4k x 16 ROM, but may have 2k x 8 ROMs available for use. The case where the available memory is larger than needed is easy to deal with. We simply use the needed lower words in the memory, thus ignoring unneeded higher words and their high-order address bits, and we use the lower data input/output lines, thus ignoring unneeded higher data lines. (Of course, we could use the higher data lines and ignore the lower lines instead). The case where the available memory is smaller than needed requires more design effort. In this case, we must compose several smaller memories to behave as the larger memory we need. Suppose the available memories have the correct number of words, but each word is not wide enough. In this case, we can simply connect the available memories side-by-side. For example, Figure illustrates the situation of needing a ROM three-times wider than that available. We connect three ROMs side-by-side, sharing the same address and enable lines among them, and concatenating the data lines to form the desired word width. Suppose instead that the available memories have the correct word width, but not enough words. In this case, we can connect the available memories top-to-bottom. For example, Figure illustrates the situation of needing a ROM with twice as many words, and hence needing one extra address line, than that available. We connect the ROMs top-to-bottom, OR’ing the corresponding data lines of each. We use the extra high-order address line to select the higher or lower ROM (using a 1x2 decoder), Q.5 Explain the Concept of Memory Hierarchy and Cache ANS:- • • • Want inexpensive, fast memory Main memory – Large, inexpensive, slow memory stores entire program and data Cache – Small, expensive, fast memory stores copy of likely accessed parts of larger memory – Can be multiple levels of cache When we design a memory to store an embedded system’s program and data, we often face the following dilemma: we want an inexpensive and fast memory, but inexpensive memories tend to be slow, whereas fast memories tend to be expensive. The solution to this dilemma is to create a memory hierarchy, as illustrated in Figure We use an inexpensive but slow main memory to store all of the program and data. We use a small amount of fast but expensive cache memory to store copies of likelyaccessed parts of main memory. Using cache is analogous to posting on a wall near a telephone a short list of important phone numbers rather than posting the entire phonebook A cache operates as follows. When we want the processor to access (read or write) a main memory address, we first check for a copy of that location in cache. If the copy is in the cache, called a cache hit, then we can access it quickly. If the copy is not there, called a cache miss, then we must first read the address (and perhaps some of its neighbors) into the cache. This description of cache operation leads to several cache design choices: cache mapping, cache replacement policy, and cache write techniques. These design choices can have significant impact on system cost, performance, as well as power, and thus should be evaluated carefully for a given application. Cache is usually designed using static RAM rather than dynamic RAM, which is one reason that cache is more expensive but faster than main memory. Q.6 Explain the various Cache Mapping Techniques ANS:Cache mapping • Far fewer number of available cache addresses • Are address’ contents in cache? • Cache mapping used to assign main memory address to cache address and determine hit or miss • Three basic techniques: – Direct mapping – Fully associative mapping – Set-associative mapping • Caches partitioned into indivisible blocks or lines of adjacent memory addresses – usually 4 or 8 addresses per line Direct mapping:• • • Main memory address divided into 2 fields – Index • cache address • number of bits determined by cache size – Tag • compared with tag stored in cache at address indicated by index • if tags match, check valid bit Valid bit – indicates whether data in slot has been loaded from memory Offset – used to find particular word in cache line Fully associative mapping:• • • Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping Set-associative mapping:• • • • • Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative – 2-way, 4-way, 8-way are common Q.6 Explain Cache Replacement Policy ANS:- • • • • • Technique for choosing which block to replace – when fully associative cache is full – when set-associative cache’s line is full Direct mapped cache has no choice Random – replace block chosen at random LRU: least-recently used – replace block not accessed for longest time FIFO: first-in-first-out – push block onto queue when accessed – choose block to replace by popping queue Unit 6:- Interfacing Q.1 Explain the Basics of Communication ANS:Communication needs Bus and wires • • Wires: – Uni-directional or bi-directional – One line may represent multiple wires Bus – Set of wires with a single function • Address bus, data bus – Or, entire collection of wires • Address, data and control • Associated protocol: rules for communication Example: Every direction needs to rd/rw enable for transmit. Also needs some ports which is given below Ports • • • • Conducting device on periphery Connects bus to processor or memory Often referred to as a pin – Actual pins on periphery of IC package that plug into socket on printed-circuit board – Sometimes metallic balls instead of pins – Today, metal “pads” connecting processors and memories within single IC Single wire or set of wires with single function – E.g., 12-wire address port Example of timing Diagrams which data write and read also show the address of data Timing Diagrams:• • • • • • Most common method for describing a communication protocol Time proceeds to the right on x-axis Control signal: low or high – May be active low (e.g., go’, /go, or go_L) – Use terms assert (active) and deassert – Asserting go’ means go=0 Data signal: not valid or valid Protocol may have subprotocols – Called bus cycle, e.g., read and write – Each may be several clock cycles Read example – rd’/wr set low,address placed on addr for at least tsetup time before enable asserted, enable triggers memory to place data on data wires by time tread – Q.2 Explain the Microprocessor interfacing: I/O addressing ANS:- A microprocessor may have tens or hundreds of pins, many of which are control pins, such as a pin for clock input and another input pin for resetting the microprocessor. Many of the other pins are used to communicate data to and from the microprocessor, which we call processor I/O. There are two common methods for using pins to support I/O: ports, and system buses. A port is a set of pins that can be read and written just like any register in the microprocessor; in fact, the port is usually connected to a dedicated register. For example, consider an 8-bit port named P0. A Clanguage programmer may write to P0 using an instruction like: P0 = 255, which would set all 8 pins to 1’s. In this case, the C compiler manual would have defined P0 as a special variable that would automatically be mapped to the register P0 during compilation. Conversely, the programmer might read the value of a port P1 being written by some other device, by saying something like a=P1. In some microprocessors, each bit of a port can be configured as input or output by writing to a configuration register for the port. For example, P0 might have an associated configuration register called CP0. To set the high-order four bits to input and the loworder four bits to output, we might say: CP0 = 15. This writes 00001111 to the CP0 register, where a 0 means input and a 1 means output. Ports are often bitaddressable, meaning that a programmer can read or write specific bits of the port. For example, one might say: x = P0.2, giving x the value of the number 2 connection of port P0. Portbased I/O is also called parallel I/O. – – Port-based I/O (parallel I/O) • Processor has one or more N-bit ports • Processor’s software reads and writes a port just like a register • E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports Bus-based I/O • Processor has address, data and control ports that form a single bus • Communication protocol is built into the processor • A single instruction carries out the read or write protocol on the bus Types of bus-based I/O: memory-mapped I/O and standard I/O – Memory-mapped I/O • Peripheral registers occupy addresses in same address space as memory • e.g., Bus has 16-bit address – lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals – Standard I/O (I/O-mapped I/O) • Additional pin (M/IO) on bus indicates whether a memory or peripheral access • e.g., Bus has 16-bit address – all 64K addresses correspond to memory when M/IO set to 0 – all 64K addresses correspond to peripherals when M/IO set to 1 Memory-mapped I/O vs. Standard I/O • • Memory-mapped I/O – Requires no special instructions • Assembly instructions involving memory like MOV and ADD work with peripherals as well • Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory Standard I/O – No loss of memory addresses to peripherals – Simpler address decoding logic in peripherals possible • When number of peripherals much smaller than address space then high-order address bits can be ignored – smaller and/or faster comparators Figure:• ISA supports standard I/O – /IOR distinct from /MEMR for peripheral read • /IOW used for writes – 16-bit address space for I/O vs. 20-bit address space for memory – Otherwise very similar to memory protocol Q.3 Explain the Microprocessor interfacing: interrupts ANS:• • • Suppose a peripheral intermittently receives data, which must be serviced by the processor – The processor can poll the peripheral regularly to see if data has arrived – wasteful – The peripheral can interrupt the processor when it has data Requires an extra pin or pins: Int – If Int is 1, processor suspends current program, jumps to an Interrupt Service Routine, or ISR – Known as interrupt-driven I/O – Essentially, “polling” of the interrupt pin is built-into the hardware, so no extra time! What is the address (interrupt address vector) of the ISR? – Fixed interrupt • Address built into microprocessor, cannot be changed • Either ISR stored at address or a jump to actual ISR stored if not enough bytes available – Vectored interrupt • Peripheral must provide the address • Common when microprocessor has multiple peripherals connected by a system bus • Compromise: interrupt address table Interrupt-driven I/O using fixed ISR location:- Q.4 short note on Direct memory access (DMA) ANS:• • • Buffering – Temporarily storing data in memory before processing – Data accumulated in peripherals commonly buffered Microprocessor could handle this with ISR – Storing and restoring microprocessor state inefficient – Regular program must wait DMA controller more efficient – Separate single-purpose processor – Microprocessor relinquishes control of system bus to DMA controller – Microprocessor can meanwhile execute its regular program • No inefficient storing and restoring state due to ISR call • Regular program need not wait unless it requires the system bus – Harvard archictecture – processor can fetch and execute instructions as long as they don’t access data memory – if they do, processor stalls Peripheral to memory transfer without DMA, using vectored interrupt: Peripheral to memory transfer with DMA: Q.5 short note on Arbitration ANS:Types of Arbitration Priority arbiter: • • Consider the situation where multiple peripherals request service from single resource (e.g., microprocessor, DMA controller) simultaneously - which gets serviced first? Priority arbiter – Single-purpose processor – Peripherals make requests to arbiter, arbiter makes requests to resource – Arbiter connected to system bus for configuration only 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1. Microprocessor is executing its program. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2. 3. Priority arbiter sees at least one Ireq input asserted, so asserts Int. 4. Microprocessor stops executing its program and stores its state. 5. Microprocessor asserts Inta. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. 7. Peripheral1 puts its interrupt address vector on the system bus 8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns (and completes handshake with arbiter). 9. Microprocessor resumes executing its program. Daisy-chain arbitration:• • Arbitration done by peripherals – Built into peripheral or external logic added • req input and ack output added to each peripheral Peripherals connected to each other in daisy-chain manner – One peripheral connected to resource, all others connected “upstream” – Peripheral’s req flows “downstream” to resource, resource’s ack flows “upstream” to requesting peripheral – Closest peripheral has highest priority • Pros/cons – Easy to add/remove peripheral - no system redesign needed – Does not support rotating priority – One broken peripheral can cause loss of access to other peripherals Network-oriented arbitration:• • When multiple microprocessors share a bus (sometimes called a network) – Arbitration typically built into bus protocol – Separate processors may try to write simultaneously causing collisions • Data must be resent • Don’t want to start sending again at same time – statistical methods can be used to reduce chances Typically used for connecting multiple distant chips – Trend – use to connect multiple on-chip processors Q.6 short note on Multilevel bus architectures ANS:- Q.7 Explain the types of Communication [when this question ask just include three communication] OR Advanced Communication Principle [when this question ask so you have to include layering and Error detection and Correction] ANS: - three types of Communication • • • Parallel communication – Physical layer capable of transporting multiple bits of data Serial communication – Physical layer transports one bit of data at a time Wireless communication – No physical connection needed for transport at physical layer Parallel communication:• • • • Multiple data, control, and possibly power wires – One bit per wire High data throughput with short distances Typically used when connecting devices on same IC or same circuit board – Bus must be kept short • long parallel wires result in high capacitance values which requires more time to charge/discharge • Data misalignment between wires increases as length increases Higher cost, bulky Serial communication:• Single data wire, possibly also control and power wires • Words transmitted one bit at a time • Higher data throughput with long distances – Less average capacitance, so more bits per unit of time • Cheaper, less bulky • More complex interfacing logic and communication protocol – Sender needs to decompose word into bits – Receiver needs to recompose bits into word – Control signals often sent on same wire as data increasing protocol complexity Wireless communication:• • Infrared (IR) – Electronic wave frequencies just below visible light spectrum – Diode emits infrared light to generate signal – Infrared transistor detects signal, conducts when exposed to infrared light – Cheap to build – Need line of sight, limited range Radio frequency (RF) – Electromagnetic wave frequencies in radio spectrum – Analog circuitry and antenna needed on both sides of transmission – Line of sight not needed, transmitter power determines range Q.8 what is layering and Error detection and Correction ANS:Layering – Break complexity of communication protocol into pieces easier to design and understand – Lower levels provide services to higher level • Lower level might work with bits while higher level might work with packets of data – Physical layer • Lowest level in hierarchy • Medium to carry data from one actor (device or node) to another • Error detection and correction • • • • • • • Often part of bus protocol Error detection: ability of receiver to detect errors during transmission Error correction: ability of receiver and transmitter to cooperate to correct problem – Typically done by acknowledgement/retransmission protocol Bit error: single bit is inverted Burst of bit error: consecutive bits received incorrectly Parity: extra bit sent with word used for error detection – Odd parity: data word plus parity bit contains odd number of 1’s – Even parity: data word plus parity bit contains even number of 1’s – Always detects single bit errors, but not all burst bit errors Checksum: extra word sent with data packet of multiple words – e.g., extra word contains XOR sum of all data words in packet Q.9 discuss various Serial Communication Protocol ANS:Serial protocols: I2C • I2C (Inter-IC) – Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago – Enables peripheral ICs to communicate using simple communication hardware – Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode – 3.4 Mbits/s and 10-bit addressing in fast-mode – Common devices capable of interfacing to I2C bus: • EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers I2C bus structure Serial protocols: CAN • CAN (Controller area network) – Protocol for real-time applications – Developed by Robert Bosch GmbH – Originally for communication among components of cars – Applications now using CAN include: • elevator controllers, copiers, telescopes, production-line control systems, and medical instruments – Data transfer rates up to 1 Mbit/s and 11-bit addressing – Common devices interfacing with CAN: • 8051-compatible 8592 processor and standalone CAN controllers – Actual physical design of CAN bus not specified in protocol • Requires devices to transmit/detect dominant and recessive signals to/from bus • e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used • Bus guarantees dominant signal prevails over recessive signal if asserted simultaneously Serial protocols: FireWire • FireWire (a.k.a. I-Link, Lynx, IEEE 1394) – High-performance serial bus developed by Apple Computer Inc. – Designed for interfacing independent electronic components • e.g., Desktop, scanner – Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing – Plug-and-play capabilities – Packet-based layered design structure – Applications using FireWire include: • disk drives, printers, scanners, cameras – Capable of supporting a LAN similar to Ethernet • 64-bit address: – 10 bits for network ids, 1023 subnetworks – 6 bits for node ids, each subnetwork can have 63 nodes – 48 bits for memory address, each node can have 281 terabytes of distinct locations Serial protocols: USB • USB (Universal Serial Bus) – Easier connection between PC and monitors, printers, digital speakers, modems, scanners, digital cameras, joysticks, multimedia game equipment – 2 data rates: • 12 Mbps for increased bandwidth devices • 1.5 Mbps for lower-speed devices (joysticks, game pads) – Tiered star topology can be used • One USB device (hub) connected to PC – hub can be embedded in devices like monitor, printer, or keyboard or can be standalone • Multiple USB devices can be connected to hub • Up to 127 devices can be connected like this – USB host controller • Manages and controls bandwidth and driver software required by each peripheral • Dynamically allocates power downstream according to devices connected/disconnected Q.10 discuss various Parallel Communication Protocol ANS:Parallel protocols: PCI Bus • PCI Bus (Peripheral Component Interconnect) – High performance bus originated at Intel in the early 1990’s – Standard adopted by industry and administered by PCISIG (PCI Special Interest Group) – Interconnects chips, expansion boards, processor memory subsystems – Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing • Later extended to 64-bit while maintaining compatibility with 32-bit schemes – Synchronous bus architecture – Multiplexed data/address lines Parallel protocols: ARM Bus • ARM Bus – Designed and used internally by ARM Corporation – Interfaces with ARM line of processors – Many IC design companies have own bus protocol – Data transfer rate is a function of clock speed • If clock speed of bus is X, transfer rate = 16 x X bits/s – 32-bit addressing Q.11 discuss various Wireless Communication Protocol ANS:- Wireless protocols: IrDA – Protocol suite that supports short-range point-to-point infrared data transmission – Created and promoted by the Infrared Data Association (IrDA) – Data transfer rate of 9.6 kbps and 4 Mbps – IrDA hardware deployed in notebook computers, printers, PDAs, digital cameras, public phones, cell phones – Lack of suitable drivers has slowed use by applications – Windows 2000/98 now include support – Becoming available on popular embedded OS’s Wireless protocols: Bluetooth • Bluetooth – New, global standard for wireless connectivity – Based on low-cost, short-range radio link – Connection established when within 10 meters of each other – No line-of-sight required • e.g., Connect to printer in another room Wireless Protocols: IEEE 802.11 • IEEE 802.11 – Proposed standard for wireless LANs – Specifies parameters for PHY and MAC layers of network • PHY layer – physical layer – handles transmission of data between nodes – provisions for data transfer rates of 1 or 2 Mbps – operates in 2.4 to 2.4835 GHz frequency band (RF) – or 300 to 428,000 GHz (IR) • MAC layer – medium access control layer – protocol responsible for maintaining order in shared medium – collision avoidance/detection Unit 7:- Digital Camera Example Q.1 Explain the Digital Camera Example with all functionality ANS:Figure of Camera Followings points required to discuss Digital Camera • Introduction to a simple digital camera • Designer’s perspective • Requirements specification • Design - Four implementations Introduction:- • • • • • • • Putting it all together – General-purpose processor – Single-purpose processor • Custom • Standard – Memory – Interfacing Knowledge applied to designing a simple digital camera – General-purpose vs. single-purpose processors – Partitioning of functionality among different processor types Captures images Stores images in digital format – No film – Multiple images stored in camera • Number depends on amount of memory and bits used per image Downloads images to PC Only recently possible – Systems-on-a-chip • Multiple processors and memories on one IC – High-capacity flash memory Very simple description used for example – Many more features with real digital camera • Variable size images, image deletion, digital stretching, zooming in and out, etc. Designer’s perspective:• Two key tasks – Processing images and storing in memory • When shutter pressed: – Image captured – Converted to digital form by charge-coupled device (CCD) – Compressed and archived in internal memory – Uploading images to PC • Digital camera attached to PC • Special software commands camera to transmit archived images serially Charge-coupled device (CCD) • • Special sensor that captures an image Light-sensitive silicon solid-state device composed of many cells Zero-bias error • • • Manufacturing errors cause cells to measure slightly above or below actual light intensity Error typically same across columns, but different across rows Some of left most columns blocked by black paint to detect zero-bias error – Reading of other than 0 in blocked cells is zero-bias error – Each row is corrected by subtracting the average error found in blocked cells for that row Compression • • • Store more images Transmit image to PC in less time JPEG (Joint Photographic Experts Group) – Popular standard format for representing digital images in a compressed form – Provides for a number of different modes of operation – Mode used in this chapter provides high compression ratios using DCT (discrete cosine transform) – Image data divided into blocks of 8 x 8 pixels – 3 steps performed on each block • DCT • Quantization • Huffman encoding Uploading to PC • When connected to PC and upload command received – Read images from memory – Transmit serially using UART – While transmitting • Reset pointers, image-size variables and global memory pointer accordingly Requirements Specification • System’s requirements – what system should do – Nonfunctional requirements • Constraints on design metrics (e.g., “should use 0.001 watt or less”) – Functional requirements • System’s behavior (e.g., “output X should be input Y times 2”) – Initial specification may be very general and come from marketing dept. • E.g., short document detailing market need for a low-end digital camera that: – captures and stores at least 50 low-res images and uploads to PC, – costs around $100 with single medium-size IC costing less that $25, – has long as possible battery life, – has expected sales volume of 200,000 if market entry < 6 months, – 100,000 if between 6 and 12 months, – insignificant sales beyond 12 months Nonfunctional requirements:- • Design metrics of importance based on initial specification • – Performance: time required to process image – Size: number of elementary logic gates (2-input NAND gate) in IC – Power: measure of avg. electrical energy consumed while processing – Energy: battery lifetime (power x time) Constrained metrics – Values must be below (sometimes above) certain threshold Optimization metrics – Improved as much as possible to improve product Metric can be both constrained and optimization • • • • • • Performance – Must process image fast enough to be useful – 1 sec reasonable constraint • Slower would be annoying • Faster not necessary for low-end of market – Therefore, constrained metric Size – Must use IC that fits in reasonably sized camera – Constrained and optimization metric • Constraint may be 200,000 gates, but smaller would be cheaper Power – Must operate below certain temperature (cooling fan not possible) – Therefore, constrained metric Energy – Reducing power or time reduces energy – Optimized metric: want battery to last as long as possible Informal functional specification:- • • • • • Flowchart breaks functionality down into simpler functions Each function’s details could then be described in English – Done earlier in chapter Low quality image has resolution of 64 x 64 Mapping functions to a particular processor type not done at this stage Refined functional specification • • • • Refine informal specification into one that can actually be executed Can use C/C++ code to describe each function – Called system-level model, prototype, or simply model – Also is first implementation Can provide insight into operations of system – Profiling can find computationally intensive functions Can obtain sample output used to verify correctness of final implementation Executable model of digital camera Design • • • • Determine system’s architecture – Processors • Any combination of single-purpose (custom or standard) or general-purpose processors – Memories, buses Map functionality to that architecture – Multiple functions on one processor – One function on one or more processors Implementation – A particular architecture and mapping – Solution space is set of all implementations Starting point – Low-end general-purpose processor connected to flash memory • All functionality mapped to software running on processor • Usually satisfies power, size, and time-to-market constraints • If timing constraint not satisfied then later implementations could: – use single-purpose processors for time-critical functions – rewrite functional specification Implementation 1: Microcontroller alone:- • • • • • Low-end processor could be Intel 8051 microcontroller Total IC cost including NRE about $5 Well below 200 mW power Time-to-market about 3 months However, one image per second not possible – 12 MHz, 12 cycles per instruction • Executes one million instructions per second – CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations • ~100 assembly instructions each iteration • 409,000 (4096 x 100) instructions per image • Half of budget for reading image alone – Would be over budget after adding compute-intensive DCT and Huffman encoding Implementation 2: Microcontroller and CCDPP • • • CCDPP function implemented on custom single-purpose processor – Improves performance – less microcontroller cycles – Increases NRE cost and time-to-market – Easy to implement • Simple datapath • Few states in controller – Simple UART easy to implement as single-purpose processor also EEPROM for program memory and RAM for data memory added as well Microcontroller • • • • • • Synthesizable version of Intel 8051 available – Written in VHDL – Captured at register transfer level (RTL) Fetches instruction from ROM Decodes using Instruction Decoder ALU executes arithmetic operations – Source and destination registers reside in RAM Special data movement instructions used to load and store externally Special program generates VHDL description of ROM from output of C compiler/linker UART • • • • UART in idle mode until invoked – UART invoked when 8051 executes store instruction with UART’s enable register as target address • Memory-mapped communication between 8051 and all single-purpose processors • Lower 8-bits of memory address for RAM • Upper 8-bits of memory address for memory-mapped I/O devices Start state transmits 0 indicating start of byte transmission then transitions to Data state Data state sends 8 bits serially then transitions to Stop state Stop state transmits 1 indicating transmission done then transitions back to idle mode CCDPP • • • • • • • • Hardware implementation of zero-bias operations Interacts with external CCD chip – CCD chip resides external to our SOC mainly because combining CCD with ordinary logic not feasible Internal buffer, B, memory-mapped to 8051 Variables R, C are buffer’s row, column indices GetRow state reads in one row from CCD to B – 66 bytes: 64 pixels + 2 blacked-out pixels ComputeBias state computes bias for that row and stores in variable Bias FixBias state iterates over same row subtracting Bias from each element NextRow transitions to GetRow for repeat of process on next row or to Idle state when all 64 rows completed Software • • • System-level model provides majority of code – Module hierarchy, procedure names, and main program unchanged Code for UART and CCDPP modules must be redesigned – Simply replace with memory assignments • xdata used to load/store variables over external memory bus • _at_ specifies memory address to store these variables • Byte sent to U_TX_REG by processor will invoke UART • U_STAT_REG used by UART to indicate its ready for next byte – UART may be much slower than processor – Similar modification for CCDPP code All other modules untouched Analysis • • Entire SOC tested on VHDL simulator – Interprets VHDL descriptions and functionally simulates execution of system • Recall program code translated to VHDL description of ROM – Tests for correct functionality – Measures clock cycles to process one image (performance) Gate-level description obtained through synthesis – Synthesis tool like compiler for SPPs – Simulate gate-level models to obtain data for power analysis • Number of times gates switch from 1 to 0 or 0 to 1 – Count number of gates for chip area Implementation 2: Microcontroller and CCDPP • Analysis of implementation 2 – Total execution time for processing one image: • 9.1 seconds – Power consumption: • 0.033 watt – Energy consumption: • 0.30 joule (9.1 s x 0.033 watt) – Total chip area: • 98,000 gates Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT • • 9.1 seconds still doesn’t meet performance constraint of 1 second DCT operation prime candidate for improvement – Execution of implementation 2 shows microprocessor spends most cycles here – Could design custom hardware like we did for CCDPP • More complex so more design effort – Instead, will speed up DCT functionality by modifying behavior DCT floating-point cost:• • Floating-point cost – DCT uses ~260 floating-point operations per pixel transformation – 4096 (64 x 64) pixels per image – 1 million floating-point operations per image – No floating-point support with Intel 8051 • Compiler must emulate – Generates procedures for each floating-point operation • mult, add – Each procedure uses tens of integer operations – Thus, > 10 million integer operations per image – Procedures increase code size Fixed-point arithmetic can improve on this Fixed-point arithmetic:- • • Integer used to represent a real number – Constant number of integer’s bits represents fractional portion of real number • More bits, more accurate the representation – Remaining bits represent portion of real number before decimal point Translating a real constant to a fixed-point representation – Multiply real value by 2 ^ (# of bits used for fractional part) – Round to nearest integer – E.g., represent 3.14 as 8-bit integer with 4 bits for fraction • 2^4 = 16 • 3.14 x 16 = 50.24 ≈ 50 = 00110010 • 16 (2^4) possible values for fraction, each represents 0.0625 (1/16) • Last 4 bits (0010) = 2 • 2 x 0.0625 = 0.125 • 3(0011) + 0.125 = 3.125 ≈ 3.14 (more bits for fraction would increase accuracy) Fixed-point arithmetic operations:• • • Addition – Simply add integer representations – E.g., 3.14 + 2.71 = 5.85 • 3.14 → 50 = 00110010 • 2.71 → 43 = 00101011 • 50 + 43 = 93 = 01011101 • 5(0101) + 13(1101) x 0.0625 = 5.8125 ≈ 5.85 Multiply – Multiply integer representations – Shift result right by # of bits in fractional part – E.g., 3.14 * 2.71 = 8.5094 • 50 * 43 = 2150 = 100001100110 • >> 4 = 10000110 • 8(1000) + 6(0110) x 0.0625 = 8.375 ≈ 8.5094 Range of real values used limited by bit widths of possible resulting values Implementation 4: Microcontroller and CCDPP/DCT • Analysis of implementation 4 – Total execution time for processing one image: • 0.099 seconds (well under 1 sec) – Power consumption: • 0.040 watt • Increase over 2 and 3 because SOC has another processor – Energy consumption: • 0.00040 joule (0.099 s x 0.040 watt) • Battery life 12x longer than previous implementation!! – Total chip area: • 128,000 gates • Significant increase over previous implementations Unit 8:- Embedded Software Development Tools Q.1 what is Host and Target Machine Also Explain Tools of Host and Target Machine ANS:-  Host: Where the embedded software is developed, compiled, tested, debugged, optimized, and prior to its translation into target device. (Because the host has keyboards, editors, monitors, printers, more memory, etc. for development, while the target may have not of these capabilities for developing the software.)  Target: After development, the code is cross-compiled, translated – cross-assembled, linked (into target processor instruction set) and located into the target Following Tools Required Target Machine  Cross-Compilers –  Native tools are good for host, but to port/locate embedded code to target, the host must have a tool-chain that includes a cross-compiler, one which runs on the host but produces code for the target processor  Cross-compiling doesn’t guarantee correct target code due to (e.g., differences in word sizes, instruction sizes, variable declarations, library functions)  Cross-Assemblers and Tool Chain  Host uses cross-assembler to assemble code in target’s instruction syntax for the target Tool chain is a collection of compatible, translation tools, which are ‘pipelined’ to produce a complete binary/machine code that can be linked and located into the target processor  (See Fig 9.1) Q.2 Discuss Linker/Locators for Embedded Software ANS:-  Native linkers are different from cross-linkers (or locators) that perform additional tasks to locate embedded binary code into target processors Address Resolution  Native Linker: produces host machine code on the hard-drive (in a named file), which the loader loads into RAM, and then schedules (under the OS control) the program to go to the CPU.  In RAM, the application program/code’s logical addresses for, e.g., variable/operands and function calls, are ordered or organized by the linker. The loader then maps the logical addresses into physical addresses – a process called address resolution. The loader then loads the code accordingly into RAM (see Fig 9.2). In the process the loader also resolves the addresses for calls to the native OS routines  Locator: produces target machine code (which the locator glues into the RTOS) and the combined code (called map) gets copied into the target ROM. The locator doesn’t stay in the target environment, hence all addresses are resolved, guided by locating-tools and directives, prior to running the code (See Fig) Locating Program Components – Segments  Unchanging embedded program (binary code) and constants must be kept in ROM to be remembered even on power-off  Changing program segments (e.g., variables) must be kept in RAM  Chain tools separate program parts using segments concept  Chain tools (for embedded systems) also require a ‘start-up’ code to be in a separate segment and ‘located’ at a microprocessor-defined location where the program starts execution  Some cross-compilers have default or allow programmer to specify segments for program parts, but cross-assemblers have no default behavior and programmer must specify segments for program parts See Fig locating of object-code segments in ROM and RAM Q.2 Explain different ways to getting Embedded Software into the Target System ANS:1. PROM Programmers  Moving maps into ROM or PROM, is to create a ROM using hardware tools or a PROM programmer (for small and changeable software, during debugging)  If PROM programmer is used (for changing or debugging software), place PROM in a socket (which makes it erasable – for EPROM, or removable/replaceable) rather than ‘burnt’ into circuitry  PROM’s can be pushed into sockets by hand, and pulled using a chip puller  The PROM programmer must be compatible with the format (syntax/semantics) of the Map See Fig 2. ROM Emulators – Another approach is using a ROM emulator (hardware) which emulates the target system, has all the ROM circuitry, and a serial or network interface to the host system. The locator loads the Map into the emulator, especially, for debugging purposes.  Software on the host that loads the Map file into the emulator must understand (be compatible with) the Map’s syntax/semantics See Fig 3. Using Flash Memory  For debugging, a flash memory can be loaded with target Map code using a software on the host over a serial port or network connection (just like using an EPROM)  Advantages:  No need to pull the flash (unlike PROM) for debugging different embedded code  Transferring code into flash (over a network) is faster and hassle-free  New versions of embedded software (supplied by vendor) can be loaded into flash memory by customers over a network - Requires a) protecting the flash programmer, saving it in RAM and executing from there, and reloading into flash after new version is written and b) the ability to complete loading new version even if there are crashes and protecting the startup code as in (a)  Modifying and/or debugging the flash programming software requires moving it into RAM, modify/debug, and reloading it into target flash memory using above methods 4. Monitors Another option you have on systems with a Communication port is use a Monitor a Program that Resides in the target ROM and knows how to load new programs onto the Systems Unit 9:- Debugging Techniques Q.1 Why Testing on your Host Machine Explain Goals of the typical Testing Process ANS:- Goal of Testing Process  Store test results (target may not even have disk drive to store results) Q.2 Explain Basic Techniques for Testing on your Host Machine ANS:- Testing on Host Machine – Basic Techniques 1.  Target system on the left: (hardware-indep code, hardware-dep code, hw)  Test system (on host) on the right: (hardware-indep code – same, scaffold – rest)  Scaffold provides (in software) all functionalities and calls to hardware as in the hardware-dep and hardware components of the target system – more like a simulator for them! Fig:- Testing on Host Machine – Basic Techniques 2  Radio.c -- hardware independent code  Radiohw.c – hardware dependent code (only interface to hw: inp() and outp() supporting vTurnOnTransmitter() and vTurnOffTransmitter() functions  Inp() and outp() must have real hardware code to read/write byte data correctly - makes testing harder!!  Replace radiohw.c with scaffold, eliminating the need for inp() and outp() – both are simulated in software – a program stub!! Testing on Host Machine – Basic Techniques 3 Calling Interrupt Routines –  Embedded systems are interrupt-driven, so to test based on interrupts  1) Divide interrupt routines into two components  A) a component that deals with the hardware  B) a component of the routine which deals with the rest of the system  2) To test, structure the routine such that the hardware-dependent component (A) calls the hardware-independent part (B).  3) Write component B in C-language, so that the test scaffold can call it  Hw component (A) is vHandleRxHardware(), which reads characters from the hw  Sw component (B) is vHandleByte, called by A to buffer characters, among others  The test scaffold, vTestMain(), then calls vHandleByte(), to test if the system works [where vTestMain() pretends to be the hardware sending the chars to vHandleByte()] Testing on Host Machine – Basic Techniques 4  Calling the Timer Interrupt Routine  Design the test scaffold routine to directly call the timer interrupt routine, rather than other part of the host environment, to avoid interruptions in the scaffold’s timing of events  This way, the scaffold has control over sequences of events in the test which must occur within intervals of timer interrupts  Script Files and Output Files  To let the scaffold test the system in some sequence or repeated times, write a script file (of commands and parameters) to control the test  Parse the script file, test system based on commands/parameters, and direct output – intermixture of the input-script and output lines – into an output file The commands in the script cause the scaffold to call routines in the B (sw-indp) component -- See Fig 10.5 and Fig 10.6 – for the cordless bar-code scanner Testing on Host Machine – Basic Techniques 5  More Advanced Techniques  Making the scaffold automatically control sequence of events – e.g., calling the printer interrupt many times but in a controlled order to avoid swamping  Making the scaffold automatically queue up requests-to-send output lines, by automatically controlling the button interrupt routine, which will cause successive pressing of a button to let the next output line be received from the hardware (the printer interrupt routine). In this way, the hardware-independent software is controlled by the scaffold, where the button interrupts serve as a switch The scaffold may contain multiple instances of the software-independent code, and the scaffold serves as a controller of the communication between the instances – where each instance is called by the scaffold when the hardware interrupt occurs (e.g., the scanner or the cash register). In this way, the scaffold simulates the hardware (scanner or register) and provides communication services to the software-independent code instances it calls. Fig Testing on Host Machine – Basic Techniques 6  Objections, Limitations, and Shortcomings  1) Hard to test parts which are truly hardware dependent, until the target system is operational. Yet, good to test most sw-independent parts on host (see Fig 10.8)  2) Time and effort in writing scaffold – even if huge, it is worthwhile  3) Having the scaffold run on the host and its RTOS – scaffold can run as low priority task within the RTOS and have nicely integrated testing environment  4) The hard to justify limitations – can’t tell in scaffold until the actual test  Writing to the wrong hardware address – software/hardware interactions  Realistic interrupt latency due to differences in processor speeds (host v. target)  Real interrupts that cause shared-data problems, where real enable/disable is the key  Differences in network addressing, size of data types, data packing schemes – portability issues Q.3 Explain Instruction Set Simulators with its Useful abilities ANS:- Instruction Set Simulators  Using software to simulate:  The target microprocessor instruction set  The target memory (types - RAM)  The target microprocessor architecture (interconnections and components)  Simulator – must understand the linker/locator Map format, parse and interpret it  Simulator – takes the Map as input, reads the instructions from simulated ROM, reads/writes from/to simulated registers  Provide a user interface to simulator for I/O, debugging (using, e.g., a macro language) Instruction Set Simulators – 1  Capabilities of Simulators:  Collect statistics on # instructions executed, bus cycles for estimating actual times  Easier to test assembly code (for startup software and interrupt routines) in simulator  Easier to test for portability since simulator takes same Map as the target  Other parts, e.g., timers and built-in peripherals, can be tested in the corresponding simulated versions in the simulated microprocessor architecture  What simulators can’t help:  Simulating and testing ASICs, sensors, actuators, specialized radios (perhaps, in future systems!!)  Lacking I/O interfaces in simulator to support testing techniques discussed (unless additional provision is made for I/O to support the scaffold; and scripts to format and reformat files between the simulator, simulated memory, and the scaffold) Q.4 Explain The assert Macro ANS: 10.3 The assert Macro  The assert is used (with a boolean-expression parameter) to check assumptions  If the expression is TRUE nothing happens, if FALSE, a message is printed and the program crashes  Assert works well in finding bugs early, when testing in the host environment  On failure, assert causes a return to the host operating systems (can’t do on target, and can’t print such message on target – may not have the display unit)  Assert macro that runs on the target are useful for spotting problems:  1) disabling interrupts and spin in infinite loop – effectively stopping the system  2) turn on some pattern of LEDs or blinking device  3) write special code memory for logic analyzer to read  4) write location of the instruction that cause problem to specific memory for logic analyzer to read (the Map can help isolate which source code is the culprit!)  5) execute an illegal op or other to stop the system – e.g., using in-circuit emulators Example:- Q.5 Explain various Laboratory Tools ANS:1. Volt Meters and Ohm Meters If you have any doubts about the correctness or the reliability of the hardware on which you are testing your software volt meter for measuring the voltage difference between two points An ohm meter for measuring the resistance between two points 2. Oscilloscopes  Oscilloscopes (scopes) test events that repeat periodically – monitoring one or two signals (graph of time v. voltage), triggering mechanism to indicate start of monitoring, adjust vertical to know ground-signal, used as voltmeter (flat graph at some vertical relative to ground signal), test if a device/part is working – is graph flat? Is the digital signal coming through – expecting a quick rising/falling edge (from 0 – VCC or VCC – 0) – if not, scope will show slow rising/falling – indicating loading, bus fight, or other hardware problem 3. Logic Analyzer  Like storage scopes that (first) capture many signals and displays them simultaneously  It knows only of VCC and ground voltage levels (displays are like timing diagrams) – Real scopes display exact voltage (like analog)  Can be used to trigger on-symptom and track back in stored signal to isolate problem  Many signals can be triggered at their low and/or high points and for how long in that state  Used in Timing or State Mode  Logic Analyzers in Timing Mode  Find out if an event occurred – did cordless scanner turn on the radio?  Measure how long it took software to respond to an interrupt (e.g., between a button interrupt signal and activation signal of a responding device – to turn off an bell) Is the software putting out the right pattern of signals to control a hardware device – looking back in the captured signal for elapsed time 4. In-Circuit Emulators (ICE)  Replaces target microprocessor in target circuitry (with some engineering)  Has all the capabilities of a software debugger  Maintains trace, similar to that of an LA’s  Has overlay memory to emulate ROM and RAM for a specified range of address within the ICE (rather than the system’s main ROM or RAM) – facilitates debugging  ICE v. LA  LA’s have better trace and filtering mechanism, and easier to detail and find problems  LA’s run in timing mode  LA’s work with any microprocessor – ICE is microprocessor-specific  LA’s support many but select signals to attach, ICE requires connecting ALL signals  ICE is more invasive 5. Software-Only Monitors  Monitors allow running an embedded system in the target environment, while providing debugging interfaces on both the host and target environments  A small portion of the Monitor resides in the target ROM (debugging kernel or monitor):  The codes receives programs from serial port, network, copies into target’s RAM, and run it with full debugging capabilities to test/debug the programs  Another portion of monitor resides on host – provides debugging capability and communicates with the debugging kernel over serial port or network, without hardware modifications  Compiled, linked (may be located into Map) code is downloaded from the host (by the portion on the host) to the target RAM or flash (received by the kernel) Other designs: ROM Emulator interface and JPAG comm. port on the target processor All Students informs that this Material Covers 9 units out of 10 just you have to read 10th units from the book Which given in hard Material that is last ================= End================

ISR - patel group of institutions

Related documents

Products

Support

ISR - patel group of institutions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib