Introduction to DISTRIBUTED SYSTEMS Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology 2009-2010 1 Outline • • • • Why distributed systems needed ? Examples Definitions Goals to build distributed systems 2009-2010 2 Why distributed systems needed ? (1) • Functional distribution: computers have different functional capabilities – Client/server – Host/terminal – Data gathering/data processing sharing of resources with specific functionalities • Inherent distribution: stemming from application domain, e.g., – cash register and inventory systems for supermarket chains – computer supported collaborative work 2009-2010 3 Why distributed systems needed ? (2) • Load distribution/balancing: assign tasks to computers such that overall performance is optimized • Replication of processing power: independent computers working on the same task – collection of microcomputers may have processing power that no supercomputer will ever achieve 2009-2010 4 Why distributed systems needed ? (3) • Physical separation: relying on the fact that computers are physically separated (e.g., to satisfy reliability requirements) • Economics: collections of microprocessors offer a better price/performance ratio than large mainframes – mainframes: 10 times faster, 1000 times as expensive 2009-2010 5 Examples (1) • Network of workstations – all files accessible from all machines in the same way and using the same path name – system looks for the best place to execute a command distributed system • Workflow information system: automatic order processing – people from several departments at different locations – users unaware how an order to be processed distributed system 2009-2010 6 Examples (2) • World Wide Web: offering uniform model of distributed documents – in theory, no need to know where the document is fetched – in practice, the location should be awared 2009-2010 7 Examples (3) • Internet intranet % ISP % % % desktop computer: server: network link: backbone • interconnected collection of computer networks of many satellite link different types • computer interacts by passing messages using a common means of communication 2009-2010 8 Examples (4) • Intranet email s erv er Desktop computers print and other servers Web server Local area netw ork email s erv er print File s erv er other servers the rest of the Internet • resources shared to different computers router/firew all 2009-2010 9 Definitions (1) “A system in which hardware or software located at networked computers communicate and coordinate their actions only by message passing”. [Coulouris] “A system that consists of a collection of two or more independent computers which coordinate their processing through exchange of synchronous or asynchronous message passing”. 2009-2010 10 Definitions (2) “A distributed system is a collection of independent computers that appear to the users of the system as a single computer”. [Tanenbaum] “A distributed system is a collection of autonomous computers linked by a network with software designed to produce an integrated computing facility”. 2009-2010 11 Computer networks vs. Distributed systems • Computer network: autonomous computers are explicitly visible (have to be explicitly addressed) • Distributed system: existence of multiple computers is transparent • However, – many problems in common – in some sense networks (or parts of them, e.g. name services) are also distributed systems – normally, every distributed system relies on services provided by a computer network 2009-2010 12 Which examples are distributed systems ? • Network of workstations distributed system • Workflow information system: automatic order processing distributed system • World Wide Web not fully qualified as a distributed system (Tanenbaum) distributed system (Coulouris) 2009-2010 13 Middleware service Machine A Machine B Machine C Distributed applications Middleware service Local OS Local OS Local OS • To guarantee – supporting heterogeneous computers – providing single view to users 2009-2010 14 Goals to build a distributed systems (1) • Connecting users and resources – sharing resource – easier to collaborate and exchange information disadvantage: security (intrusion), privacy violation (communication tracking) 2009-2010 15 Goals to build a distributed systems (2) • Transparency Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Replication tradeoff high Hide thatbetween a resource mayabe moveddegree to another of location while in use transparency and the performance of system Hide that a resource may have many copies Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk 2009-2010 16 Goals to build a distributed systems (3) • Openness – Offering services according to standard rules that describe syntax and semantics of those services • syntax specification: in interface definition language • semantic specification: in natural language – Interoperability and portability – Flexibility: using different components from different developers 2009-2010 17 Goals to build a distributed systems (4) • Scalability – Measured in three dimensions • size: more users, resources can be added easily • geographics: users, resources may lie far apart • administration: still easy to manage even spanning many independent administrative organizations – Some problems must be solved • size: centralization – centralized service: single server for all users – centralized data: single online telephone book – centralized algorithm: routing based on complete information 2009-2010 18 Goals to build a distributed systems (5) • size: centralization – centralized service: single server for all users – centralized data: single online telephone book – centralized algorithm: routing based on complete information • geographics: synchronous & unreliable communication, – some system only designed for LAN (blocking communication depends strongly on quick response) • administration: conflicting policies w.r.t. resource usage, management, security 2009-2010 19 Scaling techniques • Asynchronous communication • Distribution • Replication, caching 2009-2010 20 Some numbers (1) • Computers in the Internet Date 1979, Dec. 1989, July 1999, July 2003, Jan. Computers 188 130,000 56,218,000 171,638,297 2009-2010 Web servers 0 0 5,560,866 35,424,956 21 Some numbers (2) • Computers vs. Web servers in the Internet Date 1993, July 1995, July 1997, July 1999, July 2001, July Computers Web servers 1,776,000 6,642,000 19,540,000 56,218,000 125,888,197 2009-2010 130 23,500 1,203,096 6,598,697 31,299,592 Percentage 0.008 0.4 6 12 25 22 Text books & materials • Andrew S. Tanenbaum, Maaten Van Steen, Distributed Systems: Principles and Paradigms, Prentice Hall, Second Edition, 2007 • George Coulouris, Jean Dollimore, Tim Kindberg, Distributed Systems: Concepts and Design, Addison Wesley, Fourth Edition, 2005 • Google 2009-2010 23 How are you evaluated ? • HW & quizzes: 30% • Mid-term exam: 30% • Final exam: 40% 2009-2010 24 How to reach me • hoai@cse.hcmut.edu.vn or hoaitv@gmail.com • http://www.cse.hcmut.edu.vn/~hoai 2009-2010 25