CS 8791 — CLOUD COMPUTING NOVEMBER/DECEMBER 2021. 1.Define Cloud computing. ANS: According to NIST, Cloud computing is a model for enabling ubiquitous, convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. 2.Depict the importance of on-demand provisioning in e-commerce applications. ANS: The on-demand provisioning is the important benefit provided by cloud computing. The on-demand provisioning in cloud computing refers to process for the deployment, integration and consumption of cloud resources or services by an individuals or enterprise IT organizations. The on-demand model provides an enterprise with the ability to scale computing resources up or down. 3.Define the role and benefit of virtualization in cloud ANS: Virtualization is a computer architecture technology by which multiple virtual machines (VMs) are multiplexed in the same hardware machine. The purpose of a VM is to enhance resource sharing by many users and improve computer performance in terms of resource utilization and application flexibility. Hardware resources such as CPU, memory, I/O devices, or software resources such as OS software libraries can be virtualized 4.What is disaster recovery? ANS:. The term cloud disaster recovery (cloud DR) refers to the strategies and services enterprises apply for the purpose of backing up applications, resources, and data into a cloud environment. Cloud DR helps protect corporate resources and ensure business continuity. 5.What is a Hybrid cloud? ANS: hybrid cloud platforms connect public and private resources in different ways, but they often incorporate common industry technologies, such as Kubernetes to orchestrate container-based services. Examples include AWS Outposts, Azure Stack, Azure Arc, Google Anthos and VMware Cloud on AWS. 6.Outline the key challenges associated in the process of storing images in cloud. ANS: Security issues. ... Cost management and containment. ... Lack of resources/expertise. ... Governance/Control. ... Compliance. ... Managing multiple clouds. ... Performance. ... Building a private cloud. 7. What is Inter-cloud? ANS: Intercloud or 'cloud of clouds' is a term refer to a theoretical model for cloud computing services based on the idea of combining many different individual clouds into one seamless mass in terms of on-demand operations. 8. Name any two security challenges associated with cloud in today’s digital scenario. ANS: Unauthorized Access. Insecure Interfaces/APIs. Hijacking of Accounts. Lack of Visibility 9. What is Hadoop? ANS: Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. 10. Write a note on Federated services. ANS. Federated Services is a common Identity Management term that simply means the process of connecting two or more organizations or applications in such a way that authorization from one application will transfer to another federated application. PART-B 11(a) Formulate stage-by-stage evolution of cloud with neat sketch and formulate any three benefits, drawbacks achieved by it in the banking and insurance sectors. ANS: Cloud computing is all about renting computing services. This idea first came in the 1950s. In making cloud computing what it is today, five technologies played a vital role. These are distributed systems and its peripherals, virtualization, web 2.0, service orientation, and utility computing. Distributed Systems: It is a composition of multiple independent systems but all of them are depicted as a single entity to the users. The purpose of distributed systems is to share resources and also use them effectively and efficiently. Distributed systems possess characteristics such as scalability, concurrency, continuous availability, heterogeneity, and independence in failures. But the main problem with this system was that all the systems were required to be present at the same geographical location. Thus to solve this problem, distributed computing led to three more types of computing and they were-Mainframe computing, cluster computing, and grid computing. Mainframe computing: Mainframes which first came into existence in 1951 are highly powerful and reliable computing machines. These are responsible for handling large data such as massive input-output operations. Even today these are used for bulk processing tasks such as online transactions etc. These systems have almost no downtime with high fault tolerance. After distributed computing, these increased the processing capabilities of the system. But these were very expensive. To reduce this cost, cluster computing came as an alternative to mainframe technology. Cluster computing: In 1980s, cluster computing came as an alternative to mainframe computing. Each machine in the cluster was connected to each other by a network with high bandwidth. These were way cheaper than those mainframe systems. These were equally capable of high computations. Also, new nodes could easily be added to the cluster if it was required. Thus, the problem of the cost was solved to some extent but the problem related to geographical restrictions still pertained. To solve this, the concept of grid computing was introduced. Grid computing: In 1990s, the concept of grid computing was introduced. It means that different systems were placed at entirely different geographical locations and these all were connected via the internet. These systems belonged to different organizations and thus the grid consisted of heterogeneous nodes. Although it solved some problems but new problems emerged as the distance between the nodes increased. The main problem which was encountered was the low availability of high bandwidth connectivity and with it other network associated issues. Thus. cloud computing is often referred to as “Successor of grid computing”. Virtualization: It was introduced nearly 40 years back. It refers to the process of creating a virtual layer over the hardware which allows the user to run multiple instances simultaneously on the hardware. It is a key technology used in cloud computing. It is the base on which major cloud computing services such as Amazon EC2, VMware vCloud, etc work on. Hardware virtualization is still one of the most common types of virtualization. Web 2.0: It is the interface through which the cloud computing services interact with the clients. It is because of Web 2.0 that we have interactive and dynamic web pages. It also increases flexibility among web pages. Popular examples of web 2.0 include Google Maps, Facebook, Twitter, etc. Needless to say, social media is possible because of this technology only. In gained major popularity in 2004. Service orientation: It acts as a reference model for cloud computing. It supports low-cost, flexible, and evolvable applications. Two important concepts were introduced in this computing model. These were Quality of Service (QoS) which also includes the SLA (Service Level Agreement) and Software as a Service (SaaS). Utility computing: It is a computing model that defines service provisioning techniques for services such as compute services along with other major services such as storage, infrastructure, etc which are provisioned on a pay-per-use basis. BENEFITS: Inexpensive Cloud computing curtails capital costs and huge upfront infrastructure expenses. So, financial corporations and banks can concentrate on important business deals and projects. In cloud computing, banks and the financial corporation do not require to buy budget shortening hardware. Augmented management Cloud computing helps financial corporations and banks make a rapid adjustment to their reserves in case of unexpected and dynamic business requests. It also uploads applications faster due to enhanced management of cloud computing without maintenance. Stability The cloud computing system is very beneficial for banks and financial corporations. It is due to building a wide enterprise availability that is important to continue a business investment 11.) (b) Discuss the underlying parallel and distributed computing principles adopted by cloud in the IT sector and brief the drawbacks incurred. ANS: In previous section, we have seen the evolution of cloud computing with respect to its hardware, internet, protocol and processing technologies. This section briefly explains about the principals of two essential computing mechanisms which largely used in cloud computing called Parallel and Distributed computing. Computing in computer technology can be defined as the execution of single or multiple programs, applications, tasks or activities, sequentially or parallelly on one or more computers. The two basic approaches of computing are serial and parallel computing. Parallel Computing As single processor system is becoming archaic and quaint for doing fast computation as required by real-time applications. So parallel computing is needed to speed up the execution of real-time applications to achieve high performance. The parallel computing makes use of multiple computing resources to solve a complex computational problem in which the problem is broken into discrete parts that can be solved concurrently . Each part is further broken down into a series of instructions which execute simultaneously on different processors using overall control/coordination mechanism. Here, the different processors share the work-load which results in producing the much higher computing power and performance than could not be achieved with traditional single processor system. The parallel computing often correlated with parallel processing and parallel programming. Processing of multiple tasks and subtasks simultaneously on multiple processors is called parallel processing while parallel programming refers to programming on a multiprocessor system using the divide-and-conquer technique, where given task is divided into subtasks and each subtask is processed on different processors. Distributed Computing As per Tanenbaum, the definition of distributed system is referring to a collection of independent computers that appears to its users as a single coherent system. The term distributed computing encompasses any architecture or system that allows the computation to be broken down into units and executed concurrently on different computing elements. It is a computing concept that refers to multiple computer systems connected in a network working on a single problem. In distributed computing, a single problem is divided into many parts, and each part is executed by different computers. As long as the computers are networked, they can communicate with each other to solve the problem. If it is done properly, the computers perform like a single entity. The ultimate goal of distributed computing is to maximize performance by connecting users and IT resources in a cost-effective, transparent and reliable manner. This type of computing is highly scalable. The Conceptual view of distributed system Distributed computing networks can be connected as local networks or through a wide area network if the machines are in a different geographic location. Processors in distributed computing systems typically run in parallel. In enterprise settings, distributed computing generally puts various steps in business processes at the most efficient places in a computer network. For example, a typical distribution has a threetier model that organizes applications into the presentation tier (or user interface), the application tier and the data tier. These tiers function as follows: 1. User interface processing occurs on the PC at the user's location 2. Application processing takes place on a remote computer 3. Database access and processing algorithms happen on another computer that provides centralized access for many business processes In addition to the three-tier model, other types of distributed computing include client-server, ntier and peer-to-peer: Client-server architectures. These use smart clients that contact a server for data, then format and display that data to the user. N-tier system architectures. Typically used in application servers, these architectures use web applications to forward requests to other enterprise services. Peer-to-peer architectures. These divide all responsibilities among all peer computers, which can serve as clients or servers. 12. (a) Outline the various levels of virtualization with an example for each category. ANS: The virtualization is implemented at various levels by creating a software abstraction layer between host OS and Guest OS. The main function of software layer is to virtualize physical hardware of host machine in to virtual resources used by VMs by using various operational layers. The different levels at which the virtualization can be implemented is There are five implementation levels of virtualization, that are Instruction Set Architecture (ISA) level, Hardware level, Operating System level, Library support level and Application level which are explained as follows. Instruction Set Architecture Level Virtualization at the instruction set architecture level is implemented by emulating· an instruction set architecture completely on software stack. An emulator tries to execute instructions issued by the guest machine (the virtual machine that is being emulated) by translating them to a set of native instructions and then executing them on the available hardware. That is emulator works by translating instructions from the guest platform to· instructions of the host platform. These instructions would include both processor oriented (add, sub, jump etc.), and the I/O specific (IN/OUT) instructions for the devices. Although this virtual machine architecture works fine in terms of simplicity and robustness, it has its own pros and cons. The advantages of ISA are, it provides ease of implementation while dealing with· multiple platforms and it can easily provide infrastructure through which one can create virtual machines based on x86 platforms such as Sparc and Alpha. The disadvantage of ISA is since every instruction issued by the emulated computer needs to be interpreted in software first which degrades the performance. The popular emulators of ISA level virtualization are :· Hardware Abstraction Layer Virtualization At the Hardware Abstraction Layer (HAL) exploits the similarity in· architectures of the guest and host platforms to cut down the interpretation latency. The time spent in instruction interpretation of guest platform to host platform is reduced by taking the similarities exist between them Virtualization technique helps map the virtual resources to physical resources and use the native hardware for computations in the virtual machine. This approach generates a virtual hardware environment which virtualizes the computer resources like CPU, Memory and IO devices. For the successful working of HAL the VM must be able to trap every privileged· instruction execution and pass it to the underlying VMM, because multiple VMs running own OS might issue privileged instructions need full attention of CPU’s .If it is not managed properly then VM may issues trap rather than generating an exception that makes crashing of instruction is sent to the VMM. However, the most popular platform, x86, is not fully-virtualizable, because it is been observed that certain privileged instructions fail silently rather than trapped when executed with insufficient privileges. Operating System Level Virtualization The operating system level virtualization is an abstraction layer between OS and· user applications. It supports multiple Operating Systems and applications to be run simultaneously without required to reboot or dual boot. The degree of isolation of each OS is very high and can be implemented at low risk with easy maintenance. The implementation of operating system level virtualization includes, operating system installation, application suites installation, network setup, and so on. Therefore, if the required OS is same as the one on the physical machine then the user basically ends up with duplication of most of the efforts, he/she has already invested in setting up the physical machine. To run applications properly the operating system keeps the application specific data structure, user level libraries, environmental settings and other requisites separately. The key idea behind all the OS-level virtualization techniques is virtualization layer· above the OS produces a partition per virtual machine on demand that is a replica of the operating environment on the physical machine. With a careful partitioning and multiplexing technique, each VM can be able to export a full operating environment and fairly isolated from one another and from the underlying physical machine. Library Level Virtualization Most of the system uses extensive set of Application Programmer Interfaces (APIs) instead of legacy System calls to implement various libraries at user level. Such APIs are designed to hide the operating system related details to keep it simpler for normal programmers. In this technique, the virtual environment is created above OS layer and is mostly used to implement different Application Binary Interface (ABI) and Application Programming Interface (API) using the underlying system. The example of Library Level Virtualization is WINE. The Wine is an implementation of the Windows API, and can be used as a library to port Windows applications to UNIX. It is a virtualization layer on top of X and UNIX to export the Windows API/ABI which allows to run Windows binaries on top of it. Application-Level Virtualization In this abstraction technique the operating systems and user-level programs executes like applications for the machine. Therefore, specialize instructions are needed for hardware manipulations like I/O mapped (manipulating the I/O) and Memory mapped (that is mapping a chunk of memory to the I/O and then manipulating the memory). The group of such special instructions constitutes the application called Application level Virtualization. The Java Virtual Machine (JVM) is the popular example of application level virtualization which allows creating a virtual machine at the application-level than OS level. It supports a new self-defined set of instructions called java byte codes for JVM. Cloud Computing 2 - 31 Cloud Enabling Technologies TECHNICAL PUBLICATIONS® - An up thrust for knowledge Such VMs pose little security threat to the system while letting the user to play with it like physical machines. Like physical machine it has to provide an operating environment to its applications either by hosting a commercial operating system, or by coming up with its own environment. 12 b)Outline the problems in virtualizing in CPU, I/O and memory devices and suggest how it could be overridden for efficient utilization of cloud services. ANS: To support virtualization, processors such as the x86 employ a special running mode and instructions, known as hardware-assisted virtualization. In this way, the VMM and guest OS run in different modes and all sensitive instructions of the guest OS and its applications are trapped in the VMM. To save processor states, mode switching is completed by hardware. For the x86 architecture, Intel and AMD have proprietary technologies for hardware-assisted virtualization. 1. Hardware Support for Virtualization Modern operating systems and processors permit multiple processes to run simultaneously. If there is no protection mechanism in a processor, all instructions from different processes will access the hardware directly and cause a system crash. Therefore, all processors have at least two modes, user mode and supervisor mode, to ensure controlled access of critical hardware. Instructions running in supervisor mode are called privileged instructions. Other instructions are unprivileged instructions. In a virtualized environment, it is more difficult to make OSes and applications run correctly because there are more layers in the machine stack. At the time of this writing, many hardware virtualization products were available. The VMware Workstation is a VM software suite for x86 and x86-64 computers. This software suite allows users to set up multiple x86 and x86-64 virtual computers and to use one or more of these VMs simultaneously with the host operating system. The VMware Workstation assumes the host-based virtualization. Xen is a hypervisor for use in IA-32, x86-64, Itanium, and PowerPC 970 hosts. Actually, Xen modifies Linux as the lowest and most privileged layer, or a hypervisor. One or more guest OS can run on top of the hypervisor. KVM (Kernel-based Virtual Machine) is a Linux kernel virtualization infrastructure. KVM can support hardware-assisted virtualization and paravirtualization by using the Intel VT-x or AMD-v and VirtIO framework, respectively. The VirtIO framework includes a paravirtual Ethernet card, a disk I/O controller, a balloon device for adjusting guest memory usage, and a VGA graphics interface using VMware drivers. 2. CPU Virtualization A VM is a duplicate of an existing computer system in which a majority of the VM instructions are executed on the host processor in native mode. Thus, unprivileged instructions of VMs run directly on the host machine for higher efficiency. Other critical instructions should be handled carefully for correctness and stability. The critical instructions are divided into three categories: privileged instructions, control-sensitive instructions, and behavior-sensitive instructions. Privileged instructions execute in a privileged mode and will be trapped if executed outside this mode. Control-sensitive instructions attempt to change the configuration of resources used. Behavior-sensitive instructions have different behaviors depending on the configuration of resources, including the load and store operations over the virtual memory. A CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor mode. When the privileged instructions including control- and behavior-sensitive instructions of a VM are executed, they are trapped in the VMM. In this case, the VMM acts as a unified mediator for hardware access from different VMs to guarantee the correctness and stability of the whole system. However, not all CPU architectures are virtualizable. RISC CPU architectures can be naturally virtualized because all control- and behavior-sensitive instructions are privileged instructions. On the contrary, x86 CPU architectures are not primarily designed to support virtualization. This is because about 10 sensitive instructions, such as SGDT and SMSW, are not privileged instructions. When these instructions execute in virtualization, they cannot be trapped in the VMM. 3. Memory Virtualization Virtual memory virtualization is similar to the virtual memory support provided by modern operating systems. In a traditional execution environment, the operating system maintains mappings of virtual memory to machine memory using page tables, which is a one-stage mapping from virtual memory to machine memory. All modern x86 CPUs include a memory management unit (MMU) and a translation lookaside buffer (TLB) to optimize virtual memory performance. However, in a virtual execution environment, virtual memory virtualization involves sharing the physical system memory in RAM and dynamically allocating it to the physical memory of the VMs. That means a two-stage mapping process should be maintained by the guest OS and the VMM, respectively: virtual memory to physical memory and physical memory to machine memory. Furthermore, MMU virtualization should be supported, which is transparent to the guest OS. The guest OS continues to control the mapping of virtual addresses to the physical memory addresses of VMs. But the guest OS cannot directly access the actual machine memory. The VMM is responsible for mapping the guest physical memory to the actual machine memory. On a native UNIX-like system, a system call triggers the 80h interrupt and passes control to the OS kernel. The interrupt handler in the kernel is then invoked to process the system call. On a para-virtualization system such as Xen, a system call in the guest OS first triggers the 80h interrupt normally. Almost at the same time, the 82h interrupt in the hypervisor is triggered. Incidentally, control is passed on to the hypervisor as well. When the hypervisor completes its task for the guest OS system call, it passes control back to the guest OS kernel. Certainly, the guest OS kernel may also invoke the hypercall while it’s running. Although paravirtualization of a CPU lets unmodified applications run in the VM, it causes a small performance penalty. 4. I/O Virtualization I/O virtualization involves managing the routing of I/O requests between virtual devices and the shared physical hardware. At the time of this writing, there are three ways to implement I/O virtualization: full device emulation, para-virtualization, and direct I/O. Full device emulation is the first approach for I/O virtualization. Generally, this approach emulates well-known, real-world devices. 13a) Describe the storage structure of S3 bucket with neat sketch and write rules in order to make the bucket available for public access. ANS: AWS S3 Terminology: Bucket: Data, in S3, is stored in containers called buckets. Each bucket will have its own set of policies and configuration. This enables users to have more control over their data. Bucket Names must be unique. Can be thought of as a parent folder of data. There is a limit of 100 buckets per AWS accounts. But it can be increased if requested from AWS support. Bucket Owner: The person or organization that owns a particular bucket is its bucket owner. Import/Export Station: A machine that uploads or downloads data to/from S3. Key: Key, in S3, is a unique identifier for an object in a bucket. For example in a bucket ‘ABC ’ your GFG.java file is stored at javaPrograms/GFG.java then ‘javaPrograms/GFG.java ’is your object key for GFG.java. It is important to note that ‘bucketName+key ’is unique for all objects. This also means that there can be only one object for a key in a bucket. If you upload 2 files with the same key. The file uploaded latest will overwrite the previously contained file. Versioning: Versioning means to always keep a record of previously uploaded files in S3. Points to note: Versioning is not enabled by default. Once enabled, it is enabled for all objects in a bucket. Versioning keeps all the copies of your file, so, it adds cost for storing multiple copies of your data. For example, 10 copies of a file of size 1GB will have you charged for using 10GBs for S3 space. Versioning is helpful to prevent unintended overwrites and deletions. Note that objects with the same key can be stored in a bucket if versioning is enabled (since they have a unique version ID). null Object: Version ID for objects in a bucket where versioning is suspended is null. Such objects may be referred to as null objects. For buckets with versioning enabled, each version of a file has a specific version ID. Object: Fundamental entity type stored in AWS S3. Access Control Lists (ACL): A document for verifying the access to S3 buckets from outside your AWS account. Each bucket has its own ACL. Bucket Policies: A document for verifying the access to S3 buckets from within your AWS account, this controls which services and users have what kind of access to your S3 bucket. Each bucket has its own Bucket Policies. Lifecycle Rules: This is a cost-saving practice that can move your files to AWS Glacier (The AWS Data Archive Service) or to some other S3 storage class for cheaper storage of old data or completely delete the data after the specified time. Features of AWS S3: Durability: AWS claims Amazon S3 to have a 99.999999999% of durability (11 9’s). This means the possibility of losing your data stored on S3 is one in a billion. Availability: AWS ensures that the up-time of AWS S3 is 99.99% for standard access. Rules : Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number. Bucket names must not contain two adjacent periods. Bucket names must not be formatted as an IP address (for example, 192.168.5.4). Bucket names must not start with the prefix xn--. Bucket names must not end with the suffix -s3alias. This suffix is reserved for access point alias names. For more information, see Using a bucket-style alias for your access point. Bucket names must be unique across all AWS accounts in all the AWS Regions within a partition. A partition is a grouping of Regions. AWS currently has three partitions: aws (Standard Regions), aws-cn (China Regions), and aws-us-gov (AWS GovCloud (US)). A bucket name cannot be used by another AWS account in the same partition until the bucket is deleted. Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names. For more information about Transfer Acceleration, see Configuring fast, secure file transfers using Amazon S3 Transfer Acceleration. 13 b) Outline the various deployment models of cloud with neat sketch and identify which among them could be applied to formulate cloud structure for a small firm. ANS: Deployment Models The cloud deployment model identifies the specific type of cloud environment based on ownership, scale, and access, as well as the cloud’s nature and purpose. The location of the servers you’re utilizing and who controls them are defined by a cloud deployment model. It specifies how your cloud infrastructure will look, what you can change, and whether you will be given services or will have to create everything yourself. Relationships between the infrastructure and your users are also defined by cloud deployment types. Different types of cloud computing deployment models are: 1. Public cloud 2. Private cloud 3. Hybrid cloud 4. Community cloud 5. Multi-cloud Let us discuss them one by one: 1. Public Cloud The public cloud makes it possible for anybody to access systems and services. The public cloud may be less secure as it is open for everyone. The public cloud is one in which cloud infrastructure services are provided over the internet to the general people or major industry groups. The infrastructure in this cloud model is owned by the entity that delivers the cloud services, not by the consumer. It is a type of cloud hosting that allows customers and users to easily access systems and services. This form of cloud computing is an excellent example of cloud hosting, in which service providers supply services to a variety of customers. In this arrangement, storage backup and retrieval services are given for free, as a subscription, or on a per-use basis. Example: Google App Engine etc. Advantages of the public cloud model: • Minimal Investment: Because it is a pay-per-use service, there is no substantial upfront fee, making it excellent for enterprises that require immediate access to resources. • No setup cost: The entire infrastructure is fully subsidized by the cloud service providers, thus there is no need to set up any hardware. • Infrastructure Management is not required: Using the public cloud does not necessitate infrastructure management. • No maintenance: The maintenance work is done by the service provider (Not users). • Dynamic Scalability: To fulfill your company’s needs, on-demand resources are accessible. 2. Private Cloud The private cloud deployment model is the exact opposite of the public cloud deployment model. It’s a one-on-one environment for a single user (customer). There is no need to share your hardware with anyone else. The distinction between private and public cloud is in how you handle all of the hardware. It is also called the “internal cloud” & it refers to the ability to access systems and services within a given border or organization. The cloud platform is implemented in a cloudbased secure environment that is protected by powerful firewalls and under the supervision of an organization’s IT department. The private cloud gives the greater flexibility of control over cloud resources. Advantages of the private cloud model: • Better Control: You are the sole owner of the property. You gain complete command over service integration, IT operations, policies, and user behavior. • Data Security and Privacy: It’s suitable for storing corporate information to which only authorized staff have access. By segmenting resources within the same infrastructure, improved access and security can be achieved. • Supports Legacy Systems: This approach is designed to work with legacy systems that are unable to access the public cloud. • Customization: Unlike a public cloud deployment, a private cloud allows a company to tailor its solution to meet its specific needs. 3. Hybrid cloud By bridging the public and private worlds with a layer of proprietary software, hybrid cloud computing gives the best of both worlds. With a hybrid solution, you may host the app in a safe environment while taking advantage of the public cloud’s cost savings. Organizations can move data and applications between different clouds using a combination of two or more cloud deployment methods, depending on their needs. Advantages of the hybrid cloud model: • Flexibility and control: Businesses with more flexibility can design personalized solutions that meet their particular needs. • Cost: Because public clouds provide for scalability, you’ll only be responsible for paying for the extra capacity if you require it. • Security: Because data is properly separated, the chances of data theft by attackers are considerably reduced. 4. Community cloud It allows systems and services to be accessible by a group of organizations. It is a distributed system that is created by integrating the services of different clouds to address the specific needs of a community, industry, or business. The infrastructure of the community could be shared between the organization which has shared concerns or tasks. It is generally managed by a third party or by the combination of one or more organizations in the community. Advantages of the community cloud model: • Cost Effective: It is cost-effective because the cloud is shared by multiple organizations or communities. • Security: Community cloud provides better security. • Shared resources: It allows you to share resources, infrastructure, etc. with multiple organizations. • Collaboration and data sharing: It is suitable for both collaboration and data sharing. 5. Multi-cloud We’re talking about employing multiple cloud providers at the same time under this paradigm, as the name implies. It’s similar to the hybrid cloud deployment approach, which combines public and private cloud resources. Instead of merging private and public clouds, multi-cloud uses many public clouds. Although public cloud providers provide numerous tools to improve the reliability of their services, mishaps still occur. It’s quite rare that two distinct clouds would have an incident at the same moment. As a result, multi-cloud deployment improves the high availability of your services even more. Advantages of a multi-cloud model: • You can mix and match the best features of each cloud provider’s services to suit the demands of your apps, workloads, and business by choosing different cloud providers. • Reduced Latency: To reduce latency and improve user experience, you can choose cloud regions and zones that are close to your clients. • High availability of service: It’s quite rare that two distinct clouds would have an incident at the same moment. So, the multi-cloud deployment improves the high availability of your services. 14a) Represent the structure of inter-cloud resource management and explain why two or more clouds need to interact with each other. Provide an example for the same. ANS: Cloud computing is a novel area of research and still faces certain terminological ambiguity. The area of Inter-Clouds is even newer, and many works in the area use several terms interchangeably. Inter-Cloud computing has been formally defined as a cloud model that, for the purpose of guaranteeing service quality, such as the performance and availability of each service, allows ondemand reassignment of resources and transfer of workload through a [sic] interworking of cloud systems of different cloud providers based on coordination of each consumers requirements for service quality with each providers SLA and use of standard interfaces. In the rest of this work, we will adhere to this definition. The seminal works on Inter-Clouds by Buyya et al. 5 and Bernstein et al. 6 also implicitly express similar definitions. Buyya et al. emphasise the just-in-time, opportunistic nature of the provisioning within an Inter-Cloud that allows for achieving QoS and quality of experience targets in a dynamic environment 5. The term Cloud Fusion has been used by Fujitsu Laboratories to denote a similar notion 18. Note that this definition is generic and does not specify who is initiating the Inter-Cloud endeavour – the cloud providers or the clients. Also, it does not specify whether cloud providers collaborate voluntarily to form an Inter-Cloud or not. Two other terms are used throughout the related literature to differentiate between these – Federation and Multi-Cloud. A Federation is achieved when a set of cloud providers voluntarily interconnect their infrastructures to allow sharing of resources among each other 19, 20, 8. The term Multi-Cloud denotes the usage of multiple, independent clouds by a client or a service. Unlike a Federation, a Multi-Cloud environment does not imply volunteer interconnection and sharing of providers' infrastructures. Clients or their representatives are directly responsible for managing resource provisioning and scheduling 19. The term Sky Computing has been used in several publications with similar meaning 7, 21. Both Federations and Multi-Clouds are types of Inter-Clouds. 14b) What is IAM and detail the segregation roles carried out by IAM when services of multiple organizations are maintained within the same geographical location? ANS: IAM Definition Identity and access management (IAM) is a set of processes, policies, and tools for defining and managing the roles and access privileges of individual network entities (users and devices) to a variety of cloud and on-premises applications. Users include customers, partners, and employees; devices include computers, smartphones, routers, servers, controllers and sensors. The core objective of IAM systems is one digital identity per individual or item. Once that digital identity has been established, it must be maintained, modified, and monitored throughout each user’s or device’s access lifecycle. IAM tools IAM systems provide administrators with the tools and technologies to change a user’s role, track user activities, create reports on those activities, and enforce policies on an ongoing basis. These systems are designed to provide a means of administering user access across an entire enterprise and to ensure compliance with corporate policies and government regulations.by Ping Identity, about “70% of global business executives plan to increase spending on IAM for their workforce over the next 12 months, as a continuation of remote work increases demand on IT and security teams.” They also found that more than half of the companies surveyed have invested in new IAM products since the pandemic began. 15 a) Detail the structure of Openstack and explain each of its components. ANS: OpenStack contains a modular architecture along with several code names for the components. Nova (Compute) Nova is a project of OpenStack that facilitates a way for provisioning compute instances. Nova supports building bare-metal servers, virtual machines. It has narrow support for various system containers. It executes as a daemon set on the existing Linux server's top for providing that service. This component is specified in Python. It uses several external libraries of Python such as SQL toolkit and object-relational mapper (SQLAlchemy), AMQP messaging framework (Kombu), and concurrent networking libraries (Eventlet). Nova is created to be scalable horizontally. We procure many servers and install configured services identically, instead of switching to any large server.85 Because of its boundless integration into organization-level infrastructure, particularly Nova performance, and general performance of monitoring OpenStack, scaling facility has become a progressively important issue. Managing end-to-end performance needs tracking metrics through Swift, Cinder, Neutron, Keystone, Nova, and various other types of services. Additionally, analyzing RabbitMQ which is applied by the services of OpenStack for massage transferring. Each of these services produces their log files. It must be analyzed especially within the organization-level infrastructure. Neutron (Networking) Neutron can be defined as a project of OpenStack. It gives "network connectivity as a service" facility between various interface devices (such as vNICs) that are handled by some other types of OpenStack services (such as Nova). It operates the Networking API of OpenStack. It handles every networking facet for VNI (Virtual Networking Infrastructure) and various authorization layer factors of PNI (Physical Networking Infrastructure) in an OpenStack platform. OpenStack networking allows projects to build advanced topologies of the virtual network. It can include some of the services like VPN (Virtual Private Network) and a firewall. Neutron permits dedicated static DHCP or IP addresses. It permits Floating IP addresses to enable the traffic to be rerouted. Users can apply SDN (Software-Defined Networking) technologies such as OpenFlow for supporting scale and multi-tenancy. OpenStack networking could manage and deploy additional services of a network such as VPN (Virtual Private Network), firewalls, load balancing, and IDS (Intrusion Detection System). Cinder (Block Storage) Cinder is a service of OpenStack block storage that is used to provide volumes to Nova VMs, containers, ironic bare-metal hosts, and more. A few objectives of cinder are as follows ◦ Open-standard: It is any reference implementation for the community-driven APIs. ◦ Recoverable: Failures must be not complex to rectify, debug, and diagnose. ◦ Fault-Tolerant: Separated processes ignore cascading failures. ◦ Highly available: Can scale to serious workloads. ◦ Component-based architecture: Include new behaviors quickly. Cinder volumes facilitate persistent storage for guest VMs which are called instances. These are handled by OpenStack compute software. Also, cinder can be used separately from other services of OpenStack as software-defined stand-alone storage. This block storage system handles detaching, attaching, replication, creation, and snapshot management of many block devices to the servers. Keystone (Identity) Keystone is a service of OpenStack that offers shared multi-tenant authorization, service discovery, and API client authentication by implementing Identity API of OpenStack. Commonly, it is an authentication system around the cloud OS. Keystone could integrate with various directory services such as LDAP. It also supports standard password and username credentials, Amazon Web Services (AWS) style, and token-based systems logins. The catalog of keystone service permits API clients for navigating and discovering various cloud services dynamically. Glance (Image) The glance service (image) project offers a service in which users can discover and upload data assets. These assets are defined to be applied to many other services. Currently, it includes metadata and image definitions. Images Image glance services include retrieving, registering, and discovering VM (virtual machine) images. Glance contains the RESTful API which permits querying of virtual machine metadata and retrieval of an actual image as well. Virtual machine images are available because Glance could be stored inside a lot of locations through common filesystems to various object-storage systems such as the OpenStack Swift project. Metadata Definitions Image hosts a metadefs catalog. It facilitates an OpenStack community along with a path to determine several metadata valid values and key names that could be used for OpenStack resources. Swift (Object Storage) Swift is an eventually consistent and distributed blob/object-store. The object store project of OpenStack is called Swift and it provides software for cloud storage so that we can retrieve and store a large amount of data along with a general API. It is created for scale and upgraded for concurrency, availability, and durability across the whole data set. Object storage is ideal to store unstructured data that could grow without any limitations. Rackspace, in 2009 August, started the expansion of the forerunner to the OpenStack Object Storage same as a complete substitution for the product of Cloud Files. The starting development team includes nine developers. Currently, an object storage enterprise (SwiftStack) is the prominent developer for OpenStack Swift with serious contributions from IBM, HP, NTT, Red Hat, Intel, and many more. Horizon (Dashboard) Horizon is a canonical implementation of Dashboard of OpenStack which offers the web-based UI to various OpenStack services such as Keystone, Swift, Nova, etc. Dashboard shifts with a few central dashboards like a "Settings Dashboard", a "System Dashboard", and a "User Dashboard". It envelopes Core Support. The horizon application ships using the API abstraction set for many projects of Core OpenStack to facilitate a stable and consistent collection of reusable techniques for developers. With these abstractions, the developers working on OpenStack Horizon do not require to be familiar intimately with the entire OpenStack project's APIs.9 Heat (Orchestration) Heat can be expressed as a service for orchestrating more than one fusion cloud application with templates by CloudFormation adaptable Query API and OpenStack-native REST API. Mistral (Workflow) Mistral is the OpenStack service that handles workflows. Typically, the user writes the workflow with its language according to YAML. It uploads the definition of the workflow to Mistral by the REST API. After that, the user can begin the workflow manually by a similar API. Also, it configures the trigger for starting the workflow on a few events. Ceilometer (Telemetry) OpenStack Ceilometer (Telemetry) offers a Single Point of Contact for many billing systems, facilitating each counter they require to build customer billing around every future and current component of OpenStack. The counter delivery is auditable and traceable. The counter should be extensible easily for supporting new projects. Also, the agents implementing data collections must be separated from the overall system. Trove (Database) Trove is the database-as-a-service that is used to provision a non-relational and relational engine of the database. Sahara (Elastic map-reduce) Sahara can be defined as a component for rapidly and easily provisioning Hadoop clusters. Many users will define various parameters such as Hadoop version number, node flavor information (RAM and CPU settings, specifying disk space), cluster topology type, and more. After any user offers each parameter, Sahara expands the cluster in less time. Also, Sahara offers a means for scaling a pre-existing cluster of Hadoop by removing and adding worker nodes over demand. Ironic (Bare metal) Ironic is another project of OpenStack. It plans bare-metal machines rather than virtual machines. Initially, Ironic was forked through the driver of Nova Bare metal and has derived into an isolated project. It was the best idea as a plugin's set and bare-metal hypervisor API that collaborate with various bare-metal hypervisors. It will apply IPMI and PXE in concert for turning off and on and provisioning machines, although Ironic supports and could be developed with vendor-specific plugins for implementing additional functionality. Zaqar (Messaging) Zaqar is a service to provide a multi-tenant cloud messaging facility for many web developers. It offers a complete RESTful API that developers could apply for sending messages among several components of the mobile and SaaS applications by applying a lot of patterns of communication. This API is a powerful messaging engine developed with security and scalability in mind. Some other components of OpenStack can develop with Zaqar for various surface events and to interact with many guest agents that execute in an over-cloud layer. Designate (DNS) Designate can be defined as a REST API multi-tenant to manage DNS. It facilitates DNS as the Service. This component is compatible with various backend technologies such as BIND and PowerDNS. It doesn't offer the DND service as its goal is to interface using a DNS server (existing) for managing DNS zones based on per tenant. Manila (Shared file system) OpenStack Manila (Shared file system) facilitates an open API for managing shares within the vendor-agnostic structure. Standard primitives such as the ability to deny/give, delete, and create access to any share. It can be applied in a range of different or standalone network environments. Technical storage appliances through Hitachi, INFINIDAT, Quobyte, Oracle, IBM, HP, NetApp, and EMC data systems can be supported and filesystem technologies as well like Ceph and Red Hat GlusterFS. Searchlight (Search) Searchlight offers consistent and advanced search capabilities around many cloud services of OpenStack. It accomplishes it by offloading the queries of user search through other API servers of OpenStack by indexing the data into the ElasticSearch. This component is being developed into Horizon. Also, it offers a command-line interface. Magnum (Container orchestration) Magnum is an API service of OpenStack improved by the containers team of OpenStack making engines of container orchestration such as Apache Mesos, Kubernetes, and Docker Swarm available as initial class resources within the OpenStack. Magnum applies heat for orchestrating an operating system image that includes Kubernetes and Docker and executes that particular image in bare metal or virtual machine inside the cluster configuration. Barbican (Key manager) Barbican is the REST API developed for the management, provisioning, and secure storage of secrets. Barbican is focused on being helpful for each environment including huge ephemeral Clouds. Vitrage (Root Cause Analysis) Vitrage is an OpenStack Root Cause Analysis (RCA) service to expand, analyze, and organize OpenStack events and alarms, yielding various insights related to the problem's root cause and reducing the existence before these problems are detected directly. Aodh (Rule-based alarm actions) This service of alarming allows the ability for triggering tasks based on specified rules against event or metric data gathered by Gnocchi or Ceilometer. 15B) Write detailed steps to set the google app engine environment for executing any program of your choice. ANS: Steps to Deploy an Application in the App Engine With the above information, it is easy to understand the process of deploying an application in the App Engine, as mentioned below: 1. Go to the Google Cloud console and create an App Engine with runtime ‘ –Node.js ’and environment ‘ –standard ’present under ‘compute. ’ 2. Open cloud shell and clone the source code from the repo https://github.com/vishnu123sai/App-engine-example.git 3. Configuration file app.yaml is also available in the repo. In this example, other options like scaling, resources, etc. in the configuration file are not being used for the sake of simplicity. 4. Type “gcloud app deploy” to deploy your application. List of Prime Minister of India 472020) 5. Once the application is deployed, the output on the link [PROJECT_ID].appspot.com will be visible. For example, if the project id is vital-framing-245415, then the application URL will be https://vital-framing-245415.appspot.com/ 6. Test the URL on chrome Congratulations! Your application is successfully deployed on the App Engine. 16 B) Elaborate the working of MapReduce with an example ANS: Hadoop is highly scalable. You can start with as low as one machine, and then expand your cluster to an infinite number of servers. The two major default components of this software library are: • MapReduce • HDFS – Hadoop distributed file system In this article, we will talk about the first of the two modules. You will learn what MapReduce is, how it works, and the basic Hadoop MapReduce terminology. At a high level, MapReduce breaks input data into fragments and distributes them across different machines. The input fragments consist of key-value pairs. Parallel map tasks process the chunked data on machines in a cluster. The mapping output then serves as input for the reduce stage. The reduce task combines the result into a particular key-value pair output and writes the data to HDFS. The Hadoop Distributed File System usually runs on the same set of machines as the MapReduce software. When the framework executes a job on the nodes that also store the data, the time to complete the tasks is reduced significantly. As we mentioned above, MapReduce is a processing layer in a Hadoop environment. MapReduce works on tasks related to a job. The idea is to tackle one large request by slicing it into smaller units. JobTracker and TaskTracker In the early days of Hadoop (version 1), JobTracker and TaskTracker daemons ran operations in MapReduce. At the time, a Hadoop cluster could only support MapReduce applications. A JobTracker controlled the distribution of application requests to the compute resources in a cluster. Since it monitored the execution and the status of MapReduce, it resided on a master node. A TaskTracker processed the requests that came from the JobTracker. All task trackers were distributed across the slave nodes in a Hadoop cluster. YARN Later in Hadoop version 2 and above, YARN became the main resource and scheduling manager. Hence the name Yet Another Resource Manager. Yarn also worked with other frameworks for the distributed processing in a Hadoop cluster. MapReduce Job A MapReduce job is the top unit of work in the MapReduce process. It is an assignment that Map and Reduce processes need to complete. A job is divided into smaller tasks over a cluster of machines for faster execution. The tasks should be big enough to justify the task handling time. If you divide a job into unusually small segments, the total time to prepare the splits and create tasks may outweigh the time needed to produce the actual job output. MapReduce Task MapReduce jobs have two types of tasks. A Map Task is a single instance of a MapReduce app. These tasks determine which records to process from a data block. The input data is split and analyzed, in parallel, on the assigned compute resources in a Hadoop cluster. This step of a MapReduce job prepares the <key, value> pair output for the reduce step. A Reduce Task processes an output of a map task. Similar to the map stage, all reduce tasks occur at the same time, and they work independently. The data is aggregated and combined to deliver the desired output. The final result is a reduced set of <key, value> pairs which MapReduce, by default, stores in HDFS. The Map and Reduce stages have two parts each. The Map part first deals with the splitting of the input data that gets assigned to individual map tasks. Then, the mapping function creates the output in the form of intermediate key-value pairs. The Reduce stage has a shuffle and a reduce step. Shuffling takes the map output and creates a list of related key-value-list pairs. Then, reducing aggregates the results of the shuffling to produce the final output that the MapReduce application requested. The Map and Reduce stages have two parts each. The Map part first deals with the splitting of the input data that gets assigned to individual map tasks. Then, the mapping function creates the output in the form of intermediate key-value pairs. The Reduce stage has a shuffle and a reduce step. Shuffling takes the map output and creates a list of related key-value-list pairs. Then, reducing aggregates the results of the shuffling to produce the final output that the MapReduce application requested.