Uploaded by Nivi V

cloud computing ans

advertisement
CS 8791 — CLOUD COMPUTING
NOVEMBER/DECEMBER 2021.
1.Define Cloud computing.
ANS: According to NIST, Cloud computing is a model for enabling ubiquitous, convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.
2.Depict the importance of on-demand provisioning in e-commerce applications.
ANS: The on-demand provisioning is the important benefit provided by cloud computing. The
on-demand provisioning in cloud computing refers to process for the deployment, integration and
consumption of cloud resources or services by an individuals or enterprise IT organizations. The
on-demand model provides an enterprise with the ability to scale computing resources up or down.
3.Define the role and benefit of virtualization in cloud
ANS: Virtualization is a computer architecture technology by which multiple virtual machines
(VMs) are multiplexed in the same hardware machine.
The purpose of a VM is to enhance resource sharing by many users and improve computer
performance in terms of resource utilization and application flexibility.
Hardware resources such as CPU, memory, I/O devices, or software resources such as OS software
libraries can be virtualized
4.What is disaster recovery?
ANS:. The term cloud disaster recovery (cloud DR) refers to the strategies and services enterprises
apply for the purpose of backing up applications, resources, and data into a cloud environment.
Cloud DR helps protect corporate resources and ensure business continuity.
5.What is a Hybrid cloud?
ANS: hybrid cloud platforms connect public and private resources in different ways, but they often
incorporate common industry technologies, such as Kubernetes to orchestrate container-based
services. Examples include AWS Outposts, Azure Stack, Azure Arc, Google Anthos and VMware
Cloud on AWS.
6.Outline the key challenges associated in the process of storing images in cloud.
ANS: Security issues. ...
Cost management and containment. ...
Lack of resources/expertise. ...
Governance/Control. ...
Compliance. ...
Managing multiple clouds. ...
Performance. ...
Building a private cloud.
7. What is Inter-cloud?
ANS: Intercloud or 'cloud of clouds' is a term refer to a theoretical model for cloud computing
services based on the idea of combining many different individual clouds into one seamless mass
in terms of on-demand operations.
8. Name any two security challenges associated with cloud in today’s digital scenario.
ANS: Unauthorized Access.
Insecure Interfaces/APIs.
Hijacking of Accounts.
Lack of Visibility
9. What is Hadoop?
ANS: Hadoop is an open source framework that is used to efficiently store and process large
datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer
to store and process the data, Hadoop allows clustering multiple computers to analyze massive
datasets in parallel more quickly.
10. Write a note on Federated services.
ANS. Federated Services is a common Identity Management term that simply means the process
of connecting two or more organizations or applications in such a way that authorization from one
application will transfer to another federated application.
PART-B
11(a) Formulate stage-by-stage evolution of cloud with neat sketch and formulate any three
benefits, drawbacks achieved by it in the banking and insurance sectors.
ANS: Cloud computing is all about renting computing services. This idea first came in the 1950s.
In making cloud computing what it is today, five technologies played a vital role. These are
distributed systems and its peripherals, virtualization, web 2.0, service orientation, and utility
computing.
Distributed Systems:
It is a composition of multiple independent systems but all of them are depicted as a single entity
to the users. The purpose of distributed systems is to share resources and also use them
effectively and efficiently. Distributed systems possess characteristics such as scalability,
concurrency, continuous availability, heterogeneity, and independence in failures. But the main
problem with this system was that all the systems were required to be present at the same
geographical location. Thus to solve this problem, distributed computing led to three more types
of computing and they were-Mainframe computing, cluster computing, and grid computing.
Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful and reliable computing
machines. These are responsible for handling large data such as massive input-output operations.
Even today these are used for bulk processing tasks such as online transactions etc. These
systems have almost no downtime with high fault tolerance. After distributed computing, these
increased the processing capabilities of the system. But these were very expensive. To reduce
this cost, cluster computing came as an alternative to mainframe technology.
Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe computing. Each machine in
the cluster was connected to each other by a network with high bandwidth. These were way
cheaper than those mainframe systems. These were equally capable of high computations. Also,
new nodes could easily be added to the cluster if it was required. Thus, the problem of the cost
was solved to some extent but the problem related to geographical restrictions still pertained. To
solve this, the concept of grid computing was introduced.
Grid computing:
In 1990s, the concept of grid computing was introduced. It means that different systems were
placed at entirely different geographical locations and these all were connected via the internet.
These systems belonged to different organizations and thus the grid consisted of heterogeneous
nodes. Although it solved some problems but new problems emerged as the distance between
the nodes increased. The main problem which was encountered was the low availability of high
bandwidth connectivity and with it other network associated issues. Thus. cloud computing is
often referred to as “Successor of grid computing”.
Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a virtual layer over the
hardware which allows the user to run multiple instances simultaneously on the hardware. It is
a key technology used in cloud computing. It is the base on which major cloud computing
services such as Amazon EC2, VMware vCloud, etc work on. Hardware virtualization is still
one of the most common types of virtualization.
Web 2.0:
It is the interface through which the cloud computing services interact with the clients. It is
because of Web 2.0 that we have interactive and dynamic web pages. It also increases flexibility
among web pages. Popular examples of web 2.0 include Google Maps, Facebook, Twitter, etc.
Needless to say, social media is possible because of this technology only. In gained major
popularity in 2004.
Service orientation:
It acts as a reference model for cloud computing. It supports low-cost, flexible, and evolvable
applications. Two important concepts were introduced in this computing model. These were
Quality of Service (QoS) which also includes the SLA (Service Level Agreement) and Software
as a Service (SaaS).
Utility computing:
It is a computing model that defines service provisioning techniques for services such as compute
services along with other major services such as storage, infrastructure, etc which are
provisioned on a pay-per-use basis.
BENEFITS:

Inexpensive

Cloud computing curtails capital costs and huge upfront infrastructure expenses. So,
financial corporations and banks can concentrate on important business deals and projects.
In cloud computing, banks and the financial corporation do not require to buy budget
shortening hardware.

Augmented management

Cloud computing helps financial corporations and banks make a rapid adjustment to their
reserves in case of unexpected and dynamic business requests. It also uploads applications
faster due to enhanced management of cloud computing without maintenance.

Stability

The cloud computing system is very beneficial for banks and financial corporations. It is
due to building a wide enterprise availability that is important to continue a business
investment
11.) (b) Discuss the underlying parallel and distributed computing principles adopted
by cloud in the IT sector and brief the drawbacks incurred.
ANS: In previous section, we have seen the evolution of cloud computing with respect to its
hardware, internet, protocol and processing technologies. This section briefly explains about the
principals of two essential computing mechanisms which largely used in cloud computing called
Parallel and Distributed computing. Computing in computer technology can be defined as the
execution of single or multiple programs, applications, tasks or activities, sequentially or parallelly
on one or more computers. The two basic approaches of computing are serial and parallel
computing.
Parallel Computing As single processor system is becoming archaic and quaint for doing fast
computation as required by real-time applications. So parallel computing is needed to speed up the
execution of real-time applications to achieve high performance. The parallel computing makes
use of multiple computing resources to solve a complex computational problem in which the
problem is broken into discrete parts that can be solved concurrently . Each part is further broken
down into a series of instructions which execute simultaneously on different processors using
overall control/coordination mechanism. Here, the different processors share the work-load which
results in producing the much higher computing power and performance than could not be
achieved with traditional single processor system. The parallel computing often correlated with
parallel processing and parallel programming. Processing of multiple tasks and subtasks
simultaneously on multiple processors is called parallel processing while parallel programming
refers to programming on a multiprocessor system using the divide-and-conquer technique, where
given task is divided into subtasks and each subtask is processed on different processors.
Distributed Computing
As per Tanenbaum, the definition of distributed system is referring to a collection of independent
computers that appears to its users as a single coherent system. The term distributed computing
encompasses any architecture or system that allows the computation to be broken down into units
and executed concurrently on different computing elements. It is a computing concept that refers
to multiple computer systems connected in a network working on a single problem. In distributed
computing, a single problem is divided into many parts, and each part is executed by different
computers. As long as the computers are networked, they can communicate with each other to
solve the problem. If it is done properly, the computers perform like a single entity. The ultimate
goal of distributed computing is to maximize performance by connecting users and IT resources
in a cost-effective, transparent and reliable manner. This type of computing is highly scalable. The
Conceptual view of distributed system
Distributed computing networks can be connected as local networks or through a wide area
network if the machines are in a different geographic location. Processors in distributed computing
systems typically run in parallel.
In enterprise settings, distributed computing generally puts various steps in business processes at
the most efficient places in a computer network. For example, a typical distribution has a threetier model that organizes applications into the presentation tier (or user interface), the application
tier and the data tier. These tiers function as follows:
1. User interface processing occurs on the PC at the user's location
2. Application processing takes place on a remote computer
3. Database access and processing algorithms happen on another computer that provides
centralized access for many business processes
In addition to the three-tier model, other types of distributed computing include client-server, ntier and peer-to-peer:

Client-server architectures. These use smart clients that contact a server for data, then format
and display that data to the user.

N-tier system architectures. Typically used in application servers, these architectures use
web applications to forward requests to other enterprise services.

Peer-to-peer architectures. These divide all responsibilities among all peer computers, which
can serve as clients or servers.
12. (a) Outline the various levels of virtualization with an example for each category.
ANS: The virtualization is implemented at various levels by creating a software abstraction layer
between host OS and Guest OS. The main function of software layer is to virtualize physical
hardware of host machine in to virtual resources used by VMs by using various operational layers.
The different levels at which the virtualization can be implemented is
There are five implementation levels of virtualization, that are Instruction Set Architecture (ISA)
level, Hardware level, Operating System level, Library support level and Application level which
are explained as follows.
Instruction Set Architecture Level
Virtualization at the instruction set architecture level is implemented by emulating· an instruction
set architecture completely on software stack. An emulator tries to execute instructions issued by
the guest machine (the virtual machine that is being emulated) by translating them to a set of native
instructions and then executing them on the available hardware. That is emulator works by
translating instructions from the guest platform to· instructions of the host platform. These
instructions would include both processor oriented (add, sub, jump etc.), and the I/O specific
(IN/OUT) instructions for the devices. Although this virtual machine architecture works fine in
terms of simplicity and robustness, it has its own pros and cons. The advantages of ISA are, it
provides ease of implementation while dealing with· multiple platforms and it can easily provide
infrastructure through which one can create virtual machines based on x86 platforms such as Sparc
and Alpha. The disadvantage of ISA is since every instruction issued by the emulated computer
needs to be interpreted in software first which degrades the performance. The popular emulators
of ISA level virtualization are :·
Hardware Abstraction Layer Virtualization
At the Hardware Abstraction Layer (HAL) exploits the similarity in· architectures of the guest
and host platforms to cut down the interpretation latency. The time spent in instruction
interpretation of guest platform to host platform is reduced by taking the similarities exist between
them Virtualization technique helps map the virtual resources to physical resources and use the
native hardware for computations in the virtual machine. This approach generates a virtual
hardware environment which virtualizes the computer resources like CPU, Memory and IO
devices. For the successful working of HAL the VM must be able to trap every privileged·
instruction execution and pass it to the underlying VMM, because multiple VMs running own OS
might issue privileged instructions need full attention of CPU’s .If it is not managed properly then
VM may issues trap rather than generating an exception that makes crashing of instruction is sent
to the VMM. However, the most popular platform, x86, is not fully-virtualizable, because it is
been observed that certain privileged instructions fail silently rather than trapped when executed
with insufficient privileges.
Operating System Level Virtualization
The operating system level virtualization is an abstraction layer between OS and· user applications.
It supports multiple Operating Systems and applications to be run simultaneously without required
to reboot or dual boot. The degree of isolation of each OS is very high and can be implemented at
low risk with easy maintenance. The implementation of operating system level virtualization
includes, operating system installation, application suites installation, network setup, and so on.
Therefore, if the required OS is same as the one on the physical machine then the user basically
ends up with duplication of most of the efforts, he/she has already invested in setting up the
physical machine. To run applications properly the operating system keeps the application specific
data structure, user level libraries, environmental settings and other requisites separately. The key
idea behind all the OS-level virtualization techniques is virtualization layer· above the OS
produces a partition per virtual machine on demand that is a replica of the operating environment
on the physical machine. With a careful partitioning and multiplexing technique, each VM can be
able to export a full operating environment and fairly isolated from one another and from the
underlying physical machine.
Library Level Virtualization
Most of the system uses extensive set of Application Programmer Interfaces (APIs) instead of
legacy System calls to implement various libraries at user level. Such APIs are designed to hide
the operating system related details to keep it simpler for normal programmers. In this technique,
the virtual environment is created above OS layer and is mostly used to implement different
Application Binary Interface (ABI) and Application Programming Interface (API) using the
underlying system. The example of Library Level Virtualization is WINE. The Wine is an
implementation of the Windows API, and can be used as a library to port Windows applications
to UNIX. It is a virtualization layer on top of X and UNIX to export the Windows API/ABI which
allows to run Windows binaries on top of it.
Application-Level Virtualization
In this abstraction technique the operating systems and user-level programs executes like
applications for the machine. Therefore, specialize instructions are needed for hardware
manipulations like I/O mapped (manipulating the I/O) and Memory mapped (that is mapping a
chunk of memory to the I/O and then manipulating the memory). The group of such special
instructions constitutes the application called Application level Virtualization. The Java Virtual
Machine (JVM) is the popular example of application level virtualization which allows creating a
virtual machine at the application-level than OS level. It supports a new self-defined set of
instructions called java byte codes for JVM. Cloud Computing 2 - 31 Cloud Enabling Technologies
TECHNICAL PUBLICATIONS® - An up thrust for knowledge Such VMs pose little security
threat to the system while letting the user to play with it like physical machines. Like physical
machine it has to provide an operating environment to its applications either by hosting a
commercial operating system, or by coming up with its own environment.
12 b)Outline the problems in virtualizing in CPU, I/O and memory devices and suggest how
it could be overridden for efficient utilization of cloud services.
ANS: To support virtualization, processors such as the x86 employ a special running mode and
instructions, known as hardware-assisted virtualization. In this way, the VMM and guest OS run
in different modes and all sensitive instructions of the guest OS and its applications are trapped in
the VMM. To save processor states, mode switching is completed by hardware. For the x86
architecture, Intel and AMD have proprietary technologies for hardware-assisted virtualization.
1. Hardware Support for Virtualization
Modern operating systems and processors permit multiple processes to run simultaneously. If
there is no protection mechanism in a processor, all instructions from different processes will
access the hardware directly and cause a system crash. Therefore, all processors have at least two
modes, user mode and supervisor mode, to ensure controlled access of critical hardware.
Instructions running in supervisor mode are called privileged instructions. Other instructions are
unprivileged instructions. In a virtualized environment, it is more difficult to make OSes and
applications run correctly because there are more layers in the machine stack.
At the time of this writing, many hardware virtualization products were available. The VMware
Workstation is a VM software suite for x86 and x86-64 computers. This software suite allows
users to set up multiple x86 and x86-64 virtual computers and to use one or more of these VMs
simultaneously with the host operating system. The VMware Workstation assumes the host-based
virtualization. Xen is a hypervisor for use in IA-32, x86-64, Itanium, and PowerPC 970 hosts.
Actually, Xen modifies Linux as the lowest and most privileged layer, or a hypervisor.
One or more guest OS can run on top of the hypervisor. KVM (Kernel-based Virtual Machine) is
a Linux kernel virtualization infrastructure. KVM can support hardware-assisted virtualization and
paravirtualization by using the Intel VT-x or AMD-v and VirtIO framework, respectively. The
VirtIO framework includes a paravirtual Ethernet card, a disk I/O controller, a balloon device for
adjusting guest memory usage, and a VGA graphics interface using VMware drivers.
2. CPU Virtualization
A VM is a duplicate of an existing computer system in which a majority of the VM instructions
are executed on the host processor in native mode. Thus, unprivileged instructions of VMs run
directly on the host machine for higher efficiency. Other critical instructions should be handled
carefully for correctness and stability. The critical instructions are divided into three categories:
privileged instructions, control-sensitive instructions, and behavior-sensitive instructions.
Privileged instructions execute in a privileged mode and will be trapped if executed outside this
mode. Control-sensitive instructions attempt to change the configuration of resources used.
Behavior-sensitive instructions have different behaviors depending on the configuration of
resources, including the load and store operations over the virtual memory.
A CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and
unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor mode. When
the privileged instructions including control- and behavior-sensitive instructions of a VM are executed, they are trapped in the VMM. In this case, the VMM acts as a unified mediator for hardware
access from different VMs to guarantee the correctness and stability of the whole system.
However, not all CPU architectures are virtualizable. RISC CPU architectures can be naturally
virtualized because all control- and behavior-sensitive instructions are privileged instructions. On
the contrary, x86 CPU architectures are not primarily designed to support virtualization. This is
because about 10 sensitive instructions, such as SGDT and SMSW, are not privileged instructions.
When these instructions execute in virtualization, they cannot be trapped in the VMM.
3. Memory Virtualization
Virtual memory virtualization is similar to the virtual memory support provided by modern
operating systems. In a traditional execution environment, the operating system maintains
mappings of virtual memory to machine memory using page tables, which is a one-stage mapping
from virtual memory to machine memory. All modern x86 CPUs include a memory management
unit (MMU) and a translation lookaside buffer (TLB) to optimize virtual memory performance.
However, in a virtual execution environment, virtual memory virtualization involves sharing the
physical system memory in RAM and dynamically allocating it to the physical memory of the
VMs.
That means a two-stage mapping process should be maintained by the guest OS and the VMM,
respectively: virtual memory to physical memory and physical memory to machine memory.
Furthermore, MMU virtualization should be supported, which is transparent to the guest OS. The
guest OS continues to control the mapping of virtual addresses to the physical memory addresses
of VMs. But the guest OS cannot directly access the actual machine memory. The VMM is
responsible for mapping the guest physical memory to the actual machine memory.
On a native UNIX-like system, a system call triggers the 80h interrupt and passes control to the
OS kernel. The interrupt handler in the kernel is then invoked to process the system call. On a
para-virtualization system such as Xen, a system call in the guest OS first triggers the 80h interrupt
normally. Almost at the same time, the 82h interrupt in the hypervisor is triggered. Incidentally,
control is passed on to the hypervisor as well. When the hypervisor completes its task for the guest
OS system call, it passes control back to the guest OS kernel. Certainly, the guest OS kernel may
also invoke the hypercall while it’s running. Although paravirtualization of a CPU lets unmodified
applications run in the VM, it causes a small performance penalty.
4. I/O Virtualization
I/O virtualization involves managing the routing of I/O requests between virtual devices and the
shared physical hardware. At the time of this writing, there are three ways to implement I/O
virtualization: full device emulation, para-virtualization, and direct I/O. Full device emulation is
the first approach for I/O virtualization. Generally, this approach emulates well-known, real-world
devices.
13a) Describe the storage structure of S3 bucket with neat sketch and write rules in order to
make the bucket available for public access.
ANS: AWS S3 Terminology:
Bucket: Data, in S3, is stored in containers called buckets.
Each bucket will have its own set of policies and configuration. This enables users to have more
control over their data.
Bucket Names must be unique.
Can be thought of as a parent folder of data.
There is a limit of 100 buckets per AWS accounts. But it can be increased if requested from AWS
support.
Bucket Owner: The person or organization that owns a particular bucket is its bucket owner.
Import/Export Station: A machine that uploads or downloads data to/from S3.
Key: Key, in S3, is a unique identifier for an object in a bucket. For example in a bucket ‘ABC ’
your GFG.java file is stored at javaPrograms/GFG.java then ‘javaPrograms/GFG.java ’is your
object key for GFG.java.
It is important to note that ‘bucketName+key ’is unique for all objects.
This also means that there can be only one object for a key in a bucket. If you upload 2 files with
the same key. The file uploaded latest will overwrite the previously contained file.
Versioning: Versioning means to always keep a record of previously uploaded files in S3. Points
to note:
Versioning is not enabled by default. Once enabled, it is enabled for all objects in a bucket.
Versioning keeps all the copies of your file, so, it adds cost for storing multiple copies of your
data. For example, 10 copies of a file of size 1GB will have you charged for using 10GBs for S3
space.
Versioning is helpful to prevent unintended overwrites and deletions.
Note that objects with the same key can be stored in a bucket if versioning is enabled (since they
have a unique version ID).
null Object: Version ID for objects in a bucket where versioning is suspended is null. Such objects
may be referred to as null objects.
For buckets with versioning enabled, each version of a file has a specific version ID.
Object: Fundamental entity type stored in AWS S3.
Access Control Lists (ACL): A document for verifying the access to S3 buckets from outside
your AWS account. Each bucket has its own ACL.
Bucket Policies: A document for verifying the access to S3 buckets from within your AWS
account, this controls which services and users have what kind of access to your S3 bucket. Each
bucket has its own Bucket Policies.
Lifecycle Rules: This is a cost-saving practice that can move your files to AWS Glacier (The
AWS Data Archive Service) or to some other S3 storage class for cheaper storage of old data or
completely delete the data after the specified time.
Features of AWS S3:
Durability: AWS claims Amazon S3 to have a 99.999999999% of durability (11 9’s). This means
the possibility of losing your data stored on S3 is one in a billion.
Availability: AWS ensures that the up-time of AWS S3 is 99.99% for standard access.
Rules :

Bucket names must be between 3 (min) and 63 (max) characters long.

Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).

Bucket names must begin and end with a letter or number.

Bucket names must not contain two adjacent periods.

Bucket names must not be formatted as an IP address (for example, 192.168.5.4).

Bucket names must not start with the prefix xn--.

Bucket names must not end with the suffix -s3alias. This suffix is reserved for access point
alias names. For more information, see Using a bucket-style alias for your access point.

Bucket names must be unique across all AWS accounts in all the AWS Regions within a
partition. A partition is a grouping of Regions. AWS currently has three partitions: aws
(Standard Regions), aws-cn (China Regions), and aws-us-gov (AWS GovCloud (US)).

A bucket name cannot be used by another AWS account in the same partition until the bucket
is deleted.

Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names. For
more information about Transfer Acceleration, see Configuring fast, secure file transfers using
Amazon S3 Transfer Acceleration.
13 b) Outline the various deployment models of cloud with neat sketch and identify which
among them could be applied to formulate cloud structure for a small firm.
ANS: Deployment Models
The cloud deployment model identifies the specific type of cloud environment based on
ownership, scale, and access, as well as the cloud’s nature and purpose. The location of the servers
you’re utilizing and who controls them are defined by a cloud deployment model. It specifies how
your cloud infrastructure will look, what you can change, and whether you will be given services
or will have to create everything yourself. Relationships between the infrastructure and your users
are also defined by cloud deployment types.
Different types of cloud computing deployment models are:
1.
Public cloud
2.
Private cloud
3.
Hybrid cloud
4.
Community cloud
5.
Multi-cloud
Let us discuss them one by one:
1. Public Cloud
The public cloud makes it possible for anybody to access systems and services. The public cloud
may be less secure as it is open for everyone. The public cloud is one in which cloud infrastructure
services are provided over the internet to the general people or major industry groups. The
infrastructure in this cloud model is owned by the entity that delivers the cloud services, not by the
consumer. It is a type of cloud hosting that allows customers and users to easily access systems
and services. This form of cloud computing is an excellent example of cloud hosting, in which
service providers supply services to a variety of customers. In this arrangement, storage backup
and retrieval services are given for free, as a subscription, or on a per-use basis. Example: Google
App Engine etc.
Advantages of the public cloud model:
•
Minimal Investment: Because it is a pay-per-use service, there is no substantial upfront
fee, making it excellent for enterprises that require immediate access to resources.
•
No setup cost: The entire infrastructure is fully subsidized by the cloud service providers,
thus there is no need to set up any hardware.
•
Infrastructure Management is not required: Using the public cloud does not necessitate
infrastructure management.
•
No maintenance: The maintenance work is done by the service provider (Not users).
•
Dynamic Scalability: To fulfill your company’s needs, on-demand resources are
accessible.
2. Private Cloud
The private cloud deployment model is the exact opposite of the public cloud deployment model.
It’s a one-on-one environment for a single user (customer). There is no need to share your hardware
with anyone else. The distinction between private and public cloud is in how you handle all of the
hardware. It is also called the “internal cloud” & it refers to the ability to access systems and
services within a given border or organization. The cloud platform is implemented in a cloudbased secure environment that is protected by powerful firewalls and under the supervision of an
organization’s
IT
department.
The private cloud gives the greater flexibility of control over cloud resources.
Advantages of the private cloud model:
•
Better Control: You are the sole owner of the property. You gain complete command over
service integration, IT operations, policies, and user behavior.
•
Data Security and Privacy: It’s suitable for storing corporate information to which only
authorized staff have access. By segmenting resources within the same infrastructure,
improved access and security can be achieved.
•
Supports Legacy Systems: This approach is designed to work with legacy systems that
are unable to access the public cloud.
•
Customization: Unlike a public cloud deployment, a private cloud allows a company to
tailor its solution to meet its specific needs.
3. Hybrid cloud
By bridging the public and private worlds with a layer of proprietary software, hybrid cloud
computing gives the best of both worlds. With a hybrid solution, you may host the app in a safe
environment while taking advantage of the public cloud’s cost savings. Organizations can move
data and applications between different clouds using a combination of two or more cloud
deployment methods, depending on their needs.
Advantages of the hybrid cloud model:
•
Flexibility and control: Businesses with more flexibility can design personalized
solutions that meet their particular needs.
•
Cost: Because public clouds provide for scalability, you’ll only be responsible for paying
for the extra capacity if you require it.
•
Security: Because data is properly separated, the chances of data theft by attackers are
considerably reduced.
4. Community cloud
It allows systems and services to be accessible by a group of organizations. It is a distributed
system that is created by integrating the services of different clouds to address the specific needs
of a community, industry, or business. The infrastructure of the community could be shared
between the organization which has shared concerns or tasks. It is generally managed by a third
party or by the combination of one or more organizations in the community.
Advantages of the community cloud model:
•
Cost Effective: It is cost-effective because the cloud is shared by multiple organizations
or communities.
•
Security: Community cloud provides better security.
•
Shared resources: It allows you to share resources, infrastructure, etc. with multiple
organizations.
•
Collaboration and data sharing: It is suitable for both collaboration and data sharing.
5. Multi-cloud
We’re talking about employing multiple cloud providers at the same time under this paradigm, as
the name implies. It’s similar to the hybrid cloud deployment approach, which combines public
and private cloud resources. Instead of merging private and public clouds, multi-cloud uses many
public clouds. Although public cloud providers provide numerous tools to improve the reliability
of their services, mishaps still occur. It’s quite rare that two distinct clouds would have an incident
at the same moment. As a result, multi-cloud deployment improves the high availability of your
services even more.
Advantages of a multi-cloud model:
•
You can mix and match the best features of each cloud provider’s services to suit the
demands of your apps, workloads, and business by choosing different cloud providers.
•
Reduced Latency: To reduce latency and improve user experience, you can choose cloud
regions and zones that are close to your clients.
•
High availability of service: It’s quite rare that two distinct clouds would have an incident
at the same moment. So, the multi-cloud deployment improves the high availability of your
services.
14a) Represent the structure of inter-cloud resource management and explain why two or
more clouds need to interact with each other. Provide an example for the same.
ANS: Cloud computing is a novel area of research and still faces certain terminological ambiguity.
The area of Inter-Clouds is even newer, and many works in the area use several terms
interchangeably.
Inter-Cloud computing has been formally defined as a cloud model that, for the purpose of
guaranteeing service quality, such as the performance and availability of each service, allows ondemand reassignment of resources and transfer of workload through a [sic] interworking of cloud
systems of different cloud providers based on coordination of each consumers requirements for
service quality with each providers SLA and use of standard interfaces.
In the rest of this work, we will adhere to this definition. The seminal works on Inter-Clouds by
Buyya et al. 5 and Bernstein et al. 6 also implicitly express similar definitions. Buyya et al.
emphasise the just-in-time, opportunistic nature of the provisioning within an Inter-Cloud that
allows for achieving QoS and quality of experience targets in a dynamic environment 5. The term
Cloud Fusion has been used by Fujitsu Laboratories to denote a similar notion 18.
Note that this definition is generic and does not specify who is initiating the Inter-Cloud endeavour
– the cloud providers or the clients. Also, it does not specify whether cloud providers collaborate
voluntarily to form an Inter-Cloud or not. Two other terms are used throughout the related literature
to differentiate between these – Federation and Multi-Cloud. A Federation is achieved when a set
of cloud providers voluntarily interconnect their infrastructures to allow sharing of resources
among each other 19, 20, 8. The term Multi-Cloud denotes the usage of multiple, independent
clouds by a client or a service. Unlike a Federation, a Multi-Cloud environment does not imply
volunteer interconnection and sharing of providers' infrastructures. Clients or their representatives
are directly responsible for managing resource provisioning and scheduling 19. The term Sky
Computing has been used in several publications with similar meaning 7, 21. Both Federations
and Multi-Clouds are types of Inter-Clouds.
14b) What is IAM and detail the segregation roles carried out by IAM when services of
multiple organizations are maintained within the same geographical location?
ANS: IAM Definition
Identity and access management (IAM) is a set of processes, policies, and tools for defining and
managing the roles and access privileges of individual network entities (users and devices) to a
variety of cloud and on-premises applications.
Users include customers, partners, and employees; devices include computers, smartphones,
routers, servers, controllers and sensors. The core objective of IAM systems is one digital identity
per individual or item. Once that digital identity has been established, it must be maintained,
modified, and monitored throughout each user’s or device’s access lifecycle.
IAM tools
IAM systems provide administrators with the tools and technologies to change a user’s role, track
user activities, create reports on those activities, and enforce policies on an ongoing basis. These
systems are designed to provide a means of administering user access across an entire enterprise
and to ensure compliance with corporate policies and government regulations.by Ping Identity,
about “70% of global business executives plan to increase spending on IAM for their workforce
over the next 12 months, as a continuation of remote work increases demand on IT and security
teams.” They also found that more than half of the companies surveyed have invested in new IAM
products since the pandemic began.
15 a) Detail the structure of Openstack and explain each of its components.
ANS: OpenStack contains a modular architecture along with several code names for the
components.
Nova (Compute)
Nova is a project of OpenStack that facilitates a way for provisioning compute instances. Nova
supports building bare-metal servers, virtual machines. It has narrow support for various system
containers. It executes as a daemon set on the existing Linux server's top for providing that service.
This component is specified in Python. It uses several external libraries of Python such as SQL
toolkit and object-relational mapper (SQLAlchemy), AMQP messaging framework (Kombu), and
concurrent networking libraries (Eventlet). Nova is created to be scalable horizontally. We procure
many servers and install configured services identically, instead of switching to any large server.85
Because of its boundless integration into organization-level infrastructure, particularly Nova
performance, and general performance of monitoring OpenStack, scaling facility has become a
progressively important issue.
Managing end-to-end performance needs tracking metrics through Swift, Cinder, Neutron,
Keystone, Nova, and various other types of services. Additionally, analyzing RabbitMQ which is
applied by the services of OpenStack for massage transferring. Each of these services produces
their log files. It must be analyzed especially within the organization-level infrastructure.
Neutron (Networking)
Neutron can be defined as a project of OpenStack. It gives "network connectivity as a service"
facility between various interface devices (such as vNICs) that are handled by some other types of
OpenStack services (such as Nova). It operates the Networking API of OpenStack.
It handles every networking facet for VNI (Virtual Networking Infrastructure) and various
authorization layer factors of PNI (Physical Networking Infrastructure) in an OpenStack platform.
OpenStack networking allows projects to build advanced topologies of the virtual network. It can
include some of the services like VPN (Virtual Private Network) and a firewall.
Neutron permits dedicated static DHCP or IP addresses. It permits Floating IP addresses to enable
the traffic to be rerouted.
Users can apply SDN (Software-Defined Networking) technologies such as OpenFlow for
supporting scale and multi-tenancy. OpenStack networking could manage and deploy additional
services of a network such as VPN (Virtual Private Network), firewalls, load balancing, and
IDS (Intrusion Detection System).
Cinder (Block Storage)
Cinder is a service of OpenStack block storage that is used to provide volumes to Nova VMs,
containers, ironic bare-metal hosts, and more. A few objectives of cinder are as follows
◦
Open-standard: It is any reference implementation for the community-driven APIs.
◦
Recoverable: Failures must be not complex to rectify, debug, and diagnose.
◦
Fault-Tolerant: Separated processes ignore cascading failures.
◦
Highly available: Can scale to serious workloads.
◦
Component-based architecture: Include new behaviors quickly.
Cinder volumes facilitate persistent storage for guest VMs which are called instances. These are
handled by OpenStack compute software. Also, cinder can be used separately from other services
of OpenStack as software-defined stand-alone storage. This block storage system handles
detaching, attaching, replication, creation, and snapshot management of many block devices
to the servers.
Keystone (Identity)
Keystone is a service of OpenStack that offers shared multi-tenant authorization, service
discovery, and API client authentication by implementing Identity API of OpenStack. Commonly,
it is an authentication system around the cloud OS. Keystone could integrate with various directory
services such as LDAP. It also supports standard password and username credentials, Amazon
Web Services (AWS) style, and token-based systems logins. The catalog of keystone service
permits API clients for navigating and discovering various cloud services dynamically.
Glance (Image)
The glance service (image) project offers a service in which users can discover and upload data
assets. These assets are defined to be applied to many other services. Currently, it includes
metadata and image definitions.
Images
Image glance services include retrieving, registering, and discovering VM (virtual machine)
images. Glance contains the RESTful API which permits querying of virtual machine metadata
and retrieval of an actual image as well. Virtual machine images are available because Glance
could be stored inside a lot of locations through common filesystems to various object-storage
systems such as the OpenStack Swift project.
Metadata Definitions
Image hosts a metadefs catalog. It facilitates an OpenStack community along with a path to
determine several metadata valid values and key names that could be used for OpenStack
resources.
Swift (Object Storage)
Swift is an eventually consistent and distributed blob/object-store. The object store project of
OpenStack is called Swift and it provides software for cloud storage so that we can retrieve and
store a large amount of data along with a general API. It is created for scale and upgraded for
concurrency, availability, and durability across the whole data set. Object storage is ideal to
store unstructured data that could grow without any limitations.
Rackspace, in 2009 August, started the expansion of the forerunner to the OpenStack Object
Storage same as a complete substitution for the product of Cloud Files. The starting development
team includes nine developers. Currently, an object storage enterprise (SwiftStack) is the
prominent developer for OpenStack Swift with serious contributions from IBM, HP, NTT, Red
Hat, Intel, and many more.
Horizon (Dashboard)
Horizon is a canonical implementation of Dashboard of OpenStack which offers the web-based
UI to various OpenStack services such as Keystone, Swift, Nova, etc. Dashboard shifts with a few
central dashboards like a "Settings Dashboard", a "System Dashboard", and a "User
Dashboard". It envelopes Core Support. The horizon application ships using the API abstraction
set for many projects of Core OpenStack to facilitate a stable and consistent collection of reusable
techniques for developers. With these abstractions, the developers working on OpenStack Horizon
do not require to be familiar intimately with the entire OpenStack project's APIs.9
Heat (Orchestration)
Heat can be expressed as a service for orchestrating more than one fusion cloud application with
templates by CloudFormation adaptable Query API and OpenStack-native REST API.
Mistral (Workflow)
Mistral is the OpenStack service that handles workflows. Typically, the user writes the workflow
with its language according to YAML. It uploads the definition of the workflow to Mistral by the
REST API. After that, the user can begin the workflow manually by a similar API. Also, it
configures the trigger for starting the workflow on a few events.
Ceilometer (Telemetry)
OpenStack Ceilometer (Telemetry) offers a Single Point of Contact for many billing systems,
facilitating each counter they require to build customer billing around every future and current
component of OpenStack. The counter delivery is auditable and traceable. The counter should be
extensible easily for supporting new projects. Also, the agents implementing data collections must
be separated from the overall system.
Trove (Database)
Trove is the database-as-a-service that is used to provision a non-relational and relational engine
of the database.
Sahara (Elastic map-reduce)
Sahara can be defined as a component for rapidly and easily provisioning Hadoop clusters. Many
users will define various parameters such as Hadoop version number, node flavor information
(RAM and CPU settings, specifying disk space), cluster topology type, and more. After any user
offers each parameter, Sahara expands the cluster in less time. Also, Sahara offers a means for
scaling a pre-existing cluster of Hadoop by removing and adding worker nodes over demand.
Ironic (Bare metal)
Ironic is another project of OpenStack. It plans bare-metal machines rather than virtual machines.
Initially, Ironic was forked through the driver of Nova Bare metal and has derived into an isolated
project. It was the best idea as a plugin's set and bare-metal hypervisor API that collaborate with
various bare-metal hypervisors. It will apply IPMI and PXE in concert for turning off and on and
provisioning machines, although Ironic supports and could be developed with vendor-specific
plugins for implementing additional functionality.
Zaqar (Messaging)
Zaqar is a service to provide a multi-tenant cloud messaging facility for many web developers. It
offers a complete RESTful API that developers could apply for sending messages among several
components of the mobile and SaaS applications by applying a lot of patterns of communication.
This API is a powerful messaging engine developed with security and scalability in mind. Some
other components of OpenStack can develop with Zaqar for various surface events and to interact
with many guest agents that execute in an over-cloud layer.
Designate (DNS)
Designate can be defined as a REST API multi-tenant to manage DNS. It facilitates DNS as the
Service. This component is compatible with various backend technologies such as BIND and
PowerDNS. It doesn't offer the DND service as its goal is to interface using a DNS server (existing)
for managing DNS zones based on per tenant.
Manila (Shared file system)
OpenStack Manila (Shared file system) facilitates an open API for managing shares within the
vendor-agnostic structure. Standard primitives such as the ability to deny/give, delete, and create
access to any share. It can be applied in a range of different or standalone network environments.
Technical storage appliances through Hitachi, INFINIDAT, Quobyte, Oracle, IBM, HP, NetApp,
and EMC data systems can be supported and filesystem technologies as well like Ceph and Red
Hat GlusterFS.
Searchlight (Search)
Searchlight offers consistent and advanced search capabilities around many cloud services of
OpenStack. It accomplishes it by offloading the queries of user search through other API servers
of OpenStack by indexing the data into the ElasticSearch. This component is being developed into
Horizon. Also, it offers a command-line interface.
Magnum (Container orchestration)
Magnum is an API service of OpenStack improved by the containers team of OpenStack making
engines of container orchestration such as Apache Mesos, Kubernetes, and Docker Swarm
available as initial class resources within the OpenStack. Magnum applies heat for orchestrating
an operating system image that includes Kubernetes and Docker and executes that particular image
in bare metal or virtual machine inside the cluster configuration.
Barbican (Key manager)
Barbican is the REST API developed for the management, provisioning, and secure storage of
secrets. Barbican is focused on being helpful for each environment including huge ephemeral
Clouds.
Vitrage (Root Cause Analysis)
Vitrage is an OpenStack Root Cause Analysis (RCA) service to expand, analyze, and organize
OpenStack events and alarms, yielding various insights related to the problem's root cause and
reducing the existence before these problems are detected directly.
Aodh (Rule-based alarm actions)
This service of alarming allows the ability for triggering tasks based on specified rules against
event or metric data gathered by Gnocchi or Ceilometer.
15B) Write detailed steps to set the google app engine environment for executing any
program of your choice.
ANS:
Steps to Deploy an Application in the App Engine
With the above information, it is easy to understand the process of deploying an application in the
App Engine, as mentioned below:
1.
Go to the Google Cloud console and create an App Engine with runtime ‘ –Node.js ’and
environment
‘
–standard
’present
under
‘compute. ’
2.
Open
cloud
shell
and
clone
the
source
code
from
the
repo
https://github.com/vishnu123sai/App-engine-example.git
3.
Configuration file app.yaml is also available in the repo. In this example, other options like
scaling, resources, etc. in the configuration file are not being used for the sake of simplicity.
4.
Type “gcloud app deploy” to deploy your application. List of Prime Minister of India 472020)
5.
Once the application is deployed, the output on the link [PROJECT_ID].appspot.com will
be
visible.
For example, if the project id is vital-framing-245415, then the application URL will be
https://vital-framing-245415.appspot.com/
6.
Test
the
URL
on
chrome
Congratulations! Your application is successfully deployed on the App Engine.
16 B) Elaborate the working of MapReduce with an example
ANS:
Hadoop is highly scalable. You can start with as low as one machine, and then expand your cluster
to an infinite number of servers. The two major default components of this software library are:
•
MapReduce
•
HDFS – Hadoop distributed file system
In this article, we will talk about the first of the two modules. You will learn what MapReduce
is, how it works, and the basic Hadoop MapReduce terminology.
At a high level, MapReduce breaks input data into fragments and distributes them across different
machines.
The input fragments consist of key-value pairs. Parallel map tasks process the chunked data on
machines in a cluster. The mapping output then serves as input for the reduce stage. The reduce
task combines the result into a particular key-value pair output and writes the data to HDFS.
The Hadoop Distributed File System usually runs on the same set of machines as the MapReduce
software. When the framework executes a job on the nodes that also store the data, the time to
complete the tasks is reduced significantly.
As we mentioned above, MapReduce is a processing layer in a Hadoop environment. MapReduce
works on tasks related to a job. The idea is to tackle one large request by slicing it into smaller
units.
JobTracker and TaskTracker
In the early days of Hadoop (version 1), JobTracker and TaskTracker daemons ran operations
in MapReduce. At the time, a Hadoop cluster could only support MapReduce applications.
A JobTracker controlled the distribution of application requests to the compute resources in a
cluster. Since it monitored the execution and the status of MapReduce, it resided on a master node.
A TaskTracker processed the requests that came from the JobTracker. All task trackers were
distributed across the slave nodes in a Hadoop cluster.
YARN
Later in Hadoop version 2 and above, YARN became the main resource and scheduling
manager. Hence the name Yet Another Resource Manager. Yarn also worked with other
frameworks for the distributed processing in a Hadoop cluster.
MapReduce Job
A MapReduce job is the top unit of work in the MapReduce process. It is an assignment that Map
and Reduce processes need to complete. A job is divided into smaller tasks over a cluster of
machines for faster execution.
The tasks should be big enough to justify the task handling time. If you divide a job into unusually
small segments, the total time to prepare the splits and create tasks may outweigh the time needed
to produce the actual job output.
MapReduce Task
MapReduce jobs have two types of tasks.
A Map Task is a single instance of a MapReduce app. These tasks determine which records to
process from a data block. The input data is split and analyzed, in parallel, on the assigned compute
resources in a Hadoop cluster. This step of a MapReduce job prepares the <key, value> pair output
for the reduce step.
A Reduce Task processes an output of a map task. Similar to the map stage, all reduce tasks occur
at the same time, and they work independently. The data is aggregated and combined to deliver
the desired output. The final result is a reduced set of <key, value> pairs which MapReduce, by
default, stores in HDFS.
The Map and Reduce stages have two parts each.
The Map part first deals with the splitting of the input data that gets assigned to individual map
tasks. Then, the mapping function creates the output in the form of intermediate key-value pairs.
The Reduce stage has a shuffle and a reduce step. Shuffling takes the map output and creates a
list of related key-value-list pairs. Then, reducing aggregates the results of the shuffling to produce
the final output that the MapReduce application requested.
The Map and Reduce stages have two parts each.
The Map part first deals with the splitting of the input data that gets assigned to individual map
tasks. Then, the mapping function creates the output in the form of intermediate key-value pairs.
The Reduce stage has a shuffle and a reduce step. Shuffling takes the map output and creates a
list of related key-value-list pairs. Then, reducing aggregates the results of the shuffling to produce
the final output that the MapReduce application requested.
Download