Uploaded by Farhanah Afendi

BIG DATA

advertisement
BIG DATA
What is Big Data?
❑'Extremely large collections of data (data sets) that
may be analysed to reveal patterns, trends, and
associations, especially relating to human behaviour
and interactions.’
❑the data sets are so large that conventional methods
of storing and processing the data will not work.
Characteristics of Big Data (5Vs)
❑ Volume -Volume refers to the huge amounts of data that is collected and generated
every second in large organizations. This data is generated from different sources such as IoT
devices, social media, videos, financial transactions, and customer logs.
❑ Variety - refers to the different sources of data and their nature. data can be both
structured and unstructured.
Structured data: this data is stored within defined fields (numerical, text, date etc) often with
defined lengths, within a defined record, in a file of similar records. An example of structured
data is found in banking systems, which record the receipts and payments from your current
account: date, amount, receipt/payment, short explanations such as payee or source of the
money.
Unstructured data: refers to information that does not have a pre-defined data-model. It
comes in all shapes and sizes and it is this variety and irregularity which makes it difficult
to store in a way that will allow it to be analyzed, searched or otherwise used. Examples
are photos, audio files, videos, text files, and PDFs.
Characteristics of Big Data (5Vs)
❑Velocity - refers to the speed at which the data is created or generated. This
speed of data producing is also related to how fast this data is going to be
processed. This is because only after analysis and processing, the data can meet the
demands of the clients/users.
❑ Veracity -means accuracy and truthfulness and relates to the quality of the data.
defines the degree of trustworthiness of the data. As most of the data you encounter
is unstructured, it is important to filter out the unnecessary information and use the rest
for processing.
❑ Value – data is reliable and useful and result in adding value to the company. An
example of how data analysis was used by British supermarket group Tesco to add
value (refer to the Big Data 1 Article)
Processing and Analysing big data
(known as Big Data Analytics)
❑ Data mining: analysing data to identify patterns and establish relationships such as
associations (where several events are connected), sequences (where one event
leads to another) and correlations.
❑ Predictive analytics: a type of data mining which aims to predict future events.
For example, the chance of someone being persuaded to upgrade a flight.
❑ Text analytics: scanning text such as emails and word processing documents to
extract useful information. It could simply be looking for key-words that indicate an
interest in a product or place.
❑ Voice analytics: as above but with audio.
❑ Statistical analytics: used to identify trends, correlations and changes in behaviour
These analytical findings can lead to:
▪ Better marketing
▪ Better customer service and relationship management
▪ Increased customer loyalty
▪ Increased competitive strength
▪ Increased operational efficiency
▪ The discovery of new sources of revenue.
The Big Data
(DIKW)
Pyramid
❑ also known as the
knowledge pyramid
became well known
in 1989 from the work
of Askoff.
❑ With the
emergence of big
data, the pyramid
has also become
known as the big
data pyramid
The Big Data (DIKW) Pyramid
Jennifer Rowley in 2007 explained the relationships between data, information,
knowledge and wisdom.
❑ Data: a range of data can be collected from various sources – this is raw data and
not particularly useful in this form.
❑ Information: The raw data can be analysed to look for trends or patterns, for
example it may appear that there is a link between the purchase of a particular
product and a particular group of customers. This is information.
❑ Knowledge: The information can be analysed further to establish how the
identified links are connected. Knowing the details of exactly what types of
customers buy a particular product or favour particular product features is
knowledge.
❑ Wisdom: The knowledge gathered can be used to make informed business
decisions.
Big data is relevant to performance
management in the following ways:
❑ Gaining insights (eg about customers’ preferences) which can then
be used to improve marketing and sales, thus increasing profits and
shareholders’ wealth.
❑ Forecasting better (eg customer’s future spending patterns, when
machines will need replacing) so that more appropriate decisions can
be made.
❑ Automating of high level business processes (eg lawyers scanning
documents) which can lead to organisations becoming more efficient.
❑ Providing more detailed and up to date performance measurement.
Some potential dangers and drawbacks
of Big Data:
❑ Cost: It is expensive to establish the hardware and analytical software needed,
though these costs are continually falling.
❑ Regulation: Some countries and cultures worry about the amount of information
that is being collected and have passed laws governing its collection, storage and
use. Breaking a law can have serious reputational and punitive consequences.
❑ Loss and theft of data: Apart from the consequences arising from regulatory
breaches as mentioned above, companies might find themselves open to civil legal
action if data were stolen and individuals suffered as a consequence.
❑ Incorrect data: If the data held is incorrect or out of date incorrect conclusions are
likely. Even if the data is correct, some correlations might be spurious leading to false
positive results.
Examples of companies using big data
(Big Data Article 2)
▪ Amazon - The world’s leading e-retailer collects huge amounts of
information about customers’ preferences and habits which allow it to
market very accurately to each customer. It routinely makes
recommendations to customers based on products previously purchased.
▪ Airlines
▪ Target
▪ Walmart’s Polaris search engine
▪ Beredynamic - manufacturer of high quality audio products such as
microphones and headphones
▪ Morton’s Steak House
New A1e:
Controls over confidential information
• Encrypt sensitive files.
Encryption is a process that renders data unreadable to anyone except those
who have the appropriate password or key. By encrypting sensitive files (by
using file passwords, for example), you can protect them from being read or
used by those who are not entitled to do either.
• Manage data access.
Controlling confidentiality is, in large part, about controlling who has access to
data. Ensuring that access is only authorized and granted to those who have a
"need to know" goes a long way in limiting unnecessary exposure. Users should
also authenticate their access with strong passwords and, where practical, twofactor authentication. Periodically review access lists and promptly revoke
access when it is no longer necessary.
New A1e:
Controls over confidential information
• Physically secure devices and paper documents.
Controlling access to data includes controlling access of all kinds, both digital
and physical. Protect devices and paper documents from misuse or theft by
storing them in locked areas. Never leave devices or sensitive documents
unattented in public locations.
• Securely dispose of data, devices, and paper records.
When data is no longer necessary for University-related purposes, it must
be disposed of appropriately.
• Sensitive data, such as Social Security numbers, must be securely erased
to ensure that it cannot be recovered and misused.
• Devices that were used for University-related purposes or that were
otherwise used to store sensitive information should be destroyed or
securely erased to ensure that their previous contents cannot be
recovered and misused.
• Paper documents containing sensitive information should be shredded
rather than dumped into trash or recycling bins.
New A1e:
Controls over confidential information
• Manage data acquisition.
When collecting sensitive data, be conscious of how much data is actually needed and
carefully consider privacy and confidentiality in the acquisition process. Avoid acquiring
sensitive data unless absolutely necessary; one of the best ways to reduce confidentiality risk
is to reduce the amount of sensitive data being collected in the first place.
• Manage data utilization.
Confidentiality risk can be further reduced by using sensitive data only as approved and as
necessary. Misusing sensitive data violates the privacy and confidentiality of that data and of
the individuals or groups the data represents.
• Manage devices.
Computer management is a broad topic that includes many essential security practices. By
protecting devices, you can also protect the data they contain. Follow basic cybersecurity
hygiene by using anti-virus software, routinely patching software, whitelisting applications,
using device passcodes, suspending inactive sessions, enabling firewalls, and using wholedisk encryption.
Download