Uploaded by Ruchi Bhadauria

How to Become a Data Scientist in 180 days

advertisement
Dedicated to our thriving community – which learns, loves, shares, competes, provides
feedback and owns what we have built as much as we do.
A Big Thanks to the entire Analytics Vidhya team, who built this fabulous community and
continue to do so. The hard work you all put in enables me to think bigger and give shape
to our journey.
Special Thanks to all the authors on Analytics Vidhya who contributed to the book! This
would not be in this form without your contributions.
About the book
Are you struggling to land a data science role? Have you taken courses, certifications and
degrees, participating in competitions, spent countless hours going through videos and
articles, and yet can’t quite make the breakthrough?
We’ve been there. Landing a role in data science is one of the most daunting prospects.
And yet thousands of freshers and transitioners are still trying to figure out how to do it.
That’s because this is one of the rewarding fields to work in right now. You’ve chosen the
right space but you might be agonizing over how in the world can you land a role?
Also, you might be wondering - is this book right for me?
Well, if you answered yes to any of the above questions, or found yourself nodding to the
points after that, then yes, this book is absolutely for you!
We understand the pain of fruitless effort and we want to help you overcome that. Analytics
Vidhya has helped thousands of aspiring data scientists make the leap and we want to help
you do the same.
We’ve seen countless aspirants drop their journeys in between due to unstructured
resources and a concrete plan. We want to help you avoid that.
So yes, this book is for you.
3
About the Author
Analytics Vidhya is the World's Leading Data Science Community & Knowledge Portal. The
mission is to create the next-gen data science ecosystem! This platform allows people to
learn & advance their skills through various training programs, know more about data
science from its articles, Q&A forum, and learning paths. Also, we help professionals &
amateurs to sharpen their skill sets by providing a platform to participate in Hackathons.
Our viewers remain updated with the latest happenings around the world of analytics using
our monthly newsletters. Stay in touch with us to be a perfect and informative data
practitioner. www.analyticsvidhya.com.
Our Other Platforms
Courses: https://courses.analyticsvidhya.com/
Blog: https://www.analyticsvidhya.com/blog/
DataHack: https://datahack.analyticsvidhya.com/contest/all/
Jobs: https://jobsnew.analyticsvidhya.com/jobs/all
Bootcamp: https://www.analyticsvidhya.com/data-science-immersive-bootcamp/
Initiate AI: https://initiateai.analyticsvidhya.com/
Discuss: https://discuss.analyticsvidhya.com/
4
Introduction
From Google, Microsoft, Facebook to Swiggy, Zomato, everybody wants to get on just one
bandwagon – Data Science and Machine Learning.
There is no denying the fact that Data Science is one of the fastest-growing fields along with its
job opportunities.
The global machine learning market is expected to reach $20.83 Billion by the year 2024. That’s
massive!
According to Glassdoor, the average pay scale of a data scientist is Rs. 100k per year in India
whereas the average salary of a computer programmer is Rs. 400k per year. That is the kind of
scale we are talking about.
5
It is an exciting time to be in the field of data science!
However, becoming a data scientist isn't a linear path. There are an endless number of
unstructured resources out there, limited opportunities, sparsity of mentors, it is even hard to
manage your time if you are a working professional or even a student.
We understand the importance of structured resources and time management in order to fulfil
your dream of becoming a successful data scientist and that’s why we bring you this book!
So, if you are ready to take up the uphill challenge of becoming a data scientist in 180 days then
we will start with talking about the most essential things you need to follow during this time. We
will then cover the skills you need to master and finally the roadmap to become a data scientist
in 180 days.
“Did you know? The Data Science Immersive Bootcamp is a program that aims to make you a
data scientist in 180 days. During the 180 days, you also get to be part of paid internship”
6
What do you need to follow to become a data scientist in 180
days?
So you have finally decided to pursue your career as a data scientist! It's exciting but let us warn
you, it requires sheer grit and determination to finish what you started so let us discuss some of
the points you must follow by heart to bring your dream into a reality!
Devote 5-6 hours daily
YES! You heard it right! If you are planning on taking up the uphill task of becoming a data
scientist within 180 days then you need to devote 5-6 hours daily for the next 180 days (don't
worry you can take rest on the weekends). This will ensure that you have a continuous
concentration and full attention. During this time you will not only be learning about data
science but also working on hands-on projects.
Follow a Plan
Now that you have planned to give 5-6 hours to your data science career, you need a solid
master plan that you will follow. We'd suggest you to note down all the tools and techniques you
are planning to cover and start finding and collecting the relevant resources you will refer to.
You can also refer to the roadmap section below to get your roadmap to become a data scientist
in 180 days.
7
No Disturbance
Remember the last time you achieved something big or when the last time you performed really
well in an exam? It would have definitely required a lot of hard work, focus and determination.
You will need the same efforts here. So, put your phone on silent mode and start working!
Learning isn't enough: Practical Applications are important
During the 180 days of data science, you won't just be learning through videos or blogs, it will be
imperative that you work on your hands-on skills. There are several things you can do to work
on these hands-on skills like projects, internships, writing blogs, and participating in hackathons.
We will discuss this more in the later sections.
Now that you are well equipped with everything you will need for the next 180 days. Let us move
deeper into the skills you will acquire during this time.
We know it’s very difficult to follow these advice if you don’t have the required resources and
plans. Data Science Immersive Bootcamp is an online, instructor led full time program that
solves these problems.
8
Skills required to become a successful data scientist
Data science is a multi-faceted role. There is no one-size-fits-all approach to learning data
science. Having said that, there are a few core skills you will need to pick up to make a
successful career transition to data science.
Here are the key skills you would need:
●
●
●
●
●
●
Programming Language
Statistics
Machine learning concepts
Structured thinking
Ability to work with Databases
Communication skills
Apart from these core skills, there are other skills you should be aware of, such as:
1.
2.
3.
4.
Deep Learning concepts
Big Data
Software Engineering
Model Deployment
Let's go in-depth and talk more about the skills that you will need to become a successful data
scientist
9
Programming Language
Machine Learning has seen a great jump only because of the boost in computing power.
Programming provides us a way to communicate with machines. Do you need to become the
best in programming? Not at all. But you will definitely need to be comfortable with it.
First of all, choose the programming language of your choice. Python, R, or Julia are to name a
few and each has its own set of Pros and Cons. Python is a general-purpose programming
language having multiple data science libraries along with rapid prototyping whereas R is a
language for statistical analysis and visualization. Julia offers the best of both worlds and is
faster. If you are confused about which language to choose, we have compiled a resourceful
article for you:
●
5 Popular Data Science Languages – Which One Should you Choose for your Career?
Python is the market leader right now and continues to be widely used in the
industry. It's a lot easier to perform machine learning tasks using Python, due to the
availability of libraries and high support for deep learning.
Statistics
Statistics is the grammar of data science.
When you start learning to write sentences, you must be familiar with grammar to build the right
sentences. Similarly, statistics is an essential concept before you can produce high-quality
models. Machine Learning starts out as statistics and then advances. Even the concept of linear
regression is an age-old statistical analysis concept.
🙂
The knowledge of the concept of descriptive statistics like mean, median, mode, variance, the
standard deviation is a must. Then come the various probability distributions, sample and
population, CLT, skewness and kurtosis, inferential statistics – hypothesis testing, confidence
intervals, and so on.
Statistics is a MUST concept to become a data scientist. You can deep dive into some of these
concepts with these clear articles and their examples:
●
●
●
●
Comprehensive and Practical Guide to Learn Inferential Statistics
Statistics for Data Science: What is Normal Distribution?
Statistics for Analytics and Data Science: Hypothesis Testing and Z-Test vs. T-Test –
Statistics for Data Science: What is Skewness and Why is it Important?
10
Machine Learning Concepts
For a data scientist, machine learning is the core skill to have. Machine learning is used to build
predictive models. For example, you want to predict the number of customers you will have in
the next month by looking at the past month’s data, you will need to use machine learning
algorithms.
You can start with a simple linear and logistic regression model and then move ahead to
advanced ensemble models like Random Forest, XGBoost, CatBoost, and so on. It’s a good thing
to know the code for these algorithms (which just takes 2-3 lines) but what’s most important is
to know how they work. This will help you in hyperparameter tuning and ultimately a model that
gives a low error rate.
If you are looking for specialization, Natural Language Processing (NLP) and Computer Vision
are two fields that are absolutely thriving right now. Each requires you to dive deep into those
specific fields so make sure you're aware of what you're getting into.
This is as good a place to start as any:
●
Commonly Used Machine Learning Algorithms
Structured Thinking
Structured thinking is a process of putting a framework to an unstructured problem. Having a
structure not only helps an analyst understand the problem at a macro level, but it also helps by
identifying areas that require deeper understanding.
Without structure, an analyst is like a tourist without a map. He might understand where he
wants to go (or what he wants to solve), but he doesn’t know how to get there. He would not be
able to judge which tools and vehicles he would need to reach the desired place.
How many times have you come across a situation when the entire work had to be re-done
because a particular segment was not excluded from data? Or a segment was not included? Or
just when you were about to finish the analysis, you come across a factor you did not think of
before? All these are results of poorly structured thinking.
Here are a few resources to help you get started with structured thinking:
●
The Art of Structured Thinking
11
●
Tools for Improving Structured Thinking for Data Scientists
Ability to work with Databases
As a hands-on data science professional, you'll be working a LOT with databases. You will need
them to extract your data, extract subsets, and extract samples.
Hence, having hands-on knowledge of databases is essential. The most common
database language you should pick up is SQL.
SQL is a must-have skill for every data science professional. You should start from the basics of
databases and structured query language (SQL) and learn about everything you would need in
any data science profession, including Writing and executing efficient Queries, Joining multiple
tables, and appending and manipulating tables.
Here are a few resources to help you get started with Databases:
●
●
24 Commonly used SQL Functions for Data Analysis tasks
8 SQL Techniques to Perform Data Analysis for Analytics and Data Science
Communication skills
“Good communication is just as stimulating as black coffee, and just as hard to
sleep after.” – Anne Morrow Lindbergh
Data Science projects are more of a treasure hunting job, the treasure being the insights you
fetch from the data. The question is what is the price of the treasure? Well, that is decided by
your stakeholders. The only way to get a good price is to be able to communicate how insightful
the results are and how this treasure can help them in improving the profits and organization.
Furthermore, the quality of a great data scientist is to formulate the problem statement. At the
start of the project, the stakeholders tell their requirements to the data scientist, and then the
latter formulate a problem statement. For example, the stakeholder needs to improve the
content recommendation of their OTT platform so that the retention time increases. This is a
very vague description, it’s the job of the data scientist to communicate the right problem
statement.
12
Whatever we have covered so far has a lot to do with understanding different data science
concepts. We've covered both the technical side (programming, machine learning, statistics,
etc.) and the soft skills aspect (structured thinking).
Do you need a structured list of topics that you need to cover during these 6 months? You can
refer to Data science Immersive Bootcamp’s 6 month Curriculum.
13
Focus on Gaining Hands-On and Practical Experience in Data
Science to Become Job Ready Data scientist
Do you Want to know what's the secret sauce of a guaranteed data science job? It's applying
your knowledge in a practical scenario! Yes, you need to marry your theoretical knowledge with
hands-on practical experience to truly stand out as a data scientist. There are broadly three
ways you can do this:
1. Participate in hackathons: This is perhaps the most popular option to gain practical
knowledge. Data science competitions and hackathons are awesome! You'll love the
variety of business problems we get to solve and when we add in the pressure of finding
a solution under a tight deadline – it’s a great learning experience. Data Science
hackathons area great way to:
○ Test your data science knowledge
○ Compete against top data science experts from around the world and gauge
where you stand
○ Get hands-on practice of a data science problem working in a deadline
environment
○ Improve your existing data science skillset
○ Enhance your existing data science resume
○ Get started with hackathons here
2. Pick up open source data science projects: One key thing that has helped transitioners
immensely is picking an open-source data science project and running with it. This not
only helps you understand the key areas you need to improve on but also shows you the
way forward. And these projects aren’t your run-of-the-mill data science projects. These
14
are specific projects that tackle a certain data science sub-field, such as computer
vision, web analytics, and so on. The project could be a dataset, a state-of-the-art library
that has brought the data science field forward, or even an open-source analytics tool.
So, pick a project that intrigues you and start working on it today! Check out more open
source projects here!
3. Apply for data science internships: This is the most popular path to breaking into the
data science industry. Even for experienced people – internships are a very effective way
to break into data science. We have now seen so many successful transitions enabled by
internships. Not only do you gain hands-on experience in data science, but you also get
to learn how the industry works and how a typical data science project functions. It's an
invaluable experience!
It becomes very tricky and hard to manage your time balancing theoretical
knowledge and practical experience while also applying for internships and data
science jobs at the same time.
Do you want to know how you can get exposure to all these 3 practical learning
experiences? You can be a part of the Data Science Bootcamp in which you get to
participate in hackathons, work on real life projects, write data science blogs, weekly
mock interviews and of course you will be working on your paid internship along with
this.
15
Roadmap to become a data scientist in 180 days
Well, now that we have covered all the skills you need to become a data scientist, it is high time
that we discuss how you are going to achieve these skills within a limited time frame. I will be
referring to the roadmap for data science immersive Bootcamp.
Data Science Immersive Bootcamp program(with Job Guarantee*) is an instructor led online
program which comes along with a paid internship and covers data science, cloud computing
and data engineering in 180 days.
Here's the roadmap -
You don't need to get overwhelmed by the number of tools and techniques you will cover during
this phase. We will break them down for you. You can customize your learning plan according to
your future goal as well.
16
Deep Dive into the world of Analytics with Excel and SQL
Start your journey with basic analytics tools such as Excel and SQL. During this time it is critical
that you master the basics.
Microsoft Excel is the gold standard in data analysis tools. There’s no question about it –
industry experts, professionals and veterans still lean heavily on Excel’s prowess and Swiss
Army Knife nature to slice and dice their data.
Structured Query Language (SQL) has been around for decades. It is a programming language
used for managing the data held in relational databases. SQL is used all around the world by a
majority of big companies. A data analyst can use SQL to access, read, manipulate, and analyze
the data stored in a database and generate useful insights to drive an informed decision-making
process.
Make sure that you do ample practice of excel and SQL functions before moving forward.
SQL is one of the topics from which interviewers ask the most questions. Your next
interview might start from a SQL query question!
17
Mastering your Storytelling Skills
Imagine watching a cricket match stats, you are shown with the runs scored on each bowl in the
form of a table. Do you think you will get any important information from this? What if you are
shown a bar chart of runs scored in each over? Seems better. Right? It is not in human nature to
understand blocks unless you make them interactive.
Storytelling is the utmost important acquired skill by a data scientist and PowerBI is one of the
tools you can use to tell your story with data. Power BI is Microsoft’s proprietary product for
performing business intelligence tasks. It is a cloud-based business analytics solution suite that
provides the necessary tools to turn vast volumes of data across silos into accessible
information. It has been consistently ranked in the Gartner BI Magic Quadrant.
Polish your Python Coding Skills
Python is one of the post popular languages to get started in machine learning. It's time to
improve on your coding skills.
Python is a general-purpose, high-level interpreted language that has been growing rapidly in the
applications of data science, web development, rapid application development. Its ease of use
and learning has certainly made it very easy to adapt for beginners.
18
Python has efficient high-level data structures and effective execution of object-oriented
programming. It has a comprehensive base library along with a large number of libraries for
data science making it one of the most strong competitors.
Master SQL and NoSQL Databases
You can’t get away from learning about databases in data science. In fact, we need to become
quite familiar with how to handle databases, how to quickly execute queries, etc. as data
science professionals. There’s just no way around it!
SQL is Standard Query Language that aids in querying relational databases. Hence, these
databases are also often referred to as SQL databases. NoSQL or Not only SQL came to the
picture in the late 2000s. These are flexible, scalable, cost-efficient, and schema-less databases.
In comparison with SQL databases, they are of multiple types: document-based, key-value
based, wide column-based, graph-based. Each has its own pros and cons.
Although, we have mentioned SQL in the first step, but it is definitely advised to get in depth of
this topic if you are exploring data engineering as a career field.
Explore the world of data with statistics and EDA
19
Statistics is the building block of machine learning techniques. Before diving into machine
learning concepts, it is essential to understand about the data, getting the feel of it.
Exploratory Data Analysis is a process of examining or understanding the data and extracting
insights or main characteristics of the data. EDA is generally classified into two methods, i.e.
graphical analysis and non-graphical analysis.
EDA is very essential because it is a good practice to first understand the problem statement
and the various relationships between the data features before getting your hands dirty.
Machine Learning: Beginner to Advanced
Till now we have covered tools and techniques that will help you in understanding about the
data, analyzing it but we now let's talk about the predictive modelling.
During this phase, you will be learning about machine learning from basics to advanced starting
from linear and logistic regression, KNN, SVM all the way upto Ensemble and Boosting
algorithms. It is advised not just to implement the model building code but to understand each
topic in depth, their mechanism, pros and cons.
As part of the data science Bootcamp, we cover projects for each of these topics.
We would highly recommend you to work on real life problems after each of these
completing each of these tools and techniques.
20
Build Data pipelines using Spark
The world is creating data at an unprecedented rate, here are some mind-boggling numbers for
your reference – more than 500 million tweets, 90 billion emails, 65 million WhatsApp messages
are sent – all in a single day! 4 Petabytes of data are generated only on Facebook in 24 hours.
That’s incredible!
This, of course, comes with challenges of its own. How does a data science team capture this
amount of data? That's why Data pipelines come into place.
Apache Spark is an open-source, distributed cluster computing framework that is
used for fast processing, querying and analyzing Big Data.
It is the most effective data processing framework in enterprises today. It’s true that the cost of
Spark is high as it requires a lot of RAM for in-memory computation but is still a hot favorite
among Data Scientists and Big Data Engineers.
Understand Cloud ecosystem with AWS
21
Cloud computing has seen tremendous growth in the past few years. Almost every organization
nowadays uses cloud computing for its wide range of services. Therefore, it is crucial that you
learn about the cloud ecosystem.
AWS is a cloud computing platform by Amazon that provides services such as Infrastructure as
a Service (IaaS), platform as a service (PaaS), and packaged software as a service (SaaS) on a
pay-as-you-go basis. It was launched in 2006 but was originally used to handle Amazon’s online
retail operations.
If you have worked properly on the above skills, you are ready for a job for most of
the roles out there.
Work with Deep Learning models on CV and NLP applications
Do you want to acquire a super power? How about learning neural networks? Neural networks
are at the heart of the deep learning revolution that’s happening around us right now.
Neural networks are the present and the future. The different neural network architectures like
convolutional neural networks (CNN), recurrent neural networks (RNN), and others have altered
the deep learning landscape.
Once you master the theoretical aspect, it's imperative to work on deep learning projects
22
Deploy your machine learning models
It is time to learn how to deploy models from different domains ranging from Machine Learning
to Deep Learning, Natural Language Processing (NLP) to Computer Vision (CV).
In a typical machine learning and deep learning project, we usually start by defining the problem
statement followed by data collection and preparation, understanding of the data, and model
building, right?
But, in the end, we want our model to be available for the end-users so that they can make use
of it. Model Deployment is one of the last stages of any machine learning project and can be a
little tricky. How do you get your machine learning model to your client/stakeholder? What are
the different things you need to take care of when putting your model into production? This is
where Model Deployment comes in.
23
Final Thoughts
So, there we have it! The roadmap to become a data scientist in 180 days!
And as we said, you will need to follow a well structured plan and dedicate around 5-6
hours per day to learning data science.
This book should help you get started on your learning journey and what all you need to
cover in order to become a data scientist.
The Data Science Immersive Bootcamp instills all the skills, tools and techniques in its
curriculum. It comes with a paid internship as well as job guarantee. You can check out the
program here .
And always remember, practice is key! The more you practice, the better your understanding
of data science will become. So make sure you add discipline to your journey, follow a
structured learning path, and there isn’t any obstacle you won’t be able to overcome. All the
best!
24
25
Download