Data Science: An Introduction

Let’s get to know with data science, from what until how to master it

What is Data Science?

Many articles define data science distinctly. Cassie Kozyrkov has written an article about the journey of finding a simple yet meaningful definition. Wikipedia defines it clearly,

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data

There are a lot of diagrams that visualize what skills need to master data science. The battle of Venn Diagrams in KDnuggets will show you some of the well-known diagram, if not all. Choosing among those diagrams, I am stating that the last diagram (not so Venn Diagram) is more suited. It’s like telling us to ignore all the boundaries of each discipline, make a whole new data science disciplinary, then create other supporting skills for data science outside data science circle and aim an arrow for each those skills to data science circle.

Data science Venn diagram of the future from KDnuggets

It’s more about science and collaboration

I just want to revise a phrase multi-disciplinary and replace it with a phrase, collaborative. Yes, data science is a collaborative field. A data scientist may, even most of the time, require help from other people (beyond those 3 domains — assuming those domains are what an aspiring data scientist need to master).

Why does Data Science exist?

Everyone knows that data is ubiquitous. Every day we generate data, on purpose and/or accidentally. This big data has grown into a massive data that conventional strategy and research can’t handle. Then, data science is born. We can argue with this background but, surely data science exists because of the needs of modern process or method to handle it. Hence, we need a computer to help us to do computation. So, it's not overstated if someone says data scientist is a statistician who can code. In fact, these people already exist even before data science term exists.

Data science exist because of needs. It means if you (your company) doesn’t need data science yet, then don’t hire data scientist yet.

Things you need to know

I consider myself as an aspiring data scientist, still aspiring. My experience real-world data science is still less than other people who call themselves as a data scientist. So, I want to share what things or skills, you (we) need to be a (real) data scientist.

  1. Statistics: Probability, Hypothesis Testing, Distribution, etc.
  2. Programming: Python, R, Matlab, Octave, C++ (choose one or take all)
  3. Machine Learning: Linear Regression, SVM, Decision Tree, Random Forest, etc.
  4. Deep Learning (optional): CNN, RNN, LSTM, GAN, etc.

Learning Resources

There are so many resources on the internet you can browse by yourself. You can type “data science learning resources” or other queries, then you will get what you want. Although, some publications in medium mention have their own top resources version. But, here are resources I have learned from.

  1. Udacity
  2. Udemy
  3. Medium
  4. KDnuggets
  5. Elite Data Science

Breathe data-driven | Data Science Instructor & Engineer @ Bitlabs