Data Science: An Introduction

Let’s get to know with data science, from what until how to master it

Syahrul Hamdani
4 min readMay 11, 2019

So, you are curious about data science? Medium is a platform recommended by Thor and Loki. Let’s start the discussion.

What is Data Science?

Many articles define data science distinctly. Cassie Kozyrkov has written an article about the journey of finding a simple yet meaningful definition. Wikipedia defines it clearly,

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data

There are a lot of diagrams that visualize what skills need to master data science. The battle of Venn Diagrams in KDnuggets will show you some of the well-known diagram, if not all. Choosing among those diagrams, I am stating that the last diagram (not so Venn Diagram) is more suited. It’s like telling us to ignore all the boundaries of each discipline, make a whole new data science disciplinary, then create other supporting skills for data science outside data science circle and aim an arrow for each those skills to data science circle.

Data science Venn diagram of the future from KDnuggets

Actually, it’s not a problem of defining what data science means. Well, when I was taking Fundamentals of Mathematics II class, my teacher told us that we fully understand the definition of a concept if and only we are able to point out what is and what is not the example of that concept. So, in my opinion, it’s up to you to define what data science actually means as long as we are capable of pointing out what is and what is not.

It’s more about science and collaboration

I just want to revise a phrase multi-disciplinary and replace it with a phrase, collaborative. Yes, data science is a collaborative field. A data scientist may, even most of the time, require help from other people (beyond those 3 domains — assuming those domains are what an aspiring data scientist need to master).

There is “science” in data science. Data is the object and science is the “how”, how to make the data useful, how to get insights from data, how to infer using the data, most of them are about science. Data science is always related to machine learning to use the data and make an inference about the future.

Yufeng G wrote an article about what machine learning is and you can dive in directly with him and Google Cloud.

Why does Data Science exist?

Everyone knows that data is ubiquitous. Every day we generate data, on purpose and/or accidentally. This big data has grown into a massive data that conventional strategy and research can’t handle. Then, data science is born. We can argue with this background but, surely data science exists because of the needs of modern process or method to handle it. Hence, we need a computer to help us to do computation. So, it's not overstated if someone says data scientist is a statistician who can code. In fact, these people already exist even before data science term exists.

Also, machine learning has become a major part of data science. All data science job postings require candidates to possess machine learning knowledge, from traditional machine learning like linear regression or SVM, etc. until deep learning like CNN or LSTM.

Data science exist because of needs. It means if you (your company) doesn’t need data science yet, then don’t hire data scientist yet.

Things you need to know

I consider myself as an aspiring data scientist, still aspiring. My experience real-world data science is still less than other people who call themselves as a data scientist. So, I want to share what things or skills, you (we) need to be a (real) data scientist.

  1. Mathematics: Calculus, Linear Algebra, Multivariable Calculus, Optimization, etc.
  2. Statistics: Probability, Hypothesis Testing, Distribution, etc.
  3. Programming: Python, R, Matlab, Octave, C++ (choose one or take all)
  4. Machine Learning: Linear Regression, SVM, Decision Tree, Random Forest, etc.
  5. Deep Learning (optional): CNN, RNN, LSTM, GAN, etc.

It seems too much to be learned, but, once you have learned the basic, the intuition beyond, it’s so much easier. Even, we can learn recent development in data science research faster and apply it by ourselves.

Learning Resources

There are so many resources on the internet you can browse by yourself. You can type “data science learning resources” or other queries, then you will get what you want. Although, some publications in medium mention have their own top resources version. But, here are resources I have learned from.

  1. Kaggle
  2. Udacity
  3. Udemy
  4. Medium
  5. KDnuggets
  6. Elite Data Science

This is my first post about data science. I hope I can continuously write articles related to data science more frequently. See you in the next article!

--

--