First steps in Data Science (discussion)

This post is taken from a discussion in a Slack group related to first steps in Data Science: #data_science in Python Developers.

Only a few small edits were done from my side. There is valuable information which might help beginners in Data Science. There is also advice on learning in general.

Hope that will help you when you start your journey in Data Science!

Question 1 - First steps in data science?

Hello, I am a 2nd year IT student. Сan you help me with my first steps in data science? Please tell me what resources to use

Advice 1 - In the beginning

If you're at the very start I would probably start by getting a book and following it, and maybe Google blog posts as well.

Example Search: https://www.google.com/search?channel=fs&client=ubuntu&q=best+datascience+books

Example Data Science Book(from me):Python Data Science Handbook

× Pro Tip 1
If you're at the very start I would probably start by getting a book

Question 2 - Where to get the practice in Data Science

Where to get the practice. If the theory is clear, can you recommend some project?

Advice 2 - Choose area

You can try out some kaggle competitions or something to start.

Are you interested in learning ML, or more analytics focus?

If you are not sure about the above read the intro chapter of Book: Data Science in Context

× Pro Tip 2
If you are not sure how to start read the intro chapter of Data Science in Context

Advice 3 - The process from end to end

To start in data science ... you'd need to study linear algebra & statistics for the basic mathematical foundation.

Then, for the technical skills, you have to learn the basic tools - Python, R, and various toolkits that these include (such as Spark, etc). Then you need to acquire the ability to put these items into good use.

Thinking is more important than copypasting what someone else did on their Medium blog.

It's a long programme of study, if you are just starting

× Pro Tip 3
Thinking is more important than copypasting what someone else did on their Medium blog.

There is a vague line that divides data science from data analysis, and I am pretty sure that the line is defined by your mathematical abilities.

Now, having said that, most places will not have a business need that goes beyond an activity that is more properly defined as data analysis, so it is instructive to realize that data science is a poorly defined thing.

In a nutshell - get good at Python & R, and start applying to jobs with proof that you know both. Just let the cookie crumble at that point.

oh, I nearly forgot - SQL is a must.

Advice 4 - Interest as motivation

If I am interested in a subject, going for the core materials is insightful.

If not, I learn better by treating a subject as a black box that solves the problem I actually care about, and then familiarizing myself gradually and organically with the topic as I use a tool more and more, without ever having studied the "basics".

Due to that I studied a lot of Linear Algebra / Combinatorics as I find them very interesting topics, but I have used most software ( Keycloak, Airflow, RabbitMQ,... ) as tools that get the job I need done, without ever looking into the core CS topics.

My point being that if Linear Algebra doesn't appeal to you directly...

I might start with tutorials that show directly use cases:

  • Word Vec with NLP
  • Image Labeling with Computer Vision

to keep the motivation up, and build back from it.

It comes down to what you find more interesting, the problem being solved, or the technologies used to solve it.

Advice 5 - top-down vs bottom-up learning

There are 2 ways to learn:

  • top-down
  • bottom-up.

Top-down is preferable if you're a professional that lacks the time to spend internalizing basic concepts in a typical academic environment.

My advice to people in this situation is to monkey-see : monkey-do until you magically "get it". In other words, top-down.

In the military they would call this "results oriented training." So I suggest learning by example, and taking the time to understand the fundamentals as you go through each case. Still takes a long time, but much more effective IMO

× Pro Tip 4
A single example, if well chosen, can produce an incredibly productive lesson

You could, for instance, go through a well-chosen deep learning example, of which there is a myriad, and use it to understand weights, optimization algorithms (stochastic gradient descent, etc), the meaning of under / overtraining, training set selection strategies, neural net architectures, and so on. A single example, if well chosen, can produce an incredibly productive lesson.

You can consider yourself graduated if you can cogently interpret a publication in the field, and implement it in practice.

Advice 6 - Understand data. Master 1 thing

Usually when I am learning I do both ways for me it is easier to get and know fundamentals. (it is plenty of time if you manage it well - drop tv watching, youtube, disable all notifications on the phone, delete games if that distracts you).

Try to do what others have done and figure out each step what they have done.

Get some books, articles, videos about it and learn from it.

Check code libraries documentation (at the end of the day you need to use some code it is very rare that you will write everything yourself), knowing what each library can give you is powerful.

Looking into others codes helps to figure out structure and see if it fits what you want or you can put it more clearly with better design patterns ( this is form CS degree hitting).

× Pro Tip 5
understand what actual data means and how it was captured

Concentrate on 1 thing and learn it very well if you are really willing to get into that field. Knowing a single thing but very well helps to figure out other concepts, usually specific areas always have something in common.

I think that in data science is important to really understand what actual data means and how it was captured.

Question 3 - Where to get the practice in Data Science

I am pursuing for Data Science in my college degree
Any tip on what all should I do from the start of my college to be good at DS
also, i don't have any previous knowledge of programming

Advice 7

Learn basics of Python and Python notebook + matplotlib library (https://matplotlib.org/)
Learn about different data storages and databases ( SQL, no-SQL queries), merging data, general statistics.
Take a public dataset and try to understand what data is showing.