Data Science with Python and Pandas

Why the Python Ecosystem and Pandas for Data Science?

One of the main goals of Python has always been to ease the learning curve while remaining intuitive and powerful. The language is open source which helps the thrive of the scientific packages like:

  • TensorFlow
  • Scikit-Learn
  • Numpy
  • Keras
  • SciPy
  • Pandas

The ecosystem has great support from big companies and individuals. The flat learning curve allows scientists from different areas to enter the Data Science world.

Pandas sits on top of Python and Numpy and simplifies data manipulation. Pandas offer great range of functions like:

  • import and export of various formats
  • data wrangling
  • data cleaning
  • text processing
  • time series and much more

All this makes Pandas/Python a natural choice for learning and mastering Data Science.

History of Pandas and Python

Python was created in the late 1980s by Guido van Rossum. The initial idea of Guido was to create language which is close to plain English, powerful, open for every one and suitable for everyday tasks.

You can see the Hello world! example in python:

print('Hello, world!')

Decades later the language is in the top of the most used, loved and wanted languages:

Pandas was started in 2008 and became open source in 2009. The main idea behind Pandas was to:

be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.

Source: About Pandas

One of the most used code samples or Hello world! in Pandas is as simple as:

import pandas as pd

pd.read_csv("foo.csv")

This single line will give you shortcut to (plus few more lines):

  • reshaping and pivoting
  • slicing, fancy indexing, and subsetting
  • data alignment
  • data cleaning
  • data analysis
  • data mining