To check the dtypes of single or multiple columns in Pandas you can use:
df.dtypes
Let's see other useful ways to check the dtypes in Pandas.
Step 1: Create sample DataFrame
To start, let's say that you have the date from earthquakes:
Date | Time | Depth | Magnitude Type | Type | Magnitude | Depth_int |
---|---|---|---|---|---|---|
1965-01-02 00:00:00+00:00 | 13:44:18 | 131.6 | MW | Earthquake | 6.0 | 131 |
1965-01-04 00:00:00+00:00 | 11:29:49 | 80.0 | MW | Earthquake | 5.8 | 80 |
1965-01-05 00:00:00+00:00 | 18:05:58 | 20.0 | MW | Earthquake | 6.2 | 20 |
1965-01-08 00:00:00+00:00 | 18:49:43 | 15.0 | MW | Earthquake | 5.8 | 15 |
1965-01-09 00:00:00+00:00 | 13:32:50 | 15.0 | MW | Earthquake | 5.8 | 15 |
Data is available from Kaggle: Significant Earthquakes, 1965-2016.
How to read and convert Kaggle data to Pandas DataFrame: How to Search and Download Kaggle Dataset to Pandas DataFrame
Step 2: Get dtypes for all columns in DataFrame
To get dtypes details for the whole DataFrame you can use attribute - dtypes
:
df.dtypes
the result is:
Date datetime64[ns, UTC]
Time object
Depth float64
Magnitude Type object
Type object
Magnitude float64
Depth_int int64
dtype: object
we can see several different types like:
datetime64[ns, UTC]
- it's used for dates; explicit conversion may be needed in some casesfloat64
/int64
- numeric dataobject
- strings and other
Step 3: Short explanation of dtypes in Pandas
Let's briefly cover some dtypes and their usage with simple examples. Table of the most used dtypes in Pandas:
Pandas dtype | Data Type | Description | Example | Creation |
---|---|---|---|---|
bool | bool | Boolean values – True or False | True | pd.BooleanDtype() |
category | NA | Limited list of values (can be fixed) | [‘red’, ‘blue’] | pd.Categorical([1, 2, 3, 1, 2, 3]) |
datetime64 | datetime | Datetime (conversion is needed) | 2020-11-16 22:50:18.092888+0000 | to_datetime(df['date']) |
float64 | float | Floating point numbers | 80.5 | df.astype('float64') |
int64 | int | Integer numbers | 8 | df.astype('int64') |
object | strings | String, text and other | Red Pandas | |
timedelta | timedelta | Duration between two dates or times | 0 days 00:00:00.000000001 | pd.Timedelta(42, unit='ns') |
More information about them can be found on this link: Pandas User Guide dtypes.
Pandas offers a wide range of features and methods in order to read, parse and convert between different dtypes. The most popular conversion methods are:
to_datetime(df['date'])
to_timedelta(df['timdelta'])
to_numeric(df['amount'])
df['amount'].astype('int32')
Step 4: Check if column is numeric, datetime, categorical etc
In this step we are going to see how we can check if a given column is numerical or categorical.
For this purpose Pandas offers a bunch of methods like:
is_string_dtype
is_dict_like
is_list_like
is_numeric_dtype
is_datetime64_dtype
To find all methods you can check the official Pandas docs: pandas.api.types.is_datetime64_any_dtype
To check if a column has numeric or datetime dtype we can:
from pandas.api.types import is_numeric_dtype
is_numeric_dtype(df['Depth_int'])
result:
True
for datetime exists several options like: is_datetime64_ns_dtype
or is_datetime64_any_dtype
:
from pandas.api.types import is_datetime64_any_dtype
is_datetime64_any_dtype(df['Date'])
result:
True
Step 5: List all numeric/datetime columns in Pandas DataFrame
If you like to list only numeric/datetime or other type of columns in a DataFrame you can use method select_dtypes
:
including
df.select_dtypes(include=['float64']).columns
result of the operation:
Index(['Depth', 'Magnitude'], dtype='object')
excluding columns by dtype:
df.select_dtypes(exclude=['float64','datetime']).columns
result:
Index(['Date', 'Time', 'Magnitude Type', 'Type', 'Depth_int'], dtype='object')
Step 6: Filter columns by dtype and name in Pandas DataFrame
As an alternative solution you can construct a loop over all columns. Then you can check the dtype and the name of the column.
Below we are listing all numeric column which name has word 'Depth':
from pandas.api.types import is_numeric_dtype
for col in df.columns:
if is_numeric_dtype(df[col]) and 'Depth' in col:
print(col)
As a result you will get a list of all numeric columns:
Depth
Depth_int
Instead of printing their names you can do something.
Step 7: Apply function on numeric columns only
To apply function to numeric or datetime columns only you can use the method select_dtypes
in combination with apply
.
The function below will iterate over all numeric columns and double the value:
def double_n(x):
return x *
df.select_dtypes(include=['float64']).apply(double_n)