Convert string, K and M to number, Thousand and Million in Pandas/Python

In this short tutorial, we'll cover how to convert natural language numerics like M and K into numbers with Pandas and Python.

We will show two different ways for conversion of K and M to thousand and million. We will also cover the reverse case converting thousand and million to K and M.

So we will cover:

  • 0.1M to 100000
  • 1 K - 1000
  • 10000 to 10K
  • forty two - 42

Two Python libraries:

  • humanize - turning a number into a fuzzy human-readable expression
  • numerizer - convert natural language numerics into ints and floats

The image below show the examples:

Setup

Let's use the following DataFrame the conversion from natural language numerals to numbers:

import pandas as pd
import matplotlib.pyplot as plt

data={'day': [1, 2, 3, 4, 5],
     'numeric': [22, 222, '22K', '2M', '0.01 B'],
     'numbers': [110, 11000, 1000000, 33300000, 456873],
     'lang': ['one', 'five', 'twelve', 'forty two', 'one hundred and five']}

df = pd.DataFrame(data,
                  columns=['day', 'numeric', 'numbers', 'lang'])

Our data looks like:

day numeric numbers lang
0 1 22 110 one
1 2 222 11000 five
2 3 22K 1000000 twelve
3 4 2M 33300000 forty two
4 5 0.01 B 456873 one hundred and five

Step 1: Convert K/M to Thousand/Million

First we will start the conversion of large number abbreviations to numbers. We will map the abbreviations to the math expression.

So we will convert:

22K -> 22 * 10**3
0.1 M -> 0.1 * 10**6
mp = {'K':' * 10**3', 'M':' * 10**6', 'B':' * 10**9', 't':' * 10**12', 'q':' * 10**15', 'Q':' * 10**15'}
pd.eval(df['numeric'].replace(mp.keys(), mp.values(), regex=True))

This will give us:

array([22.0, 222.0, 22000.0, 2000000.0, 10000000.0], dtype=object)

As we can see it works fine for columns with mixed data. It works well also for numeric values with spaces like 1 M

pd.eval limit 100 rows

There seems to be limit of pd.eval and the returned results by eval are limited:

len(pd.eval([1 * 10**6] * 105))

result:

101

Last five results are:

 [1000000,
 1000000,
 1000000,
 1000000,
 Ellipsis]

So the above code can be rewritten to:

df['cols_num'] = df['col'].replace(mp.keys(), mp.values(), regex=True)
df['col_num'] = df.apply(lambda x: eval(x.col_num), axis=1)

to make it working for more than 100 rows.

Step 2: Convert Thousand/Million to K/M

For this step we will use the Python library - humanize. It will help us to** convert easily and reliably large numbers to human readable abbreviations**:

import humanize
df['numbers'].apply(humanize.intword)

The result contains the converted values:

0               110
1     11.0 thousand
2       1.0 million
3      33.3 million
4    456.9 thousand
Name: numbers, dtype: object

The library supports multiple languages - about 25 like:

  • spanish
  • russian
  • french
  • portuguese

Library can be installed by:

pip install humanize

Step 3: Convert words to number - one to 1

Finally let's cover the case when we need to convert language numerics into numbers:

  • forty two -> 42
  • twelve hundred -> 12000
  • four hundred and sixty two -> 462

This time we will use Python library: numerize - which can be installed by:

pip install numerize

So the Pandas code to convert numbers is:

from numerizer import numerize
df['lang'].apply(numerize)

The output is:

0      1
1      5
2     12
3     42
4    105
Name: lang, dtype: object

Large Number Abbreviations

Below we can find a table of large number abbreviations which can be used to improve the mapping.

Abbreviation Name Value Equivalent
K Thousand (Kilo) 10^ 3 1000
M Million 10^ 6 1000K
B Billion 10^ 9 1000M
t trillion 10^ 12 1000B
q quadrillion 10^ 15 1000t
Q Quintillion 10^ 18 1000q
s sextillion 10^ 21 1000Q
S Septillion 10^ 24 1000s
o octillion 10^ 27 1000S
n nonillion 10^ 30 1000o

Conclusion

In this post, we saw how to convert human readable and language expressions to numbers. We covered the reverse case of conversion of large numbers to abbreviations.

It was shown how to map values to Python mathematical expressions and how to evaluate them. Two very useful Python libraries were used in Pandas for numeric conversion.