Lambda Function with Pandas

In the previous post, the basic of lambda function is introduced. In this post, the author introduces the practical situation, i.e., lambda functions × Pandas.

What is Pandas?

Pandas, a library for data structures, is known as one of the essential libraries for data analyses, such as NumPy, SciPy, and Scikit-learn. Pandas is designed to treat Excel easily so that we can use a table data flexibly.

Pandas can treat files in various formats. Besides, Pandas has rich methods. The above features make it possible to perform data analysis against table data efficiently. If you look in a data science competition(e.g. Kaggle), you can understand that Pandas is an essential tool for data scientists.

Lambda Function × Pandas

Pandas is used for table data analyses. So, there might be a situation that you would like to apply the same manipulate to each element of sequence data(e.g. one column of table data).

That’s exactly where the combination of Pandas and lambda functions comes into play.

Ex. Categorize the Age Group

We first prepare the age-group list, 18, 50, 28, 78, and 33. Second, we convert the list “age_list” into Pandas DataFrame with the column name “Age”.

import pandas as pd
age_list = [18, 50, 28, 78, 33]
age_list = pd.DataFrame(age_list, columns=["Age"])
print(age_list)

>>    Age
>> 0   18
>> 1   50
>> 2   28
>> 3   78
>> 4   33

Next, we categorize each element of the column “age_list[“Age”]”. Note here, you must predefine the function for classification.

Here, we prepare the function to categorize ages into the group of “unknown”, “Under 20”, “20-40”, “41-60”, and “Over 60”. Note that “unknown” is for mistake inputs such as minus ages.

def categorize_age(x):
  x = int(x)
  if x < 0:
    x = "unknown"
  elif x < 20:
    x = "Under 20"
  elif x <= 40:
    x = "20-40"
  elif x <= 60:
    x = "41-60"
  else:
    x = "Over 60"
  return x

Then, let’s apply the above function “categorize_age()” to each element of the column “age_list[“Age”]”. As a result, we can see that the result is assigned to the newly generated column “Generation”.

Note that, to apply, we use a “apply()” method and a “lambda function“.

Syntax: DataFrame[column].apply( lambda x: function(x) )

age_list["Generation"] = age_list["Age"].apply( lambda x: categorize_age(x) )
print(age_list)

>>    Age Generation
>> 0   18   Under 20
>> 1   50      41-60
>> 2   28      20-40
>> 3   78    Over 60
>> 4   33      20-40

Summary

In this article, we have seen that a lambda function becomes a powerful tool when it is used with Pandas. When analyzing table data, it will be needed to apply arbitrary processing to each element of a column or a row of Pandas DataFrame.

It is such a time to use a lambda function!