Interpolation – Power of Interpolation in Python to fill Missing Values
This article was published as a part of the Data Science Blogathon
Introduction
Interpolation is a technique in Python used to estimate unknown data points between two known data points. Interpolation is mostly used to impute missing values in the dataframe or series while preprocessing data.
Interpolation is also used in Image Processing when expanding an image you can estimate the pixel value with help of neighboring pixels.
Table of Contents
 when to use Interpolation?
 Interpolation to fill missing values in series data
 Linear Interpolation
 Polynomial Interpolation
 Interpolation through Padding
 Interpolation to fill missing values in DataFrame
 Linear Method
 Backward Direction
 Interpolation through Padding
 Filling Missing Values in TimeSeries Data
 EndNote
When to use Interpolation
we can use Interpolation to find missing value with help of its neighbors. When imputing missing values with average does not fit best, we have to move to a different technique and the technique most people find is Interpolation.
Interpolation is mostly used while working with timeseries data because in timeseries data we like to fill missing values with previous one or two values. for example, suppose temperature, now we would always prefer to fill today’s temperature with the mean of the last 2 days, not with the mean of the month. We can also use Interpolation for calculating the moving averages.
Using Interpolation to fill Missing Values in Series Data
Pandas series is a onedimensional array which is capable to store elements of various data types like list. We can easily create series with help of a list, tuple, or dictionary. To perform all Interpolation methods we will create a pandas series with some NaN values and try to fill missing values with different methods of Interpolation.
import pandas as pd import numpy as np a = pd.Series([0, 1, np.nan, 3, 4, 5, 7])
1) Linear Interpolation
Linear Interpolation simply means to estimate a missing value by connecting dots in a straight line in increasing order. In short, It estimates the unknown value in the same increasing order from previous values. The default method used by Interpolation is Linear so while applying it we did not need to specify it.
a.interpolate()
The output you can observe as

Hence, Linear interpolation works in the same order. Remember that it does not interpret using the index, it interprets values by connecting points in a straight line.
2) Polynomial Interpolation
In Polynomial Interpolation you need to specify an order. It means that polynomial interpolation is filling missing values with the lowest possible degree that passes through available data points. The polynomial Interpolation curve is like the trigonometric sin curve or assumes it like a parabola shape.
a.interpolate(method="polynomial", order=2)
If you pass an order as 1 then the output will similar to linear because the polynomial of order 1 is linear.
3) Interpolation through Padding
Interpolation with help of padding simply means filling missing values with the same value present above them in the dataset. If the missing value is in the first row then this method will not work. While using this technique you also need to specify the limit which means how many NaN values to fill.
So, if you are working on a realworld project and want to fill missing values with previous values you have to specify the limit as to the number of rows in the dataset.
a.interpolate(method="pad", limit=2)
You will see the output coming as below.

The missing value is replaced by the same value as present before to it.
Using Interpolation to fill Missing Values in Pandas DataFrame
DataFrame is a widely used python data structure that stores the data in form of rows and columns. When performing data analysis we always store the data in a table which is known as a dataframe. Dataframe can contain huge missing values in many columns so let us understand how we can use Interpolation to fill missing values in the dataframe.
import pandas as pd # Creating the dataframe df = pd.DataFrame({"A":[12, 4, 7, None, 2], "B":[None, 3, 57, 3, None], "C":[20, 16, None, 3, 8], "D":[14, 3, None, None, 6]})
1) Linear Interpolation in forwarding Direction
The linear method ignores the index and treats missing values as equally spaced and finds the best point to fit the missing value after previous points. If the missing value is at first index then it will leave it as Nan. let’s apply it to our dataframe.
df.interpolate(method ='linear', limit_direction ='forward')
the output you can observe in the below figure.
If you only want to perform interpolation in the single column then it is also simple and follows the below code.
df['C'].interpolate(method="linear")
2) Linear Interpolation in Backward Direction
Now, the method is the same, only the order in which we want to perform changes. Now the method will work from the end of the dataframe or understand it as a bottom to top approach.
df.interpolate(method ='linear', limit_direction ='backward')
You will get the same output as in the below figure.
3) Interpolation with Padding
We have already seen that to use padding we have to specify the limit of NaN values to be filled. we have a maximum of 2 NaN values in the dataframe so our limit will be 2.
df.interpolate(method="pad", limit=2)
After running the above code, it will fill missing values with previous present values and gives the output as shown in the figure below.
Filling Missing Values in TimeSeries Data
Timeseries data is data that follows some special trend or seasonality. Analyzing Time series data is a little bit different than normal data frames. Whenever we have timeseries data, Then to deal with missing values we cannot use mean imputation techniques. Interpolation is a powerful method to fill missing values in timeseries data.
df = pd.DataFrame({'Date': pd.date_range(start='20210701', periods=10, freq='H'), 'Value':range(10)}) df.loc[2:3, 'Value'] = np.nan
Filling missing values in forwarding and backward method
The simplest method to fill values using interpolate is the same as we apply on a column of dataframe.
df['value'].interpolate(method="linear")
But the method is not used when we have a date column because we will fill missing values according to date which makes sense while filling missing values in time series data.
df.set_index('Date')['Value'].interpolate(method="linear")
The same code with a few modifications can be used as a backfill to fill missing values in the backward direction.
df.set_index('Date')['Value'].fillna(method="backfill", axis=None)
EndNote
We have learned various methods to use interpolate function in Python to fill missing values in series as well as in Dataframe. Interpolation in most cases supposed to be the best technique to fill missing values. I hope you got to know the power of interpolation and understand how to use it. If you have any kind of query using interpolate function please put it down in the comment section, I will be happier to help you out.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.