Data Cleaning in Python with steps - Part 2

 Data Cleaning in Python with Steps- Part 2


So, in the last part, we saw how to find the null values, and how to fill them. 

In this blog, we will see how to fill the null values, we will focus on the duplicates, and more importantly how to deal with outliers!!


The next step, How to fill the null values.

In the last series, we saw that we can fill with the median, mode, and mean also.


1. We can fill with the above row and below the row

If you want to fill the null values with the above row you can use the ffill method.



 

Here ffill is the forward fill.

We can also fill it with the below row.

So, for that, we can use bfill method.



 


2. We can also do with using interpolate() method:

Interpolate method



The interpolate() method is used to fill null values in a pandas Data frame or series by using a linear interpolation technique. It works by using the values of the surrounding data points to estimate the value of the missing data point.

The method looks at the missing value and the values that surround it then uses linear interpolation to calculate an estimate for the missing value based on the values of the surrounding points. The interpolation method can be set to "linear", "quadratic", "cubic", and others, depending on the degree of complexity required for the interpolation. 
So, the next step we treat the outliers

So, there are so many ways that we can know the outliers.


There are many ways to know the outlier, using a boxplot we can know the outlier.

So, there are many ways to handle the outlier, but here I am using the z-score method to handle the outliers.





Here, I used the z-score method to handle the outlier.

Thanks For Watching💜

If You guys have any doubts feel free to contact me

And Within a time, I will come up with  new content, So Stay tuned
Read the blogs, 
Comments💭
Like 💪
And 
Subscribe💓

Comments

  1. Thank you so much to helping me out in this situation

    ReplyDelete

Post a Comment

Popular Posts