Data Cleaning in Python with Steps - Part 1

 Data Cleaning With Steps in Python


What is Data?

Data is the heart of each and every company. Data that has been converted to information that is very efficient for processing. So, it is very important to have data to be cleared and neater.


However, Data Cleaning is a very tedious task.



Data Cleaning is an essential step in the data-science process, as real-world data is messier and requires preparation before analysis.

This process involves a lot of things such as finding the missing values, filling these missing values, removing the duplication, finding the outliers, and so on.

In this blog, we will learn how to perform the data cleaning process in Python.

Python provides several libraries that make data cleaning easier and faster.

Step 1: Load the data.

Here, we are using the Netflix dataset, I will provide the link in the description below👇

We are using Pandas Library to load the dataset. This can be done using the read_csv function in Pandas, which reads a CSV file into a data frame.

 





Step 2:Missing Values

    One of the most common problems in real-world data is missing values. 
    
    Python provides several libraries to handle the missing data and they also fill the data using the front row or back row or fill it with a default value.

Finding the null values of this dataset


 




Drop the missing values

 

filling the values:

 
You can fill the missing values with the above row and below row, even you can fill with the mean, median, and even mode also.

Stay tuned for the next part where we handle the duplicates and we can learn mode methods to fill the data, also we can learn how to treat outliers.

Thank you for reading!!!!!


Comments

Popular Posts