Cleaning
There are features like the name of the passenger and the cabin which cannot be used for analysis directly.
There are features like the name of the passenger and the cabin which cannot be used for analysis directly.
The dataframe was next preprocessed.
The data was first cleaned and preprocessed to handle missing values, categorical features, outliers, class imbalance and redundant features.
Now that we have cleaned the data into an organized format, we can proceed with preprocessing, i.e., imputing missing values, encoding categorical features and scaling the data if required.
The first step is to extract all the text from a document and preprocess it into a format suitable to be analyzed.
There are several steps followed for preprocessing the data.