Project-5: Spam Detection
Aim of the Project
To predict, analyze and provide the counter measures to prevent the Spamming. Build a Model to predict the Spam Messages’ likely to be churned using Data Science, NLP and Machine Learning Applications.
Life Cycle of the Project
Extracted the data from Kaggle open source. Performed Data Exploration to understand the descriptive stats w.r.t. the data, in order to manipulate the data to build a robust ML model. Used Pandas, matplotlib & Numpy for Data Pre-processing, to decrease the redundancy, by taking care of the duplicates. Data consisted of 0% Missing Values. EDA& Visualized the Data using Seaborn & Matplotlib to understand the data || Used Un-Masking to find messages with max & min length of words. Used Box-Plot method to detect and handle the Outliers. Performed Text-Preprocessing using NLP-NLTK Lib. : Used Bag Of Word/ Tokenization / Count-Vectorization to analyze the text and make it ready for ML modelling. Trained the model using Sci-Kit Learn , using Multi-nominal NB Classifier.
- Created a Data pipeline to automate the ML Workflow.
- Created a model with 97% Accuracy
Results from the Project
Heroku App https://spam-check.herokuapp.com/
Check out the Detail Project Overview on GitHub Repository
Technologies Used | Python | Seaborn | Numpy | Pandas | Scikit Learn | Flask | Matplotlib | Numpy |