Sunday, July 26, 2020

Solve any Machine Learning Problem with this Approach!

Problem Solving the Easy Way with Machine Learning: An Approach









The following are the steps that approach with any machine learning deep learning for any data science problem.

 ALERT: Solve any machine learning problem with this approach!



 1. Set up your Python environment


An Absolute Beginner's Guide To Google Colaboratory | by digitaldina |  Medium

Well, you can use any language to work on machine learning and data science problems. But if you are using python, then set up your Google colab for Jupiter notebook before touching your data set.

For this purpose, use the conda or pip package installer.


2. Install the dependencies

Install the required packages and libraries like Numpy, Scipy,  Pandas, Tensorflow PyTorch, Keras, etc. It completely depends on your goal, that is what you actually want to do with the dataset. 

For instance, if you want to work the study of the data set with the use of graphs like Line plot, 3D plot, Histogram, Piechart, then include the MatPlotLib library. In the same way, each and every library and package is designed for a specific purpose. 

Python Libraries - Python Standard Library & List of Important Libraries -  DataFlair

Pandas- this library is used to analyze large data sets.

Numpy - To perform rudimentary numerical analysis

Beautiful Soup- for the purpose of web scraping and HTML parsing.

Scrapy- for the purpose of data mining, web scraping, and web crawling. We can even build web crawlers with scrapy.


3. Explore the data set

ALERT: Make sure that your data set is in .csv format.

Python Pandas Tutorial: A Complete Introduction for Beginners – LearnDataSci


This step involves checking for the number of Rows and number of Columns, print the first 5 rows, find the features, labels, or variables of the dataset, find which data type is present in your data set. Do this analysis task with the help of the Panda library. This is the most underrated task and undoubtedly the most important step in solving Data Science Problems.

ALERT: Quality data beats Fancier Algorithm

4. Data Cleaning and Processing
It is the most tedious task to ensure that your data is correct, useable, readable, and consistent. 

MachineX: Data Cleaning with NumPy and Pandas | by Shubham Goyal | Towards  Data Science

ALERT: Research has shown that data scientists use 80% of their time and data cleaning and data processing. 


Follow these steps for data cleaning

 First, we will 

  • Deal with missing values.
  •  Remove duplicate entries.
  • Remove unwanted entries which will not contribute to the output like in Analysis of the health of a person, their mobile number plays no role in it.
  • Fill with median values for Missing numerical data.
  •  Correct Data Types- Like age column cant have a negative value, no days in a month can't be more than 31.
  • Remove whitespaces and correct the typos.


5. Build a Machine Learning Model
 Now that we are done with Preprocessing, build a machine learning model, or neural network. You can either use already built models or develop your own ML model.

Machine Learning Models | Top 5 Amazing Models of Machine Learning

ALERT: we can even collect data using surveys. Fresh surveys can help you get the latest updated data sets. But always remember that you will be compromising with the output If you are using poor datasets or wrong information.


6. Feature Selection

A scalable saliency-based Feature selection method with instance level  information

Choose important labels or features to get better results. Like to determine the health of a person, factors like income, eating habits, social environment, which plays an important role. Selecting these features will help in building better models.

7. Split the data

Train/Test Split and Cross Validation in Python | by Adi Bronshtein |  Towards Data Science

Split the data set, for convenience allot 75% of the data for training and 25% off the data for testing.


8. Train and Test the data

Training and Test Sets: Splitting Data | Machine Learning Crash Course

feed 75% of the training data set to the model. After that test your testing dataset.

ALERT: The purpose is not to get 100% accuracy but to produce results that can be used for productive purposes.


JOINT THE GUIDING POINT COMMUNITY: https://t.me/internshipsforall

Labels: , ,

1 Comments:

At July 26, 2020 at 10:26 AM , Blogger Critic said...

This blog is written by a beginner! I’ll be very surprised if this was written by a person who has made more than 1 model. Cheap!

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home