Monday, July 27, 2020

MACHINE LEARNING INTERN HIRING CHALLENGE by DOCKSHIP.IO

ALERT:  Sales Forecasting and EDA Challenge by Dockship.io

This challenge involves the task of Weekly Sales Forecasting and Exploratory Data Analytics (EDA).

Helmet Detection AI Challenge | Dockship.io

About the Challenge 

Jobs with THE CHALLENGE | Guardian Jobs

1. First Task is to provide Exploratory Data Analysis (EDA) for the given data in (.ipynb and it's corresponding .pdf)

  • A Sample Jupyter Notebook ( Chicago Crime Dataset Sample EDA.ipynb and its '.pdf' form ) has been given to provide a basic understanding of EDA, Chicago Crime Dataset is used for the example. It has nothing to do with the actual training data and only serves as an example to help the participants develop a good understanding of the task. The actual training data is given in a separate CSV file "train.csv".
2. The second task is to design a model to predict sales for the next week based on previous data observations.
  • The next 7 dates consisting of the week will be chosen based on the latest 'Order Date'.
  • Example: If the latest "Order Date" is 2018-06-20, the prediction date starts from 2018-06-21.
  • sample-output.csv has been given as an example of the final output file.

ALERT: DATASET DETAILS 

  • Superstore Sales Dataset is used for the challenge.
  • It consists of 18 attributes with "Sales" being the target attribute for prediction.

ALERT: JUDGEMENT RULES 

  1. Accuracy using RMSE error
  2. Exploratory Data Analysis (EDA) Report.

RULES:
  1. Submission must not include copyrighted code. If a violation is found, the submission will be rejected.
  2. The submission should be in a proper format as described by "Submission Guidelines".
  3. Late submission will not be accepted beyond the provided deadline (Indian Standard Time).



The final submission should include:
  1. Exploratory Data Analysis (EDA) notebook (.ipynb and corresponding .pdf) providing detailed analysis of the "train.csv" (Note: EDA is only required for training files and not the test files)
  2. Model Files (Python-based)
  3.  requirements.txt (providing details of modules required to run your submission)
  4. Forecasting Model Training Files (.ipynb)
  5.  Code Execution script (for prediction of weekly sales) (only .py.) (run.py)
ALERT: PRIZES 

Science Communication Prize - Biochemistry

  • The candidates will be invited for an Internship Interview based on their performance.
  • Certificates will be provided after successful submission of a solution to this challenge from dockship.io
For FAQs and any other queries go to https://dockship.io/challenges
JOINT THE GUIDING POINT COMMUNITY: https://t.me/internshipsforall


Labels: , ,

Sunday, July 26, 2020

Solve any Machine Learning Problem with this Approach!

Problem Solving the Easy Way with Machine Learning: An Approach









The following are the steps that approach with any machine learning deep learning for any data science problem.

 ALERT: Solve any machine learning problem with this approach!



 1. Set up your Python environment


An Absolute Beginner's Guide To Google Colaboratory | by digitaldina |  Medium

Well, you can use any language to work on machine learning and data science problems. But if you are using python, then set up your Google colab for Jupiter notebook before touching your data set.

For this purpose, use the conda or pip package installer.


2. Install the dependencies

Install the required packages and libraries like Numpy, Scipy,  Pandas, Tensorflow PyTorch, Keras, etc. It completely depends on your goal, that is what you actually want to do with the dataset. 

For instance, if you want to work the study of the data set with the use of graphs like Line plot, 3D plot, Histogram, Piechart, then include the MatPlotLib library. In the same way, each and every library and package is designed for a specific purpose. 

Python Libraries - Python Standard Library & List of Important Libraries -  DataFlair

Pandas- this library is used to analyze large data sets.

Numpy - To perform rudimentary numerical analysis

Beautiful Soup- for the purpose of web scraping and HTML parsing.

Scrapy- for the purpose of data mining, web scraping, and web crawling. We can even build web crawlers with scrapy.


3. Explore the data set

ALERT: Make sure that your data set is in .csv format.

Python Pandas Tutorial: A Complete Introduction for Beginners – LearnDataSci


This step involves checking for the number of Rows and number of Columns, print the first 5 rows, find the features, labels, or variables of the dataset, find which data type is present in your data set. Do this analysis task with the help of the Panda library. This is the most underrated task and undoubtedly the most important step in solving Data Science Problems.

ALERT: Quality data beats Fancier Algorithm

4. Data Cleaning and Processing
It is the most tedious task to ensure that your data is correct, useable, readable, and consistent. 

MachineX: Data Cleaning with NumPy and Pandas | by Shubham Goyal | Towards  Data Science

ALERT: Research has shown that data scientists use 80% of their time and data cleaning and data processing. 


Follow these steps for data cleaning

 First, we will 

  • Deal with missing values.
  •  Remove duplicate entries.
  • Remove unwanted entries which will not contribute to the output like in Analysis of the health of a person, their mobile number plays no role in it.
  • Fill with median values for Missing numerical data.
  •  Correct Data Types- Like age column cant have a negative value, no days in a month can't be more than 31.
  • Remove whitespaces and correct the typos.


5. Build a Machine Learning Model
 Now that we are done with Preprocessing, build a machine learning model, or neural network. You can either use already built models or develop your own ML model.

Machine Learning Models | Top 5 Amazing Models of Machine Learning

ALERT: we can even collect data using surveys. Fresh surveys can help you get the latest updated data sets. But always remember that you will be compromising with the output If you are using poor datasets or wrong information.


6. Feature Selection

A scalable saliency-based Feature selection method with instance level  information

Choose important labels or features to get better results. Like to determine the health of a person, factors like income, eating habits, social environment, which plays an important role. Selecting these features will help in building better models.

7. Split the data

Train/Test Split and Cross Validation in Python | by Adi Bronshtein |  Towards Data Science

Split the data set, for convenience allot 75% of the data for training and 25% off the data for testing.


8. Train and Test the data

Training and Test Sets: Splitting Data | Machine Learning Crash Course

feed 75% of the training data set to the model. After that test your testing dataset.

ALERT: The purpose is not to get 100% accuracy but to produce results that can be used for productive purposes.


JOINT THE GUIDING POINT COMMUNITY: https://t.me/internshipsforall

Labels: , ,