Getting Started with Machine Learning

Getting Started with Machine Learning

Heyy !!
This article will walk us through how we can make you the first Machine Learning model, and we will also cover the 7 steps involved starting from preparing data to training mode.

Here we will train a Machine learning model with libraries like pandas, seaborn, sklearn, numpy to predict how many medals a country can predict based on their past performance. So sounds exciting???
Let's get started

So what are those 7 steps, well here is an overview ->

  1. Form a Hypothesis

    So according to our model, we will say that we can predict how many medals a country is going to win in the Olympics, based on their past performance

  2. Find the data

    Now to do so we need data for many two things, the first one being training our model and the second being testing how accurately our model is able to predict the medals

    Our data is in CSV format and it contains columns like Country, Year, Athletes, Previous Medals, Current Medals

  3. Reshape the data

    Now we have a data set, we need to reshape it to make ML model predictions Here we are going to predict the final column, the medals column and we are using the athletes and the previous medals column to do that

    But why these two columns only ???

    Well if we see these two have the highest dependencies on the count of medals won by a country

    Factors like the Age of the Athlete do not have a major role to play as we can see from our data

  4. Clean the Data

    Now we have the data reshaped, we have to clean the Data, and cleaning the data means(that our data is ready to be used by our ML model), making sure that there are no null columns(or values unavailable) as we know most machine learning algo can't work with missing data.

  5. Error Metric,

    We have to evaluate the performance of our ML model so we need to have some kind of measure to let us know how close or off we are from our accurate answer Here the error matrix we are using is called mean absolute error (trust me it's more simple that what it sounds like)

    Error = Math.abs(Medals - predicted medals)

    So then we add up all the error values and divide them by the total number of predictions we made.

  6. Split the data

    As we want to train data on one part of the total data(around 80%) and we want to test the model on another part of the data(around 20%)

  7. Train the model

    Here we are using liner regressing Y = ax + B--> Uni variant or single variable liner regression y -> Medals last Olympics x -> medals

    So a liner regression model will draw a line between the data points on the graph and we can use it to make predictions

    For our model we will be using We are actually going to use two predictors Y = a1x1 + a2x2 + B, as we are considering factors like the number of athletes and the previous medals won by the country

So from our final Output, we can see that our model is more accurate in predicting medals for the country with more medals, however for a country with less medal count our model is not very accurate

I will also provide a link for the whole code file and the dataset used in this project.
Here is the git hub link for the repository -> link

Hope it adds some value : )