Basic Statistics & Regression for Machine Learning in Python Coupon


In the coming sessions, we will have a brief introduction about polynomial linear regression and the visualization of the modified dataset with x and y values. Using python we will then find the polynomial regression co-efficient value, the r2 value and also we will do future value prediction using python numpy library.

Then we will repeat the same using the plain old manual calculation method. At first we have to manually find the Standard Deviation components. Then later we will substitute these SD components in the equation to find a, b and c values. using these a,b,c values we will then find the final polynomial regression equation. This equation will enable us to do a manual prediction for future values.

And after that, here comes the Multiple regression. Here in this regression we can consider multiple number of independent x variables and one independent y variable. We will have an introduction about this type of regression. We will make necessary changes to our dataset to match the multiple regression requirement.

Since our dataset is getting more complex by the introduction of multiple independent variable columns, it may not be able to be managed by using a plain array for the dataset. We will use a csv or comma separated values file to save the dataset. We will have an exercise to read data from a csv file and save the data in corresponding data-frames. Once we have the data imported to our python program, we will do a visualization using a new library called seaborn which is a derivative of the scikitlearn library.

Using the python numpy and scikitlearn library, multiple regression can be done very easily. Just use the method and pass in the required parameters. Rest will be done by the python library itself. We will find the regression object and then using that we can do prediction for future values.

But with manual calculation, things will start getting complex. Its a lengthy calculation which needs to be done in multiple steps. In the first step we will have an introduction about the equations that we are going to use in the manual method and also we will find the mean values. Then in the second step we will find the components that are required to find the a,b and c values. Then in the third step, we will find the a,b and c values. And in the final step, using a,b,c values we will find the multiple regression equation and using this equation we will do future value prediction of our dataset. We will also try to get the value of the co-efficient of regression.

That’s all about the popular regression methods that are included in the course. Now we can go ahead with a very important topic in data preparation for machine learning. Many machine learning algorithms love to have input values which are scaled to a standard range. We will learn a technique called data normalization or standardization in which all the different ranges of data values will be scaled down to fit within a range of 0 to 1. This will improve the performance of the algorithms very much compared to a non scaled dataset.

Recommended: Power BI course