Getting Started With Machine Learning: The 7 step Process

As a teenager most of us had been itching to get behind the wheel and drive a car. Imagine if you were told first you need to spend hours learning about internal combustion engine, how the catalytic converter works, the difference between different carbon ratings for gas or even how to fix a flat tyre. If that was the case, many people would never have taken up driving.

Some of the present machine learning courses are also like that. But our belief at Machine Learning Academy is different. We first want the enthusiast to see the joy of Machine Learning at work. We encourage, in our hands-on machine learning classes, to use tools like scikit-learn and solve machine learning problems.

The completion rate for the first class is almost 100%. Then we take the students through a seven step process where they immerse themselves in the art of machine learning.

#1 Preparing yourself

The toughest step, of course, involves the initial push. To avoid any nasty surprises or burnouts midway, it is best to prepare yourself before diving into machine learning, know what you’re getting into, and understand the expectations the learning process will have from you, and vice versa:

  • This will involve charting out steps and estimating the effort and time that may be required from you. You should be mentally prepared to put in a focused amount of time if you’re seeking positive and worthwhile results.

  • Understand that machine learning is not as hard as it sounds. Tangible results will not be visible immediately. A logical and bottom-up approach is the key to deal with the complexity of the subject.

#2 The Basics

Machine learning is based on two important pillars - programming and statistics. Concepts of linear algebra, numerical analysis and probability are often leveraged throughout most machine learning problems. It is therefore essential to have a solid understanding of these topics.

While most programming languages do entertain the capabilities of machine learning, there are two languages which are used on account of their flexibility - Python and R:

  • Python contains special libraries for machine learning which come in handy for statistical and bulky calculations. Its syntax and compatibility with machine learning algorithms help it become an easy choice of programming language for many.

#3 Exploring Python

With Python being the most popular choice of language for machine learning, you’re going to have to understand it in depth to make progress. Its extensive library support and easy code readability make it a favourite among all coders. There are a handful of libraries you may want to play around with to understand machine learning better.

  • Numpy is Python’s very own open source version of Matlab. Numpy allows matrix manipulations, sorting, differential math calculations, basic linear algebra, statistical operations and a lot more.

  • Pandas: Working on time series and statistical tables became exponentially easy after Pandas came into the picture.

  • Matplotlib assists in plotting graphs of any dimension. It is a graphical extension of Numpy.

  • Scikit-learn is Python’s machine learning library. It contains all the clustering, regression and classification algorithms, reducing code size by a factor of 10.

  • NLTK is the natural language toolkit handles non-numerical data and can process data to parse, tokenize, wrap and carry out many other operations.

  • Tensorflow: Created by Google, this machine learning framework is used to create deep learning models rapidly.

Your learning becomes a lot easier if you are using the correct environment. The Jupyter notebook, an interactive Python notebook lets you interact with your code at every step. Plus it allows for easy identification of errors and results.

Essential python packages. Image source: Stanford Research

#4 Understanding ML algorithms and models

While many algorithms are put into practice for machine learning to be effective, they are typically based on, or derived from familiar concepts. Regression and classification are two concepts used regularly:

  • Categorising an observation under a particular category is termed as Classification.

  • An attempt to identify the class of the observation is called Prediction.

Another pair of important concepts is supervised and unsupervised learning:

  • Supervised learning involves one half of your data being trained and the next half being tested on.

  • Unsupervised training involves no prior training. This sort of training is primarily performed during pattern identification.

Some widely used machine learning algorithms to start with are Random Forests, Decision tree, Clustering, Support Vector Machines, K-Means algorithm and Logistic Regression.

#5 Picking datasets for practice

The theory is essential of course, but you’re not going to go too far without some real hands-on experience. There are a lot of datasets openly available for practising machine learning problems. The University of California, Irvine, has a Machine Learning Repository. It has more than 400 maintained repositories that are openly available to the public. For absolute beginners, there are a few datasets that are tailor-made for first-time users:

  • Wine quality dataset: In this dataset, the properties of red and white wine samples are provided. The objective is to model the wine quality based on the given tests.

  • Titanic Dataset: This dataset contains information about all the people who were on board and lost their lives. The objective is to find out what sort of people are more likely to survive.

  • Credit card default: the dataset consists of demographics, payment history, credit and default data. The objective is to predict the credit card default.

#6 Working on problem sets

Once you are familiar with the basics and theoretical aspects of machine learning, move to working on real-life problems. Kaggle is a platform that brings together many aspiring and practising data scientists. It offers a wide variety of problem sets depending on your domain of interest, and assists in interacting with experienced folks who can broaden your perception of problems, or help out with a different approach.

#7 Building a career plan

Regularly solving problems will build your profile and confidence. With time, you can start contributing to open source machine learning projects such as TensorFlow and GoLearn.  To build a career in machine learning, you’ll need to have some idea about the sectors you plan to forge ahead in. Like we said earlier, machine learning is penetrating nearly all industries, so some specificity should prove useful in identifying your direction.