Supervised Machine Learning

Supervised learning


In this document, I will not go so much into explaining the concepts and different algorithms used in supervised learning but will try to explain it for novice understanding.

Supervised learning in one of the approaches one can use in machine learning. Some may say it is is the easier approach as compared to its counterparts like the unsupervised learning. Supervised Learning works basically on the principle of having training data where each instance has an input (a set of attributes) and a desired output (a target class). Then we use this data to train a model that will predict the same target class for new unseen instances. In short, Supervised learning occurs when the learning data contains the “right answers”.

There is an influx of supervised learning algorithms such as the simple Naïve Bayes and K-Nearest neighbors to advanced linear classifiers, such as Support Vector Machines (SVM). Some methods, such as decision trees, will allow us to visualize how important a feature is to discriminate between different target classes and have a human interpretation of the decision process but most or if not all of them work following the same basic principle mentioned above.

Regression is another of the methods that try to predict real-valued data. We could consider regression as classification with an infinite number of target classes. For example, predicting blood sugar level is a regression task, while predicting if somebody has diabetes or not is a classification task.

Most of these methods mentioned above are included in the python scikit Learn package which has made it easy for implementation.

Picking out one of the algorithms mentioned above, I will try to explain simply as possible the programming steps that show how we can get from training data to a prediction of new unseen instances or as many would call it testing data.

Below are the steps we go through to carry out predictions

  1. import important packages
    Python has a very big support for machine learning and with it’s rich libraries one can be able to harness its computational prowess. Numpy, Pandas and Scikit learn are just few but the most important packages one can use for a successful machine learning project.
  2. Read the training data file
    The pandas package is first put to work at this stage. We use it to read our training data that is most times in a comma separated file (.CSV) or spreadsheet format. The loaded file is organised and easy to read by the machine and further modifications are done here after.
  3. Replace all missing fields
    Working with missing data in fields could yield bad performance by the algorithm. So, these fields are replaced with average values or in extreme cases dropping the variables.
    This is called data munging and is done in pandas.
  4. Drop non-numeric fields;
    Statistical analytics work with number values and not string values, removing the fields that bear non-numeric values will avoid the likelihood of errors. I personally never saw a mathematical formula that took a non-numeric value and produced a numeric result. These fields are excluded from the statistical analysis.
    Note: Even the expected classes are represented with numeric values.
  5. Define features (X) and labels (Y)
    The features or attributes are all the numeric fields except the class field. These are plotted in multi-dimensional space to come up with the right classifications. This part of the data is the backbone of the analytics. Without it we would have no basis of classification.
    The labels or classes on the other hand are the desired output or target.
  6. Define train and test data
    Train data is the subset of the original sample, represented by the the attributes selected and their respective target values. We always want our training data to be a representative sample of the population they represent. The test data is one used to show how the algorithm behaves
    To get train and test data from the dataset, we shuffle data using cross validation: Cross-validation allows us to avoid this particular case, reducing result variance and producing a more realistic score for our models.
  7. Define classifier
    This is the stage where the machine learning algorithm is called. The scikit learn package has a number of this algorithms that carryout both regression and classification functions. As mentioned above these algorithms include the K-nearest neighbors and SVM.
    With the classifier algorithm defined, we can fit the model to the train data to do the learning job. Score is synonymous to test data as fit is to train data, we could select the accuracy measurement by passing any scorer function as an argument.
  8. Make Predictions
    This is the point where we bring real world data to the algorithm to make predictions.

Evaluating our results

The final step in every supervised learning task should be to evaluate our best classifier on the previously unseen data, to get an idea of its prediction performance. How right are the results of our classifiers? One would ask. We calculate the accuracy simply as the proportion of times our method correctly predicted the class of the left-out instance.


Machine Learning at Rainbow

With the brief preamble above on supervised machine learning, Identify anomalies on our system and components that might be subject to failure, is made easy.

The opportunities for machine learning in payments are almost limitless. Some examples are listed below.

  • Transaction risk management. Use machine learning supervised learning algorithms to identify risk of payment transactions.
  • Merchant risk analysis. Use machine learning to assess acquirer risk in signing a new merchant or managing risk of on-going merchant relationships.
  • Optimizing user’s web experience. Recommending sites for customer check-out.
  • Customer classification. Use machine learning supervised learning to group customers based on a set of customer characteristics.


Comments

Popular posts from this blog

INTRODUCTION TO MACHINE LEARNING (ML)

Future of Payment ..rather Web3

Building on the Blockchain by Nyesigye