Research Snappy
  • Market Research Forum
  • Investment Research
  • Consumer Research
  • More
    • Advertising Research
    • Healthcare Research
    • Data Analysis
    • Top Companies
    • Latest News
No Result
View All Result
Research Snappy
No Result
View All Result

An explanation of machine learning models even you could understand

researchsnappy by researchsnappy
May 13, 2020
in Healthcare Research
0
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter

Supervised machine learning models

Logistic Regression

Logistic regression is used when you have a classification problem. This means that your target variable (a.k.a. the variable you are interested in predicting) is made up of categories. These categories could be yes/no, or something like a number between 1 and 10 representing customer satisfaction.

The logistic regression model uses an equation to create a curve with your data and then uses this curve to predict the outcome of a new observation.

Illustration of Logistic Regression

In the graphic above, the new observation would get a prediction of 0 because it falls on the left side of the curve. If you look at the data this curve is based on, it makes sense because, in the “predict a value of 0” region of the graph, the majority of the data points have a y-value of 0.

Linear Regression

Linear regression is often one of the first machine learning models that people learn. This is because its algorithm (i.e. the equation behind the scenes) is relatively easy to understand when using just one x-variable — it is just making a best-fit line, a concept taught in elementary school. This best-fit line is then used to make predictions about new data points (see illustration).

Illustration of Linear Regression

Linear Regression is similar to logistic regression, but it is used when your target variable is continuous, which means it can take on essentially any numerical value. In fact, any model with a continuous target variable can be categorized as “regression.” An example of a continuous variable would be the selling price of a house.

Linear regression is also very interpretable. The model equation contains coefficients for each variable, and these coefficients indicate how much the target variable changes for each small change in the independent variable (the x-variable). With the house prices example, this means that you could look at your regression equation and say something like “oh, this tells me that for every increase in 1ft² of house size (the x-variable), the selling price (the target variable) increases by $25.”

K Nearest Neighbors (KNN)

This model can be used for either classification or regression! The name “K Nearest Neighbors” is not intended to be confusing. The model first plots out all of the data. The “K” part of the title refers to the number of closest neighboring data points that the model looks at to determine what the prediction value should be (see illustration below). You, as the future data scientist, get to choose K and you can play around with the values to see which one gives the best predictions.

Illustration of K Nearest Neighbors

All of the data points that are in the K=__ circle get a “vote” on what the target variable value should be for this new data point. Whichever value receives the most votes is the value that KNN predicts for the new data point. In the illustration above, 2 of the nearest neighbors are class 1, while 1 of the neighbors is class 2. Thus, the model would predict class 1 for this data point. If the model is trying to predict a numerical value instead of a category, then all of the “votes” are numerical values that are averaged to get a prediction.

Support Vector Machines (SVMs)

Support Vector Machines work by establishing a boundary between data points, where the majority of one class falls on one side of the boundary (a.k.a. line in the 2D case) and the majority of the other class falls on the other side.

Illustration of Support Vector Machines

The way it works is the machine seeks to find the boundary with the largest margin. The margin is defined as the distance between the nearest point of each class and the boundary (see illustration). New data points are then plotted and put into a class depending on which side of the boundary they fall on.

My explanation of this model is for the classification case, but you can also use SVMs for regression!

Decision trees & random forests

I already explained these in a previous article — check it out here (decision trees and random forests are near the end).

Unsupervised machine learning models

Previous Post

CommVault Systems (NASDAQ:CVLT) Price Target Cut to $45.00

Next Post

Minnesota has a new drug price transparency law. Here’s what it does

Next Post

Minnesota has a new drug price transparency law. Here's what it does

Research Snappy

Category

  • Advertising Research
  • Consumer Research
  • Data Analysis
  • Healthcare Research
  • Investment Research
  • News
  • Top Company News

HPIN International Financial Platform Becomes a New Benchmark for India’s Digital Economy

Top 10 Market Research Companies in the world

3 Best Market Research Certifications in High Demand

  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA
  • Contact Us

© 2025 researchsnappy.com

No Result
View All Result
  • Market Research Forum
  • Investment Research
  • Consumer Research
  • More
    • Advertising Research
    • Healthcare Research
    • Data Analysis
    • Top Companies
    • Latest News

© 2025 researchsnappy.com