Research Snappy
  • Market Research Forum
  • Investment Research
  • Consumer Research
  • More
    • Advertising Research
    • Healthcare Research
    • Data Analysis
    • Top Companies
    • Latest News
No Result
View All Result
Research Snappy
No Result
View All Result

Anscombe’s Quartet And Importance of Data Visualization

researchsnappy by researchsnappy
January 10, 2021
in Healthcare Research
0
Anscombe’s Quartet And Importance of Data Visualization
400
SHARES
2.4k
VIEWS
Share on FacebookShare on Twitter

Author profile picture

@imsparshSparsh Gupta

Technologist. Programmer. Musician. Explorer.

Anscombe’s quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed.
— Wikipedia

Anscombe’s Quartet can be defined as a group of four data sets which are nearly identical in simple descriptive statistics, but there are some peculiarities in the dataset that fools the regression model if built. They have very different distributions and appear differently when plotted on scatter plots.

It was constructed in 1973 by statistician Francis Anscombe to illustrate the importance of plotting the graphs before analyzing and model building, and the effect of other observations on statistical properties.

There are these four data set plots which have nearly same statistical observations, which provides same statistical information that involves variance, and mean of all x,y points in all four datasets.

This tells us about the importance of visualising the data before applying various algorithms out there to build models out of them which suggests that the data features must be plotted in order to see the distribution of the samples that can help you identify the various anomalies present in the data like outliers, diversity of the data, linear separability of the data, etc.

Also, the Linear Regression can be only be considered a fit for the data with linear relationships and is incapable of handling any other kind of datasets.

These four plots can be defined as follows:

The statistical information for all these four datasets are approximately similar and can be computed as follows:

When these models are plotted on a scatter plot, all datasets generates a different kind of plot that is not interpretable by any regression algorithm which is fooled by these peculiarities and can be seen as follows:

The four datasets can be described as:

  • Dataset 1: this fits the linear regression model pretty well.
  • Dataset 2: this could not fit linear regression model on the data quite well as the data is non-linear.
  • Dataset 3: shows the outliers involved in the dataset which cannot be handled by linear regression model.
  • Dataset 4: shows the outliers involved in the dataset which cannot be handled by linear regression model.

Conclusion

We have described the four datasets that were intentionally created to describe the importance of data visualisation and how any regression algorithm can be fooled by the same. Hence, all the important features in the dataset must be visualised before implementing any machine learning algorithm on them which will help to make a good fit model.

Thanks for reading. You can find my other Machine Learning related posts here.

I hope this post has been useful. I appreciate feedback and constructive criticism. If you want to talk about this article or other related topics, you can drop me a text here or at LinkedIn.

Previously published under a paywall.

Related

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.

Previous Post

Broadcom Inc. (AVGO) Is Up 1.48% in One Week: What You Should Know

Next Post

Global Friedreich Ataxia Drug Market Report 2021 – Covering Impact of COVID-19, Financial Information, Developments, SWOT Analysis by Global Top Companies

Next Post
Global Friedreich Ataxia Drug Market Report 2021 – Covering Impact of COVID-19, Financial Information, Developments, SWOT Analysis by Global Top Companies

Global Friedreich Ataxia Drug Market Report 2021 – Covering Impact of COVID-19, Financial Information, Developments, SWOT Analysis by Global Top Companies

Research Snappy

Category

  • Advertising Research
  • Consumer Research
  • Data Analysis
  • Healthcare Research
  • Investment Research
  • News
  • Top Company News

HPIN International Financial Platform Becomes a New Benchmark for India’s Digital Economy

Top 10 Market Research Companies in the world

3 Best Market Research Certifications in High Demand

  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA
  • Contact Us

© 2025 researchsnappy.com

No Result
View All Result
  • Market Research Forum
  • Investment Research
  • Consumer Research
  • More
    • Advertising Research
    • Healthcare Research
    • Data Analysis
    • Top Companies
    • Latest News

© 2025 researchsnappy.com