Types of data:
There are two types of data.
- Qualitative data
- Quantitative data
Qualitative data(categorical data):
- Nominal data: Nominal attributes are those qualitative attributes which there is no any natural order that the attributes can take.
- Ordinal data: Ordinal attributes are those qualitative attributes which there is a natural order that the attributes can take.
Quantitative data:
- Discrete data: Discrete attributes are those quantitative attributes which can take on only finite number of numerical values(Integers).
- Continuous data: Continuous attributes are those quantitative attributes which can take on fractional values(Real numbers).
How to describe Qualitative data?
- How many red color shirts in the catalogue?
- How many times does LWB appear?
- How many kharif crops are there in the data?
Here, the horizontal axis is the categorial attribute and its frequency is mapped along the Y axis. The height of the bar is proportionate to the the count.
Frequency plots- Long tailed distribution:
As shown in the above graph, a long tailed distribution has,- A large number of tall bars at the beginning.
- A large number of short bars at the end.
- Very common in real world scenarios.
Frequency plots - Uniform distribution:
Relative frequency tables:
Grouped frequency plots:
Till now, we had one data set and were plotting the frequencies for that set. Suppose if I want to compare two data sets. If I have farming data for two or three consecutive years and I want to find if the farming pattern has changed over the years, I go for grouped frequency plots as shown below.How to describe Qualitative data?
- What would be the histogram of sachins score would look like?
- Where would the tallest bar be?
- Would there be some regions along the x- axis which would have a bar height of 0?
Surprisingly the tall bars are at the beginning indicating he has scored more low scores. There are blank regions along the x-axis.
- Too many unique values along the x- axis.
- How many times was he dismissed in his 90s or single digits? Its difficult to answer these questions. And we are not keen about how many times he scored 40?
How to choose the right bin size?
Relative frequency histograms:
Frequency Polygons:
- We individually have to check different charts for each dataset which is cumbersome.
- We can overlap charts but its hard to distinguish between individual histograms.
- We can draw grouped bar charts but we will lose the overall trend.
Frequency polygons for continuous data:
- These are similar to histograms but has more details in the bars. With the first digit of the value as the stem and remaining digits as leaves.
- As the number gets larger, we take two, three digits as stems. This is more like choosing the Bin size.
- It is not useful if the dataset is large and it is efficient for small data sets as its easy to see the patterns.
- It is good to compare two datasets side by side.
- Can be done for continuous data by rounding the number.














