This interactive simulation demonstrates why we use the midpoint of each group when estimating the mean from grouped data.
When data is grouped, we don't know the exact values within each group. Using the midpoint assumes that data points are evenly distributed within each group.
This is a reasonable assumption because:
Try different distributions and group sizes to see how the accuracy changes!
When we have grouped data, we estimate the mean using:
Where:
This works because the midpoint represents the "average" position of all data points within that group, minimising the overall estimation error.