Statistical estimates can provide interesting insights about a population sample. The article explains how it’s possible to use samples to estimate the entire size of a population. In this case, the example used to illustrate this principle is the estimation of the total number of lottery tickets, thereby calculating the probability of winning.
The author describes a situation where they attend a state fair and buy some lottery tickets. The numbers printed on the tickets are assumed to be consecutive, forming a discrete uniform distribution. The series of ticket numbers are then used to infer the total number of tickets available. The lowest estimation is derived from the highest number of the ticket bought, however, this does not account for potential tickets with higher numbers.
The article introduces an additional method of estimation; calculating the intervals or “gaps” between the purchased ticket numbers. These gaps are expected to be approximately equal to the gap between the highest numbered ticket and the maximum number. The formula calculated from this method is shared, and the outcomes from different estimation methods are compared through a simulation.
Interestingly, in the simulation, both estimation approaches averaged to the actual highest number (2000 in this case). However, the estimation method based on average gaps demonstrated less variance, and thus provided a more precise estimation. The information from these methods could then be used to estimate the likelihood of winning the lottery.
This statistical approach has real-world applications beyond lottery chances, such as the German tank problem from World War II, where Allies estimated tank production based on serial numbers. Other examples include estimating the volume of products made (through serial numbers), customer volume (through customer IDs), or university student population (through student ID numbers). However, the article cautions that these estimations rely on the assumption of randomness and independence within the samples.
In conclusion, while both estimation methods provide the same long-term expected value, the one based on average gaps between ticket numbers offers lower variance and thus superior precision. For aspiring lottery winners, this could provide a somewhat better understanding of their chances, although the author jokingly suggests investing in cotton candy for a guaranteed win. The article ends by encouraging readers to consider other potential use cases for these statistical estimation methods.