Talk:Soldier Name Stats

Even one data set having no duplicates would be strong evidence of deduplication, but twenty?

For a batch of 100, the theoretical probability that no name is duplicated in a batch, where the names are purely chosen randomly, is the series product

Π
k=1100..1199

k
1200

Numerically (putting this into a spreadsheet), this comes out to ~0.013154 . We have five significant digits even if the spreadsheet used typical C floats. [Lose two compared to base type because of the 199 multiplications and divisions done, but typical C floats have 7 significant digits.]

This does complicate testing whether the six name categories are being selected in a way compatible with a good random number generator. The deduplication means the Χ² test is not directly applicable.

--- Zaimoni 6:59 July 22, 2006 (CDT)