Talk:Soldier Name Stats
Even one data set having no duplicates would be strong evidence of deduplication, but twenty?
For a batch of 100, the theoretical probability that no name is duplicated in a batch, where the names are purely chosen randomly, is the series product
| Π k=1100..1199 | k 1200 |
Numerically (putting this into a spreadsheet), this comes out to ~0.013154 . We have five significant digits even if the spreadsheet used typical C floats. [Lose two compared to base type because of the 199 multiplications and divisions done, but typical C floats have 7 significant digits.]
This does complicate testing whether the six name categories are being selected in a way compatible with a good random number generator. The deduplication means the Χ² test is not directly applicable.
--- Zaimoni 6:59 July 22, 2006 (CDT)