Jump to content

Talk:Soldier Name Stats

From UFOpaedia
Revision as of 12:00, 22 July 2006 by Zaimoni (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Even one data set having no duplicates would be strong evidence of deduplication, but twenty?

For a batch of 100, the theoretical probability that no name is duplicated in a batch, where the names are purely chosen randomly, is the series product

Π
k=1100..1199
k
1200

Numerically (putting this into a spreadsheet), this comes out to ~0.013154 . We have five significant digits even if the spreadsheet used typical C floats. [Lose two compared to base type because of the 199 multiplications and divisions done, but typical C floats have 7 significant digits.]

This does complicate testing whether the six name categories are being selected in a way compatible with a good random number generator. The deduplication means the Χ² test is not directly applicable.

--- Zaimoni 6:59 July 22, 2006 (CDT)