[All Lists] [By Thread] [By Date] [Previous] [Next]
From: Devorah
Subject: The uniqueness requirement
Date: 5 Sivan 5783
Subject: The clustering problem
I have run some simulations.
I generated synthetic blocks based on demographic data: denominational affiliations, geographic distributions, age cohorts. I assumed people would describe themselves in ways that correlate with their actual communities.
If blocks cluster around common patterns, the birthday paradox applies more aggressively than R. Nachmani's calculation suggests.
With 17 million Jews and a theoretical space of 10^20, collisions seem unlikely. But my simulation shows that 80% of submissions cluster into approximately 1000 common patterns. The "Orthodox Ashkenazi who keeps kosher and observes Shabbat" is one pattern. The "secular Israeli with cultural identification" is another. The effective space collapses to these clusters plus the sparse tails.
After approximately 40,000 submissions to a common cluster, collisions become probable. After 100,000, they become likely. The first million Jews to submit blocks will exhaust the most common patterns.
This means: for typical Jewish profiles, the system will fill up faster than the raw numbers suggest. The combinatorial space is large in theory. In practice, we will run out of common patterns within a few years of launch.
Thread: