[All Lists] [By Thread] [By Date] [Previous] [Next]
From: Devorah
Subject: The Geek Code discovery
Date: 23 Cheshvan 5781
Subject: Parsing concerns
I have spent several days with the Geek Code specification. I downloaded archived copies, found examples in old Usenet posts, compared what was written to what was specified. I want to be clear about what I found.
The format is not parseable in any standard sense. Here is why:
1. Codes have variable length. d is one character. PS is two. PGP is three. When you see PS+ in a block, is that P followed by S+, or is it PS followed by +? The answer depends on knowing the vocabulary in advance.
2. Modifiers can appear before or after codes. Sometimes both. !d means something different from d!. The meaning of @ depends on position.
3. Whitespace is significant in some places, insignificant in others. A space usually separates codes, but not always. Line breaks are sometimes meaningful.
4. The grammar is not specified formally. There is no BNF, no regex, no automaton. I tried to write one. I could not construct a grammar that accepts all valid blocks and rejects all invalid ones, because there is no clear definition of valid.
A strict parser following the specification would reject most real-world Geek Code blocks. The specification describes an ideal. Actual usage diverged.
If we adopt this model, we are adopting the divergence too.
Thread: