-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion for Attributes: Shannon & Brute Force Thresholds Revisited #9
Comments
Ah, I did this work a few years, so this is a bit of test for me. Very cool to have someone interested in this stuff!
Agree. I think you're suggesting the attributes table include a value called "minimum average word length". Let's assume 4.7 bits of entropy for a moment (putting Shannon aside for now). I think the meaning of this value would be saying to the user "Hey, whenever you generate passphrase using this list, make sure the average length of the words in your passphrase is above X". Do I have that right? If so, I object to that metric for a few reasons. First of all, for the EFF long list, this new "minimum average word length" would be 12.925/4.7 or 2.75. Since the shortest word is 3 characters, there's no possible way to generate a passphrase below the minimum length. But secondly, I have intended to make Tidy a tool to make word lists that are good and safe with wide room for user error. By this I mean, I do not want to pass caveats to the user. "Ensure your passphrase's average word length is above X" would be one of these caveats. Basically, I want to encourage list-makers to make lists that are as "fool-proof" as possible. (Making it easy to remove prefix or suffix words is another example of this: that way, the passphrase generator need not require separators between words). My method for preventing these caveats is to take conservative approaches. Let's look at the metrics that Tidy currently displays. Here they are for the EFF long list:
The key lines for our discussion are the last three. I'll use W as word list length (7,7776 in the case of the EFF long list). "Assumed entropy per char(acter)" is log2(W) divided by length of the shortest word on the list (here's the code). For the EFF list, it's log2(7776)/3 which is 4.308. I use the shortest word on the list to be conservative. Users could get a passphrase completely made up of words that are of the shortest length. Once we've calculated "assumed entropy per character", we compare it to two constants to make the metric more human-understandable to users. If "assumed entropy per character" <= 4.7, Tidy reports that the list is "above" the "brute force line". For the EFF list, 4.308 is indeed less than 4.7, so we're safely "above" the brute force line (reported in the attribute table as "Above...": "true"). If it's less than 2.62, we're under the "Shannon line". Since 4.308 is greater the 2.62, the EFF list does not clear this line. In general, I want Tidy to encourage the creation of lists that are as safe as possible, with as few caveats as possible. I hope that makes sense in connection to your comment. |
Thank you for your thoughtful response. Your reasoning makes sense, and it's easy enough to compute the the "minimum required average word length" just using a calculator, if one is interested in this metric. There is a benefit delving into a more in-depth analysis when one wants to optimize the trade-off between passphrase entropy and length (for example, for a 7776-word list to clear the "Shannon line" using your conservative approach, the minimum word length would have to be 5, so a user will have to type a large number of characters to enter their passphrase; the more nuanced analysis would allow shorter words on the list, as long as the generated words have an average length of 4.9). |
Yep, I gotcha. The "lines" are admittedly strict/unforgiving in the current set-up... maybe too strict to be useful? Haha I also haven't gotten much feedback from tidy users who are not me, so this is a good discussion. I guess we could do like "Chance that a 6-word passphrase is below the 'Shannon line'", expressed as a percentage? For the EFF list, we'd count the number of words with fewer than 5 characters and go from there? |
Yes, something like this is what I meant when I wrote "It may be possible to compute or estimate the overall entropy reduction caused by rejecting passphrases that are too short", although I was thinking of some kind of metric in terms of entropy (e.g., perhaps an effective entropy per word, taking into account passphrases that would be rejected for being too short). However, I believe that whether this is expressed as a probability (percentage) or effective entropy, the result will be dependent on the number of words assumed for the passphrase. Thus, it may not be possible to calculate a universal metric for this, unless you are open to using a 6-word passphrase as representative benchmark.
I have a feeling that the calculation will not be trivial. My initial thought is that it may involve a 6-dimensional joint probability distribution... |
Ah, maybe you're right. If you want to work out some kind of formula, I might be able to implement it for Tidy in Rust. Maybe we could create a sort of general 0 to 100 "score"? |
I'll think about how to do this. But the bottom line is that the "score" (whether expressed as a 0-100 score, or in terms of some entropy metric) will be different depending on the assumed number of words in the passphrase. So unless you are OK with presenting the results in a tabular form (e.g., for word counts in the range 4-8), we would have to see whether a conservative (worst-case) metric can provide meaningful/actionable information, or else, abandon the idea. |
Not exactly on-topic, but I did add two formulas and two tables to the bottom of the readme that might be of interest here (and would welcome a double-check of my math). Maximum word list lengths to clear the Brute Force LineFormula: Where S is the length of the shortest word on the list, 26 is the number of letters in the English alphabet, and M is max list length: 2S * log2(26) = M (or in Python:
Maximum word list lengths to clear the Shannon LineFormula: Where S is the length of the shortest word on the list and M is max list length: 2S * 2.62 = M (or in Python:
Before creating this table, I hadn't realized just how "strict" the Shannon line is. Makes me wonder if it's so strict as to not be helpful. |
For the first table, your max list length values are only accurate to 2-3 significant digits, as a result of rounding errors in taking the logarithm. You can make the max list length values more accurate by taking advantage of the fact that S·log2(26) = log2(26S ) and that for any base B, by definition of the logarithm, we get BlogBX = X. As a result, we get M = 26S, which can be exactly evaluated without rounding errors; thus, your table entries should be 676, 17576, 456976, and 11881376. The Shannon entropy was apparently computed based on an empirically estimated "entropy per word" of 11.82 bits, and dividing that by 4.5 letters/word, which Shannon calls "the average word length in English" without documenting any source for this claim. Now, 11.82/4.5 = 2.6266666..., so rounded to 3 significant digits, it would actually be more appropriate to call it 2.63 than 2.62. However, since the claimed average word length is only given to 2 significant digits, the ratio can also be known only to 2 significant digits — i.e., 2.6 bits/letter. A more detailed uncertainty propagation analysis based on assumed Type B uncertainties of ±0.005 in the 11.82 value and ±0.05 in the 4.5 value indicates that the confidence interval for the ratio corresponds to the range 2.60-2.66 (i.e., 2.63±0.03 bits/letter). Thus, the corresponding values of the maximum list length should not be given to a precision greater than 2 or 3 digits. Footnote: |
AMAZING! I've updated the brute force table in the readme. Fixing the actual brute force computation in the code itself is a bit trickier, thanks to some rounding tomfoolery in Rust. 2.0_f64.powf(26_f64.log2()) // evaluates to 25.999999999999993, rather than 26
26_f64.log2() // evaluates to 4.700439718141092 ; I think it should be 4.700439718141093 (Code playground, for those interested.) So, unless you object, I'm just going to hard-code the decimal in the test:
Fascinating work! Going to change the Shannon test to
Again, super impressed you went this far here. Hope it's just a scanning issue? Let me know if I can help. The PDF I linked to was just a first search result. I should read more of those papers myself at some point. |
Ha! Thank you for the kind words, but I was really just trying to figure out if I could get a little better precision on the "2.62" value. Either there's something screwy with Shannon's numbers (not impossible -- this is what an electronic calculator looked like in 1951!), or I have completely misunderstood the methods used in his paper.
I think we may have to either look for a different analysis in the literature, or come up with our own. It seems to me that Shannon's value is derived entirely from data on frequency of words in grammatically correct written text (sentences, paragraphs, etc.). What we need instead is some kind of analysis based on the frequency of letters appearing in those words. Perhaps the act of dividing the "entropy per word" by the average word length does give an appropriate value, but that seems like mathemagic to me, so I would like to understand the reasoning behind the analysis before I trust it. In addition, it is not yet clear to me that Shannon's use of the word entropy is identical to our own use of this term. Thus, I would personally take the Shannon entropy value with a grain of salt until I have better understood the underpinnings of his methods.
Can you explain why the limited precision of Rust's floating point calculation causes a problem in the code? Not sure I understand why there is a need to calculate 2log26 in the code, or why an error on the order of 10-15 will cause issues. |
I was confused as well until just now -- have resolved the rounding issue. Think the culprit was this function: pub fn calc_entropy_per_word(list_size: usize) -> f64 {
(list_size as f64).ln() / (2_f64.ln() as f64)
} If I change it to pub fn calc_entropy_per_word(list_size: usize) -> f64 {
(list_size as f64).log2()
} the rounding issue I found resolves itself. Here's a Rust code playground the shows the original issue: |
I think you can simplify the evaluation of
by first calculating a value G that represents the number of letter guesses required, if given an entropy per letter: G = 26 (randomly drawn letters from If W is the length of the word list, and M is the number of letters in the shortest word, then your boolean test above becomes equivalent to checking whether W ≤ GM For the brute force case, this should require no floating-point arithmetic, and therefore would be expected to yield a more precise result. |
Very smart. Got it working and haven't gotten any surprises yet.
Just want to confirm you like NOTE: Using your new method of calculating the Shannon line disrupts the table in the readme, as it's currently calculated. Now, a list with shortest_word-length of 3 can be 226 words and still be above the Shannon line. Does that sound right? |
For Shannon line max length:
|
Yes, And, yes, because of imprecision due to rounding, you will get slightly different results if you use 2.6 vs. the the equivalent of log26.1 (≈2.6088). Since Shannon's reported value is not precise, if you want a max list length cut-off to the nearest word, you'll have to decide whether you consider 2.6 or log26.1 (or even something else, like 2.63, or 11.82/4.5) to be the canonical representation of Shannon's limit. |
Glad we finally got to "We're stuck cuz we can't tag Claude Shannon"! I guess I'll leave the Shannon calculations using Related: I just ordered two information theory textbooks: Elements of Information Theory 2nd Edition by Cover and Thomas; and Information and Coding Theory by Jones and Jones. The former gets better reviews it seems like. Maybe they'll have some clues for us, both on these entropy calculations and on uniquely decodable stuff. |
I played around a little with the entropy definition. If a corpus contains N words that appear with equal probability, then the Shannon entropy per word is log2N, which kind of makes sense. But Shannon's calculation of entropy per letter takes into account only the average word length in the corpus, and completely ignores the relative frequency of individual letters in the corpus — to me, this suggests that Shannon's notion of "entropy per letter" is not suited to characterizing the vulnerability of word lists to letter-based brute-force attacks. For example, consider the following corpus:
It contains 4 words of frequency 2 (probability 1/7), and 6 words of frequency 1 (probability 1/14). From this, the Shannon entropy per word is -4·(1/7)·log2(1/7) – 6·(1/14)·log2(1/14) = 3.236 bits/word, and the average word length in the corpus in 3.571 letter/word. Thus, we would get an entropy per letter of 3.236/3.571 = 0.9062 bits/letter. Note that this calculation makes no use of the knowledge that the corpus contains only 17 distinct letters, and that the relative probabilities of these letters range from 41% ( Look forward to hearing about what you learn in your reading. |
I think you're asking and answering the right question here. TBH when I picked that 2.62 number out of that Shannon paper (without reading far beyond that section), I wasn't attuned to all these ideas and Shannon's limitations. My goal was to provide a response to a (hypothetical) question of: "OK, this list is above the brute-force line that assumes all letters are equally likely in random order, but what about the different, slightly more enlightened attack of brute-force-guessing words rather than letters?" (And I think I figured it'd be cool to cite a 70-year-old paper from the father of information theory.) I've now got enough of a handle on this whole entropy thing that it makes sense to me that the goals and methods Shannon used are distinct from what I was/am looking for when setting up this second, "tougher" line calculation. (Not mention changes in English language in last 70 years.) Thus, I'm open to removing the Shannon line concept from Tidy entirely, at least for the time being. |
I think that the concept of an attacker requiring (on average) fewer than 26 guesses per character if they know your passphrase is in English is a very important one. The value of 2.62 from Shannon's 1951 paper may not be directly relevant, but I would not be surprised if it is close to the "correct" value. So, you may elect to remove the references to Shannon's 2.62 value if you don't feel comfortable that this value is rigorously correct, but I think it is important to not abandon this line of analysis completely. Until we can find a better number to use, perhaps one of the reported attributes can be the number of required guesses per letter (in a character-level brute force attack) that would make the character-level attack break even with the word-level attack? It will probably be a formula like GM=W, then solve for G: G = W1/M But it's late, and I should re-check this math, and also determine if the result represents an upper bound or lower bound. |
So my math from last night seems to check out. Thus, if you (temporarily?) decide to 86 the "Shannon Line", I would recommend that you report the following value in your attributes table instead: G = W1/M where W is the word list size, and M the length of the shortest word on the list. The best descriptor I can come up with for G right now is the minimum guesses per character. Thus, if an attacker must generate more than G guesses for each character in the passphrase in order to crack it using a character-level attack, then the passphrase could be considered safe from this type of attack (because it would be more efficient to do a word-based brute force attack). Conversely, if the number of guesses required to generate each character in the passphrase is lower than G, then the passphrase is susceptible to a character-level attack (at least if the passphrase contains any words of length M). Some examples for W=7776:
The proposed measure is less intuitive than the "Shannon line", but if you remove the Shannon entropy analysis from |
I think I follow you, but I want to learn/know more about G, with an eye toward hopefully (a) improving its name or (b) mathematically converting into something more intuitive without losing any of its descriptive power or accuracy. Two questions for now:
Notes on assumed entropy per characterTidy currently calculates a variable named assumed_entropy_per_character = log2(list_length) / shortest_word_length As an example, for the EFF long list (7,776 length, shortest word is 3 characters), "assumed entropy per character" comes out to 4.308 bits. This is the number Tidy formerly compare against 4.7001 to see if the list is above or below the brute force line. Thanks to your help, Tidy no longer uses this method, due to rounding inconsistencies, but in principle, "assumed entropy per character" still is tightly associated with the brute force line, hence my wondering if it's useful in our discussion of G and creation of a replacement for the Shannon Line. A separate, wild thoughtIn the assumptions behind our construction of the brute force line, are we wrong to assume a brute-force attack that must go through all 26 characters? More generally: Won't, on average, an attacker guess correctly about halfway through the given space? As a contrived example, if I tell you I'm thinking of a randomly chosen (lowercase) letter from a to z, won't you, on average, guess it on or around your 13th guess? If so, does this mean we need to re-think the brute force line? |
Yes:
G = 2
Not sure if I would say this is any more or less "accurate", but if you prefer this descriptor because it captures the relationship to the The bottom line is that because we (currently) do not have a reliable entropy threshold (for comparing against
I had the same thought when I was typing up my post about G this morning. There's some validity to this concern, but at the same time there are pitfalls that one has to be careful about avoiding. For example, if we assume a lower case letter can be guessed (on average) in 13 tries, it is not true that we can guess a three-letter word (on average) in 133 = 2197 tries, since this will only cover one-eighth of the search space. Instead, the average number of tries for a 3-letter word is 263/2 = 8788. I think this amounts to basically subtracting 1 bit of entropy, for the case of uniform frequency distributions (every letter equally probable). Unfortunately, the larger problem we are tackling here involves decidedly non-uniform frequency distributions, so it gets more complicated. A lot of it comes down to how you define your variables. For example, does G represent an average number of guesses, or a maximum or minimum? Only after you define the variables can you derive equations to evaluate those variables. I think it's best to defer further ruminations on this until after we have consulted some of the literature. For example, Shannon's entropy calculation –Σpilog2pi could be interpreted as some sort of weighted average of the entropies associate with each term i. So it would be worthwhile to check what the premises are for Shannon's definition of entropy. |
Haven't delved any further into the primary literature yet, but I found a nice synopsis of Shannon entropy in a post by Abhimanyu Dubey on Quora. Some excerpts:
In this context, "event" refers to
...or the random selection of a word from a larger corpus. Also, in the above definition of entropy, "information" has a specific interpretation:
So the Shannon entropy for a language corpus seems to be the average amount of space (in bits) required to store the information that a specific word from the corpus came out in a random draw. So for a corpus with 2n unique words that appear with equal probability, we'll need n bits to store information about which word was picked (e.g., for 8 words, we can represent each word using an integer in the range 0-7, which requires 3 bits — |
I've seen some references to entropy definitions other than Shannon's, e.g. Tsallis Entropy and Renyi Entropy. While I have no idea if either of those two concepts are helpful to our problem(s), I'm encouraged that there are other conceptions of entropy out there. Wonder if something fits our needs just right. |
My philosophy is to think from the perspective of the adversary. I think the most likely approach that would be used to reduce the number of guesses required per character is a Markov chain. Thus, we need an analysis of the computational complexity or the performance of such algorithms. Or even an entropy... Here's a quick Google Scholar search that yields 188 hits: https://scholar.google.com/scholar?q=markov+chain+entropy+%22passphrase%22 There's even more published work if you change the keyword "passphrase" to "password". |
The more I think about these issues, the more confused I make myself. To make things more clear, I think it is best if we, in turn, consider three different assumptions regarding the attacker:
In addition, we may occasionally choose to assume that the attacker knows the number of words in our passphrase, and whether or not a separator characters is used (although it should not be too difficult to relax these two assumptions when needed). I am starting to believe that under Assumption 1, a character-level brute force attack will never be more efficient that a word-level brute force attack, if one wants a 100% guarantee of success. I'll have to think more about whether or not this is also true if the attacker is satisfied with a lower probability of success To help me think about some of these issues, I created a toy word list, which contains 41 words all made from 4 letters of the alphabet ( Under Assumption 2 would be an attacker who knows that the list words can only contain 4 possible letters. Let's assume that they also know the number of words, and the fact that no separator is used (the word list is uniquely decodable). If only a single word is used in the passphrase, then naïvely, a character-level brute-force attack seems it would be worthwhile if the number of characters in the passphrase (n) satisfies 4n < 41 This requires n<3, and because there is only a single word in the phrase, n is also equal to the word length. Thus, we conclude that "unlucky" passphrases are those that consist of a 2-letter word. Based on our list of 41 words, only a single passphrase (2%) is vulnerable:
If two words are used in the passphrase, then the potentially vulnerable passphrases would be those consisting of n≤5 letters (because 45<412<46), of which there are 9 (0.5%)
Under Assumption 3, the number of randomly generated passphrases to check would be 265 = 11,881,376 for a 5-letter passphrase (or 12,356,630 if considering all passphrase lengths in the range 1-5); clearly, this approach confers no advantage to guessing which of the 412 = 1681 possible 2-word passphrases is correct. Under Assumption 2 if the attacker know only that all letters except four ( The question (or at least one question) is how much more efficient can the attack get by leveraging Markov chain statistics (probabilities about what letters follow previous letters or letter sequences)? *A second important question is whether we should compare the number of required brute-force guesses generated at the character-level to the full set of possible passphrases (1681), or only to the size of the vulnerable subset of the keyspace (e.g., the 9 two-word passphrases with no more than 5 characters). |
Your treatment of entropy per letter in the attributes (and as explained in your blogs) is interesting, but I think that to ensure a passphrase will not be too short, one must consider the number of words in the passphrase (as Reinhold did when he came up with a 19-letter minimum for a 6-word passphrase). Basically, the character-based entropy must be at least as large as the word-based entropy. This leads to a required lower bound for the minimum average word length in a given passphrase, from which one can determine the minimum number of letters required in a passphrase that has a specified number of words (by multiplying the minimum average word length by the number of words). So my suggestion is that the attributes table should report this lower bound for the required average word length.
If the actual entropy per letter is E (e.g., 2.62 bits for Shannon, 4.70 bits for brute-force), and the number of entries in the word list is W (e.g., 7776 for a diceware list using 5 dice), then it can be shown that the required lower bound for the average word length in a generated passphrase is given by the expression (log2W)/E.
The above expression is for the case of no separators, but if separators are included, the resulting increase in the character-based entropy relative to the word-based entropy will be minimal. The calculation becomes much more complex, so I think it would make most sense to just report the conservative value obtained for the assumption of no separators.
One could compare the average word length for the entire word list to the computed minimum required average word length, but to guarantee that none of the generated passphrases have an insufficient number of characters, the shortest word length in the word list would have to be greater than or equal to the lower bound for the average word length.
It may be possible to compute or estimate the overall entropy reduction caused by rejecting passphrases that are too short, but I would have to think more about how this could be done.
The text was updated successfully, but these errors were encountered: