Kickstarter: We Were Hacked, User Information Exposed

Discussion in 'Planetary Annihilation General Discussion' started by kalherine, April 18, 2014.

  1. FSN1977

    FSN1977 Active Member

    Messages:
    657
    Likes Received:
    232
    try google translate the text you wrote "Dont tell me you think Uber and Kickstarter are 2 diferent companys? too portuguese --- Não me diga que você acha Uber e Kickstarter são dois diferentes da empresa.
  2. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    Thanks. I am not sure how that happened. I was surprised it was bigger.
    I'm not even going to bother rebutting this with my own words.

    You are completely misusing vocabulary. If you say "your vocabulary is 10000 - 12000 words", that is all the words that you can come up with on the spot, because those are the words in your vocabulary.


    https://xato.net/passwords/analyzing-the-xkcd-comic/#.U1JRX_mSzRU

    Except that isn't how you do it.

    The hacker doesn't know the password of the person they're hacking. They don't know if the password is troub4d0ur<3 or correcthorsebatterystaple, to begin with.

    The also doesn't know the vocabulary of the person they are hacking. Do you know the word "Manuka"? It's a tree in NZ. It would be naive to try and hack my password without using that word in your list of 5000. Actually, it's just naive to be using a list of common words, because if any of the components of my password aren't on that list, it will never be hacked, and you'll have to bruteforce it anyway, having wasted computing time with a naiive approach.
  3. corwin1

    corwin1 Member

    Messages:
    50
    Likes Received:
    31
    There are such concepts as 'active' and 'passive' vocabulary. There are a lot of words we understand if we see/hear them, but that wouldn't really come to mind without any reason, nor used in normal speech. And people are just more likely to come up with more common words. Even the comic talks about 'common words' - not the most esoteric ones you know.

    But it's true that names and such things add a lot more. I didn't think about that at the moment. And course all languages in the world, but I'm just talking in the context of cracking the password of some random American guy.

    And I guarantee you that if people were trained to use combination of common words as passwords, that would be exactly how the cracking would be done. Common words lists are certainly used even now. Because people are still foolish enough to use those if the service doesn't enforce anything better. It's not usually about cracking 'your' password. Or any good password. It's about cracking the easiest ones. Ok, of course there are other cases as well. But that's the context I was thinking about.
  4. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    Yeah I understand that :)
    The problem is this.

    Your common words list passwords have no entropy. It's actually on the explainxkcd page - because "correcthorsebatterystaple" is a "strong" password, it's one that is going to be guessed.

    It's a catch-22. If you're going to build a password using 5000 4 letter words, those 4 letter words have to remain secret. But if your password can be easily guessed, it's not a strong password, now is it?
  5. aevs

    aevs Post Master General

    Messages:
    1,051
    Likes Received:
    1,150
    Pretty much this, except it's even easier when the combinations tested go from using the most to least common words, and I would wager a good portion of people following this advice would mostly use the few hundred most common words without considering the consequences.
    Consider that "Mississippi" is far easier to guess than "Mississip", even though the second one is shorter. When you judge the information content of a word, you don't judge it as if every character is random. That's why I don't like this idea.
    Also, when you force people to use a number in an 8 character password, sure you decrease the max information content, but you increase the average information content by a lot. Most people wouldn't have used a number, so prioritizing those combinations means passwords are easier to guess on average.
    Pretending that a password can be modelled as a perfectly random combination can be very misleading.
  6. cola_colin

    cola_colin Moderator Alumni

    Messages:
    12,074
    Likes Received:
    16,221
    A few month ago I tried to use a password made up by the first letters of some sentence for the first time. It was the first password I actually forgot so bad I had to ask the admin to reset it...

    So I am back to passwords made up by short gibberish words that are put together using randomly made up rules. So I can remember a small non existent word and a simple rule or two.
    Like qweaAqweaBqweaZ (no that is not an actual password, neither the rule nor the base word is used by me in anywhere, but you get the idea)
  7. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    Mississip is a permutation of Mississipi though, where i character has been replaced by the null character. A CPU will figure that out. It's just one of the permutations of Mississipi that it will try.

    I don't understand your disagreement with the comic? Randall assumes that the "Hacker" knows the list his words came from. (which is why CHBS only has 44 bits of entropy) I.e. he doesn't make the assumption that every letter is random.

    And irrespectively, the 25 letter CHBS (4 words coming from a word list of 2048 words) mathematically has more entropy than the 10 letter troubadour&3.

    That's the point that Randall is making. It doesn't matter if you agree with it or not, because mathematically you're wrong, you can argue with words all you like, but the maths does not support your argument.

    Randall isn't actually saying "length > complexity". The best passwords will involve all 94 characters. Randall is saying that length is more important that complexity, so your complex password should at least be long.

    Windows 8 can have a maximum of a 16 character password. You're better off to fill the password field completely, rather than do half a job.
    Ya you're supposed to pick a passphrase you'll remember :)
    BulletMagnet likes this.
  8. cola_colin

    cola_colin Moderator Alumni

    Messages:
    12,074
    Likes Received:
    16,221
    A CPU alone is not able to think at all, and surely not to figure anything out. It's the human mind using the CPU as a tool that can figure it out :p

    But it was something like "Planetary Annihilation is quite an awesome game that Uber made" :(
    Issue is I remembered the meaning, but not the exact wording.
    aevs and stormingkiwi like this.
  9. kalherine

    kalherine Active Member

    Messages:
    558
    Likes Received:
    76

    1 im not portuguese im spanish but live in portugal true.
    2 then google translate its rong in portuguese that´s not right, write to google admin
    and who the hell cares to who or wy or from who is the company....im just worry about the hack and trying to see where was the attack!

    Did you buy the game from uber or from kickstarter?
    So who is who?
    And what belongs to who?
    Last edited: April 19, 2014
  10. aevs

    aevs Post Master General

    Messages:
    1,051
    Likes Received:
    1,150
    Mississip is missing 2 characters. A permutation like that without knowledge of the location or type of permutation adds more information to the password than using a word like "Mississippi" carries on its own.
    Here's a better example; the two characters "th" without context contain 9.4 bits of information. I'd bet money that the word "the" usually contains less than 2 bits of information. I'd wager less than a single bit, honestly. In a password, it might be slightly more, but it will still be far less than 9.4

    I believe Randall is overestimating the information content of his password. His words may only be around the first 1500 most common, but for passwords they certainly would not be. Most passwords like that will primarily use nouns (as was the case with Randall's), and people will also pick words that are even more common than his on average. That means even lower information content for the average password.

    My point is that real words without permutations actually carry very little information, especially when you want people to come up with those words on the spot. People are horrible random word generators. If you used a program to come up with truly random words, I might be a bit less skeptical, but I do not believe the advice is good when 3% of passwords start containing the word "cheese".

    EDIT: And don't tell me "mathematically you're wrong" when we're talking about the information content of arbitrary English words in specific context. There's no way you can argue math here unless you set up a survey to determine the actual entropy of the average user's password using this method.

    EDIT 2: Looking over the comic again, Randall assumes 11 bits of entropy for "horse" and 16 bits for "troubador". I'm extremely skeptical of his values for troubador in particular; it's a misspelling of "troubadour", which itself contains over 22 bits of information if we're going by wolfram alpha's estimation of word frequency. He assumes 3 bits of information for common substitutions, which is also an underestimation given that the location and type of those substitutions isn't known. In passwords, I can guarantee "horse" contains even less information than 11 bits, and I would predict the same for the other words in his password with the possible exception of the word "correct".
    Last edited: April 19, 2014
  11. cola_colin

    cola_colin Moderator Alumni

    Messages:
    12,074
    Likes Received:
    16,221
    Can you explain me how you end up at 9.4 bits for th? What kind of encoding knows partial bits?
  12. aevs

    aevs Post Master General

    Messages:
    1,051
    Likes Received:
    1,150
    http://en.wikipedia.org/wiki/Nat_(unit)
    http://en.wikipedia.org/wiki/Ban_(information)
    If I'm 75% certain that a weighted coin is going to land on tails, then the information I get from the result of a coin toss is less than a bit on average.
    There's nothing fundamental about the bit as a unit of measurement.

    EDIT: And for "th", I'm assuming a random combination of 2 letters. log of base 2 of (26^2) = 9.400879...

    EDIT2: As an example for 'what kind of encoding knows partial bits', think of the average information content of single characters in a compressed text file. You probably won't get an integer value.
    Last edited: April 19, 2014
    stormingkiwi and cola_colin like this.
  13. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    (Does anyone object to this derail? Just yell out and we can move to unrelated or terminate)


    Thanks, I never could spell Mississippi, but it doesn't change the fact that there are only 11 permutations of Mississippi that involve the null character.

    The null character - order doesn't matter. Missi\0ssippi is exactly the same as Missi\0, because of what the null character is. The null character terminates the string.

    While your disagreement is well based, there are university studies on this, and you're over analysing the comic.

    You'll notice that o is a common subsitution for ou. There are many words in English with that spelling difference, because of the latin derived spelling.

    If you go to wikipedia, on the American and British English spelling differences page, you'll notice that troubadour is a word that came from French into English. It's not a clever substitution you've made. It's one that will be picked up on by algorithm run by a CPU, because the algorithm isn't going to say "is the word armour? Guess armor!", it's going to say "does the word end in 'our'? Maybe it ends in 'or'!"

    You don't understand the comic.

    Randall randomly selects 4 words from a list of 2048. 2^11 = 2048. That is why horse has 11 bits of entropy, because it was randomly selected from that list.

    Shall we put the comic another way?

    Having a short password (4) with more symbols (2048), is more important than having a long password (11) with less symbols (94)

    Mathematically, you are wrong. Because the comic is backed up by university studies and by studies of people in industry.

    16 bits of entropy for Troubadour - Aevs you need to look at the comic once more and stop countering your own argument. If "the" has less than 9.4 bits of information, then you can expect that troubadour has less than 22.

    Troubadour is a dictionary word. Which means it has less than 22 bits of information, irrespective of how long it is. It has approximately 16 bits of information.

    Note that that is the point of the comic.

    1) Randall does use a program to come up with "truly random symbols". (With the caveat that it's a web comic, so he is allowed some artistic licence when it comes to what sounds funny)
    2) People are horrible random word generators. Their "random" "tr0ubador&3" password is following a set pattern.


    Yes, ideally you would have a long password that also has permutations, so it can't be dictionary attacked. The meaning of the comic is "if you have a password where you can enter up to 25 characters, don't just enter 8 that with substitutions, enter 25"

    With the obvious implication that your passphrase isn't using 26 characters, but all 94, so the attack has to brute force every character.


    I've done that with passwords too..... I know my facebook password, but I never remember the exact spelling ...
    My point is that real words without permutations actually carry very little information, especially when you want people to come up with those words on the spot. People are horrible random word generators. If you used a program to come up with truly random words, I might be a bit less skeptical, but I do not believe the advice is good when 3% of passwords start containing the word "cheese".
  14. aevs

    aevs Post Master General

    Messages:
    1,051
    Likes Received:
    1,150
    You misunderstand a few of my arguments here, and I think our interpretation of the comic may differ.

    Maybe I should have said "Misisipi" or "M1ssiss1ppi" instead, because my point is that you can't just test for one kind of permutation and ignore others. You don't know which kind of permutations you need to test for. Information content is drastically increased because of this.

    I'm analyzing the comic as though it were supposed to be good advice to a user to come up with a password using this method, or the argument that were this the norm, passwords would be more secure.

    It's a common permutation to test for, yes. It doesn't add much information, although it may add a bit in this case. My point was that troubadour is already as infrequent as ~200 appearances per billion words (which is where I got my rather low estimate for its information content), and adding that permutation probably isn't negligible. 16 bits is definitely a low estimate.

    I assumed his word choice was arbitrary, since he labels them as "four random common words". Not sure where you got the "list of 2048 words" part from (you mention it below as well, you must have gotten it from somewhere, I just can't seem to figure out where).
    Either way, I would be testing words based on their use frequency in passwords. Horse is in the first 1600 most used words in the English language. In English text, it appears around 5 times per hundred thousand words, which is around 14 bits. But for passwords, it's probably far more common since words like "of", "to", and "the" are certainly not nearly as common as in text (I'm not aware of any studies on word frequency in passwords, but they probably exist).
    If horse contains over 11 bits of entropy in the context of a password, I would be a little surprised, although it is reasonable.

    94^11 = 5.06*10^21
    2048^4 = 1.759*10^13

    Uhhh...

    I'll admit that I haven't read those studies, have you?

    Nope. "the" is the most common word in the english language. My last sentence feature the word "the" in context twice. If I removed the word "the" from all of my posts in this thread, very little information would be lost, because the word is very frequent and carries little information. If you can guess when the word "the" is missing accurately 50% of the time, then within the context of the message the word "the" contains one bit of information. Troubadour, in the context of common text? Yeah, good luck with that.

    Sorry, that's not how information theory works. Not all words contain the same amount of information.
    Let's go back to my coin toss example. We'll change it a little bit, say, there's a 95% chance the coin will land on "heads" and a 5% chance the coin will land on "troubadour".
    The entropy of the coin toss is:
    -0.05*log_2(0.05) - 0.95*log_2(0.95) = 0.2864 bits
    The self-information of predicting a "heads" outcome:
    -log_2(0.95) = 0.074 bits
    The self-information of predicting a "troubadour" outcome:
    -log_2(0.05) = 4.322 bits

    Do you see where I'm going with this?
    Not all words carry the same information content, because not all words are as frequent. When one has more, the other has less. If all passwords were comprised of single words with no permutations, sure, you could argue that you only need to test each word once and so the information content is limited by that. That's not the case here, and word frequency in text is probably a good indicator for word frequency in passwords.
    -log_2(200*10^-9) = 22.25 bits
    The self-information of selecting a random word in a text and it being "troubadour" is over 22 bits. (In a password, I doubt it loses as much information as a word like "horse" would, but that's just conjecture since passwords are given by everyone while the sources of my statistics come from digital books which probably feature a more extensive vocabulary on average).

    If we use the same technique with "the", we get a value of 4.47 bits.


    Of course, more information = better. My point is that he's misrepresenting the differences between the two passwords (significantly underestimating the information content of the former, probably overestimating that of the latter), and that if a user looks at the comic and takes it at face value, the password they come up with using his technique probably won't be better. The troubadour password definitely follows some common password styles, but so will passwords generated using his method. His advice isn't going to solve that problem, because neither one is ideal.

    You know what's probably a better password than both? "punypunsteroids"
    ...Well, maybe. Hard to say, I don't know how common the pun permutation is.

    EDIT: Also, just looked at the 'tooltip' for the comic. Randall knew what he was going to cause. :p

    EDIT 2: Accidentally screwed up some math; that's 4.47 bits for "the". Context will of course lower the information content of any word compared to a value calculated this way, which is why that value is quite high. It's probably a good estimate for passwords though, since context can't give you clues in that case.
    Last edited: April 20, 2014
    stormingkiwi likes this.
  15. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    No, you can't, I agree with you. There are (95^11+11) permutations of Mississippi. And by checking against truncation first, you've eliminated 11. Because that's how the CPU is going to bruteforce your password. It is going to check every single permutation. Including i = 1, ss = (s, $, b, 5 or B etc.). That's the comic. It is very easy for a CPU to check every single permutation. It is quite difficult for a person to check every single permutation, and it is quite difficult for a person to generate truly random permutations.

    I'm going to put your quotes in blue. I'll try and make this as good a post as I can.

    Sorry I grow unclear, inconsistent and stupid when I try to explain stuff.


    You don't know which kind of permutations you need to test for.

    That's another point that the comic is making. Because by encouraging people to make common substitutions, you do know which kind of permutations you need to test for.

    3 random substitutions in mississippi are ^issossilpi

    That's done by a random number generator selecting the position for the substitution, and a random number generator selecting the character to be substituted in.

    The reason the comic has so few bits allocated to the substitutions for 'ou' is because of common substitutions. The comic isn't considering 94 characters in 11 possible positions as an option. The algorithm the hacker is using is prioritising the common substitutions over the non-common substitutions, because it is more likely that the "random substitutions" aren't actually that random at all, because a person selected them.

    Randall assumes you are not randomly substituting, but making common substitutions. It is a human generated password, not a randomised string.

    That's the way you are encouraged to make a password. So it isn't 94^11, because not every substitution is possible, because the human is making their "common substitutions". They aren't actually generating a completely random password at all, they wouldn't generate all 94^11 permutations.

    Which is why &3 doesn't have much entropy. It could be at the beginning, it could be at the end, but it isn't in a random position turning the word into gibberish.

    Another assumption that Randall is making is that only the 16 most common punctuation characters are being used.

    Why do you think it's 4 bits of entropy for 16 characters? It should be 5, as there are 32 punctuation characters.

    Information content is drastically increased because of this.
    Information content, in the sense that you are applying it, has little if anything to do with it.

    troubadour is already as infrequent ~200 appearances per billion words (which where I got my rather low estimate for its information content), and adding that permutation probably not negligible. 16 bits definitely a.

    Words in bold, in the above paragraph, are repeated. Duplicates are removed. Think why I did that.

    Sorry, I forgot to post the source. Just as well, I can't resist this pun

    Straight from the horses mouth XD

    Either way, I would be testing words based on their use frequency in passwords. Horse is in the first 1600 most used words in the English language. In English text, it appears around 5 times per hundred thousand words, which is around 14 bits.

    But for passwords, it's probably far more common since words like "of", "to", and "the" are certainly not nearly as common as in text (I'm not aware of any studies on word frequency in passwords, but they probably exist).
    If horse contains over 11 bits of entropy in the context of a password, I would be a little surprised, although it is reasonable.

    Notice what I've strikethroughed.

    Horse is in the 1600 most used words. So on your list of 1600 most used words, it is one word, over 1600. Therefore it has 10.9 bits of entropy. "the" is the most used word in english. It has 10.9 bits of entropy, because it came off your list of 1600 words, where it appears once. It doesn't matter how frequently it appears, because it's already on your non-repeated list. If you randomly choose a word from that list, there is a 1/1600 chance that it will be "the".

    Do you get where 2048 11 bits comes from now? Horse doesn't contain more than 11 bits of entropy. It contains exactly log(2048)/log(2) = 11 bits.

    Randall has a list of 2048 words, that he randomly generated 4 words from, to make a sentence. The hacker found that list of words. It's 11 bits * 4. That's why all the words in his list only have 11 bits of entropy. Because they are four words, each word chosen randomly from 2048 words.

    Likewise, the hacker has perfect information for troubadour password, because he knows the template it was created from.

    The entropy of both passwords is therefore the lowest it can possibly be.

    This isn't about testing for information. This is about a dictionary attack. There are several institutions (Oxford, Mirriam-Webster, that have already done the hard work, because they have formulated a 'dictionary'. So the hacker doesn't have to consider the entropies that you're suggesting. "the" may have 4.47 bits of entropy, based on its usage. But the list has already been compiled. It appears on the list exactly once. As does troubadour. And there is one dictionary, or in this case, list of words. It doesn't matter how frequently those words appear in a block of text, they appear once on our list.

    That's why troubadour has 16 bits of entropy - Randall assumes the list is 2^16 words long. Maybe that estimate is too low. But remember the key point - the hacker knows the list of common words, he must know the list of common and uncommon words too, for fairness of comparison.

    How big is the English language? Let's assume 1,000,000 words. Troubadour appears once. That's 20 bits of entropy, in a 1,000,000 word dictionary. "the" has 20 bits of entropy also.

    If troubadour was 5% of all words, there would be 20 words in your dictionary, and it would have an entropy of 4.32. I think there are substantially more than 20 non-repeated words in this post.

    So troubadour - 20 bits of entropy for how many different words it could be. You need 24 bits of entropy to catch up to CHBS. And you are not making random substitutions, you're making common substitutions.

    Randall has only assigned 12 bits of entropy for those common substitutions.

    It's not the 39.3 bits of entropy that you would expect if those substitutions had been entirely random. (Randall made 4 "substitutions" and two additions to the original 10 character string, ('troubadour'))

    I too disagree with the levels of entropy that Randall allocates to the word, after considering this further.

    '&3' should have 3 bits of entropy (1 bit for order, 1 bit for 'punctuation' at the start/end of the word, 1 bit for numeric at the start/end of the word. (Note the comic parenthesis - it can't be in the middle, or it turns the word into gibberish)

    5 bits for punctuation

    1 bit for caps.

    3 bits for numeric.

    32 bits of entropy. I do not believe that there are 4096 "common substitutions" that can now be made, without turning the word into gibberish. So I think that Randall's assessment is pretty fair.

    Concluding, it is unlikely that the most common substitutions are going to contain the 12 bits of information they need to equal CHBS. You also know as well as me that 1,000,000 words is a very generous allocation for a dictionary list.

    The most important part of the comic is the idea of the substitutions that are made.

    In addition, Randall is calculating the lower bound of the complexity to crack the password, not the upper bound.

    He gives both attackers perfect information.

    Both attackers know the template the password was made from. If you suddenly say "no, troubadour&3 hacker doesn't know the template or there are random substitutions, so it's 72.3 bits of entropy or 59", then "CHBS hacker doesn't know the template either, or doesn't have to concatenate his words, and it's 120 bits of entropy, or 94".

    Consider: correcthobatteryrsestaple


    Your first edit is what I gave you the like for :)

    Your mention of pun is what I gave you the pun for.

    Your second edit, I'm sorry about, but I've been doing it to you all day, so no worries.

    While the underestimate of the former is up to debate, it is however clear that he is not overestimating the entropy of the latter, because of the way the passwords are selected.

    Either way, even the security experts who disagree with the comic agree with his fundamental point. That length is greater than complexity, and you add complexity to improve an already long password, not instead of length.

    F*** it. I did it again.

    What I keep doing is this - going into google. 2048^4, in the search bar.

    Calculator opens up in the results. So my next calculation is 94 EXP 11 in the calculator. (i.e. 94*10^11, because I keep clicking EXP, not X^y)

    D'oh!
    aevs likes this.
  16. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    Addendum (over limit)

    Another interpretation:

    It should be acknowledged that the experts who reject the comic, don't actually recognise the key points of the comic - That in both cases, the hacker has perfect information, and that there is a formula being applied to the generation of both passwords, so you can't add bits of entropy for stuff that would be possible if the password was randomly generated, but aren't possible because it doesn't' fit in the formula.

    It also should be acknowledged that even experts who reject the comic agree with its principal. That complexity is nothing without length.
  17. aevs

    aevs Post Master General

    Messages:
    1,051
    Likes Received:
    1,150
    Length is definitely the more important factor. 94^11 >> 2048^4. Length is the exponent here, the base is related to the entropy of a symbol.
    No one will argue against that, what they will argue is that a "long" password of a few words can be modeled as a short password with more information per symbol (as you said, 2048^4).

    This is pretty much all I can say to clarify myself further than I already have:

    I'm assuming a real world scenario, and not that the hacker has perfect information (Although I can't give an accurate description of the information content of a given permutation, because I would need to know the frequency of a permutation in passwords to come up with that estimate :(). As you've said yourself, his math works out because he has modeled the problem with simplified constraints that I believe are misleading when applied to the real world. Randall is correct given the assumptions he makes! If you're arguing for that, you're right! But that's not my point.

    A reductio ad absurdum like this should have made you realize the error of that assumption :( I'm kind of disappointed that you crossed out the important part of my post regarding this. Yes, a random word from a dictionary, if we know it is truly random, will only carry 18 bits of information at most (as far as I can tell from googling). But testing permutations of 'troubadour' just as often as you test permutations of 'cheese' is not how you would crack passwords in practice.

    In reality, the information content of a word is far more closely related to its use frequency [in passwords in this case] than to the number of potential words [even when that list is limited based on commonality]. Here's why:

    If I were to come up with the most efficient possible program to guess passwords, I would spend my time guessing at permutations of a word relative to its use frequency [in passwords]. This is why your 'duplicate symbol' argument is flawed in practice. I'll test 'horse', 'h0rse', 'hor5e' and 'h0r5e' before I ever test 'troubadour', because that is how I maximize my chance at success. The chance that 'hor5e' will be present is larger than the chance that 'troubadour' will be present in a password, because a permutation of 'horse' probably still has less entropy than 'troubadour' [again, can't know the real information content of a permutation, but a difference of +10 bits is definitely too high to make up for]. The same can be said of word combinations when the words' combined entropy does not exceed that of troubadour. Unless all passwords use single words without permutations, use frequency [again, in passwords, I'm just assuming that it's approximately proportional to frequency in text] is the best way to estimate the entropy of a word, symbol or permutation. The only other information you can account for is from context, which is really hard to account for, and probably not that significant for passwords.

    If I tell you "this password is of a length from 1 to 6, contains the symbols 0, 1, 2 & 3, and there's a 1% chance that any given symbol is 2", and you want to find the password as quickly as possible, you'll be testing "0310" before you test "2". That's how information theory works. Randall actually does go over the relationship between the frequency of adding "3" to the end of a password and the entropy it adds in that forum post, for example. I don't necessarily agree on his estimates for the information content of the other permutations, however (it seems he assumes only a few number replacement permutations need to be accounted for, which I doubt is the case), but there's no way to really prove it one way or the other without statistics [​IMG]

    So, in the end, all I can say is that I am sure he is significantly underestimating the information content of the first password, and I suspect he is overestimating the information content of the second password.
    It may be that the second password still has some more information than the first (I wouldn't be willing to call it either way definitively), but a lot of that has to do with the fact that the first is already the norm. If you reversed the commonality of each method, who's to say the opposite wouldn't be true?

    If his objective was to point out that some permutations lose information content because of commonality and should be avoided because of it, I just think he picked a bad way to demonstrate it. I'd love to see him make a more in-depth post regarding information theory though. :D

    Anyway, that's all I have to say. Frequency's important, etc. I need to get some sleep now, I've got 3 exams this week and I should really spend more time studying for signal processing than debating on the blagotubes :oops:. I won't be responding again this week, for my own good. ;)
  18. stormingkiwi

    stormingkiwi Post Master General

    Messages:
    3,266
    Likes Received:
    1,355
    Real world scenario

    The hacker knows one password is a 11 character string of 94 characters.

    The other password is a 25 character string of only lowercase characters.

    26^25 - 120 bits of entropy. Randall wins, and is still correct. Deal with it.

    For someone so intense about statistics, you are missing the most key point of the interpretation of the comic. Which is why I strikethoughed or removed all the stuff that wasn't relevant.

    Randall gave both hackers as much information as they can possibly have. For both passwords, he has calculated the lower bound of how complex they are. The CHBS method randomly selects 4 words from a dictionary. The troubadour method randomly selects 1 word from a dictionary, and then makes common user substitutions. Randall actually gives Troubadour a lot of extra entropy there, as he highlights in his own forum post - if the user was to select their own password, you would expect that the starting entropy of troubadour wasn't 16 bits to begin with. Just like you are trying to argue for horse. Le Chatelier's principle applies to this system of equilibrium.

    You can't have an upper bound lower than a lower bound.

    It is literally impossible to argue that CHBS is an overestimate (when the comic was written). Because we know that 4 words were randomly selected from a list of 2048, and then concatenated. If anything, that's a serious underestimate, a longer word list would be used in the real world.

    And that's fine, well, and dandy.

    Except we know our method ensures random selection. So everything else you write is, no offence, complete bollocks.

    You realise you are now arguing against Diceware passwords? Using the same method, a Diceware password has an entropy per word of 12.9 bits. (the list is 7776 words long). And that's the lower bound on the entropy strength of the password.

    I'm sorry if you don't agree because of how you perceive the strength of troubadour. But if you don't agree because you think correct horse battery staple is an overestimate, you really have no idea what you are discussing.

Share This Page