Red Hat Bugzilla – Bug 970222
Does not accurately evaluate use of multiple dictionary words
Last modified: 2013-08-31 14:18:22 EDT
Single dictionary words are flagged as 'based on a dictionary word', but multiple dictionary words are not. eg. einsteinwashington
Current password cracking software will find passwords like this very quickly, so giving them a high score leads users to think they are more secure than they really are.
I don't think that default settings should reject passwords just because they are made of two dictionary words.
Even if we assume 10 bits of guessing entropy per word, it's still a password with 20 bits of entropy -- enough to satisfy NIST SP 800-63-1 requirement for Level 2 system (higher levels don't allow for single factor auth with just a user selected password).
It should detect passwords that are made of dictionary word and a common password ("password", "iloveyou", etc.) though:
[root@localhost ~]# echo 'passwordWashington' | pwscore
[root@localhost ~]# echo 'iloveyouWashington' | pwscore
(In reply to Hubert Kario from comment #1)
> It should detect passwords that are made of dictionary word and a common
> password ("password", "iloveyou", etc.) though:
> [root@localhost ~]# echo 'passwordWashington' | pwscore
> [root@localhost ~]# echo 'iloveyouWashington' | pwscore
OK, that makes sense, can you provide a list (short) of such common passwords?
How short do you want?
10, 100, 1000 entries? I should be able to provide meaningful lists up to 1000 entries...
That depends on how much entropy we want to add for such common password usage. Of course list sorted by commonality would allow to assign lower entropy to the most common passwords and higher to the less but still common passwords. 1000 entries already gives nearly 10 bits of entropy though.
I'd say, that the top 10 words should be treated as empty string = 0 bit -- user that selected them is very likely to not use complex rules to select other rules or words in password so penalizing him for that might be a good idea.
I'll try to see how often they pop up (frequency) and then came up with entropy estimates for top n (where n > 10 and depends on how easy it is to calculate :)
Given real world data from RockYou list (32603388 accounts)
If we assume that all the top 10 passwords are equally likely to be chosen, we get ~3.32 bits of entropy per entry in top 10 most used passwords.
If we calculate Shannon entropy for the passwords using their specific likelihood, we would get:
And for larger sets we would get even higher estimates for the first passwords (2.35, 4.23, ... for a top 100 list).
But we don't want to compress a list of passwords using Huffman coding...
We want to know what is the guessing entropy of a specific password. If the attacker uses the list of most common passwords according to the likelihood of the password (always choosing the most likely password), then the guessing entropy will be equal to log2 of the password's position in the list:
So I'd say that the list of bad passwords should be at least equal 2^10 (our estimate for entropy of words chosen by users).
But probably we should penalize them more, while the RockYou is the biggest sample available publicly, the most common passwords in the gawker leak (188281 accounts) are different:
661 lifehack (the name of the site)
300 monkey (position 14 on RockYou)
273 consumer (the name of other site in the network)
247 letmein (position 512 in RockYou list)
241 trustno1 (position 1001 in RockYou list)
Other high up in Gawker (in top 100) but low (not in top 200) in RockYou:
Pos. : password
149437 :gizmodog (gizmondo is the name of the site in network)
173057 :consumer (so is consumer)
391382 :sample123 (I have no idea)
632836 :kotaku (again, name of the site in network)