Single dictionary words are flagged as 'based on a dictionary word', but multiple dictionary words are not. eg. einsteinwashington Current password cracking software will find passwords like this very quickly, so giving them a high score leads users to think they are more secure than they really are.
I don't think that default settings should reject passwords just because they are made of two dictionary words. Even if we assume 10 bits of guessing entropy per word, it's still a password with 20 bits of entropy -- enough to satisfy NIST SP 800-63-1 requirement for Level 2 system (higher levels don't allow for single factor auth with just a user selected password). It should detect passwords that are made of dictionary word and a common password ("password", "iloveyou", etc.) though: [root@localhost ~]# echo 'passwordWashington' | pwscore 100 [root@localhost ~]# echo 'iloveyouWashington' | pwscore 100
(In reply to Hubert Kario from comment #1) > It should detect passwords that are made of dictionary word and a common > password ("password", "iloveyou", etc.) though: > > [root@localhost ~]# echo 'passwordWashington' | pwscore > 100 > [root@localhost ~]# echo 'iloveyouWashington' | pwscore > 100 OK, that makes sense, can you provide a list (short) of such common passwords?
How short do you want? 10, 100, 1000 entries? I should be able to provide meaningful lists up to 1000 entries...
That depends on how much entropy we want to add for such common password usage. Of course list sorted by commonality would allow to assign lower entropy to the most common passwords and higher to the less but still common passwords. 1000 entries already gives nearly 10 bits of entropy though.
I'd say, that the top 10 words should be treated as empty string = 0 bit -- user that selected them is very likely to not use complex rules to select other rules or words in password so penalizing him for that might be a good idea. I'll try to see how often they pop up (frequency) and then came up with entropy estimates for top n (where n > 10 and depends on how easy it is to calculate :)
Given real world data from RockYou list (32603388 accounts) (count/password): 290729 123456 79076 12345 76789 123456789 59462 password 49952 iloveyou 33291 princess 21725 1234567 20553 12345678 16648 abc123 16227 nicole If we assume that all the top 10 passwords are equally likely to be chosen, we get ~3.32 bits of entropy per entry in top 10 most used passwords. If we calculate Shannon entropy for the passwords using their specific likelihood, we would get: 1.19 3.07 3.11 3.48 3.73 4.32 4.93 5.01 5.32 5.36 And for larger sets we would get even higher estimates for the first passwords (2.35, 4.23, ... for a top 100 list). But we don't want to compress a list of passwords using Huffman coding... We want to know what is the guessing entropy of a specific password. If the attacker uses the list of most common passwords according to the likelihood of the password (always choosing the most likely password), then the guessing entropy will be equal to log2 of the password's position in the list: 0.00 1.00 1.58 2.00 2.32 2.58 2.81 3.00 3.17 3.32 So I'd say that the list of bad passwords should be at least equal 2^10 (our estimate for entropy of words chosen by users). But probably we should penalize them more, while the RockYou is the biggest sample available publicly, the most common passwords in the gawker leak (188281 accounts) are different: (count/password) 3057 123456 1955 password 1119 12345678 661 lifehack (the name of the site) 418 qwerty 333 abc123 311 111111 300 monkey (position 14 on RockYou) 273 consumer (the name of other site in the network) 253 12345 247 letmein (position 512 in RockYou list) 241 trustno1 (position 1001 in RockYou list) Other high up in Gawker (in top 100) but low (not in top 200) in RockYou: Pos. : password 207 :bailey 245 :banana 256 :freedom 259 :master 266 :spiderman 292 :mustang 311 :fuckoff 323 :internet 334 :asdfgh 345 :midnight 374 :aaaaaa 377 :welcome 378 :metallica 396 :jackson 455 :scooter 512 :letmein 557 :phoenix 610 :matrix 669 :starwars 677 :biteme 706 :nothing 1001 :trustno1 1263 :passw0rd 1487 :nintendo 1598 :swordfish 1601 :blahblah 1856 :Password 2172 :qwerty123 3329 :asdf1234 3869 :asdfasdf 19374 :thx1138 149437 :gizmodog (gizmondo is the name of the site in network) 173057 :consumer (so is consumer) 391382 :sample123 (I have no idea) 632836 :kotaku (again, name of the site in network)
We are not going to pursue this RFE at this point. If anyone wishes to work on this I'd suggest creating pull request on libpwquality upstream. https://github.com/libpwquality/libpwquality