Bug 970222 - Does not accurately evaluate use of multiple dictionary words
Does not accurately evaluate use of multiple dictionary words
Status: NEW
Product: Fedora
Classification: Fedora
Component: libpwquality (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Tomas Mraz
Fedora Extras Quality Assurance
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-03 14:16 EDT by Brian Lane
Modified: 2013-08-31 14:18 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Brian Lane 2013-06-03 14:16:41 EDT
Single dictionary words are flagged as 'based on a dictionary word', but multiple dictionary words are not. eg. einsteinwashington

Current password cracking software will find passwords like this very quickly, so giving them a high score leads users to think they are more secure than they really are.
Comment 1 Hubert Kario 2013-07-17 08:15:41 EDT
I don't think that default settings should reject passwords just because they are made of two dictionary words.

Even if we assume 10 bits of guessing entropy per word, it's still a password with 20 bits of entropy -- enough to satisfy NIST SP 800-63-1 requirement for Level 2 system (higher levels don't allow for single factor auth with just a user selected password).

It should detect passwords that are made of dictionary word and a common password ("password", "iloveyou", etc.) though:

[root@localhost ~]# echo 'passwordWashington' | pwscore
100
[root@localhost ~]# echo 'iloveyouWashington' | pwscore
100
Comment 2 Tomas Mraz 2013-07-17 10:50:06 EDT
(In reply to Hubert Kario from comment #1)
> It should detect passwords that are made of dictionary word and a common
> password ("password", "iloveyou", etc.) though:
> 
> [root@localhost ~]# echo 'passwordWashington' | pwscore
> 100
> [root@localhost ~]# echo 'iloveyouWashington' | pwscore
> 100

OK, that makes sense, can you provide a list (short) of such common passwords?
Comment 3 Hubert Kario 2013-07-17 11:17:51 EDT
How short do you want?

10, 100, 1000 entries? I should be able to provide meaningful lists up to 1000 entries...
Comment 4 Tomas Mraz 2013-07-17 11:48:07 EDT
That depends on how much entropy we want to add for such common password usage. Of course list sorted by commonality would allow to assign lower entropy to the most common passwords and higher to the less but still common passwords. 1000 entries already gives nearly 10 bits of entropy though.
Comment 5 Hubert Kario 2013-07-17 12:03:39 EDT
I'd say, that the top 10 words should be treated as empty string = 0 bit -- user that selected them is very likely to not use complex rules to select other rules or words in password so penalizing him for that might be a good idea.

I'll try to see how often they pop up (frequency) and then came up with entropy estimates for top n (where n > 10 and depends on how easy it is to calculate :)
Comment 6 Hubert Kario 2013-07-18 11:34:47 EDT
Given real world data from RockYou list (32603388 accounts)
(count/password):
 290729 123456
  79076 12345
  76789 123456789
  59462 password
  49952 iloveyou
  33291 princess
  21725 1234567
  20553 12345678
  16648 abc123
  16227 nicole

If we assume that all the top 10 passwords are equally likely to be chosen, we  get ~3.32 bits of entropy per entry in top 10 most used passwords.

If we calculate Shannon entropy for the passwords using their specific likelihood, we would get:
1.19
3.07
3.11
3.48
3.73
4.32
4.93
5.01
5.32
5.36

And for larger sets we would get even higher estimates for the first passwords (2.35, 4.23, ... for a top 100 list).

But we don't want to compress a list of passwords using Huffman coding...

We want to know what is the guessing entropy of a specific password. If the attacker uses the list of most common passwords according to the likelihood of the password (always choosing the most likely password), then the guessing entropy will be equal to log2 of the password's position in the list:
0.00
1.00
1.58
2.00
2.32
2.58
2.81
3.00
3.17
3.32

So I'd say that the list of bad passwords should be at least equal 2^10 (our estimate for entropy of words chosen by users).

But probably we should penalize them more, while the RockYou is the biggest sample available publicly, the most common passwords in the gawker leak (188281 accounts) are different:

(count/password)
3057    123456
1955    password
1119    12345678
661     lifehack (the name of the site)
418     qwerty
333     abc123
311     111111
300     monkey  (position 14 on RockYou)
273     consumer (the name of other site in the network)
253     12345
247     letmein (position 512 in RockYou list)
241     trustno1 (position 1001 in RockYou list)

Other high up in Gawker (in top 100) but low (not in top 200) in RockYou:

Pos.   : password
 207   :bailey
 245   :banana
 256   :freedom
 259   :master
 266   :spiderman
 292   :mustang
 311   :fuckoff
 323   :internet
 334   :asdfgh
 345   :midnight
 374   :aaaaaa
 377   :welcome
 378   :metallica
 396   :jackson
 455   :scooter
 512   :letmein
 557   :phoenix
 610   :matrix
 669   :starwars
 677   :biteme
 706   :nothing
1001   :trustno1
1263   :passw0rd
1487   :nintendo
1598   :swordfish
1601   :blahblah
1856   :Password
2172   :qwerty123
3329   :asdf1234
3869   :asdfasdf
19374  :thx1138
149437 :gizmodog (gizmondo is the name of the site in network)
173057 :consumer (so is consumer)
391382 :sample123 (I have no idea)
632836 :kotaku (again, name of the site in network)

Note You need to log in before you can comment on or make changes to this bug.