Bug 960230 - nss forgets about CAs intermittently (frequently after suspend)
nss forgets about CAs intermittently (frequently after suspend)
Product: Fedora
Classification: Fedora
Component: p11-kit (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Stef Walter
Fedora Extras Quality Assurance
RejectedBlocker AcceptedFreezeException
: 969463 (view as bug list)
Depends On:
Blocks: 466626 F19-accepted/F19FinalFreezeException
  Show dependency treegraph
Reported: 2013-05-06 14:39 EDT by T.C. Hollingsworth
Modified: 2013-06-21 10:30 EDT (History)
13 users (show)

See Also:
Fixed In Version: p11-kit-0.18.3-1.fc19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-06-07 00:36:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
tail of firefox output with P11_KIT_DEBUG=all (171.34 KB, text/plain)
2013-06-04 20:22 EDT, T.C. Hollingsworth
no flags Details
Test case (1.51 KB, text/plain)
2013-06-05 04:39 EDT, Stef Walter
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
FreeDesktop.org 65401 None None None Never

  None (edit)
Description T.C. Hollingsworth 2013-05-06 14:39:50 EDT
Description of problem:
If I suspend my system while Firefox or Chrome are running, after I resume and try to visit a HTTPS encryped site, I get a warning that the CA isn't trusted.  Closing the browser and restarting fixes the issue.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Open Firefox or Chrome
2. Suspend the system
3. Attempt to visit a HTTPS encryped site.
Actual results:
Get a warning from the browser that the CA isn't trusted.

Expected results:
Site loads normally.

Additional info:
I experience this with both Mozilla Firefox from the Fedora repo and Google Chrome from Google's repo.
Comment 1 Andrew Hutchings 2013-05-07 13:18:57 EDT
If it helps I get this randomly happening every couple of days without suspend.  It also affects Thunderbird.
Comment 3 T.C. Hollingsworth 2013-05-30 23:37:39 EDT
It happens to me randomly sometimes too, but suspending reproduces it every time.

I'm proposing as a Fedora 19 Final Blocker somewhat based on criteria 17:

"All applications listed under the Applications menu or category must withstand a basic functionality test and not crash after a few minutes of normal use. They must also have working Help and Help -> About menu items"

It's not exactly crashing, but having Firefox start spewing security warnings on legitimate sites randomly on our live images isn't very nice.  This bug keeps growing CCs so I'd like it to get a little more attention.

Comment 4 Adam Williamson 2013-06-03 14:21:03 EDT
Discussed at 2013-06-03 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-03/f19final-blocker-review-2.2013-06-03-16.00.log.txt .

This doesn't appear to be affecting all cases - fr'instance, I have two F19 systems which I suspend/resume all the time and I've only seen something which _may_ have been this bug one time on one of the machines (never on the other).

It's also unclear that suspending/resuming live images is a case we care about a lot (suspending/resuming lives is kinda dicey in general) and so the 'blocker' justification for this is unclear.

But we decided to punt on the decision until we get some feedback from the nss maintainer on what they think may be causing this and how big of a problem it is.
Comment 5 Kai Engert (:kaie) 2013-06-03 14:36:35 EDT
> But we decided to punt on the decision until we get some feedback from the
> nss maintainer on what they think may be causing this and how big of a
> problem it is.

I still haven't been able to reproduce.

Your hint that, for you, it happens with a live image, only, is helpful. I'll test that.

However, we also have similar bug report 969463, which doesn't talk suspending - so the issue might be more general.
Comment 6 T.C. Hollingsworth 2013-06-03 16:30:31 EDT
Okay, so suspending *used* to reproduce it every time but it seems with more recent updates it doesn't anymore.  (That being said, I only had to put my computer to sleep 3 times to get it to trigger.  ;-)  I've also noticed it happening slightly less often randomly.

Modifying the summary since multiple users have reported that this affects them randomly, not just on suspend.  bug 969463 might well be a dupe, though it seems to happen to Dan constantly instead of just randomly like with us.

This problem does *not* solely affect live images, I only mentioned that this would be a nasty bug to have permanently affect them for an entire release, given that Firefox is a thing a lot of people use in them quite frequently.
Comment 7 T.C. Hollingsworth 2013-06-03 16:58:06 EDT
Oh, and it was really bad (and happened to me on suspend all the time) with the Alpha package set, so you might want to start there if you're having trouble reproducing with more up-to-date F19.  (It seems it has gotten better but hasn't gone away...)
Comment 8 Adam Williamson 2013-06-03 17:03:50 EDT
I've been running F19 since well before Alpha release and, as I said, have seen something that could _possibly_ have been this (it saw the certificate on my own domain, happyassassin.net, as invalid until I restarted firefox) only once in that whole span.
Comment 9 T.C. Hollingsworth 2013-06-03 17:10:30 EDT
Hmmm...do you have Chrome installed?  What about you Andrew/other CCers?

I wonder if that screwed up nss somehow; it's the only nonstandard piece of the puzzle on my system.

Wish I could get rid of it, I use Firefox mostly but need it around for webdevvy testing.  Guess I could get along with just using it in a Windows VM, but ugh.  :-(
Comment 10 Adam Williamson 2013-06-03 17:28:47 EDT
Nope, no Chrome here, I'm a strict Mozilla fanboy :)
Comment 11 Andrew Hutchings 2013-06-03 18:06:18 EDT
Yep, I have Chrome and it happens in that, but it happened before I had Chrome installed.

I did have Firefox/Thunderbird and it affected both.  I now primarily use Chrome and Evolution and this problem affects both of them too.
Comment 12 Kai Engert (:kaie) 2013-06-03 20:05:37 EDT
Do you use a network filesystem?

Is /tmp mounted from an unusual place?
Comment 13 Kai Engert (:kaie) 2013-06-03 20:17:28 EDT
I really wish I could reproduce, but I can't, so I'll have to ask more questions.

Given that you can fix your issue by restarting the application, I suspect a runtime error in the p11-kit-trust.so library. This is just a wild guess at this time.

As a first step, let's attempt to verify, that an application in a broken state has indeed lost its trust list.

Let's use firefox first. While it works, please go to
  edit/preferences/advanced/encryption, click "security devices"
On the left hand side, you should see two entries "System Trust" and "Default Trust".
Close the dialog, click "view certificates", select the "Authorities" tab.
You should see many entries that say "Default Trust" in the second column.
One after the other, click a few of them, and click the "edit trust" button on each. DON'T HIT OK. Just verify that most of them have checkboxes enabled, and hit cancel.
Close the cert manager.

The above was for your education, and to see the expected state.

Once you get into the broken state, please reopen the "security devices" dialog.
Do you still have the default trust and system trust entries?
If you click those entries, does the right hand side still show some version information?
Now again open the certificate manager, authorities tab. Do you still have a lot of "default trust" entries? If you do, and you click edit trust, do they still have the checkboxes checked? (don't click ok, click cancel).

I'm asking these questions, in the hope that you'll tell me that things look differently, once firefox has forgotten about CAs.
Comment 14 Kai Engert (:kaie) 2013-06-03 20:25:33 EDT
Could you start the affected applications from a terminal, set an environment variable, and redirect output to a file? Stef, who created the new p11-kit-trust module, had suggested to set P11_KIT_DEBUG=all - however, that produces a large amount of output.

export P11_KIT_DEBUG=all
firefox -no-remote > log 2>&1

Once it fails (and after you made above tests that I asked for), quit firefox, and investigate the file. Hopefully the sections at the end will give us a clue.
Comment 15 T.C. Hollingsworth 2013-06-03 20:29:30 EDT
(In reply to Kai Engert (:kaie) from comment #12)
> Do you use a network filesystem?

> Is /tmp mounted from an unusual place?

The only "unusual" thing I have done is `systemctl mask tmp.mount` so /tmp is on disk instead of on tmpfs.

I'll try out the other stuff you mention later, unfortunately a fire just landed on my lap so I can't do it right now.  :-(
Comment 16 Andrew Hutchings 2013-06-04 02:15:35 EDT
No network file system, tmpfs on /tmp type tmpfs (rw).  Main disk is an SSD using ext4.

I'll run the debugging today and hopefully it will retrigger in the browser.
Comment 17 Francis Kong 2013-06-04 12:21:52 EDT
I think I have encountered a similar situation, which reinstalling ca-certificates and restarting firefox worked around the issue for Firefox.  Not sure how to work around the same issue with Chromium however.
Comment 18 T.C. Hollingsworth 2013-06-04 16:51:07 EDT
I just hit this again.  Unfortunately, I hadn't had the debug ouput enabled, but I did check this:

(In reply to Kai Engert (:kaie) from comment #13)
> Now again open the certificate manager, authorities tab. Do you still have a
> lot of "default trust" entries? If you do, and you click edit trust, do they
> still have the checkboxes checked? (don't click ok, click cancel).

These trust entries did *not* change after I encountered the issue.

I've got that debug option enabled now so hopefully I'll be able to report back soon with more details.
Comment 19 Adam Williamson 2013-06-04 16:55:28 EDT
Funnily enough something that looked like this happened to me (second time ever) this morning - FF was claiming my bank and my own domain were 'untrusted' (probably would've affected any https I opened, I guess). Re-starting FF fixed it.
Comment 20 T.C. Hollingsworth 2013-06-04 17:09:28 EDT
Yeah, this is a heisenbug if I've ever seen one.  I guess that's what we get for using physics jokes for release names.  ;-)

Before, I used to be able to reproduce it every time just by sleeping (or at least twice in a row immediately prior to filing the bug ;-), and yesterday I managed to do it within three tries, but just now I tried sleeping a dozen times and hibernating twice but no dice.  :-(

I'll keep using Firefox normally with that debug option enabled and hopefully I'll hit it eventually.  Otherwise, later tonight when I have more time I'll try booting from the F19 Alpha Live I installed from and see if I can repro it faster that way with debug output enabled.
Comment 21 T.C. Hollingsworth 2013-06-04 20:22:54 EDT
Created attachment 757014 [details]
tail of firefox output with P11_KIT_DEBUG=all

Here you go; I just hit this with https://twitter.com/.
Comment 22 Stef Walter 2013-06-05 01:33:14 EDT
(In reply to T.C. Hollingsworth from comment #21)
> Created attachment 757014 [details]
> tail of firefox output with P11_KIT_DEBUG=all
> Here you go; I just hit this with https://twitter.com/.

Unfortunately we'll need more than just the tail. I can see an error occurring, but it's occuring from the first line on. I'd like to see what's causing teh issue. Would you be able to redirect the debug output to a file as Kai was suggesting:

export P11_KIT_DEBUG=all
firefox -no-remote > log 2>&1
Comment 23 T.C. Hollingsworth 2013-06-05 02:03:49 EDT
Sorry, Kai made it seem like you wanted it cut off, because it really is gigantic. (Bugzilla even choked on it.)

Here's the full log:
Comment 24 Stef Walter 2013-06-05 04:10:14 EDT
Thanks, that is very helpful information.
Comment 25 Stef Walter 2013-06-05 04:39:18 EDT
Created attachment 757077 [details]
Test case

This is a test case which should print out this in the presence of this bug:

  reinitialization bug present
Comment 26 Stef Walter 2013-06-05 04:51:56 EDT
Patch posted upstream.
Comment 27 Fedora Update System 2013-06-05 08:46:17 EDT
p11-kit-0.18.3-1.fc19 has been submitted as an update for Fedora 19.
Comment 28 Stef Walter 2013-06-05 10:32:43 EDT
Could you test this update, and post positive karma if it fixes the issue?


It fixes other (minor) things as well, so no need to leave negative karma unless it breaks.

Thank you.
Comment 29 Fedora Update System 2013-06-05 12:52:01 EDT
Package p11-kit-0.18.3-1.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing p11-kit-0.18.3-1.fc19'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
Comment 30 Adam Williamson 2013-06-05 13:29:32 EDT
Discussed at 2013-06-05 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-05/f19final-blocker-review-3.2013-06-05-16.05.log.txt . As things stand we thought this was uncommon enough - and possibly related to suspending - that we don't need to block the release for it (plus, since there's an update in already, it's probably kind of a moot point). So for now it's rejected as a blocker but accepted as a freeze exception issue.

If this is somehow still a problem closer to release and we start worrying about the impact, we can re-propose it.
Comment 31 Stef Walter 2013-06-05 15:00:19 EDT
For those interested in testing:

 * Please run the above test case. Build command is at top of file.
   Requires gcc and p11-kit-devel packages be installed.

 * Run firefox in logging mode as above. Look for the line below:
   (p11-kit:2843) p11_kit_initialize_registered: out: 0

   The line won't show up immediately, or all the time. That line appearing
   should have similar reproducability characteristics as this bug has had.

   With the bug, all CA lookups past that point would fail. Without the bug
   they should continue as expected.

   I would be interested in such logs.

If anyone wants to dig even deeper. It would be really interesting to see a firefox backtrace for calls into the function p11_kit_initialize_registered(). In any case, this call should have worked, and was my oversight that it caused this problem.
Comment 32 Stef Walter 2013-06-06 10:09:26 EDT
Adam, the update is now in updates-pending and shortly in Fedora updates. If this update needs to make it into the RC builds, then please let me know what action to take.
Comment 33 Adam Williamson 2013-06-06 15:15:36 EDT
stef: as noted via IRC, no-one needs to take any special action at this point as we are not in freeze, you can manage the update just like usual. For the record, if you have a blocker or FE bug once we're past freeze, you as the developer still don't really need to do anything special: just fix it, and submit the fix as an update. QA and releng will take things from there and take care of making sure it gets through the freeze.
Comment 34 Stef Walter 2013-06-06 15:30:48 EDT
(In reply to Adam Williamson from comment #33)
> stef: as noted via IRC, no-one needs to take any special action at this
> point 
> ...

Ok. Good to know. Thanks.
Comment 35 Fedora Update System 2013-06-07 00:36:27 EDT
p11-kit-0.18.3-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 36 Stef Walter 2013-06-21 10:30:44 EDT
*** Bug 969463 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.