Red Hat Bugzilla – Bug 960230
nss forgets about CAs intermittently (frequently after suspend)
Last modified: 2013-06-21 10:30:44 EDT
Description of problem:
If I suspend my system while Firefox or Chrome are running, after I resume and try to visit a HTTPS encryped site, I get a warning that the CA isn't trusted. Closing the browser and restarting fixes the issue.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Open Firefox or Chrome
2. Suspend the system
3. Attempt to visit a HTTPS encryped site.
Get a warning from the browser that the CA isn't trusted.
Site loads normally.
I experience this with both Mozilla Firefox from the Fedora repo and Google Chrome from Google's repo.
If it helps I get this randomly happening every couple of days without suspend. It also affects Thunderbird.
It happens to me randomly sometimes too, but suspending reproduces it every time.
I'm proposing as a Fedora 19 Final Blocker somewhat based on criteria 17:
"All applications listed under the Applications menu or category must withstand a basic functionality test and not crash after a few minutes of normal use. They must also have working Help and Help -> About menu items"
It's not exactly crashing, but having Firefox start spewing security warnings on legitimate sites randomly on our live images isn't very nice. This bug keeps growing CCs so I'd like it to get a little more attention.
Discussed at 2013-06-03 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-03/f19final-blocker-review-2.2013-06-03-16.00.log.txt .
This doesn't appear to be affecting all cases - fr'instance, I have two F19 systems which I suspend/resume all the time and I've only seen something which _may_ have been this bug one time on one of the machines (never on the other).
It's also unclear that suspending/resuming live images is a case we care about a lot (suspending/resuming lives is kinda dicey in general) and so the 'blocker' justification for this is unclear.
But we decided to punt on the decision until we get some feedback from the nss maintainer on what they think may be causing this and how big of a problem it is.
> But we decided to punt on the decision until we get some feedback from the
> nss maintainer on what they think may be causing this and how big of a
> problem it is.
I still haven't been able to reproduce.
Your hint that, for you, it happens with a live image, only, is helpful. I'll test that.
However, we also have similar bug report 969463, which doesn't talk suspending - so the issue might be more general.
Okay, so suspending *used* to reproduce it every time but it seems with more recent updates it doesn't anymore. (That being said, I only had to put my computer to sleep 3 times to get it to trigger. ;-) I've also noticed it happening slightly less often randomly.
Modifying the summary since multiple users have reported that this affects them randomly, not just on suspend. bug 969463 might well be a dupe, though it seems to happen to Dan constantly instead of just randomly like with us.
This problem does *not* solely affect live images, I only mentioned that this would be a nasty bug to have permanently affect them for an entire release, given that Firefox is a thing a lot of people use in them quite frequently.
Oh, and it was really bad (and happened to me on suspend all the time) with the Alpha package set, so you might want to start there if you're having trouble reproducing with more up-to-date F19. (It seems it has gotten better but hasn't gone away...)
I've been running F19 since well before Alpha release and, as I said, have seen something that could _possibly_ have been this (it saw the certificate on my own domain, happyassassin.net, as invalid until I restarted firefox) only once in that whole span.
Hmmm...do you have Chrome installed? What about you Andrew/other CCers?
I wonder if that screwed up nss somehow; it's the only nonstandard piece of the puzzle on my system.
Wish I could get rid of it, I use Firefox mostly but need it around for webdevvy testing. Guess I could get along with just using it in a Windows VM, but ugh. :-(
Nope, no Chrome here, I'm a strict Mozilla fanboy :)
Yep, I have Chrome and it happens in that, but it happened before I had Chrome installed.
I did have Firefox/Thunderbird and it affected both. I now primarily use Chrome and Evolution and this problem affects both of them too.
Do you use a network filesystem?
Is /tmp mounted from an unusual place?
I really wish I could reproduce, but I can't, so I'll have to ask more questions.
Given that you can fix your issue by restarting the application, I suspect a runtime error in the p11-kit-trust.so library. This is just a wild guess at this time.
As a first step, let's attempt to verify, that an application in a broken state has indeed lost its trust list.
Let's use firefox first. While it works, please go to
edit/preferences/advanced/encryption, click "security devices"
On the left hand side, you should see two entries "System Trust" and "Default Trust".
Close the dialog, click "view certificates", select the "Authorities" tab.
You should see many entries that say "Default Trust" in the second column.
One after the other, click a few of them, and click the "edit trust" button on each. DON'T HIT OK. Just verify that most of them have checkboxes enabled, and hit cancel.
Close the cert manager.
The above was for your education, and to see the expected state.
Once you get into the broken state, please reopen the "security devices" dialog.
Do you still have the default trust and system trust entries?
If you click those entries, does the right hand side still show some version information?
Now again open the certificate manager, authorities tab. Do you still have a lot of "default trust" entries? If you do, and you click edit trust, do they still have the checkboxes checked? (don't click ok, click cancel).
I'm asking these questions, in the hope that you'll tell me that things look differently, once firefox has forgotten about CAs.
Could you start the affected applications from a terminal, set an environment variable, and redirect output to a file? Stef, who created the new p11-kit-trust module, had suggested to set P11_KIT_DEBUG=all - however, that produces a large amount of output.
firefox -no-remote > log 2>&1
Once it fails (and after you made above tests that I asked for), quit firefox, and investigate the file. Hopefully the sections at the end will give us a clue.
(In reply to Kai Engert (:kaie) from comment #12)
> Do you use a network filesystem?
> Is /tmp mounted from an unusual place?
The only "unusual" thing I have done is `systemctl mask tmp.mount` so /tmp is on disk instead of on tmpfs.
I'll try out the other stuff you mention later, unfortunately a fire just landed on my lap so I can't do it right now. :-(
No network file system, tmpfs on /tmp type tmpfs (rw). Main disk is an SSD using ext4.
I'll run the debugging today and hopefully it will retrigger in the browser.
I think I have encountered a similar situation, which reinstalling ca-certificates and restarting firefox worked around the issue for Firefox. Not sure how to work around the same issue with Chromium however.
I just hit this again. Unfortunately, I hadn't had the debug ouput enabled, but I did check this:
(In reply to Kai Engert (:kaie) from comment #13)
> Now again open the certificate manager, authorities tab. Do you still have a
> lot of "default trust" entries? If you do, and you click edit trust, do they
> still have the checkboxes checked? (don't click ok, click cancel).
These trust entries did *not* change after I encountered the issue.
I've got that debug option enabled now so hopefully I'll be able to report back soon with more details.
Funnily enough something that looked like this happened to me (second time ever) this morning - FF was claiming my bank and my own domain were 'untrusted' (probably would've affected any https I opened, I guess). Re-starting FF fixed it.
Yeah, this is a heisenbug if I've ever seen one. I guess that's what we get for using physics jokes for release names. ;-)
Before, I used to be able to reproduce it every time just by sleeping (or at least twice in a row immediately prior to filing the bug ;-), and yesterday I managed to do it within three tries, but just now I tried sleeping a dozen times and hibernating twice but no dice. :-(
I'll keep using Firefox normally with that debug option enabled and hopefully I'll hit it eventually. Otherwise, later tonight when I have more time I'll try booting from the F19 Alpha Live I installed from and see if I can repro it faster that way with debug output enabled.
Created attachment 757014 [details]
tail of firefox output with P11_KIT_DEBUG=all
Here you go; I just hit this with https://twitter.com/.
(In reply to T.C. Hollingsworth from comment #21)
> Created attachment 757014 [details]
> tail of firefox output with P11_KIT_DEBUG=all
> Here you go; I just hit this with https://twitter.com/.
Unfortunately we'll need more than just the tail. I can see an error occurring, but it's occuring from the first line on. I'd like to see what's causing teh issue. Would you be able to redirect the debug output to a file as Kai was suggesting:
firefox -no-remote > log 2>&1
Sorry, Kai made it seem like you wanted it cut off, because it really is gigantic. (Bugzilla even choked on it.)
Here's the full log:
Thanks, that is very helpful information.
Created attachment 757077 [details]
This is a test case which should print out this in the presence of this bug:
reinitialization bug present
Patch posted upstream.
p11-kit-0.18.3-1.fc19 has been submitted as an update for Fedora 19.
Could you test this update, and post positive karma if it fixes the issue?
It fixes other (minor) things as well, so no need to leave negative karma unless it breaks.
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing p11-kit-0.18.3-1.fc19'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
Discussed at 2013-06-05 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-06-05/f19final-blocker-review-3.2013-06-05-16.05.log.txt . As things stand we thought this was uncommon enough - and possibly related to suspending - that we don't need to block the release for it (plus, since there's an update in already, it's probably kind of a moot point). So for now it's rejected as a blocker but accepted as a freeze exception issue.
If this is somehow still a problem closer to release and we start worrying about the impact, we can re-propose it.
For those interested in testing:
* Please run the above test case. Build command is at top of file.
Requires gcc and p11-kit-devel packages be installed.
* Run firefox in logging mode as above. Look for the line below:
(p11-kit:2843) p11_kit_initialize_registered: out: 0
The line won't show up immediately, or all the time. That line appearing
should have similar reproducability characteristics as this bug has had.
With the bug, all CA lookups past that point would fail. Without the bug
they should continue as expected.
I would be interested in such logs.
If anyone wants to dig even deeper. It would be really interesting to see a firefox backtrace for calls into the function p11_kit_initialize_registered(). In any case, this call should have worked, and was my oversight that it caused this problem.
Adam, the update is now in updates-pending and shortly in Fedora updates. If this update needs to make it into the RC builds, then please let me know what action to take.
stef: as noted via IRC, no-one needs to take any special action at this point as we are not in freeze, you can manage the update just like usual. For the record, if you have a blocker or FE bug once we're past freeze, you as the developer still don't really need to do anything special: just fix it, and submit the fix as an update. QA and releng will take things from there and take care of making sure it gets through the freeze.
(In reply to Adam Williamson from comment #33)
> stef: as noted via IRC, no-one needs to take any special action at this
Ok. Good to know. Thanks.
p11-kit-0.18.3-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 969463 has been marked as a duplicate of this bug. ***