Bug 1317691 - pycurl fails with CKR_DEVICE_ERROR after fork() when NSS was initialized by someone else
Summary: pycurl fails with CKR_DEVICE_ERROR after fork() when NSS was initialized by s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nss
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kai Engert (:kaie) (inactive account)
QA Contact: Hubert Kario
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-14 21:30 UTC by Martin Milata
Modified: 2020-01-02 16:50 UTC (History)
10 users (show)

Fixed In Version: nss-3.21.0-16.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 03:56:59 UTC
Target Upstream Version:


Attachments (Terms of Use)
Patch to force restart modules whenever NSS_InitContext is called. (1.25 KB, patch)
2016-04-07 21:19 UTC, Bob Relyea
kdudka: review+
Details | Diff


Links
System ID Priority Status Summary Last Updated
Mozilla Foundation 1263017 -- RESOLVED pycurl fails with CKR_DEVICE_ERROR after fork() when NSS was initialized by someone else 2020-01-02 16:49:48 UTC
Red Hat Bugzilla 800304 None CLOSED NSS module reload doesn't succeed in apache prefork process 2019-10-28 08:55:37 UTC
Red Hat Product Errata RHBA-2016:2335 normal SHIPPED_LIVE nss bug fix update 2016-11-03 13:45:11 UTC

Internal Links: 800304

Description Martin Milata 2016-03-14 21:30:41 UTC
Description of problem:

It appears to be impossible to use pycurl in a child process when NSS was previously initialized by something else than (py)curl. Following error is raised:

  A PKCS #11 module returned CKR_DEVICE_ERROR, indicating that a problem has occurred with the token or slot.

We ran into this with koji and koji-containerbuild plugin, provided example executes the same sequence of events. Inserting call to pycurl.global_cleanup() or pycurl.global_init() doesn't seem to help. In this case NSS_InitContext is called when the rpm module is loaded, I haven't tried initializing it any other way so it is possible that this is in fact bug in RPM. Note that this problem is also reproducible on Fedora 23.


Version-Release number of selected component (if applicable):

libcurl-7.29.0-25.el7.x86_64
nss-3.19.1-19.el7_2.x86_64
nss-softokn-3.16.2.3-13.el7_1.x86_64
python-pycurl-7.19.0-17.el7.x86_64
rpm-4.11.3-17.el7.x86_64
rpm-python-4.11.3-17.el7.x86_64


How reproducible: Always.

Steps to Reproduce:

$ cat > reproducer.py
#!/usr/bin/python

import rpm
import pycurl

import os
import sys

def curl_get():
    c = pycurl.Curl()
    c.setopt(c.URL, 'https://google.com/')
    c.setopt(c.WRITEFUNCTION, sys.stdout.write)
    c.perform()
    c.close()

if os.fork():
    print 'parent: waiting'
    os.wait()
    print 'parent: done'
else:
    print 'child: start'
    curl_get()
    print 'child: done'
^D
$ python reproducer.py


Actual results:

parent: waiting
child: start
Traceback (most recent call last):
  File "reproducer.py", line 22, in <module>
    curl_get()
  File "reproducer.py", line 13, in curl_get
    c.perform()
pycurl.error: (35, 'A PKCS #11 module returned CKR_DEVICE_ERROR, indicating that a problem has occurred with the token or slot.')
parent: done


Expected results:

parent: waiting
child: start
(page content)
child: done
parent: done

Comment 2 Kamil Dudka 2016-03-15 12:13:01 UTC
This seems be a limitation of nss-softokn.

You can use the following command to work around the issue:

$ export NSS_STRICT_NOFORK=DISABLED

Upstream documentation of the environment variable is available here:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Reference/NSS_environment_variables

Comment 3 Bob Relyea 2016-03-15 20:21:36 UTC
There's a function in NSS:

SECMOD_RestartModules()

If you call it after a fork() it will reinitialize all the PKCS #11 modules.

Softoken is following the current PKCS #11 standard (as other PKCS #11 modules will follow). If you are tripping over this in softoken, you will really have issues in an HSM PKCS #11 module (or a smart card).

Calling SECMOD_RestartModules(PR_FALSE) is safe even if you haven't forked. It only restart modules that need a restart because of a fork().

bob

Comment 4 Adam Miller 2016-03-21 18:36:37 UTC
FWIW, the export workout did not work as mentioned in Comment 2.

Comment 5 Kamil Dudka 2016-03-29 11:00:50 UTC
(In reply to Bob Relyea from comment #3)
> Calling SECMOD_RestartModules(PR_FALSE) is safe even if you haven't forked.
> It only restart modules that need a restart because of a fork().

Could it be called during nss (re)initialization then?

The fact that nss loads softokn during its initialization is an implementation detail of nss as I understand it.

Comment 6 Bob Relyea 2016-03-29 17:07:10 UTC
The fork can happen after NSS_Initialization (in fact usually does).

Comment 7 Kamil Dudka 2016-03-30 10:50:33 UTC
(In reply to Bob Relyea from comment #6)
> The fork can happen after NSS_Initialization (in fact usually does).

Sure but the application then usually calls NSS_InitContext() again before it continues to use nss.  I have looked at libcurl source code and the most  suitable place to call SECMOD_RestartModules() seems to be after the call to NSS_InitContext().  OpenLDAP calls SECMOD_RestartModules() even before nss initialization.

If it is safe to call SECMOD_RestartModules() in any case, why would it be a problem to call it from NSS_InitContext() internally?

Comment 8 Bob Relyea 2016-03-30 16:18:04 UTC
> Sure but the application then usually calls NSS_InitContext() again before it > continues to use nss.

Libraries might, but applications don't do that typically. Applications typically call NSS_Initialize once and then just use NSS. If the application had actually finalaized between forks(), then the SECMOD_RestartModules() would be unnecessary. It's only needed because someone called NSS_Initialize then forked() without calling Finalize(). In that scenario they need to call SECMOD_RestartModules() after the fork().

bob

Comment 9 Kamil Dudka 2016-03-30 17:04:32 UTC
The application is a python interpreter in our minimal example (comment #0).  Importing the rpm module triggers the first initialization of nss.  pycurl does not trigger the initialization of nss on import.  The initialization is delayed until a protocol requiring TLS is actually requested.  I believe it is a common scenario that a fork() is called between the import and the actual transfer.

I have verified that calling SECMOD_RestartModules() after NSS_InitContext() in libcurl makes the minimal example work as expected.  Unless it has some nasty side effects (like invalidating handles of other libraries using nss in the same process), I can install the fix in libcurl.

The problem is that the patch for libcurl will not fix other applications or libraries that suffer from the same issue.

Comment 10 Bob Relyea 2016-04-07 21:19:16 UTC
Created attachment 1144900 [details]
Patch to force restart modules whenever NSS_InitContext is called.

Comment 11 Bob Relyea 2016-04-07 21:29:26 UTC
> Unless it has some nasty side effects (like invalidating handles of other
> libraries using nss in the same process),

It will invalidate any handles that are open, but only in modules that need to be restarted after the fork(), so those handles aren't accessible anymore anyway. If the module does not need to be restarted, it won't be (thus no handles are invalidated).

I've added it now to NSS_ContextInit() (this the supplied patch).

You can still run into issues if you call any form of NSS_XXXInitXXXX() fork() and then use NSS. I've only added it to the NSS_ContextInit() case because that's what libraries normally call, and they could be in the case where the application initialized NSS then forked (so pretty much every library would have to call SECMOD_RestartModules anyway). Applications call the other forms of NSS_Initialize, which are call 'once' functions, so by definition you can't run into a case where SECMOD_RestartModules() will help. Those application still need to call SECMOD_RestartModules after they fork to get regular NSS calls working again.

bob

Comment 12 Bob Relyea 2016-04-07 21:30:09 UTC
Elio, we should send this upstream as well.

Comment 13 Kamil Dudka 2016-04-12 10:23:54 UTC
Comment on attachment 1144900 [details]
Patch to force restart modules whenever NSS_InitContext is called.

I can confirm the patch fixes the problem described in comment #0.

Comment 15 Adam Miller 2016-04-19 18:01:06 UTC
Has the patch been submitted in an update to Fedora? If not, I'd like to request it. This is an issue in Fedora 23 also.

Thank you,
-AdamM

Comment 16 Adam Miller 2016-04-25 15:59:38 UTC
I have tested this patch in Fedora 23 and it is still failing.

Fault: <Fault 1: 'Traceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1161, in runTask\n    response = (handler.run(),)\n  File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 158, in run\n    return koji.util.call_with_argcheck(self.handler, self.params, self.opts)\n  File "/usr/lib/python2.7/site-packages/koji/util.py", line 154, in call_with_argcheck\n    return func(*args, **kwargs)\n  File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 373, in handler\n    **create_build_args\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 37, in catch_exceptions\n    return func(*args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 358, in create_build\n    return self.create_prod_build(**kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 37, in catch_exceptions\n    return func(*args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 305, in create_prod_build\n    response = self._create_build_config_and_build(build_request)\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 202, in _create_build_config_and_build\n    running_builds = self._get_running_builds_for_build_config(build_config_name)\n  File "/usr/lib/python2.7/site-packages/osbs/api.py", line 174, in _get_running_builds_for_build_config\n    all_builds_for_bc = self.os.list_builds(build_config_id=build_config_id).json()[\'items\']\n  File "/usr/lib/python2.7/site-packages/osbs/core.py", line 363, in list_builds\n    return self._get(url)\n  File "/usr/lib/python2.7/site-packages/osbs/core.py", line 167, in _get\n    headers, kwargs = self._request_args(with_auth, **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/core.py", line 138, in _request_args\n    self.get_oauth_token()\n  File "/usr/lib/python2.7/site-packages/osbs/core.py", line 185, in get_oauth_token\n    username=self.username, password=self.password)\n  File "/usr/lib/python2.7/site-packages/osbs/core.py", line 168, in _get\n    return self._con.get(url, headers=headers, verify_ssl=self.verify_ssl, **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/http.py", line 105, in get\n    return self.request(url, "get", **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/http.py", line 118, in request\n    stream = HttpStream(url, *args, verbose=self.verbose, **kwargs)\n  File "/usr/lib/python2.7/site-packages/osbs/http.py", line 242, in __init__\n    self._perform()\n  File "/usr/lib/python2.7/site-packages/osbs/http.py", line 264, in _perform\n    raise OsbsNetworkException(self.url, err_obj[2], err_obj[1])\nOsbsNetworkException: (35) A PKCS #11 module returned CKR_DEVICE_ERROR, indicating that a problem has occurred with the token or slot.\n'>

Comment 17 Kamil Dudka 2016-04-25 18:25:35 UTC
Adam, are you saying that the patch fixes the minimal example from comment #0 but not your own use case?

Sounds like we need another minimal example that covers your use case, too...

Comment 18 Adam Miller 2016-05-04 18:59:17 UTC
No, I didn't test the simple use case. I only tested the one I reported to Martin initially that lead to the filing of the BZ.

Comment 19 Kamil Dudka 2016-05-04 20:46:27 UTC
(In reply to Adam Miller from comment #18)
> No, I didn't test the simple use case.

I did.  The patch fixes the minimal example from comment #0.

> I only tested the one I reported to
> Martin initially that lead to the filing of the BZ.

Then the minimal example does not model the actual scenario precisely enough.

Comment 20 Hubert Kario 2016-06-22 15:08:09 UTC
(In reply to Adam Miller from comment #18)
> No, I didn't test the simple use case. I only tested the one I reported to
> Martin initially that lead to the filing of the BZ.

can you provide a different reproducer that does hit the issue when using nss-softokn-3.16.2.3-14.2.el7_2.x86_64?

Comment 23 Adam Miller 2016-09-08 14:02:05 UTC
I'm still seeing this issue on Fedora 24, was the patch not pushed upstream?

Comment 24 Kamil Dudka 2016-09-08 14:08:23 UTC
(In reply to Adam Miller from comment #23)
> I'm still seeing this issue on Fedora 24, was the patch not pushed upstream?

According to your comment #16, the patch was insufficient to resolve the issue you were facing.  We asked for a better reproducer in comment #17 and comment #20, unsuccessfully so far.

Comment 25 Adam Miller 2016-09-08 15:28:52 UTC
I don't know how to supply a reproducer other than the koji-containerbuild plugin code, I don't have an intimate enough knowledge of pycurl to provide that. I just know the issue I'm seeing in a system I'm administering.

Comment 26 Kamil Dudka 2016-09-08 15:48:37 UTC
We are trying to understand the high-level scenario which triggers this bug.

According to the info we have, there is some code that forks, some code that uses pycurl to speak over TLS, and probably other code that uses NSS (either for TLS, or for low-level crypto).

We need someone who roughly understands what the high-level code actually does.  The reproducer in comment #0 does not describe it precisely enough and no other information has been provided.

Comment 27 Kamil Dudka 2016-09-08 15:59:56 UTC
(In reply to Adam Miller from comment #23)
> I'm still seeing this issue on Fedora 24, was the patch not pushed upstream?

Unfortunately, it was not.  See https://bugzilla.mozilla.org/1263017

Comment 28 Adam Miller 2016-09-16 19:24:50 UTC
I pulled the patch from this BZ and applied it to Fedora 24's nss and it doesn't resolve the issue. I don't know what the implications of that are but I thought I'd report it.

My build is here: https://copr.fedorainfracloud.org/coprs/maxamillion/atomic-reactor/build/454017/

Comment 29 Leonardo Sandoval 2016-09-20 16:10:48 UTC
We also hit the issue [https://bugzilla.yoctoproject.org/show_bug.cgi?id=10226] and verified that the patch does not solved the problem in our case. The scenario in our case can be easily simplified with this while loop run on a terminal:

while true
do
   git -c core.fsyncobjectfiles=0 ls-remote https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6
done

Comment 30 Kamil Dudka 2016-09-20 16:44:53 UTC
(In reply to Leonardo Sandoval from comment #29)
> while true
> do
>    git -c core.fsyncobjectfiles=0 ls-remote
> https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6
> done

How is the above loop written in a shell related to pycurl?  Should it somehow trigger a bug on the remote server?  What runs on the server actually?  Are you able to trigger the bug on the server locally?

Comment 31 Leonardo Sandoval 2016-09-20 17:37:31 UTC
(In reply to Kamil Dudka from comment #30)
> (In reply to Leonardo Sandoval from comment #29)
> > while true
> > do
> >    git -c core.fsyncobjectfiles=0 ls-remote
> > https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6
> > done
> 
> How is the above loop written in a shell related to pycurl?

There is no relation, I believe. The relation I see is between git & curl, but I cannot confirm this.


>  Should it somehow trigger a bug on the remote server?  

No. The error is seen on host, not on server. You can give a try on your host and after some seconds (sometimes it takes a few minutes), you should see the error message reported originally.

>What runs on the server  actually? 

Nothing. This script is executed on host, not on server.

Comment 32 Adam Miller 2016-09-21 13:47:16 UTC
Our issue has been resolved, one of the devs on our team with more familiarity around nss realized that both the rpm python and pycurl modules were trying to mess with nss initialization which was causing the issue that we were seeing.

Thank you,
-AdamM

Comment 34 errata-xmlrpc 2016-11-04 03:56:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2335.html


Note You need to log in before you can comment on or make changes to this bug.