Bug 1317691
Summary: | pycurl fails with CKR_DEVICE_ERROR after fork() when NSS was initialized by someone else | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin Milata <mmilata> | ||||
Component: | nss | Assignee: | Kai Engert (:kaie) (inactive account) <kengert> | ||||
Status: | CLOSED ERRATA | QA Contact: | Hubert Kario <hkario> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.2 | CC: | acarter, admiller, asegundo, gerald.prock, hkario, kdudka, kengert, leonardo.sandoval.gonzalez, rrelyea, ydu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | nss-3.21.0-16.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-04 03:56:59 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Martin Milata
2016-03-14 21:30:41 UTC
This seems be a limitation of nss-softokn. You can use the following command to work around the issue: $ export NSS_STRICT_NOFORK=DISABLED Upstream documentation of the environment variable is available here: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Reference/NSS_environment_variables There's a function in NSS: SECMOD_RestartModules() If you call it after a fork() it will reinitialize all the PKCS #11 modules. Softoken is following the current PKCS #11 standard (as other PKCS #11 modules will follow). If you are tripping over this in softoken, you will really have issues in an HSM PKCS #11 module (or a smart card). Calling SECMOD_RestartModules(PR_FALSE) is safe even if you haven't forked. It only restart modules that need a restart because of a fork(). bob FWIW, the export workout did not work as mentioned in Comment 2. (In reply to Bob Relyea from comment #3) > Calling SECMOD_RestartModules(PR_FALSE) is safe even if you haven't forked. > It only restart modules that need a restart because of a fork(). Could it be called during nss (re)initialization then? The fact that nss loads softokn during its initialization is an implementation detail of nss as I understand it. The fork can happen after NSS_Initialization (in fact usually does). (In reply to Bob Relyea from comment #6) > The fork can happen after NSS_Initialization (in fact usually does). Sure but the application then usually calls NSS_InitContext() again before it continues to use nss. I have looked at libcurl source code and the most suitable place to call SECMOD_RestartModules() seems to be after the call to NSS_InitContext(). OpenLDAP calls SECMOD_RestartModules() even before nss initialization. If it is safe to call SECMOD_RestartModules() in any case, why would it be a problem to call it from NSS_InitContext() internally? > Sure but the application then usually calls NSS_InitContext() again before it > continues to use nss.
Libraries might, but applications don't do that typically. Applications typically call NSS_Initialize once and then just use NSS. If the application had actually finalaized between forks(), then the SECMOD_RestartModules() would be unnecessary. It's only needed because someone called NSS_Initialize then forked() without calling Finalize(). In that scenario they need to call SECMOD_RestartModules() after the fork().
bob
The application is a python interpreter in our minimal example (comment #0). Importing the rpm module triggers the first initialization of nss. pycurl does not trigger the initialization of nss on import. The initialization is delayed until a protocol requiring TLS is actually requested. I believe it is a common scenario that a fork() is called between the import and the actual transfer. I have verified that calling SECMOD_RestartModules() after NSS_InitContext() in libcurl makes the minimal example work as expected. Unless it has some nasty side effects (like invalidating handles of other libraries using nss in the same process), I can install the fix in libcurl. The problem is that the patch for libcurl will not fix other applications or libraries that suffer from the same issue. Created attachment 1144900 [details]
Patch to force restart modules whenever NSS_InitContext is called.
> Unless it has some nasty side effects (like invalidating handles of other
> libraries using nss in the same process),
It will invalidate any handles that are open, but only in modules that need to be restarted after the fork(), so those handles aren't accessible anymore anyway. If the module does not need to be restarted, it won't be (thus no handles are invalidated).
I've added it now to NSS_ContextInit() (this the supplied patch).
You can still run into issues if you call any form of NSS_XXXInitXXXX() fork() and then use NSS. I've only added it to the NSS_ContextInit() case because that's what libraries normally call, and they could be in the case where the application initialized NSS then forked (so pretty much every library would have to call SECMOD_RestartModules anyway). Applications call the other forms of NSS_Initialize, which are call 'once' functions, so by definition you can't run into a case where SECMOD_RestartModules() will help. Those application still need to call SECMOD_RestartModules after they fork to get regular NSS calls working again.
bob
Elio, we should send this upstream as well. Comment on attachment 1144900 [details] Patch to force restart modules whenever NSS_InitContext is called. I can confirm the patch fixes the problem described in comment #0. Has the patch been submitted in an update to Fedora? If not, I'd like to request it. This is an issue in Fedora 23 also. Thank you, -AdamM I have tested this patch in Fedora 23 and it is still failing. Fault: <Fault 1: 'Traceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1161, in runTask\n response = (handler.run(),)\n File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 158, in run\n return koji.util.call_with_argcheck(self.handler, self.params, self.opts)\n File "/usr/lib/python2.7/site-packages/koji/util.py", line 154, in call_with_argcheck\n return func(*args, **kwargs)\n File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 373, in handler\n **create_build_args\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 37, in catch_exceptions\n return func(*args, **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 358, in create_build\n return self.create_prod_build(**kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 37, in catch_exceptions\n return func(*args, **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 305, in create_prod_build\n response = self._create_build_config_and_build(build_request)\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 202, in _create_build_config_and_build\n running_builds = self._get_running_builds_for_build_config(build_config_name)\n File "/usr/lib/python2.7/site-packages/osbs/api.py", line 174, in _get_running_builds_for_build_config\n all_builds_for_bc = self.os.list_builds(build_config_id=build_config_id).json()[\'items\']\n File "/usr/lib/python2.7/site-packages/osbs/core.py", line 363, in list_builds\n return self._get(url)\n File "/usr/lib/python2.7/site-packages/osbs/core.py", line 167, in _get\n headers, kwargs = self._request_args(with_auth, **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/core.py", line 138, in _request_args\n self.get_oauth_token()\n File "/usr/lib/python2.7/site-packages/osbs/core.py", line 185, in get_oauth_token\n username=self.username, password=self.password)\n File "/usr/lib/python2.7/site-packages/osbs/core.py", line 168, in _get\n return self._con.get(url, headers=headers, verify_ssl=self.verify_ssl, **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/http.py", line 105, in get\n return self.request(url, "get", **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/http.py", line 118, in request\n stream = HttpStream(url, *args, verbose=self.verbose, **kwargs)\n File "/usr/lib/python2.7/site-packages/osbs/http.py", line 242, in __init__\n self._perform()\n File "/usr/lib/python2.7/site-packages/osbs/http.py", line 264, in _perform\n raise OsbsNetworkException(self.url, err_obj[2], err_obj[1])\nOsbsNetworkException: (35) A PKCS #11 module returned CKR_DEVICE_ERROR, indicating that a problem has occurred with the token or slot.\n'> Adam, are you saying that the patch fixes the minimal example from comment #0 but not your own use case? Sounds like we need another minimal example that covers your use case, too... No, I didn't test the simple use case. I only tested the one I reported to Martin initially that lead to the filing of the BZ. (In reply to Adam Miller from comment #18) > No, I didn't test the simple use case. I did. The patch fixes the minimal example from comment #0. > I only tested the one I reported to > Martin initially that lead to the filing of the BZ. Then the minimal example does not model the actual scenario precisely enough. (In reply to Adam Miller from comment #18) > No, I didn't test the simple use case. I only tested the one I reported to > Martin initially that lead to the filing of the BZ. can you provide a different reproducer that does hit the issue when using nss-softokn-3.16.2.3-14.2.el7_2.x86_64? I'm still seeing this issue on Fedora 24, was the patch not pushed upstream? (In reply to Adam Miller from comment #23) > I'm still seeing this issue on Fedora 24, was the patch not pushed upstream? According to your comment #16, the patch was insufficient to resolve the issue you were facing. We asked for a better reproducer in comment #17 and comment #20, unsuccessfully so far. I don't know how to supply a reproducer other than the koji-containerbuild plugin code, I don't have an intimate enough knowledge of pycurl to provide that. I just know the issue I'm seeing in a system I'm administering. We are trying to understand the high-level scenario which triggers this bug. According to the info we have, there is some code that forks, some code that uses pycurl to speak over TLS, and probably other code that uses NSS (either for TLS, or for low-level crypto). We need someone who roughly understands what the high-level code actually does. The reproducer in comment #0 does not describe it precisely enough and no other information has been provided. (In reply to Adam Miller from comment #23) > I'm still seeing this issue on Fedora 24, was the patch not pushed upstream? Unfortunately, it was not. See https://bugzilla.mozilla.org/1263017 I pulled the patch from this BZ and applied it to Fedora 24's nss and it doesn't resolve the issue. I don't know what the implications of that are but I thought I'd report it. My build is here: https://copr.fedorainfracloud.org/coprs/maxamillion/atomic-reactor/build/454017/ We also hit the issue [https://bugzilla.yoctoproject.org/show_bug.cgi?id=10226] and verified that the patch does not solved the problem in our case. The scenario in our case can be easily simplified with this while loop run on a terminal: while true do git -c core.fsyncobjectfiles=0 ls-remote https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6 done (In reply to Leonardo Sandoval from comment #29) > while true > do > git -c core.fsyncobjectfiles=0 ls-remote > https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6 > done How is the above loop written in a shell related to pycurl? Should it somehow trigger a bug on the remote server? What runs on the server actually? Are you able to trigger the bug on the server locally? (In reply to Kamil Dudka from comment #30) > (In reply to Leonardo Sandoval from comment #29) > > while true > > do > > git -c core.fsyncobjectfiles=0 ls-remote > > https://git.yoctoproject.org/git/dbus-wait | grep -v 6cc6 > > done > > How is the above loop written in a shell related to pycurl? There is no relation, I believe. The relation I see is between git & curl, but I cannot confirm this. > Should it somehow trigger a bug on the remote server? No. The error is seen on host, not on server. You can give a try on your host and after some seconds (sometimes it takes a few minutes), you should see the error message reported originally. >What runs on the server actually? Nothing. This script is executed on host, not on server. Our issue has been resolved, one of the devs on our team with more familiarity around nss realized that both the rpm python and pycurl modules were trying to mess with nss initialization which was causing the issue that we were seeing. Thank you, -AdamM Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2335.html |