Bug 1717330

Summary: leapp failed with KeyError in /usr/lib64/python2.7/multiprocessing/managers.py when NetworkManager is not installed on RHEL7
Product: Red Hat Enterprise Linux 7 Reporter: Masahiro Matsuya <mmatsuya>
Component: leappAssignee: Vojtech Sokol <vsokol>
Status: CLOSED CURRENTRELEASE QA Contact: Alois Mahdal <amahdal>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: aromito, devin, jaeshin, mbocek, pstodulk
Target Milestone: rcKeywords: Extras, Patch, Upgrades
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: leapp-0.8.0-1.el7_6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 17:03:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed patch to exit properly of the chlid process with OSError from os.execvpe() none

Description Masahiro Matsuya 2019-06-05 08:41:29 UTC
Description of problem:
On the RHEL7 server without NetworkManager installed, the leapp upgrade command failed with:

Traceback (most recent call last):
  File "/bin/leapp", line 9, in <module>
    load_entry_point('leapp==0.7.0', 'console_scripts', 'leapp')()
  File "/usr/lib/python2.7/site-packages/leapp/cli/__init__.py", line 30, in main
    cli.command.execute('leapp version {}'.format(VERSION))
  File "/usr/lib/python2.7/site-packages/leapp/utils/clicmd.py", line 90, in execute
    args.func(args)
  File "/usr/lib/python2.7/site-packages/leapp/utils/clicmd.py", line 112, in called
    self.target(args)
  File "/usr/lib/python2.7/site-packages/leapp/cli/upgrade/__init__.py", line 170, in upgrade
    workflow.run(context=context, skip_phases_until=skip_phases_until)
  File "/usr/lib/python2.7/site-packages/leapp/workflows/__init__.py", line 210, in run
    if messaging.errors():
  File "/usr/lib/python2.7/site-packages/leapp/messaging/__init__.py", line 55, in errors
    return list(self._errors)
  File "<string>", line 2, in __len__
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 242, in serve_client
    obj, exposed, gettypeid = id_to_obj[ident]
KeyError: '7f7dcb372e18'


Also, the following error was output into /var/log/leapp/leapp-upgrade.log

2019-05-30 13:15:44.330 DEBUG    PID: 20165 leapp.workflow.FactsCollection.network_manager_read_config: External command is started: [NetworkManager --print-config]
2019-05-30 13:15:44.356 DEBUG    PID: 20169 leapp.workflow.FactsCollection.network_manager_read_config: External command is finished: [NetworkManager --print-config]
2019-05-30 13:15:44.363 WARNING  PID: 20169 leapp.workflow.FactsCollection.network_manager_read_config: Error reading NetworkManager configuration: [Errno 2] No such file or directory
2019-05-30 13:15:44.375 DEBUG    PID: 20165 leapp.workflow.FactsCollection.network_manager_read_config: External command is finished: [NetworkManager --print-config]

I was wondering why the message "External command is finished .." was output doubly with different PID. That looks that both the child process to run the NetworkManager command and the parent process output the same message.

In this RHEL7 server, NetworkManager binary is not available, because NetworkManager package is not installed.
The following os.execvpe() fails with OSError in that case, and the child process kept running by return to the caller function without exiting soon. As a result, the above "External command is finished .." message was output even by the child process, but it's not expected.

In /usr/lib/python2.7/site-packages/leapp/libraries/stdlib/call.py

def _call(command, callback_raw=lambda fd, value: None, callback_linebuffered=lambda fd, value: None,
          encoding='utf-8', poll_timeout=1, read_buffer_size=80, stdin=None, env=None):
...
    pid = os.fork()
    if pid > 0:
...
    else:
...
        os.execvpe(command[0], command, env=environ)

I created a patch for that, and confirmed that the KeyError problem was resolved.
I will attach the proposed patch on this bugzilla.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux 7
Extras

How reproducible:
Always

Steps to Reproduce:
1. remove NetworkManager package from RHEL7
2. follow https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/upgrading_to_rhel_8/index
   and run leapp upgrade


Actual results:
leapp upgrade command fails with KeyError in /usr/lib64/python2.7/multiprocessing/managers.py.

Expected results:
leapp upgrade command doesn't fail with KeyError in /usr/lib64/python2.7/multiprocessing/managers.py.

Comment 1 Masahiro Matsuya 2019-06-05 08:52:56 UTC
Created attachment 1577467 [details]
proposed patch to exit properly of the chlid process with OSError from os.execvpe()

Comment 2 Michal Bocek 2019-06-05 09:46:07 UTC
Thanks for reporting it. Vojtech Sokol is working on a fix: https://github.com/oamg/leapp/pull/480.

Comment 4 Vojtech Sokol 2019-07-03 13:53:55 UTC
The problem was not in the leapp but in the multiprocessing library of python2.7 that is present in el7_6. The bug was fixed in newer releases of python2.7 since then (https://github.com/python/cpython/commit/e8a57b98ec8f2b161d4ad68ecc1433c9e3caad57), but the fix was not backported to the el7_6 package. Workaround was implemented in leapp: https://github.com/oamg/leapp/pull/533 and we will further investigate whether it is possible to backport the fix for multiprocessing into python2.7 in el7_6.