Bug 1206661
| Summary: | networking plugin fails if NetworkManager is disabled | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Bryn M. Reeves <bmr> | ||||
| Component: | sos | Assignee: | Shane Bradley <sbradley> | ||||
| Status: | CLOSED ERRATA | QA Contact: | David Kutálek <dkutalek> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.6 | CC: | agk, bmr, cww, dkutalek, fholec, gavin, jherrman, plambri, pmoravec, pportant, qe-baseos-apps, sbradley | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | sos-3.2-27.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
The networking plug-in for the sos utility previously reported an "unhandled exception" error when the NetworkManager tool was disabled. With this update, the status of the nmcli utility is properly checked before the networking plug-in processes its output, which prevents the plug-in from generating the error.
|
Story Points: | --- | ||||
| Clone Of: | 1206633 | Environment: | |||||
| Last Closed: | 2015-07-22 06:33:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1206633 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Bryn M. Reeves
2015-03-27 16:37:28 UTC
On RHEL6, the failing nmcli command behaves differently:
NetworkManager running:
# nmcli --terse --fields NAME con show
Usage: nmcli con { COMMAND | help }
COMMAND := { list | status | up | down }
list [id <id> | uuid <id> | system | user]
status
up id <id> | uuid <id> [iface <iface>] [ap <hwaddr>] [--nowait] [--timeout <timeout>]
down id <id> | uuid <id>
Error: 'con' command 'show' is not valid.
#
NetworkManager stopped:
# nmcli --terse --fields NAME con show
** (process:18769): WARNING **: get_all_cb: couldn't retrieve system settings properties: (2) The name org.freedesktop.NetworkManagerSystemSettings was not provided by any .service files.
** (process:18769): WARNING **: fetch_connections_done: error fetching system connections: (2) The name org.freedesktop.NetworkManagerSystemSettings was not provided by any .service files.
(and the command is stuck - IMHO a bug in nmcli)
Running sosreport on NetworkManager disabled, gathering this nmcli output simply timeouts. And the upstream patch relevant to this BZ is in fact irrelevant. So for end-user experience, sos (newly calling nmcli commands) shows a regression as it start to timeout. No exception, just timeout.
Clearing original needinfo but raising a new one also to Shane to let him decide if it makes sense to condition calling of nmcli command by "nmcli nm" positive output (status of NetworkManager).
As it is hard to say if we want to fix some RHEL6-specific bug of nmcli in sos, or leave the timeout regression there (until the nmcli hang is fixed?).
Completely different problems. This bugzilla is *only* about the fact that the networking plugin does not test exit status before attempting to use the results of calling nmcli; this causes sos to paste an nmcli error message into a subsequent nmcli call string which results in the backtrace in comment #0. That problem is fixed in commit d19bc04. If nmcli behaves differently on RHEL6 and that causes consequences for sos users (silent timeouts) that requires a new bug. There are two problems here (other than the original nmcli backtrace):
(1). an apparent bug in NM on RHEL6 (at least up to 0.8.1-75.el6.x86_64)
this bug causes nmcli DBus operations to hang forever.
(2). as a consequence of (1) users running the networking plugin of sos on
affected RHEL6 systems will experience a long (300s) timeout for any
affected nmcli commands.
RHEL7's NM is unaffected and immediately returns with an error:
# systemctl stop NetworkManager.service
# nmcli con show
Error: NetworkManager is not running.
I would suggest we open a new bug against RHEL6 to reduce the timeout for nmcli commands to something short (10-30s imho) - we can make this change upstream as well and this should give a reasonable compromise for users of affected NM versions (there's no need to rush it into RHEL7 at this stage since it does not ship the NM bug).
Hmm what about "Error: 'con' command 'show' is not valid." ? Should we patch sos to use relevant nmcli command on rhel-6? I'd prefer a solution that we can use upstream - i.e. detecting the NM version (either via package manager or tool output) and using the appropriate command verbs. A simpler approach would be to always collect both variants but that's pretty ugly. Created attachment 1024617 [details]
nmcli commands from latest rebased sosreport doesn't work at all (tried with NM not running)
Clearing exception & requesting blocker+ for this as the current regression breaks /all/ NetworkManager use cases for RHEL6 (both with the service disabled, and enabled and running). This needs to be fixed & tested asap. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1323.html |