Bug 1206661 - networking plugin fails if NetworkManager is disabled
Summary: networking plugin fails if NetworkManager is disabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sos
Version: 6.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Shane Bradley
QA Contact: David Kutálek
URL:
Whiteboard:
Depends On: 1206633
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-27 16:37 UTC by Bryn M. Reeves
Modified: 2015-07-22 06:33 UTC (History)
12 users (show)

Fixed In Version: sos-3.2-27.el6
Doc Type: Bug Fix
Doc Text:
The networking plug-in for the sos utility previously reported an "unhandled exception" error when the NetworkManager tool was disabled. With this update, the status of the nmcli utility is properly checked before the networking plug-in processes its output, which prevents the plug-in from generating the error.
Clone Of: 1206633
Environment:
Last Closed: 2015-07-22 06:33:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
nmcli commands from latest rebased sosreport doesn't work at all (tried with NM not running) (2.28 KB, text/plain)
2015-05-12 14:30 UTC, David Kutálek
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1323 normal SHIPPED_LIVE sos bug fix and enhancement update 2015-07-20 17:53:12 UTC
Red Hat Bugzilla 1213327 None None None Never

Internal Links: 1213327

Description Bryn M. Reeves 2015-03-27 16:37:28 UTC
+++ This bug was initially created as a clone of Bug #1206633 +++

See https://github.com/sosreport/sos/issues/432

On RHEL 7.1 we see:

   3092 networking
   3093 Traceback (most recent call last):
   3094   File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1163, in collect
   3095     plug.collect()
   3096   File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 629, in collect
   3097     self._collect_cmd_output()
   3098   File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 609, in _collect_cmd_output
   3099     timeout=timeout, runat=runat)
   3100   File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 554, in get_cmd_output_now
   3101     result = self.get_command_output(exe, timeout=timeout, runat=runat)
   3102   File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 464, in get_command_output
   3103     result = sos_get_command_output(prog, timeout=timeout, runat=runat)
   3104   File "/usr/lib/python2.7/site-packages/sos/utilities.py", line 144, in sos_get_command_output
   3105     args = shlex.split(command)
   3106   File "/usr/lib64/python2.7/shlex.py", line 279, in split
   3107     return list(lex)
   3108   File "/usr/lib64/python2.7/shlex.py", line 269, in next
   3109     token = self.get_token()
   3110   File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
   3111     raw = self.read_token()
   3112   File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
   3113     raise ValueError, "No closing quotation"
   3114 ValueError: No closing quotation

Additional info:
Trivial backport of https://github.com/sosreport/sos/commit/d19bc046d549aaf634314a257dd22623df731648

Comment 5 Pavel Moravec 2015-04-20 10:42:46 UTC
On RHEL6, the failing nmcli command behaves differently:

NetworkManager running:
# nmcli --terse --fields NAME con show
Usage: nmcli con { COMMAND | help }
  COMMAND := { list | status | up | down }

  list [id <id> | uuid <id> | system | user]
  status
  up id <id> | uuid <id> [iface <iface>] [ap <hwaddr>] [--nowait] [--timeout <timeout>]
  down id <id> | uuid <id>
Error: 'con' command 'show' is not valid.
#

NetworkManager stopped:
# nmcli --terse --fields NAME con show

** (process:18769): WARNING **: get_all_cb: couldn't retrieve system settings properties: (2) The name org.freedesktop.NetworkManagerSystemSettings was not provided by any .service files.

** (process:18769): WARNING **: fetch_connections_done: error fetching system connections: (2) The name org.freedesktop.NetworkManagerSystemSettings was not provided by any .service files.


(and the command is stuck - IMHO a bug in nmcli)


Running sosreport on NetworkManager disabled, gathering this nmcli output simply timeouts. And the upstream patch relevant to this BZ is in fact irrelevant. So for end-user experience, sos (newly calling nmcli commands) shows a regression as it start to timeout. No exception, just timeout.

Clearing original needinfo but raising a new one also to Shane to let him decide if it makes sense to condition calling of nmcli command by "nmcli nm" positive output (status of NetworkManager).

As it is hard to say if we want to fix some RHEL6-specific bug of nmcli in sos, or leave the timeout regression there (until the nmcli hang is fixed?).

Comment 7 Bryn M. Reeves 2015-04-20 15:29:05 UTC
Completely different problems. This bugzilla is *only* about the fact that the networking plugin does not test exit status before attempting to use the results of calling nmcli; this causes sos to paste an nmcli error message into a subsequent nmcli call string which results in the backtrace in comment #0. That problem is fixed in commit d19bc04.

If nmcli behaves differently on RHEL6 and that causes consequences for sos users (silent timeouts) that requires a new bug.

Comment 8 Bryn M. Reeves 2015-04-20 16:03:13 UTC
There are two problems here (other than the original nmcli backtrace):

(1). an apparent bug in NM on RHEL6 (at least up to 0.8.1-75.el6.x86_64)
     this bug causes nmcli DBus operations to hang forever.

(2). as a consequence of (1) users running the networking plugin of sos on
     affected RHEL6 systems will experience a long (300s) timeout for any
     affected nmcli commands.

RHEL7's NM is unaffected and immediately returns with an error:

# systemctl stop NetworkManager.service
# nmcli con show
Error: NetworkManager is not running.

I would suggest we open a new bug against RHEL6 to reduce the timeout for nmcli commands to something short (10-30s imho) - we can make this change upstream as well and this should give a reasonable compromise for users of affected NM versions (there's no need to rush it into RHEL7 at this stage since it does not ship the NM bug).

Comment 10 David Kutálek 2015-04-21 13:38:57 UTC
Hmm what about "Error: 'con' command 'show' is not valid." ?
Should we patch sos to use relevant nmcli command on rhel-6?

Comment 11 Bryn M. Reeves 2015-04-22 13:38:51 UTC
I'd prefer a solution that we can use upstream - i.e. detecting the NM version (either via package manager or tool output) and using the appropriate command verbs.

A simpler approach would be to always collect both variants but that's pretty ugly.

Comment 13 David Kutálek 2015-05-12 14:30:46 UTC
Created attachment 1024617 [details]
nmcli commands from latest rebased sosreport doesn't work at all (tried with NM not running)

Comment 27 Bryn M. Reeves 2015-06-18 13:30:10 UTC
Clearing exception & requesting blocker+ for this as the current regression breaks /all/ NetworkManager use cases for RHEL6 (both with the service disabled, and enabled and running).

This needs to be fixed & tested asap.

Comment 30 errata-xmlrpc 2015-07-22 06:33:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1323.html


Note You need to log in before you can comment on or make changes to this bug.