Bug 1651599 - timeout in networking plugin when large number of namespaces in nodes
Summary: timeout in networking plugin when large number of namespaces in nodes
Keywords:
Status: CLOSED DUPLICATE of bug 1635214
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: pre-dev-freeze
: ---
Assignee: Pavel Moravec
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-20 12:19 UTC by anil venkata
Modified: 2019-01-20 09:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-20 09:50:16 UTC
Target Upstream Version:


Attachments (Terms of Use)
log from networking plugin (2.23 MB, text/plain)
2018-11-20 12:19 UTC, anil venkata
no flags Details

Description anil venkata 2018-11-20 12:19:36 UTC
Created attachment 1507401 [details]
log from networking plugin

Description of problem:
sosreport is timing out on networking plugin as it is iterating through all namepsaces (there are 1125 namespaces) and executing below commands

2018-11-20 12:06:39,521 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ip address show'
2018-11-20 12:06:39,606 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ip route show table all'
2018-11-20 12:06:39,704 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 iptables-save'
2018-11-20 12:06:39,807 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ss -peaonmi'
2018-11-20 12:06:41,583 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -W -neopa'
2018-11-20 12:06:43,258 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -s'
2018-11-20 12:06:43,341 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -W -agn'

We can optimise this plugin to spawn workers to finish processing namespaces faster so that we can avoid timeouts.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Bryn M. Reeves 2018-11-20 12:38:19 UTC
> We can optimise this plugin to spawn workers to finish processing namespaces 
> faster so that we can avoid timeouts.

There is no mechanism to do that. Plugins are bound to a thread from a threadpool and the plugin timeout mechanism operates at that level: there is no way for a plugin to spawn additional deferred work (it would be a bit pointless - we would need another mechanism to check on and time out that work in any case).

In the short term we can increase the plugin timeout (and also provide a mechanism for users to increase it when needed), but making this in some way adaptive to the volume of data we find is a lot more complicated.

Comment 4 Pavel Moravec 2018-11-20 12:48:48 UTC
Spawning threads for the "ip netns .." (or other) commands would be a bigger feature request (i.e. how much to bulk them in an individual plugin? when to spawn threads and when stick to current behaviour?)

Is that required?

Or would it be sufficient to either:
- have configurable plugin timeout (see bz1635214), optionally increased by default for networking (or rather for OpenStack pre-set)?

- have plugin option to skip collecting per-namespace commands, to skip:
https://github.com/sosreport/sos/blob/master/sos/plugins/networking.py#L234-L261
?

- or limit that commands to some say first 100 entries (I think this is a bad idea, who knows what entries are important to collect)


Is either of those options sufficient?

Comment 5 anil venkata 2018-11-21 09:55:48 UTC
Looks like configurable plugin timeout is the feasible option to avoid sosreports hanging because of a plugin timeout.

Comment 6 Bryn M. Reeves 2018-11-21 10:35:36 UTC
> to avoid sosreports hanging because of a plugin timeout.

The sosreport command should never "hang" in this situation - it should continue but with partial data for the timed-out plugin(s). If you are seeing something different then that is a separate bug and should be reported.

Comment 7 Pavel Moravec 2019-01-20 09:50:16 UTC
(In reply to anil venkata from comment #5)
> Looks like configurable plugin timeout is the feasible option to avoid
> sosreports hanging because of a plugin timeout.

Thanks. Closing as dup of 1635214 (planned to 7.7) then.

*** This bug has been marked as a duplicate of bug 1635214 ***


Note You need to log in before you can comment on or make changes to this bug.