Bug 1651599

Summary: timeout in networking plugin when large number of namespaces in nodes
Product: Red Hat Enterprise Linux 7 Reporter: anil venkata <vkommadi>
Component: sosAssignee: Pavel Moravec <pmoravec>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: agk, apevec, bmr, gavin, lhh, plambri, sbradley, vkommadi
Target Milestone: pre-dev-freeze   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-20 09:50:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log from networking plugin none

Description anil venkata 2018-11-20 12:19:36 UTC
Created attachment 1507401 [details]
log from networking plugin

Description of problem:
sosreport is timing out on networking plugin as it is iterating through all namepsaces (there are 1125 namespaces) and executing below commands

2018-11-20 12:06:39,521 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ip address show'
2018-11-20 12:06:39,606 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ip route show table all'
2018-11-20 12:06:39,704 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 iptables-save'
2018-11-20 12:06:39,807 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 ss -peaonmi'
2018-11-20 12:06:41,583 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -W -neopa'
2018-11-20 12:06:43,258 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -s'
2018-11-20 12:06:43,341 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-c151e6a8-d635-40d9-8e41-25cf78a4ed79 netstat -W -agn'

We can optimise this plugin to spawn workers to finish processing namespaces faster so that we can avoid timeouts.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Bryn M. Reeves 2018-11-20 12:38:19 UTC
> We can optimise this plugin to spawn workers to finish processing namespaces 
> faster so that we can avoid timeouts.

There is no mechanism to do that. Plugins are bound to a thread from a threadpool and the plugin timeout mechanism operates at that level: there is no way for a plugin to spawn additional deferred work (it would be a bit pointless - we would need another mechanism to check on and time out that work in any case).

In the short term we can increase the plugin timeout (and also provide a mechanism for users to increase it when needed), but making this in some way adaptive to the volume of data we find is a lot more complicated.

Comment 4 Pavel Moravec 2018-11-20 12:48:48 UTC
Spawning threads for the "ip netns .." (or other) commands would be a bigger feature request (i.e. how much to bulk them in an individual plugin? when to spawn threads and when stick to current behaviour?)

Is that required?

Or would it be sufficient to either:
- have configurable plugin timeout (see bz1635214), optionally increased by default for networking (or rather for OpenStack pre-set)?

- have plugin option to skip collecting per-namespace commands, to skip:
https://github.com/sosreport/sos/blob/master/sos/plugins/networking.py#L234-L261
?

- or limit that commands to some say first 100 entries (I think this is a bad idea, who knows what entries are important to collect)


Is either of those options sufficient?

Comment 5 anil venkata 2018-11-21 09:55:48 UTC
Looks like configurable plugin timeout is the feasible option to avoid sosreports hanging because of a plugin timeout.

Comment 6 Bryn M. Reeves 2018-11-21 10:35:36 UTC
> to avoid sosreports hanging because of a plugin timeout.

The sosreport command should never "hang" in this situation - it should continue but with partial data for the timed-out plugin(s). If you are seeing something different then that is a separate bug and should be reported.

Comment 7 Pavel Moravec 2019-01-20 09:50:16 UTC
(In reply to anil venkata from comment #5)
> Looks like configurable plugin timeout is the feasible option to avoid
> sosreports hanging because of a plugin timeout.

Thanks. Closing as dup of 1635214 (planned to 7.7) then.

*** This bug has been marked as a duplicate of bug 1635214 ***