Bug 1683904
| Summary: | sosreport executes almost 5000 commands and timed out in networking plugin | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Masahiro Matsuya <mmatsuya> | |
| Component: | sos | Assignee: | Pavel Moravec <pmoravec> | |
| Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.6 | CC: | agk, astupnik, bmr, cww, fkrska, jjansky, jmaxwell, jraju, mhradile, pamadio, plambri, pmoravec, sbradley | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | sos-3.9-2.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1805079 1805080 (view as bug list) | Environment: | ||
| Last Closed: | 2020-09-29 20:55:10 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1784466, 1805079, 1805080 | |||
|
Comment 3
Pavel Moravec
2019-03-10 11:12:51 UTC
Scope of 7.7 closed, rescheduling for potential inclusion in 7.8. Any idea how in particular to improve sosreport behaviour is welcomed - several contradicting requirements are here. (In reply to Pavel Moravec from comment #5) > Scope of 7.7 closed, rescheduling for potential inclusion in 7.8. > > > Any idea how in particular to improve sosreport behaviour is welcomed - > several contradicting requirements are here. It could be the "-p" option that is adding to the delay. It means that the command needs to query the process information of that socket and that can increase the execution time. I am not suggesting that we omit the "-p" option because it provides valuable insight. Matsuya, If you change: ns_cmd_prefix + "ss -peaonmi", ns_cmd_prefix + "netstat %s -neopa" % self.ns_wide, to: ns_cmd_prefix + "ss -eaonmi", ns_cmd_prefix + "netstat %s -neoa" % self.ns_wide, Does that decrease the time that it takes to complete the networking plugins? Regards Jon I don't think that playing with a command option helps sufficiently here. Since in the original sosreport, I see thousands of commands like: 2019-02-15 13:36:36,353 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-55acb798-3698-419e-a7b2-00509771c2dc ethtool qg-00431182-a9' 2019-02-15 13:36:36,412 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-55acb798-3698-419e-a7b2-00509771c2dc ethtool -i qg-00431182-a9' 2019-02-15 13:36:36,469 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-55acb798-3698-419e-a7b2-00509771c2dc ethtool -k qg-00431182-a9' 2019-02-15 13:36:36,531 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-55acb798-3698-419e-a7b2-00509771c2dc ethtool -S qg-00431182-a9' or 2019-02-15 13:37:59,315 INFO: [plugin:networking] collecting output of 'ip netns exec qdhcp-03b8ebf5-6b9e-4520-a61e-4f033d9ffc0c ethtool tap0177f4b3-a1' 2019-02-15 13:37:59,381 INFO: [plugin:networking] collecting output of 'ip netns exec qdhcp-03b8ebf5-6b9e-4520-a61e-4f033d9ffc0c ethtool -i tap0177f4b3-a1' 2019-02-15 13:37:59,443 INFO: [plugin:networking] collecting output of 'ip netns exec qdhcp-03b8ebf5-6b9e-4520-a61e-4f033d9ffc0c ethtool -k tap0177f4b3-a1' 2019-02-15 13:37:59,510 INFO: [plugin:networking] collecting output of 'ip netns exec qdhcp-03b8ebf5-6b9e-4520-a61e-4f033d9ffc0c ethtool -S tap0177f4b3-a1' being collected. Some stats from those commands in the sosreport attached to the support case: command type avg max sum count --------------------------------------------------------------------- _count_durations.ethtool_HQ.txt 63 103 27754 438 _count_durations.ethtool_-i.txt 63 98 39476 625 _count_durations.ethtool_-k.txt 63 99 39531 625 _count_durations.ethtool_-S.txt 63 86 39561 625 _count_durations.ethtool_tap.txt 63 83 11811 187 _count_durations.ip_addr.txt 71 266 24080 335 _count_durations.ip_rout.txt 70 282 23844 336 _count_durations.iptables-save'_.txt 71 346 24063 336 _count_durations.netstat_-s'.txt 486 719 163499 336 _count_durations.netstat_-W.-.txt 261 601 175809 672 _count_durations.ss_-pea.txt 74 1258 25068 336 avg = average time in ms per one command execution max = maximal time for the same sum = summary of all those cmds execution count = number of such cmds executed by the sosreport So if we shall decrease the sosreport execution time, we should focus on commands like: 2019-02-15 13:28:08,207 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-d1c111b0-80d0-46b0-97b4-74f72e9f4220 netstat -W -neopa' 2019-02-15 13:28:08,691 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-d1c111b0-80d0-46b0-97b4-74f72e9f4220 netstat -s' 2019-02-15 13:28:08,765 INFO: [plugin:networking] collecting output of 'ip netns exec qrouter-d1c111b0-80d0-46b0-97b4-74f72e9f4220 netstat -W -agn' 2019-02-15 13:28:09,458 INFO: [plugin:networking] collecting output of 'ip netns exec qdhcp-1f64034d-8be8-4cc4-b6a2-8ef86473efa7 netstat -W -neopa' .. that took 339 seconds alone. The networking plugin was stopped after 10minutes, so more than half of time spent in those netstat commands. Anyway, how reasonable is it to iterate over all few hundreds of the networking namespaces and collect several cmds output for all of them? Are all those data usefull? rescheduling from 7.8 scope for potential inclusion in 7.9. Needinfo (still) pending. Upstream discussion kicked off in https://github.com/sosreport/sos/issues/1916 My proposal is described there as well. Upstream PR merged via [1], codefix will appear in 7.9 and 8.3. [1] https://github.com/sosreport/sos/commit/c20bd8d489c45401db55cf89bf7d4d0f7623a4fe This will appear in 7.9 for sure (due to rebase to sos 3.9 that contains the fix). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sos bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4034 |