Bug 1869724
| Summary: | sosreport running 'ethtool -e' is causing bnx2x NICs to pause | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | suresh kumar <surkumar> |
| Component: | sos | Assignee: | Pavel Moravec <pmoravec> |
| Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 8.3 | CC: | agk, bmr, mhradile, plambri, ptalbert, sbradley |
| Target Milestone: | rc | Keywords: | OtherQA |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | sos-3.9.1-6.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-04 01:58:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I have submitted an upstream patch to remove ethtool -e for bnx2x NICs which is accepted. https://github.com/sosreport/sos/commit/34c77d6902ee1df403dc3836b4092d413fb95350 . +++ $ git show 34c77d69 commit 34c77d6902ee1df403dc3836b4092d413fb95350 Author: suresh2514 <suresh2514> Date: Fri Aug 14 22:59:34 2020 +0530 [networking] remove 'ethtool -e' option for bnx2x NICs Running EEPROM dump (ethtool -e) can result in bnx2x driver NICs to pause for few seconds and is not recommended in production environment. Resolves: #2188 Resolves: #2200 Signed-off-by: suresh2514 <suresh2514> Signed-off-by: Jake Hunsaker <jhunsake> diff --git a/sos/report/plugins/networking.py b/sos/report/plugins/networking.py index ba9c0fb1..397549a5 100644 --- a/sos/report/plugins/networking.py +++ b/sos/report/plugins/networking.py @@ -198,7 +198,6 @@ class Networking(Plugin): "ethtool -a " + eth, "ethtool -c " + eth, "ethtool -g " + eth, - "ethtool -e " + eth, "ethtool -P " + eth, "ethtool -l " + eth, "ethtool --phy-statistics " + eth, @@ -206,6 +205,17 @@ class Networking(Plugin): "ethtool --show-eee " + eth ], tags=eth) + # skip EEPROM collection for 'bnx2x' NICs as this command + # can pause the NIC and is not production safe. + bnx_output = { + "cmd": "ethtool -i %s" % eth, + "output": "bnx2x" + } + bnx_pred = SoSPredicate(self, + cmd_outputs=bnx_output, + required={'cmd_outputs': 'none'}) + self.add_cmd_output("ethtool -e %s" % eth, pred=bnx_pred) + # Collect information about bridges (some data already collected via # "ip .." commands) self.add_cmd_output([ +++ Test result for above patch: +++ Setting up archive ... Setting up plugins ... ... [plugin:networking] skipped command 'ethtool -e em2': <--------------------- bnx2x NIC [plugin:networking] skipped command 'ethtool -e em1': <--------------------- bnx2x NICs Running plugins. Please wait ... Starting 1/1 networking [Running: networking] Finished running plugins Creating compressed archive... Your sosreport has been generated and saved in: /var/tmp/sosreport-dell-pem630-01-2020-08-14-ixpdmsw.tar.xz Size 1.24MiB Owner root md5 a2c236193997733cc383ebdf2bac478f Please send this file to your support representative. real 0m3.718s <------- Without this patch, it was taking 12s to complete sosreport. user 0m2.028s sys 0m0.896s +++ I can add it to RHEL 8.3.0 but we are limited on QE capacity. If/Once a candidate package is available, could you verify it, please? (In reply to Pavel Moravec from comment #2) > I can add it to RHEL 8.3.0 but we are limited on QE capacity. If/Once a > candidate package is available, could you verify it, please? sure regards Hello, could you please verify the fix against below build? Thanks in advance. A yum repository for the build of sos-3.9.1-6.el8 (task 30820540) is available at: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/ You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/sos-3.9.1-6.el8.repo RPMs and build logs can be found in the following locations: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/noarch/ The full list of available rpms is: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/noarch/sos-3.9.1-6.el8.src.rpm http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/noarch/sos-3.9.1-6.el8.noarch.rpm http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.9.1/6.el8/noarch/sos-audit-3.9.1-6.el8.noarch.rpm The repository will be available for the next 60 days. Scratch build output will be deleted earlier, based on the Brew scratch build retention policy. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sos bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4534 |
Description of problem: [1] Customer observed their application is stuck for ~4 seconds while executing sosreport. Issue was further tracked down to 'ethtool -e' command. Checking the strace, we could see ioctl for reading eeprom is returned after 3.444488 seconds. +++ 26621 1595999211.585459 socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 3<UDP:[21070872]> <0.000016> 26621 1595999211.585748 ioctl(3<UDP:[21070872]>, SIOCETHTOOL, 0x7fffffffe530) = 0 <0.000026> 26621 1595999211.586002 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff750a000 <0.000012> 26621 1595999211.587348 ioctl(3<UDP:[21070872]>, SIOCETHTOOL, 0x7fffffffe530 <unfinished ...> 26621 1595999215.032016 <... ioctl resumed> ) = 0 <3.444488> <<<<< +++ NIC version: +++ driver: bnx2x version: 1.713.36-0 storm 7.13.1.0 firmware-version: mbi 7.15.64 bc 7.14.62 +++ Version-Release number of selected component (if applicable): sos version >= 3.7 sosreport has added support for 'ethtool -e' from version 3.7 on wards. +++ $ git show 8b989aeb commit 8b989aebc9c152430fc57f918a8e90210a792a9f Author: Patrick Talbert <ptalbert> Date: Thu Dec 6 13:14:38 2018 +0100 [networking] Extend ethtool command set Update the list of ethtool commands to include: ethtool -e (EEPROM dump) ethtool -P (permanent MAC address) ethtool -l (channel/queue settings) ethtool --phy-statistics ethtool --show-priv-flags ethtool --show-eee All of the above are helpful in understanding the state of modern NICs. And -P is nice to have as otherwise there is no reliable way to see the permanent MAC of team ports. ... +++ How reproducible: Always. Run sosreport on a system with bnx2x NIC. Below is test result from on system dell-pem630-01 +++ System Information Manufacturer: Dell Inc. Product Name: PowerEdge M630 Version: Not Specified Serial Number: 1V8QT52 UUID: 4c4c4544-0056-3810-8051-b1c04f543532 +++ # ethtool -i em1 driver: bnx2x <------------------ version: 1.713.36-0 storm 7.13.1.0 firmware-version: FFV7.12.19 bc 7.12.5 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes +++ It takes time in EEPROM dump +++ # time ethtool -e em1 ... 0x1fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fff90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1fffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1ffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 real 0m12.674s <----------------- user 0m0.090s sys 0m1.294s +++ Actual results: sosreport is taking time to complete and bnx2x NICs are pausing. Expected results: sosreport should not affect NIC operation. Additional info: [1] In another instance (https://bugzilla.redhat.com/show_bug.cgi?id=1846708), we observed sosreport breaks iDRAC connectivity. Issue was again tracked to "ethtool -e" command. [2] Dell has an advisory regarding this: +++ https://www.dell.com/support/manuals/in/en/inbsdt1/red-hat-entps-lx-v7.0/rhel_7.7_rn/reading-eeprom-from-a-broadcom-device-via-ethtool-results-in-soft-lockup?guid=guid-986ca2a9-c9f9-4345-8762-02d286cc0d1f&lang=en-us +++ [3] We also have case where abrtd triggers sosreport and CPUs get into soft lockup very often. So, its better to avoid sosreport running 'ethtool -e' on bnx2x NICs as its not production safe.