Bug 1309422
Summary: | sosreport collides with 3rd party kernel drivers for Dialogic Diva BRI-2 PCIe v2 (ISDN card) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Robert Scheck <redhat-bugzilla> | ||||
Component: | sos | Assignee: | Pavel Moravec <pmoravec> | ||||
Status: | CLOSED ERRATA | QA Contact: | Miroslav HradĂlek <mhradile> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.2 | CC: | agk, bmr, gavin, isenfeld, mhradile, plambri, redhat-bugzilla, robert.scheck, sbradley, srandhaw, ssekidde | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | sos-3.4-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 23:08:12 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Robert Scheck
2016-02-17 18:36:21 UTC
Created attachment 1127987 [details]
sosreport-3.2-eicon.patch
Cross-filed case 01585233 on the Red Hat customer portal. Attachment #1127987 [details] was needed to get case 01585244 on the Red Hat customer
portal filed. If you need more information about the system, please consult
case 01585244 as well (or let me know what you need).
Thanks for raising the improvement BZ. Two comments to otherwise fine patch: 1) What types of nodes are the problematic? Some block/character device or sockets (under /proc/net?)? Sosreport calls shutil.copystat / behaves like cp for the files, this sounds be hit. 2) Isn't it still worth collecting files mentioned in https://www.dialogic.com/webhelp/Diva/8.5lin/206-324-08/6336.htm ? (just asking, not knowing the kernel driver) I've no clue about the details of the driver (just a user/admin), what I have at this specific system is: $ find /proc/net/eicon/ /proc/net/eicon/ /proc/net/eicon/diva_idi /proc/net/eicon/adapter1 /proc/net/eicon/adapter1/dynamic_l1_down /proc/net/eicon/adapter1/group_optimization /proc/net/eicon/adapter1/info /proc/net/eicon/divas /proc/net/eicon/divadidd $ The problematic ones were at least /proc/net/eicon/diva_idi as well as /proc/net/eicon/dynamic_l1_down which seem to fill the disk completely, I aborted sosreport shortly before the disk was full (~ 25 GB size for each of the files). How can I figure out how large such a file might get while copying, but without calling "cp"? Maybe the excludes should happen more precise. > What types of nodes are the problematic? Some block/character device or > sockets (under /proc/net?)? Sosreport calls shutil.copystat / behaves like cp > for the files, this sounds be hit. It is a zero-sized inode that returns large volumes of data on read(2). They are commonplace in /proc (e.g. /proc/kcore, /proc/$PID/mem etc.). > 2) Isn't it still worth collecting files mentioned in Possibly but it's somewhat orthogonal to this bug: this bz is about not messing up when we trip on these files from other plugins that traverse /proc. Adding a plugin to collect data from these Dialogic devices would need to be a separate request. > How can I figure out how large such a file might get while copying, but > without calling "cp"? Maybe the excludes should happen more precise. You can't (directly). If it's e.g. a process address space (/proc/$PID/mem), or the kernel address space (/proc/kcore from kdump kernel) then you can calculate it indirectly but most of these pseudofiles report a 0-byte size. I would like to avoid getting the disk 100% filled just by running sosreport while Dialogic Diva BRI-2 PCIe v2 and its drivers are used. I'm not sure if there is any partnership between Red Hat and Dialogic (where it could make sense to write a Dialogic plugin), but for me this is outside of the scope through. This will be addressed in a future Red Hat Enterprise Linux update. Unfortunately this is a problem with any driver that places files exposing large or unlimited quantities of data in the /proc/net tree - none of the in-tree modules do this and due to the semantics of /proc (all inodes report 0-size) it is not possible to detect this is occurring (without resorting to hacks and heuristics): this means that each instance needs to be manually blacklisted to prevent these problems. Now that we are aware of the problem with the Dialogic cards there is an issue open upstream to blacklist these devices: https://github.com/sosreport/sos/issues/777 This will then filter into available package updates for supported releases. Unfortunately the only immediately applicable workarounds are to either disable the driver prior to running sos (disruptive) or to disable the networking plugin that runs into the Dialogic driver files: # sosreport -n networking For problems not relating to network configuration or state this should be acceptable. Bryn, I filed the support case above to get this addressed within RHEL 7.x, so please add a fix to sosreport for RHEL 7.x. If you need justification or similar in the support case, please let me know. This has been committed to upstream as: https://github.com/sosreport/sos/commit/03cfbe57966090d041c4689f8cc3fd291789fb5a That commit has been fixed in RHEL7.3 / sos errata [1] due to sos rebase [2]. I am closing the bugzilla - please test it and if some problem with the fix is found, reopen the BZ. [1] https://rhn.redhat.com/errata/RHBA-2016-2380.html [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293044 I am sorry, but the issue is not fixed. Running sosreport from RHEL 7.3 on
the affected system, it looks like this:
[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4,5G
-rw-r--r--. 1 root root 4,5G 16. Nov 19:20 info
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4.7G
-rw-r--r--. 1 root root 4.7G Nov 16 19:20 info
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4.9G
-rw-r--r--. 1 root root 4.9G Nov 16 19:20 info
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 5.0G
-rw-r--r--. 1 root root 5.0G Nov 16 19:20 info
[root@tux ~]#
Feels like /proc/net/eicon/adapter*/info should be excluded as well now?
So far so good, did that using:
self.add_forbidden_path("/proc/net/eicon/adapter*/info")
Re-running sosreport now leads to this:
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 199M
-rw-r--r--. 1 root root 199M Nov 16 19:29 group_optimization
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 201M
-rw-r--r--. 1 root root 201M Nov 16 19:29 group_optimization
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 203M
-rw-r--r--. 1 root root 203M Nov 16 19:29 group_optimization
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 205M
-rw-r--r--. 1 root root 205M Nov 16 19:29 group_optimization
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 206M
-rw-r--r--. 1 root root 206M Nov 16 19:29 group_optimization
[root@tux ~]#
[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 212M
-rw-r--r--. 1 root root 212M Nov 16 19:29 group_optimization
[root@tux ~]#
Ouch! Okay, lets exclude this path as well:
self.add_forbidden_path("/proc/net/eicon/adapter*/group_optimization")
Result: sosreport succeeds, however nothing of /proc/net/eicon ends up
anymore in the sosreport tarball. This leads to the initial proposal as
per attachment #1127987 [details] to exclude the whole directory simply.
As per https://github.com/sosreport/sos/pull/892 I am also suggesting the exclusion to upstream. Updated ticket in the Red Hat customer portal, too. (In reply to Robert Scheck from comment #15) > As per https://github.com/sosreport/sos/pull/892 I am also suggesting the > exclusion to upstream. Updated ticket in the Red Hat customer portal, too. Thanks, I have commented the upstream PR (seems ok, just simplification a bit). We can add these additional paths (the ones previously identified and tested are already present in 7.3 - unfortunately it seems something either in the driver, or the environment has changed, exposing more pseudofiles with this behaviour). If you have a support relationship with the vendor you may wish to open a case with them, to either remove the problematic files, or to coordinate with other vendors so that we can avoid these ping-pong problems. In the meantime, we will probably blacklist everything under '/proc/net/eicon' - it's unfortunate since there could feasibly be useful information here, but since these drivers are not upstream and are not shipped in the Red Hat kernel, there is no way for us to test them, or to get notifications when something changes. POSTed to upstream as: https://github.com/sosreport/sos/commit/e9458ae0e263ea9997465a46873e9fe7be6ae3c8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2203 |