Bug 1309422

Summary: sosreport collides with 3rd party kernel drivers for Dialogic Diva BRI-2 PCIe v2 (ISDN card)
Product: Red Hat Enterprise Linux 7 Reporter: Robert Scheck <redhat-bugzilla>
Component: sosAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Miroslav HradĂ­lek <mhradile>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.2CC: agk, bmr, gavin, isenfeld, mhradile, plambri, redhat-bugzilla, robert.scheck, sbradley, srandhaw, ssekidde
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sos-3.4-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:08:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport-3.2-eicon.patch none

Description Robert Scheck 2016-02-17 18:36:21 UTC
Description of problem:
sosreport collides with 3rd party kernel drivers for Dialogic Diva BRI-2
PCIe v2 (ISDN card): We are unable to run sosreport, because the step

  Running 49/85: networking...

never finishes, instead the disk runs out of space. Tracking this down while
sosreport is running leads to a file like

/var/tmp/sos.G8Bna1/sosreport-tux.example.net-20160217190519/proc/net/eicon/diva_idi

which is 25 GB and more in size.

$ ls -l /proc/net/eicon/diva_idi 
-rw-r--r--. 1 root root 0 Feb 17 19:27 /proc/net/eicon/diva_idi
$ 

This "file" is created by the 3rd party kernel drivers for Dialogic Diva
BRI-2 PCIe v2 (ISDN card), which are provided by the hardware vendor.

Version-Release number of selected component (if applicable):
sos-3.2-35.el7_2.3.noarch

How reproducible:
Everytime, see above and below.

Actual results:
sosreport collides with 3rd party kernel drivers for Dialogic Diva BRI-2
PCIe v2 (ISDN card).

Expected results:
No collision between sosreport and 3rd party kernel drivers for Dialogic
Diva BRI-2 PCIe v2 (ISDN card).

Additional info:
Once I excluded /proc/net/eicon/diva_idi file with a manual hack within
the sosreport plugin, the next file is /proc/net/eicon/dynamic_l1_down
with about the same.

Comment 1 Robert Scheck 2016-02-17 18:38:37 UTC
Created attachment 1127987 [details]
sosreport-3.2-eicon.patch

Comment 3 Robert Scheck 2016-02-17 18:49:57 UTC
Cross-filed case 01585233 on the Red Hat customer portal.

Comment 4 Robert Scheck 2016-02-17 19:02:13 UTC
Attachment #1127987 [details] was needed to get case 01585244 on the Red Hat customer
portal filed. If you need more information about the system, please consult
case 01585244 as well (or let me know what you need).

Comment 5 Pavel Moravec 2016-02-20 17:17:14 UTC
Thanks for raising the improvement BZ.

Two comments to otherwise fine patch:

1) What types of nodes are the problematic? Some block/character device or sockets (under /proc/net?)? Sosreport calls shutil.copystat / behaves like cp for the files, this sounds be hit.


2) Isn't it still worth collecting files mentioned in

https://www.dialogic.com/webhelp/Diva/8.5lin/206-324-08/6336.htm

? (just asking, not knowing the kernel driver)

Comment 6 Robert Scheck 2016-02-20 19:20:14 UTC
I've no clue about the details of the driver (just a user/admin),
what I have at this specific system is:

$ find /proc/net/eicon/
/proc/net/eicon/
/proc/net/eicon/diva_idi
/proc/net/eicon/adapter1
/proc/net/eicon/adapter1/dynamic_l1_down
/proc/net/eicon/adapter1/group_optimization
/proc/net/eicon/adapter1/info
/proc/net/eicon/divas
/proc/net/eicon/divadidd
$ 

The problematic ones were at least /proc/net/eicon/diva_idi as well as
/proc/net/eicon/dynamic_l1_down which seem to fill the disk completely,
I aborted sosreport shortly before the disk was full (~ 25 GB size for
each of the files).

How can I figure out how large such a file might get while copying, but
without calling "cp"? Maybe the excludes should happen more precise.

Comment 7 Bryn M. Reeves 2016-02-22 09:33:05 UTC
> What types of nodes are the problematic? Some block/character device or 
> sockets (under /proc/net?)? Sosreport calls shutil.copystat / behaves like cp 
> for the files, this sounds be hit.

It is a zero-sized inode that returns large volumes of data on read(2). They are commonplace in /proc (e.g. /proc/kcore, /proc/$PID/mem etc.).

> 2) Isn't it still worth collecting files mentioned in

Possibly but it's somewhat orthogonal to this bug: this bz is about not messing up when we trip on these files from other plugins that traverse /proc. Adding a plugin to collect data from these Dialogic devices would need to be a separate request.

> How can I figure out how large such a file might get while copying, but
> without calling "cp"? Maybe the excludes should happen more precise.

You can't (directly). If it's e.g. a process address space (/proc/$PID/mem), or the kernel address space (/proc/kcore from kdump kernel) then you can calculate it indirectly but most of these pseudofiles report a 0-byte size.

Comment 8 Robert Scheck 2016-02-22 21:21:40 UTC
I would like to avoid getting the disk 100% filled just by running sosreport
while Dialogic Diva BRI-2 PCIe v2 and its drivers are used. I'm not sure if
there is any partnership between Red Hat and Dialogic (where it could make
sense to write a Dialogic plugin), but for me this is outside of the scope 
through.

Comment 9 Bryn M. Reeves 2016-02-23 09:55:25 UTC
This will be addressed in a future Red Hat Enterprise Linux update. 

Unfortunately this is a problem with any driver that places files exposing large or unlimited quantities of data in the /proc/net tree - none of the in-tree modules do this and due to the semantics of /proc (all inodes report 0-size) it is not possible to detect this is occurring (without resorting to hacks and heuristics): this means that each instance needs to be manually blacklisted to prevent these problems.

Now that we are aware of the problem with the Dialogic cards there is an issue open upstream to blacklist these devices:

https://github.com/sosreport/sos/issues/777

This will then filter into available package updates for supported releases.

Comment 10 Bryn M. Reeves 2016-02-23 09:56:36 UTC
Unfortunately the only immediately applicable workarounds are to either disable the driver prior to running sos (disruptive) or to disable the networking plugin that runs into the Dialogic driver files:

# sosreport -n networking

For problems not relating to network configuration or state this should be acceptable.

Comment 11 Robert Scheck 2016-02-23 10:22:43 UTC
Bryn, I filed the support case above to get this addressed within RHEL 7.x,
so please add a fix to sosreport for RHEL 7.x. If you need justification or
similar in the support case, please let me know.

Comment 13 Pavel Moravec 2016-11-08 21:33:08 UTC
This has been committed to upstream as:

https://github.com/sosreport/sos/commit/03cfbe57966090d041c4689f8cc3fd291789fb5a

That commit has been fixed in RHEL7.3 / sos errata [1] due to sos rebase [2]. I am closing the bugzilla - please test it and if some problem with the fix is found, reopen the BZ.

[1] https://rhn.redhat.com/errata/RHBA-2016-2380.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1293044

Comment 14 Robert Scheck 2016-11-16 18:35:37 UTC
I am sorry, but the issue is not fixed. Running sosreport from RHEL 7.3 on
the affected system, it looks like this:

[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4,5G
-rw-r--r--. 1 root root 4,5G 16. Nov 19:20 info
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4.7G
-rw-r--r--. 1 root root 4.7G Nov 16 19:20 info
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 4.9G
-rw-r--r--. 1 root root 4.9G Nov 16 19:20 info
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.1DFrYW/sosreport-tux.example.net-20161116191926/proc/net/eicon/adapter1/
total 5.0G
-rw-r--r--. 1 root root 5.0G Nov 16 19:20 info
[root@tux ~]# 

Feels like /proc/net/eicon/adapter*/info should be excluded as well now?
So far so good, did that using:

  self.add_forbidden_path("/proc/net/eicon/adapter*/info")

Re-running sosreport now leads to this:

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 199M
-rw-r--r--. 1 root root 199M Nov 16 19:29 group_optimization
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 201M
-rw-r--r--. 1 root root 201M Nov 16 19:29 group_optimization
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 203M
-rw-r--r--. 1 root root 203M Nov 16 19:29 group_optimization
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 205M
-rw-r--r--. 1 root root 205M Nov 16 19:29 group_optimization
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 206M
-rw-r--r--. 1 root root 206M Nov 16 19:29 group_optimization
[root@tux ~]# 

[root@tux ~]# ls -lh /var/tmp/sos.i2wvQI/sosreport-tux.example.net-20161116192753/proc/net/eicon/adapter1/
total 212M
-rw-r--r--. 1 root root 212M Nov 16 19:29 group_optimization
[root@tux ~]# 

Ouch! Okay, lets exclude this path as well:

  self.add_forbidden_path("/proc/net/eicon/adapter*/group_optimization")

Result: sosreport succeeds, however nothing of /proc/net/eicon ends up
anymore in the sosreport tarball. This leads to the initial proposal as
per attachment #1127987 [details] to exclude the whole directory simply.

Comment 15 Robert Scheck 2016-11-16 18:39:42 UTC
As per https://github.com/sosreport/sos/pull/892 I am also suggesting the
exclusion to upstream. Updated ticket in the Red Hat customer portal, too.

Comment 16 Pavel Moravec 2016-11-17 08:17:11 UTC
(In reply to Robert Scheck from comment #15)
> As per https://github.com/sosreport/sos/pull/892 I am also suggesting the
> exclusion to upstream. Updated ticket in the Red Hat customer portal, too.

Thanks, I have commented the upstream PR (seems ok, just simplification a bit).

Comment 17 Bryn M. Reeves 2016-11-17 09:58:29 UTC
We can add these additional paths (the ones previously identified and tested are already present in 7.3 - unfortunately it seems something either in the driver, or the environment has changed, exposing more pseudofiles with this behaviour).

If you have a support relationship with the vendor you may wish to open a case with them, to either remove the problematic files, or to coordinate with other vendors so that we can avoid these ping-pong problems.

In the meantime, we will probably blacklist everything under '/proc/net/eicon' - it's unfortunate since there could feasibly be useful information here, but since these drivers are not upstream and are not shipped in the Red Hat kernel, there is no way for us to test them, or to get notifications when something changes.

Comment 18 Pavel Moravec 2016-12-15 20:43:25 UTC
POSTed to upstream as:

https://github.com/sosreport/sos/commit/e9458ae0e263ea9997465a46873e9fe7be6ae3c8

Comment 24 errata-xmlrpc 2017-08-01 23:08:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2203