Bug 1792908
| Summary: | nstat core dumps continuously if its state file /tmp/.nstat.u<userid> is corrupted | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Renaud Métrich <rmetrich> | |
| Component: | iproute | Assignee: | Andrea Claudi <aclaudi> | |
| Status: | CLOSED ERRATA | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.7 | CC: | atragler, jmaxwell, ptalbert | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | iproute-4.11.0-27.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1824896 (view as bug list) | Environment: | ||
| Last Closed: | 2020-09-29 20:28:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1824896 | |||
Hi Renaud, Data stored in nstat temporary file are provided by kernel via /proc files; if something breaks in their syntax (e.g. because of a bug), we have a nstat temp file that is costantly broken, so there is no chance to provide "fresh" nstat data as you expect. In this situation, the only thing we can do is to avoid the crash and make nstat fail gracefully (i.e. printing a meaningful message and exiting with error). Do you agree with this? Hi Andrea, I understand, it's sufficient to bail in error. Still I'm not sure if nstat should delete the corrupted state file or keep it and tell the admin to delete it (I would prefer nstat to delete the file since it's corrupted anyway). Renaud. I don't see any prominent reason to hide the causes of the error deleting the file. Having the file readily available can expedite a fix in case of issues, instead. Please take into account that if the error comes from the kernel, every new file will contain it. Well the issue happened on a customer system, of course he didn't push "FOO" into the state file. This means that it's possible that from time to time the kernel reports bad data, hence when such thing happens, the file should just be discarded, an error printed stating that next execution will not show the differences, but the whole stats. We may indeed not delete the file, but then nstat return code should be very specific so that script tools can detect this and do the clean up by themselves (which seems complicated to me: the caller would then need to know how the state file is named). Yes, that was clear to me. The problem is that if the kernel is printing garbage in the temp file, there is no chance to have "clean" data: hence it is useless to delete the file, we will end up with another corrupted file anyway. Patch sent upstream: https://patchwork.ozlabs.org/patch/1234524/ Patch merged upstream. Solved upstream with:
commit 2c7056ac26412fe99443a283f0c1261cb81ccea2
Author: Andrea Claudi <aclaudi>
Date: Mon Feb 17 14:46:18 2020 +0100
nstat: print useful error messages in abort() cases
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (iproute bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3999 |
Description of problem: If for some reason the state file /tmp/.nstat.u<userid> is corrupted and doesn't contain the expected data nstat understands, "nstat" will continuously die until the state file is manually deleted. Version-Release number of selected component (if applicable): iproute-4.11.0-25.el7_7.2.x86_64 How reproducible: Always Steps to Reproduce: 1. Execute nstat once # nstat 2. Corrupt the state file # echo FOO > /tmp/.nstat.u0 3. Execute nstat again # nstat Actual results: Aborted (core dumped) Expected results: "fresh" nstat data (not the differences since state file is corrupted) Additional info: The following backtrace is seen: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- (gdb) bt #0 0x00007fb3a26c0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fb3a26c1a68 in __GI_abort () at abort.c:90 #2 0x0000000000402602 in load_good_table (fp=fp@entry=0x169c0a0) at nstat.c:147 #3 0x00000000004021f7 in main (argc=<optimized out>, argv=<optimized out>) at nstat.c:696 (gdb) up 2 #2 0x0000000000402602 in load_good_table (fp=fp@entry=0x169c0a0) at nstat.c:147 147 abort(); (gdb) list 142 continue; 143 } 144 /* idbuf is as big as buf, so this is safe */ 145 nr = sscanf(buf, "%s%llu%lg", idbuf, &val, &rate); 146 if (nr < 2) 147 abort(); 148 if (nr < 3) 149 rate = 0; 150 if (useless_number(idbuf)) 151 continue; (gdb) p buf $1 = "FOO\n" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Due to "nr < 2" ("nr == 0" here, there is no "long" then "double" value), a coredump is created.