1792908 – nstat core dumps continuously if its state file /tmp/.nstat.u<userid> is corrupted

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1792908 - nstat core dumps continuously if its state file /tmp/.nstat.u<userid> is corrupted

Summary: nstat core dumps continuously if its state file /tmp/.nstat.u<userid> is corr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	iproute
Sub Component:
Version:	7.7
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrea Claudi
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1824896
TreeView+	depends on / blocked

Reported:	2020-01-20 11:23 UTC by Renaud Métrich
Modified:	2023-09-07 21:31 UTC (History)
CC List:	3 users (show)
Fixed In Version:	iproute-4.11.0-27.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1824896 (view as bug list)
Environment:
Last Closed:	2020-09-29 20:28:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	4757141	0	None	None	None	2020-01-20 13:23:18 UTC
Red Hat Product Errata	RHBA-2020:3999	0	None	None	None	2020-09-29 20:28:34 UTC

Description Renaud Métrich 2020-01-20 11:23:10 UTC

Description of problem:

If for some reason the state file /tmp/.nstat.u<userid> is corrupted and doesn't contain the expected data nstat understands, "nstat" will continuously die until the state file is manually deleted.


Version-Release number of selected component (if applicable):

iproute-4.11.0-25.el7_7.2.x86_64


How reproducible:

Always


Steps to Reproduce:
1. Execute nstat once

  # nstat

2. Corrupt the state file

  # echo FOO > /tmp/.nstat.u0

3. Execute nstat again

  # nstat

Actual results:

Aborted (core dumped)


Expected results:

"fresh" nstat data (not the differences since state file is corrupted)


Additional info:

The following backtrace is seen:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) bt
#0  0x00007fb3a26c0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007fb3a26c1a68 in __GI_abort () at abort.c:90
#2  0x0000000000402602 in load_good_table (fp=fp@entry=0x169c0a0) at nstat.c:147
#3  0x00000000004021f7 in main (argc=<optimized out>, argv=<optimized out>) at nstat.c:696

(gdb) up 2
#2  0x0000000000402602 in load_good_table (fp=fp@entry=0x169c0a0) at nstat.c:147
147				abort();
(gdb) list
142				continue;
143			}
144			/* idbuf is as big as buf, so this is safe */
145			nr = sscanf(buf, "%s%llu%lg", idbuf, &val, &rate);
146			if (nr < 2)
147				abort();
148			if (nr < 3)
149				rate = 0;
150			if (useless_number(idbuf))
151				continue;

(gdb) p buf
$1 = "FOO\n"
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Due to "nr < 2" ("nr == 0" here, there is no "long" then "double" value), a coredump is created.

Comment 2 Andrea Claudi 2020-02-04 13:17:22 UTC

Hi Renaud,
Data stored in nstat temporary file are provided by kernel via /proc files; if something breaks in their syntax (e.g. because of a bug), we have a nstat temp file that is costantly broken, so there is no chance to provide "fresh" nstat data as you expect. In this situation, the only thing we can do is to avoid the crash and make nstat fail gracefully (i.e. printing a meaningful message and exiting with error). Do you agree with this?

Comment 3 Renaud Métrich 2020-02-04 13:32:33 UTC

Hi Andrea,

I understand, it's sufficient to bail in error. Still I'm not sure if nstat should delete the corrupted state file or keep it and tell the admin to delete it (I would prefer nstat to delete the file since it's corrupted anyway).

Renaud.

Comment 4 Andrea Claudi 2020-02-04 16:08:51 UTC

I don't see any prominent reason to hide the causes of the error deleting the file. Having the file readily available can expedite a fix in case of issues, instead.
Please take into account that if the error comes from the kernel, every new file will contain it.

Comment 5 Renaud Métrich 2020-02-04 20:03:09 UTC

Well the issue happened on a customer system, of course he didn't push "FOO" into the state file.
This means that it's possible that from time to time the kernel reports bad data, hence when such thing happens, the file should just be discarded, an error printed stating that next execution will not show the differences, but the whole stats.
We may indeed not delete the file, but then nstat return code should be very specific so that script tools can detect this and do the clean up by themselves (which seems complicated to me: the caller would then need to know how the state file is named).

Comment 6 Andrea Claudi 2020-02-06 17:56:14 UTC

Yes, that was clear to me.

The problem is that if the kernel is printing garbage in the temp file, there is no chance to have "clean" data: hence it is useless to delete the file, we will end up with another corrupted file anyway.

Comment 7 Andrea Claudi 2020-02-06 18:12:22 UTC

Patch sent upstream: https://patchwork.ozlabs.org/patch/1234524/

Comment 9 Andrea Claudi 2020-02-24 17:01:57 UTC

Patch merged upstream.

Comment 11 Andrea Claudi 2020-04-16 15:55:38 UTC

Solved upstream with:

commit 2c7056ac26412fe99443a283f0c1261cb81ccea2
Author: Andrea Claudi <aclaudi>
Date:   Mon Feb 17 14:46:18 2020 +0100

    nstat: print useful error messages in abort() cases

Comment 18 errata-xmlrpc 2020-09-29 20:28:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (iproute bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3999

Note You need to log in before you can comment on or make changes to this bug.