1774590 – sadc segfaults when collecting network data

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1774590 - sadc segfaults when collecting network data

Summary: sadc segfaults when collecting network data

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sysstat
Sub Component:
Version:	7.7
Hardware:	All
OS:	All
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Sekletar
QA Contact:	Radka Brychtova
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1793580
TreeView+	depends on / blocked

Reported:	2019-11-20 14:15 UTC by Renaud Métrich
Modified:	2023-03-24 16:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1793580 (view as bug list)
Environment:
Last Closed:	2020-03-31 20:12:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:1200	0	None	None	None	2020-03-31 20:12:12 UTC

Description Renaud Métrich 2019-11-20 14:15:03 UTC

Description of problem:

A customer sees sadc segfaults repeatedly (the cron /etc/cron.d/sysstat executing the command every 10 minutes).
The coredump shows that /proc/net/snmp data is read and pushed to not allocated memory.
This happens when the data collected previously are for a different activity.


Version-Release number of selected component (if applicable):

sysstat-10.1.5-18.el7.x86_64


How reproducible:

Always


Steps to Reproduce:
1. Delete all "saXX" log files

  # \rm /var/log/sa/sa*

2. Collect data for SNMP activity

  # /usr/lib64/sa/sadc -F -L -S SNMP 1 1 -

3. Collect data for DISK activity

  # /usr/lib64/sa/sadc -F -L -S DISK 1 1 -


Actual results:

Segmentation fault


Expected results:

No issue doing this.

Comment 2 Renaud Métrich 2019-11-20 14:17:09 UTC

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb /usr/lib64/sa/sadc ./coredump
[...]
(gdb) bt
#0  0x00007f096c131d1a in _IO_vfscanf_internal (s=s@entry=0x7fff7f082600, 
    format=format@entry=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu", argptr=argptr@entry=0x7fff7f082728, errp=errp@entry=0x0) at vfscanf.c:1821
#1  0x00007f096c1436dc in __GI___isoc99_vsscanf (
    string=0x7fff7f082823 " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0\n", 
    format=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu", args=args@entry=0x7fff7f082728) at isoc99_vsscanf.c:43
#2  0x00007f096c143667 in __isoc99_sscanf (
    s=s@entry=0x7fff7f082823 " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0\n", 
    format=format@entry=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu") at isoc99_sscanf.c:32
#3  0x0000000000408aea in read_net_ip (st_net_ip=0x0) at rd_stats.c:1007
#4  0x0000000000403360 in read_stats () at sadc.c:888
#5  0x00000000004034df in rw_sa_stat_loop (count=count@entry=1, rectime=rectime@entry=0x7fff7f082e00, 
    stdfd=-1, ofd=3, ofile=ofile@entry=0x7fff7f082e40 "/var/log/sa/sa04") at sadc.c:952
#6  0x0000000000401ff1 in main (argc=8, argv=0x7fff7f083068) at sadc.c:1222
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The statistics being parsed are " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0".
Some fields are assigned to fields of structure "st_net_ip":

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) f 3
#3  0x0000000000408aea in read_net_ip (st_net_ip=0x0) at rd_stats.c:1007
1007					sscanf(line + 3, "%*u %*u %llu %*u %*u %llu %*u %*u "
(gdb) list
1002		
1003		while (fgets(line, 1024, fp) != NULL) {
1004	
1005			if (!strncmp(line, "Ip:", 3)) {
1006				if (sw) {
1007					sscanf(line + 3, "%*u %*u %llu %*u %*u %llu %*u %*u "
1008					       "%llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu",
1009					       &st_net_ip->InReceives,
1010					       &st_net_ip->ForwDatagrams,
1011					       &st_net_ip->InDelivers,
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

For some reason, "st_net_ip" is a NULL pointer (see frame #3: read_net_ip() argument), causing the segmentation fault to happen when sscanf() attempts to write parsed value "4927190" to NULL pointer.

Frame 4 (read_stats()):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) list
883		}
884	
885		for (i = 0; i < NR_ACT; i++) {
886			if (IS_COLLECTED(act[i]->options)) {
887				/* Read statistics for current activity */
888				(*act[i]->f_read)(act[i]);
889			}
890		}
891	
892		if (cpu_nr == 1) {
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The "for" loop on line 885 is optimized ($rbx is used to know where we are in the "act" array):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   0x0000000000403344 <+68>:	cmp    $0x612428,%rbx
   0x000000000040334b <+75>:	je     0x403369 <read_stats+105>

   --> test to bail out ($0x612300 is the "act" array, 37 items (NR_ACT), so end reached at $0x612428)

   $rbx: current "act" bucket, content copied to $rax

   0x000000000040334d <+77>:	mov    (%rbx),%rax
   0x0000000000403350 <+80>:	testb  $0x1,0x4(%rax)
   0x0000000000403354 <+84>:	je     0x403340 <read_stats+64>
   0x0000000000403356 <+86>:	add    $0x8,%rbx

   $rbx: next "act" bucket after call

   0x000000000040335a <+90>:	mov    %rax,%rdi

   $rdi: 1st param of function being called, here NULL! instead of $rax

   0x000000000040335d <+93>:	callq  *0x20(%rax)
=> 0x0000000000403360 <+96>:	cmp    $0x612428,%rbx

   $rbx: do we reach end of array?

   0x0000000000403367 <+103>:	jne    0x40334d <read_stats+77>

(gdb) p/x $rbx
$1 = 0x612390
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Here, we have $rbx point to next element, hence we have at the moment of the crash i == 17 ($rbx already incremented, so substract one).

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]
$79 = (struct activity *) 0x6130c0 <net_ip_act>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

It appears that due to having some "SNMP" activity data in the saXX file, sadc turns on collecting the SNMP activity.
This is done in open_ofile() on line 820:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 716 void open_ofile(int *ofd, char ofile[])
  :
 814                         /*
 815                          * OK: All tests successfully passed.
 816                          * List of activities from the file prevails over that of the user.
 817                          * So unselect all of them. And reset activity sequence.
 818                          */
 819                         for (i = 0; i < NR_ACT; i++) {
 820                                 act[i]->options &= ~AO_COLLECTED;
 821                                 id_seq[i] = 0;
 822                         }
  :
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]->f_read 
$2 = (void (*)(struct activity *)) 0x403700 <wrap_read_net_ip>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This leads to wrap_read_net_ip() being executed, which is a tail call and hence "disappears" from stack view due to compiler optimization.

The wrap_read_net_ip() is shown below:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
298 #define _buf0   buf[0]

 392 __read_funct_t wrap_read_net_ip(struct activity *a)
 393 {
 394         struct stats_net_ip *st_net_ip
 395                 = (struct stats_net_ip *) a->_buf0;
 396 
 397         /* Read IP stats */
 398         read_net_ip(st_net_ip);
 399 
 400         return;
 401 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

It basically calls "read_net_ip()" on the "buf" array (_buf0 is an alias for buf[0]).
Here, the issue is that the "buf" array in not allocated, causing the segmentation fault when sscanf() writes to it:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]->buf
$3 = {0x0, 0x0, 0x0}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Root cause is hence data collection is turned on but no buffer is allocated to hold the data.

Comment 3 Paulo Andrade 2019-11-20 15:00:26 UTC

I believe this is a side effect of the patch added in rhbz#1670060
0001-avoiding-triggering-automounts-bug-1670060.patch

It should be slightly modified to, like upstream, always run:

		if (act[i]->nr > 0) {
			/* Allocate structures for current activity */
			SREALLOC(act[i]->_buf0, void, act[i]->msize * act[i]->nr * act[i]->nr2);
		}

as the crash is due to act[17].buf[0] being NULL.

Comment 13 errata-xmlrpc 2020-03-31 20:12:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1200

Note You need to log in before you can comment on or make changes to this bug.