Bug 1774590
| Summary: | sadc segfaults when collecting network data | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Renaud Métrich <rmetrich> | |
| Component: | sysstat | Assignee: | Michal Sekletar <msekleta> | |
| Status: | CLOSED ERRATA | QA Contact: | Radka Brychtova <rskvaril> | |
| Severity: | high | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.7 | CC: | bubrown, fkrska, fsumsal, pandrade, qguo, rskvaril, tborcin | |
| Target Milestone: | rc | Keywords: | EasyFix, Patch, Regression, Reproducer, ZStream | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1793580 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-31 20:12:10 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1793580 | |||
|
Description
Renaud Métrich
2019-11-20 14:15:03 UTC
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb /usr/lib64/sa/sadc ./coredump
[...]
(gdb) bt
#0 0x00007f096c131d1a in _IO_vfscanf_internal (s=s@entry=0x7fff7f082600,
format=format@entry=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu", argptr=argptr@entry=0x7fff7f082728, errp=errp@entry=0x0) at vfscanf.c:1821
#1 0x00007f096c1436dc in __GI___isoc99_vsscanf (
string=0x7fff7f082823 " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0\n",
format=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu", args=args@entry=0x7fff7f082728) at isoc99_vsscanf.c:43
#2 0x00007f096c143667 in __isoc99_sscanf (
s=s@entry=0x7fff7f082823 " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0\n",
format=format@entry=0x40ea70 "%*u %*u %llu %*u %*u %llu %*u %*u %llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu") at isoc99_sscanf.c:32
#3 0x0000000000408aea in read_net_ip (st_net_ip=0x0) at rd_stats.c:1007
#4 0x0000000000403360 in read_stats () at sadc.c:888
#5 0x00000000004034df in rw_sa_stat_loop (count=count@entry=1, rectime=rectime@entry=0x7fff7f082e00,
stdfd=-1, ofd=3, ofile=ofile@entry=0x7fff7f082e40 "/var/log/sa/sa04") at sadc.c:952
#6 0x0000000000401ff1 in main (argc=8, argv=0x7fff7f083068) at sadc.c:1222
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
The statistics being parsed are " 2 64 4927190 0 0 0 0 0 4910671 3903669 0 0 0 0 0 0 0 0 0".
Some fields are assigned to fields of structure "st_net_ip":
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) f 3
#3 0x0000000000408aea in read_net_ip (st_net_ip=0x0) at rd_stats.c:1007
1007 sscanf(line + 3, "%*u %*u %llu %*u %*u %llu %*u %*u "
(gdb) list
1002
1003 while (fgets(line, 1024, fp) != NULL) {
1004
1005 if (!strncmp(line, "Ip:", 3)) {
1006 if (sw) {
1007 sscanf(line + 3, "%*u %*u %llu %*u %*u %llu %*u %*u "
1008 "%llu %llu %*u %*u %*u %llu %llu %*u %llu %*u %llu",
1009 &st_net_ip->InReceives,
1010 &st_net_ip->ForwDatagrams,
1011 &st_net_ip->InDelivers,
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
For some reason, "st_net_ip" is a NULL pointer (see frame #3: read_net_ip() argument), causing the segmentation fault to happen when sscanf() attempts to write parsed value "4927190" to NULL pointer.
Frame 4 (read_stats()):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) list
883 }
884
885 for (i = 0; i < NR_ACT; i++) {
886 if (IS_COLLECTED(act[i]->options)) {
887 /* Read statistics for current activity */
888 (*act[i]->f_read)(act[i]);
889 }
890 }
891
892 if (cpu_nr == 1) {
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
The "for" loop on line 885 is optimized ($rbx is used to know where we are in the "act" array):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
0x0000000000403344 <+68>: cmp $0x612428,%rbx
0x000000000040334b <+75>: je 0x403369 <read_stats+105>
--> test to bail out ($0x612300 is the "act" array, 37 items (NR_ACT), so end reached at $0x612428)
$rbx: current "act" bucket, content copied to $rax
0x000000000040334d <+77>: mov (%rbx),%rax
0x0000000000403350 <+80>: testb $0x1,0x4(%rax)
0x0000000000403354 <+84>: je 0x403340 <read_stats+64>
0x0000000000403356 <+86>: add $0x8,%rbx
$rbx: next "act" bucket after call
0x000000000040335a <+90>: mov %rax,%rdi
$rdi: 1st param of function being called, here NULL! instead of $rax
0x000000000040335d <+93>: callq *0x20(%rax)
=> 0x0000000000403360 <+96>: cmp $0x612428,%rbx
$rbx: do we reach end of array?
0x0000000000403367 <+103>: jne 0x40334d <read_stats+77>
(gdb) p/x $rbx
$1 = 0x612390
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Here, we have $rbx point to next element, hence we have at the moment of the crash i == 17 ($rbx already incremented, so substract one).
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]
$79 = (struct activity *) 0x6130c0 <net_ip_act>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
It appears that due to having some "SNMP" activity data in the saXX file, sadc turns on collecting the SNMP activity.
This is done in open_ofile() on line 820:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
716 void open_ofile(int *ofd, char ofile[])
:
814 /*
815 * OK: All tests successfully passed.
816 * List of activities from the file prevails over that of the user.
817 * So unselect all of them. And reset activity sequence.
818 */
819 for (i = 0; i < NR_ACT; i++) {
820 act[i]->options &= ~AO_COLLECTED;
821 id_seq[i] = 0;
822 }
:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]->f_read
$2 = (void (*)(struct activity *)) 0x403700 <wrap_read_net_ip>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
This leads to wrap_read_net_ip() being executed, which is a tail call and hence "disappears" from stack view due to compiler optimization.
The wrap_read_net_ip() is shown below:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
298 #define _buf0 buf[0]
392 __read_funct_t wrap_read_net_ip(struct activity *a)
393 {
394 struct stats_net_ip *st_net_ip
395 = (struct stats_net_ip *) a->_buf0;
396
397 /* Read IP stats */
398 read_net_ip(st_net_ip);
399
400 return;
401 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
It basically calls "read_net_ip()" on the "buf" array (_buf0 is an alias for buf[0]).
Here, the issue is that the "buf" array in not allocated, causing the segmentation fault when sscanf() writes to it:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p act[17]->buf
$3 = {0x0, 0x0, 0x0}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Root cause is hence data collection is turned on but no buffer is allocated to hold the data.
I believe this is a side effect of the patch added in rhbz#1670060
0001-avoiding-triggering-automounts-bug-1670060.patch
It should be slightly modified to, like upstream, always run:
if (act[i]->nr > 0) {
/* Allocate structures for current activity */
SREALLOC(act[i]->_buf0, void, act[i]->msize * act[i]->nr * act[i]->nr2);
}
as the crash is due to act[17].buf[0] being NULL.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1200 |