groupadd segfaults in libaudit code, see below for details from F-24 mock chroot. <mock-chroot>sh-4.3# gdb groupadd GNU gdb (GDB) Fedora 7.10.50.20160121-46.fc24 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "s390x-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from groupadd...Reading symbols from /usr/lib/debug/usr/sbin/groupadd.debug...done. done. (gdb) set args foo1 (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/sbin/groupadd foo1 Program received signal SIGSEGV, Segmentation fault. check_ack (seq=<optimized out>, fd=<optimized out>) at netlink.c:287 287 int error = rep.error->error; (gdb) where #0 check_ack (seq=<optimized out>, fd=<optimized out>) at netlink.c:287 #1 audit_send_internal (fd=fd@entry=3, type=type@entry=1116, data=data@entry=0x3ffffffcfc6, size=<optimized out>) at netlink.c:244 #2 0x000003fffdfac592 in audit_send_user_message_internal (fd=fd@entry=3, type=type@entry=1116, hide_error=hide_error@entry=REAL_ERR, message=message@entry=0x3ffffffcfc6 "op=add-group id=1001 exe=\"/usr/sbin/groupadd\" hostname=? addr=? terminal=? res=success") at deprecated.c:47 #3 0x000003fffdfab874 in audit_log_acct_message (audit_fd=<optimized out>, type=<optimized out>, pgname=<optimized out>, op=0x2aa0000c3ec "add-group", name=0x3fffffff8ef "foo1", id=1001, host=<optimized out>, addr=0x0, tty=<optimized out>, result=1) at audit_logging.c:457 #4 0x000002aa00004d44 in audit_logger (type=<optimized out>, pgname=<optimized out>, op=<optimized out>, name=0x3fffffff8ef "foo1", id=<optimized out>, result=SHADOW_AUDIT_SUCCESS) at audit_help.c:86 #5 0x000002aa0000417c in close_files () at groupadd.c:275 #6 main (argc=<optimized out>, argv=<optimized out>) at groupadd.c:621 Version-Release number of selected component (if applicable): audit-2.5-2.fc24 Additional info: This audit build is from the GCC6 rebuild, when audit is downgraded to audit-2.5-1.fc24 (pre-mass rebuild), everything works.
rebuild with -fno-delete-null-pointer-checks didn't help
also using -fno-strict-aliasing doesn't help, but switching to -O1 does help, maybe a compiler error then ...
I can't see how the error is occurring in the code. The code that sets things up is this: switch (rep->type) { case NLMSG_ERROR: rep->error = NLMSG_DATA(rep->nlh); break; And the dereference is: else if (rc > 0 && rep.type == NLMSG_ERROR) { int error = rep.error->error; This part of the code has not changed in probably 8 years.
yeah, I couldn't see anything obvious as well, lets ask the gcc team for opinions
compiler used was gcc-6.0.0-0.9.fc24.s390x, my test builds were with gcc-6.0.0-0.11.fc24.s390x and from what I can see both s390 and s390x are affected full logs are available at http://s390.koji.fedoraproject.org/koji/buildinfo?buildID=378112
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle. Changing version to '24'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase
Created attachment 1130824 [details] preprocessed source file so it's the static int adjust_reply(struct audit_reply *rep, int len) function in lib/netlink.c that requires switching to -O1 to avoid the segfaults gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I../auparse -fPIC -DPIC -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -march=z9-109 -mtune=z10 -c netlink.c -o netlink.o
when __attribute__((noinline, noclone)) is applied to adjust_reply(), the segfault doesn't occur with -O2
This is a dup of http://gcc.gnu.org/PR70025. r227382 broke it too: --- netlink.s.227381 2016-03-01 13:00:48.321371161 +0100 +++ netlink.s.227382 2016-03-01 13:00:50.793337109 +0100 @@ -374,8 +374,8 @@ audit_get_reply_internal: .L42: .loc 1 193 0 lg %r1,176(%r15) - la %r2,32(%r1) - stg %r2,9008(%r1) + la %r1,32(%r1) + stg %r1,9008(%r1) .LVL43: .L35: .LBE28: and the bad lg %r1,176(%r15) la %r1,32(%r1) stg %r1,9008(%r1) still appears even in r233777.
The code also has the same pattern: rep->nlh = &rep->msg.nlh; ... rep->signal_info = ((void*)(((char*) rep->nlh) + ((0) + ((int) ( ((sizeof(struct nlmsghdr))+4U -1) & ~(4U -1) ))))); thus it is again *(ptr + off1) = ptr + off2; where off1 and off2 are constants and off1 is large enough that it is not valid address offset for s390x.
I can confirm that using gcc-6.0.0-0.14.fc24 to build audit there is no more a segfault in libaudit when running eg. groupadd.
Fixed then.