Bug 2147595

Summary: crash in hostname resolution by NIS when address sanitizer is in use
Product: [Fedora] Fedora Reporter: Jochen <jochen447>
Component: libnsl2Assignee: Ondřej Sloup <osloup>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: abokovoy, fjanus, mmuzila, odubaj
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
small test programm calling gethostbyname
none
small test programm calling getpwuid_r none

Description Jochen 2022-11-24 11:00:05 UTC
Created attachment 1926963 [details]
small test programm calling gethostbyname

Description of problem:
My minimal test program crashes in `gethostbyname` function with a DEADLYSIGNAL (SIGSEGV) when built with clang's address sanitizer (`-fsanitize=address`), but _only_ if the hostname must be solved by libnsl2 (the machine is configured as a NIS(YP) client).
I'm not 100% sure if this is really related to the libnsl2 module or if there is an issue with clang or underlying asan libraries.

Version-Release number of selected component (if applicable):
2.0.0-4

How reproducible:
# /usr/bin/clang -g -fsanitize=address -fno-omit-frame-pointer libnsl2crash.c
# ./a.out example.com
gethostbyname("example.com")...AddressSanitizer:DEADLYSIGNAL
=================================================================
==248166==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffc28ba64c0 sp 0x7ffc28ba5c78 T0)
==248166==Hint: pc points to the zero page.
==248166==The signal is caused by a READ memory access.
==248166==Hint: address points to the zero page.
    #0 0x0  (<unknown module>)
    #1 0x7f5d20e05e42  (/lib64/libnsl.so.3+0x3e42) (BuildId: 9486128142acf0b2aab30643ec361f2d7836d19c)
    #2 0x7f5d20e0624d  (/lib64/libnsl.so.3+0x424d) (BuildId: 9486128142acf0b2aab30643ec361f2d7836d19c)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>) 
==248166==ABORTING


Steps to Reproduce:
1. compile the example code with clang as outlined above
2. configure the machine as NIS client
3. invoke the compiled program with a non-local hostname to resolve

Actual results:
gethostbyname("example.com")...AddressSanitizer:DEADLYSIGNAL
...crash (SEGV)


Expected results:
gethostbyname("example.com") ... official name=example.com
  h_addr_list[0]=93.184.216.34
done


Additional info:
# gdb -ex r --args ./a.out example.com
gethostbyname("example.com")...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000457e6d in __interceptor_xdrstdio_create.part.0 ()
#2  0x00007ffff75dae43 in __yp_bind.part.0 () from /lib64/libnsl.so.3
#3  0x00007ffff75db24e in do_ypcall () from /lib64/libnsl.so.3
#4  0x00007ffff75dbae7 in yp_match () from /lib64/libnsl.so.3
#5  0x00007ffff79091a0 in internal_gethostbyname2_r () from /lib64/libnss_nis.so.2
#6  0x00007ffff790b360 in _nss_nis_gethostbyname_r () from /lib64/libnss_nis.so.2
#7  0x00007ffff7dceada in gethostbyname_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
#8  0x00007ffff7dce1e9 in gethostbyname () from /lib64/libc.so.6
#9  0x0000000000463683 in gethostbyname ()
#10 0x00000000005158f1 in main (argc=2, argv=0x7fffffffd948) at libnsl2crash.c:88

# cat /etc/fedora-release
Fedora release 37 (Thirty Seven)

# uname -a
Linux fedora 6.0.9-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 16 17:36:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# clang --version
clang version 15.0.4 (Fedora 15.0.4-1.fc37)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Comment 1 Jochen 2022-11-25 10:37:09 UTC
The issue is even a bit worse, because it not only happens indirectly when calling `gethostbyname`, but also indirectly when calling `getpwuid_r` and similar. But again: it only crashes if NIS is consulted.

I've created another example program to demonstrate that: libnsl2crash2.c
After installing the debug symbols for libnsl2-2.0.0-4.fc37, I get a slightly better stack trace:

#0  0x0000000000000000 in ?? ()
#1  0x0000000000457e3d in __interceptor_xdrstdio_create.part.0 ()
#2  0x00007ffff75d9e43 in yp_bind_file (ysd=0x612000000340, domain=0x7ffff75de020 <ypdomainname> "XXXXXXXXXXXXXXXXXXX")
    at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:109
#3  __yp_bind (domain=domain@entry=0x7ffff75de020 <ypdomainname> "XXXXXXXXXXXXXXXXXXX", ypdb=ypdb@entry=0x7fffffffccd0)
    at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:275
#4  0x00007ffff75da24e in __yp_bind (ypdb=0x7fffffffccd0, domain=0x7ffff75de020 <ypdomainname> "XXXXXXXXXXXXXXXXXXX")
    at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:254
#5  do_ypcall (domain=0x7ffff75de020 <ypdomainname> "XXXXXXXXXXXXXXXXXXX", prog=prog@entry=3, xargs=xargs@entry=0x7ffff75d8ce0 <xdr_ypreq_key>, 
    req=req@entry=0x7fffffffcd40 " \340]\367\377\177", xres=xres@entry=0x7ffff75d8d50 <xdr_ypresp_val>, resp=resp@entry=0x7fffffffcd20 "")
    at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:442
#6  0x00007ffff75daae7 in do_ypcall_tr (resp=0x7fffffffcd20, xres=0x7ffff75d8d50 <xdr_ypresp_val>, req=0x7fffffffcd40 " \340]\367\377\177", 
    xargs=0x7ffff75d8ce0 <xdr_ypreq_key>, prog=3, domain=<optimized out>) at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:475
#7  yp_match (indomain=<optimized out>, inmap=<optimized out>, inkey=<optimized out>, inkeylen=<optimized out>, outval=0x7fffffffcdd0, outvallen=0x7fffffffcdc4)
    at /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/yp_match.c:48
#8  0x00007ffff790ae4b in _nss_nis_getpwuid_r () from /lib64/libnss_nis.so.2
#9  0x00007ffff7d85061 in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
#10 0x00000000004735ec in getpwuid_r ()
#11 0x0000000000515a3a in main (argc=2, argv=0x7fffffffd958) at libnsl2crash2.c:79

Comment 2 Jochen 2022-11-25 10:38:33 UTC
Created attachment 1927381 [details]
small test programm calling getpwuid_r

Comment 3 Alexander Bokovoy 2022-11-28 12:42:43 UTC
I think this needs an upstream report. Fedora is driving NIS(+) removal in Fedora 38, but this looks like nothing specific to Fedora.

Few more questions before that.

1. Is this reproducible with clang only or a gcc-built example fails as well?

2. The crash happens in xdrstdio_create() implementation. Fedora uses tirpc, and libnsl2 links against tirpc. May be it is actually a bug in tirpc?

The code in question in libnsl2 is this:

  FILE *in = fopen (path, "rce");
  if (in != NULL)
    {
....
      XDR xdrs;
      xdrstdio_create (&xdrs, in, XDR_DECODE);
....

E.g. it passes a file object to initialize XDR stream and the code crashes there in TIRPC code.

Comment 4 Jochen 2022-11-28 13:06:25 UTC
Thank Alexander for looking into this!

(In reply to Alexander Bokovoy from comment #3)
> I think this needs an upstream report. Fedora is driving NIS(+) removal in
> Fedora 38, but this looks like nothing specific to Fedora.
> 
> Few more questions before that.
> 
> 1. Is this reproducible with clang only or a gcc-built example fails as well?

It also crashes with gcc (I had to install package libasan first in order to try it out):

# sudo dnf info libasan
Installed Packages
Name         : libasan
Version      : 12.2.1
Release      : 4.fc37
Architecture : x86_64
Size         : 1.3 M
Source       : gcc-12.2.1-4.fc37.src.rpm
[..]

This is the ASAN stacktrace from running the gcc sanitized example "libnslcrash2.c" (2nd example):

# gcc --version
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
[..]

# gcc -g -fno-optimize-sibling-calls -fsanitize=address -fno-omit-frame-pointer libnsl2crash2.c 
# ./a.out 1076
getpwuid_r(1076) ... AddressSanitizer:DEADLYSIGNAL
=================================================================
==1413854==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd597ebe70 sp 0x7ffd597eb618 T0)
==1413854==Hint: pc points to the zero page.
==1413854==The signal is caused by a READ memory access.
==1413854==Hint: address points to the zero page.
    #0 0x0  (<unknown module>)
    #1 0x7fc2eb8cfe42 in yp_bind_file /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:109
    #2 0x7fc2eb8cfe42 in __yp_bind /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:275
    #3 0x7fc2eb8d024d in __yp_bind /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:254
    #4 0x7fc2eb8d024d in do_ypcall /usr/src/debug/libnsl2-2.0.0-4.fc37.x86_64/src/do_ypcall.c:442

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>) 
==1413854==ABORTING

It seems like the stack trace is identical. The last frame is suspicious, though. Perhaps the stack is smashed/garbled.


> 2. The crash happens in xdrstdio_create() implementation. Fedora uses tirpc,
> and libnsl2 links against tirpc. May be it is actually a bug in tirpc?

That could very well be, because the last stack frame is unknown.
The libtirpc version in use is:
# sudo dnf info libtirpc
Installed Packages
Name         : libtirpc
Version      : 1.3.3
Release      : 0.fc37
Architecture : i686
Size         : 218 k
Source       : libtirpc-1.3.3-0.fc37.src.rpm
[...]


Would be cool, if someone except myself is able to reproduce any of these issues.

Comment 5 Fedora Admin user for bugzilla script actions 2023-06-27 12:10:30 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.