ypserv-2.8-21 (with debugging as per bug #192920) Core was generated by `/var/tmp/ypserv/usr/sbin/ypserv --dns'. Program terminated with signal 11, Segmentation fault. <snip> #0 _int_malloc (av=0xf65a1180, bytes=1) at malloc.c:3926 3926 fwd->bk = victim; (gdb) bt #0 _int_malloc (av=0xf65a1180, bytes=1) at malloc.c:3926 #1 0xf64dce9d in __libc_malloc (bytes=352) at malloc.c:3295 #2 0xf64cbfa3 in __fopen_internal (filename=0x1 <Address 0x1 out of bounds>, mode=0x1 <Address 0x1 out of bounds>, is32=1) at iofopen.c:76 #3 0xf64cc06e in _IO_new_fopen (filename=0x1 <Address 0x1 out of bounds>, mode=0x1 <Address 0x1 out of bounds>) at iofopen.c:107 #4 0xf655700e in __res_vinit (statp=0xf65a2840, preinit=0) at res_init.c:236 #5 0xf6556e07 in __res_ninit (statp=0x1) at res_init.c:138 #6 0xf65bc2fd in res_gethostbyaddr (addr=0xfeffdd64, len=4, af=2) at gethnamaddr.c:679 #7 0x0804b518 in ypproc_match_2_svc (argp=0xfeffddc8, result=0xfeffdda8, rqstp=0x8050097) at server.c:290 #8 0x08049dfb in ypprog_2 (rqstp=0xfeffde28, transp=0x8054008) at ypserv.c:215 #9 0xf656c9f8 in svc_getreq_common (fd=1) at svc.c:465 #10 0xf656c818 in svc_getreq_poll (pfdp=0x84cd000, pollretval=1) at svc.c:398 #11 0x0804a14f in ypserv_svc_run () at ypserv.c:266 #12 0x0804ac3d in main (argc=134566760, argv=0xffffffff) at ypserv.c:707 #13 0xf648179d in __libc_start_main (main=0x804a650 <main>, argc=2, ubp_av=0xfeffe504, init=0x804fa58 <__libc_csu_init>, fini=0x2, rtld_fini=0xfeffe504, stack_end=0xfeffe4fc) at ../sysdeps/generic/libc-start.c:205 #14 0x08049cd1 in _start () When running "ypserv --dns" under valgrind, and requesting the value for a key that's not in the database, we get errors like: ==11715== Invalid write of size 1 ==11715== at 0x804B16B: ypproc_match_2_svc (server.c:263) ==11715== by 0x8049DFA: ypprog_2 (ypserv.c:215) ==11715== by 0x41569F7: svc_getreq_common (svc.c:465) ==11715== by 0x4156817: svc_getreq_poll (svc.c:398) ==11715== by 0x804A14E: ypserv_svc_run (ypserv.c:266) ==11715== by 0x804AC3C: main (ypserv.c:707) ==11715== Address 0x41970FF is 0 bytes after a block of size 7 alloc'd ==11715== at 0x401A6C2: malloc (vg_replace_malloc.c:149) ==11715== by 0x41591AB: xdr_bytes (xdr.c:564) ==11715== by 0x4044E64: xdr_keydat (yp_xdr.c:76) ==11715== by 0x4044FC4: xdr_ypreq_key (yp_xdr.c:112) ==11715== by 0x415870B: svcudp_getargs (svc_udp.c:374) ==11715== by 0x8049DD2: ypprog_2 (ypserv.c:209) ==11715== by 0x41569F7: svc_getreq_common (svc.c:465) ==11715== by 0x4156817: svc_getreq_poll (svc.c:398) ==11715== by 0x804A14E: ypserv_svc_run (ypserv.c:266) ==11715== by 0x804AC3C: main (ypserv.c:707) See that the invalid reads, and the crash itself occurred in ypprog_2 (ypserv.c:215). Steps to reproduce: Server setup (test.redhat.com): 0. (default ypserv and ypserv.conf install) 1. domainname redhat.com 2. add an entry to /etc/hosts to test with 3. cd /var/yp && make 4. change /proc/sys/kernel/core_pattern 5. launch ypserv --dns Client side: 1. domainname redhat.com 2. In /etc/yp.conf: domain redhat.com server test.redhat.com 3. ypbind 4. test with ypcat: ypcat hosts 5. Launch ypmatch on a host that isn't in the database: ypmatch -d redhat.com -k isnotthere hosts The crash is not reproduceable every time, but the invalid reads happen every time.
Just to verify, it looks like that package you're using is for RHEL3U8, is that correct? (The bug is assigned to RHEL4)
It actually exhibits itself in both releases (and yes, they've tested with U8 beta IIRC)
I'm working on replicating this issue, but I'm not seeing any errors with my machines. What version of ypbind & yp-tools do you have on the client machine. Also, what kind of errors do you see on the clients, and are there any errors on the server (besides the core dump). Also, approx how many times you do you have to run the ypmatch to get ypserv to segfault.
Also, do you know which arch you saw this failing on? (I'm just trying to exactly replicate this bug).
<feedback> OK, no coredumps yet, but it looks like ypserv has become wedged: # /usr/lib/yp/makedbm -c failed to send 'clear' to local ypserv: RPC: Timed out </feedback> Is there anything that he should be looking for when this happens?
Just to verify, this is RHEL3, correct?
I've been able to verify the hangs on RHEL3, they appear to only happen when I'm doing a combination of dns & normal yp requests. If I only do dns requests or only do ypserv requests ypserv seems to operate perfectly. I'm still investigating what's going on.
Gotcha, thanks for the work here. To confirm, the environment that they're able to test in is RHEL3.
<feedback> No coredumps, but still gets hung as below. </feedback>
I haven't been able to replicate the core dumps on my side after running with heavy load for 16 hours. Can you have them 'killall ypserv', and verify that they've all been killed, and then do a 'sha1sum /usr/sbin/ypserv' and send us the output, and then start ypserv up again and run their tests. I just want to be 100% sure that they're running the latest version. If they do have the latest version, I'll rebuild with full debugging, and have them trigger segfaults to figure out where things are getting hung up.
<feedback> sha1sum /var/tmp/ypserv/usr/sbin/ypserv 24595c985866014dc985221502b6ad5dac7677af /var/tmp/ypserv/usr/sbin/ypserv We have to install outside /usr, so that's where it's being run from. We aren't getting coredumps. We're getting ypserv wedged, like this: # /usr/lib/yp/makedbm -c failed to send 'clear' to local ypserv: RPC: Timed out And this is happening less frequently now than with the ypserv-2.8-21.mstest.0.i386.rpm, but is still happening in our environment. </feedback>
The core is attached, and here's the backtrace from an unresponsive process: <snip> (gdb) bt #0 0xf65493ac in accept () from /lib/tls/libc.so.6 #1 0xf656daff in rendezvous_request () from /lib/tls/libc.so.6 #2 0xf656c892 in svc_getreq_common_internal () from /lib/tls/libc.so.6 #3 0xf656c818 in svc_getreq_poll_internal () from /lib/tls/libc.so.6 #4 0x0804a14f in ypserv_svc_run () at ypserv.c:266 #5 0x0804ac3d in main (argc=134566760, argv=0xffffffff) at ypserv.c:707 #6 0xf648179d in __libc_start_main () from /lib/tls/libc.so.6 #7 0x08049cd1 in _start () (gdb) </snip>
Created attachment 140039 [details] ypserv core from unresponsive process
Was there only one process when it hung? or multiple?
Can you also send me the rpm versions of glibc that they have installed.
paivm1 /var/tmp/cores 18# rpm -q glibc glibc-2.3.2-95.30
In response to "Was there only one process when it hung? or multiple?": Usually just one process. In one case it was two, but both coredumps had the same backtrace.
Ypserv has been added to the list of planned components for 4.5 as an exception. So proposing this request as well.
I've had a chance to look through the rpm again, and made a slight change to the way we poll in our svc_run function. Can you try this rpm: http://people.redhat.com/cfeist/ypserv/ypserv-2.8-21.mstest.6.i386.rpm And if it core dumps can you send me the core file, as well as the back trace from your machine. Can you also send me the output of the following command: rpm -q --queryformat '%{name}-%{version}-%{release}.%{arch}\n' glibc glibc-common glibc-debug glibc-devel glibc-headers nptl nptl-devel nscd
QE ack for RHEL4.5.
Devel ACK
This bugzilla had previously been approved for engineering consideration but Red Hat Product Management is currently reevaluating this issue for inclusion in RHEL4.6.
Created attachment 155041 [details] ypserv mstest patch ported to 2.13 This is a fairly straightforward port of the 2.8 mstest patch to the 4.5 2.13. Please take a look to certify that it's sane before using it for anything.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request.
Reopening this bugzilla
This fix is in ypserv-2.13-19, Frank is it possible for you to verify that it works with your customer? (This fix should also make it out for 4.7.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0747.html