Bug 195662 - [RHEL 4.5] Crashes when looking up hosts with --dns
Summary: [RHEL 4.5] Crashes when looking up hosts with --dns
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: ypserv
Version: 4.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Chris Feist
QA Contact: Jay Turner
URL:
Whiteboard:
Depends On:
Blocks: 198694 227512 246627
TreeView+ depends on / blocked
 
Reported: 2006-06-16 13:37 UTC by Bastien Nocera
Modified: 2018-10-19 20:39 UTC (History)
6 users (show)

Fixed In Version: RHBA-2008-0747
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 20:00:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ypserv core from unresponsive process (2.23 MB, application/x-bzip2)
2006-11-01 21:14 UTC, Frank Hirtz
no flags Details
ypserv mstest patch ported to 2.13 (3.49 KB, patch)
2007-05-19 04:50 UTC, Frank Hirtz
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0747 0 normal SHIPPED_LIVE ypserv bug fix update 2008-07-23 16:47:41 UTC

Description Bastien Nocera 2006-06-16 13:37:13 UTC
ypserv-2.8-21 (with debugging as per bug #192920)

Core was generated by `/var/tmp/ypserv/usr/sbin/ypserv --dns'.
Program terminated with signal 11, Segmentation fault.
<snip>
#0  _int_malloc (av=0xf65a1180, bytes=1) at malloc.c:3926
3926          fwd->bk = victim;
(gdb) bt
#0  _int_malloc (av=0xf65a1180, bytes=1) at malloc.c:3926
#1  0xf64dce9d in __libc_malloc (bytes=352) at malloc.c:3295
#2  0xf64cbfa3 in __fopen_internal (filename=0x1 <Address 0x1 out of bounds>,
   mode=0x1 <Address 0x1 out of bounds>, is32=1) at iofopen.c:76
#3  0xf64cc06e in _IO_new_fopen (filename=0x1 <Address 0x1 out of bounds>,
   mode=0x1 <Address 0x1 out of bounds>) at iofopen.c:107
#4  0xf655700e in __res_vinit (statp=0xf65a2840, preinit=0) at res_init.c:236
#5  0xf6556e07 in __res_ninit (statp=0x1) at res_init.c:138
#6  0xf65bc2fd in res_gethostbyaddr (addr=0xfeffdd64, len=4, af=2)
   at gethnamaddr.c:679
#7  0x0804b518 in ypproc_match_2_svc (argp=0xfeffddc8, result=0xfeffdda8,
   rqstp=0x8050097) at server.c:290
#8  0x08049dfb in ypprog_2 (rqstp=0xfeffde28, transp=0x8054008) at ypserv.c:215
#9  0xf656c9f8 in svc_getreq_common (fd=1) at svc.c:465
#10 0xf656c818 in svc_getreq_poll (pfdp=0x84cd000, pollretval=1) at svc.c:398
#11 0x0804a14f in ypserv_svc_run () at ypserv.c:266
#12 0x0804ac3d in main (argc=134566760, argv=0xffffffff) at ypserv.c:707
#13 0xf648179d in __libc_start_main (main=0x804a650 <main>, argc=2,
   ubp_av=0xfeffe504, init=0x804fa58 <__libc_csu_init>, fini=0x2,
   rtld_fini=0xfeffe504, stack_end=0xfeffe4fc)
   at ../sysdeps/generic/libc-start.c:205
#14 0x08049cd1 in _start ()

When running "ypserv --dns" under valgrind, and requesting the value for a key
that's not in the database, we get errors like:
==11715== Invalid write of size 1
==11715==    at 0x804B16B: ypproc_match_2_svc (server.c:263)
==11715==    by 0x8049DFA: ypprog_2 (ypserv.c:215)
==11715==    by 0x41569F7: svc_getreq_common (svc.c:465)
==11715==    by 0x4156817: svc_getreq_poll (svc.c:398)
==11715==    by 0x804A14E: ypserv_svc_run (ypserv.c:266)
==11715==    by 0x804AC3C: main (ypserv.c:707)
==11715==  Address 0x41970FF is 0 bytes after a block of size 7 alloc'd
==11715==    at 0x401A6C2: malloc (vg_replace_malloc.c:149)
==11715==    by 0x41591AB: xdr_bytes (xdr.c:564)
==11715==    by 0x4044E64: xdr_keydat (yp_xdr.c:76)
==11715==    by 0x4044FC4: xdr_ypreq_key (yp_xdr.c:112)
==11715==    by 0x415870B: svcudp_getargs (svc_udp.c:374)
==11715==    by 0x8049DD2: ypprog_2 (ypserv.c:209)
==11715==    by 0x41569F7: svc_getreq_common (svc.c:465)
==11715==    by 0x4156817: svc_getreq_poll (svc.c:398)
==11715==    by 0x804A14E: ypserv_svc_run (ypserv.c:266)
==11715==    by 0x804AC3C: main (ypserv.c:707)

See that the invalid reads, and the crash itself occurred in ypprog_2
(ypserv.c:215).

Steps to reproduce:

Server setup (test.redhat.com):
0. (default ypserv and ypserv.conf install)
1. domainname redhat.com
2. add an entry to /etc/hosts to test with
3. cd /var/yp && make
4. change /proc/sys/kernel/core_pattern
5. launch ypserv --dns

Client side:
1. domainname redhat.com
2. In /etc/yp.conf:
domain redhat.com server test.redhat.com
3. ypbind
4. test with ypcat:
ypcat hosts
5. Launch ypmatch on a host that isn't in the database:
ypmatch -d redhat.com -k isnotthere hosts

The crash is not reproduceable every time, but the invalid reads happen every time.

Comment 3 Chris Feist 2006-07-17 18:12:16 UTC
Just to verify, it looks like that package you're using is for RHEL3U8, is that
correct?  (The bug is assigned to RHEL4)

Comment 4 Frank Hirtz 2006-07-17 18:21:18 UTC
It actually exhibits itself in both releases (and yes, they've tested with U8
beta IIRC)

Comment 5 Chris Feist 2006-07-17 18:29:45 UTC
I'm working on replicating this issue, but I'm not seeing any errors with my
machines.  What version of ypbind & yp-tools do you have on the client machine.
 Also, what kind of errors do you see on the clients, and are there any errors
on the server (besides the core dump).

Also, approx how many times you do you have to run the ypmatch to get ypserv to
segfault.

Comment 6 Chris Feist 2006-07-17 18:36:40 UTC
Also, do you know which arch you saw this failing on?  (I'm just trying to
exactly replicate this bug).

Comment 34 Frank Hirtz 2006-10-20 13:49:05 UTC
<feedback>
OK, no coredumps yet, but it looks like ypserv has become wedged:

# /usr/lib/yp/makedbm -c
failed to send 'clear' to local ypserv: RPC: Timed out
</feedback>

Is there anything that he should be looking for when this happens?

Comment 35 Chris Feist 2006-10-20 15:45:58 UTC
Just to verify, this is RHEL3, correct?

Comment 36 Chris Feist 2006-10-20 16:22:19 UTC
I've been able to verify the hangs on RHEL3, they appear to only happen when I'm
doing a combination of dns & normal yp requests.  If I only do dns requests or
only do ypserv requests ypserv seems to operate perfectly.  I'm still
investigating what's going on.

Comment 37 Frank Hirtz 2006-10-20 16:44:43 UTC
Gotcha, thanks for the work here. To confirm, the environment that they're able
to test in is RHEL3.

Comment 39 Frank Hirtz 2006-10-24 21:16:41 UTC
<feedback>
No coredumps, but still gets hung as below.
</feedback>

Comment 40 Chris Feist 2006-10-25 17:05:02 UTC
I haven't been able to replicate the core dumps on my side after running with
heavy load for 16 hours.  Can you have them 'killall ypserv', and verify that
they've all been killed, and then do a 'sha1sum /usr/sbin/ypserv' and send us
the output, and then start ypserv up again and run their tests.  I just want to
be 100% sure that they're running the latest version.

If they do have the latest version, I'll rebuild with full debugging, and have
them trigger segfaults to figure out where things are getting hung up.

Comment 41 Frank Hirtz 2006-10-26 18:15:12 UTC
<feedback>
sha1sum /var/tmp/ypserv/usr/sbin/ypserv
24595c985866014dc985221502b6ad5dac7677af  /var/tmp/ypserv/usr/sbin/ypserv

We have to install outside /usr, so that's where it's being run from.

We aren't getting coredumps. We're getting ypserv wedged, like this:

  # /usr/lib/yp/makedbm -c
  failed to send 'clear' to local ypserv: RPC: Timed out

And this is happening less frequently now than with the
ypserv-2.8-21.mstest.0.i386.rpm, but is still happening in our
environment.
</feedback>

Comment 43 Frank Hirtz 2006-11-01 20:59:53 UTC
The core is attached, and here's the backtrace from an unresponsive process:

<snip>
(gdb) bt
#0  0xf65493ac in accept () from /lib/tls/libc.so.6
#1  0xf656daff in rendezvous_request () from /lib/tls/libc.so.6
#2  0xf656c892 in svc_getreq_common_internal () from /lib/tls/libc.so.6
#3  0xf656c818 in svc_getreq_poll_internal () from /lib/tls/libc.so.6
#4  0x0804a14f in ypserv_svc_run () at ypserv.c:266
#5  0x0804ac3d in main (argc=134566760, argv=0xffffffff) at ypserv.c:707
#6  0xf648179d in __libc_start_main () from /lib/tls/libc.so.6
#7  0x08049cd1 in _start ()
(gdb)
</snip>

Comment 44 Frank Hirtz 2006-11-01 21:14:19 UTC
Created attachment 140039 [details]
ypserv core from unresponsive process

Comment 45 Chris Feist 2006-11-02 17:11:31 UTC
Was there only one process when it hung? or multiple?

Comment 46 Chris Feist 2006-11-02 17:23:17 UTC
Can you also send me the rpm versions of glibc that they have installed.

Comment 47 Frank Hirtz 2006-11-02 17:37:21 UTC
paivm1 /var/tmp/cores 18# rpm -q glibc
glibc-2.3.2-95.30

Comment 48 Frank Hirtz 2006-11-02 19:52:19 UTC
In response to "Was there only one process when it hung? or multiple?":

Usually just one process. In one case it was two, but both coredumps had
the same backtrace. 

Comment 51 Daniel Riek 2006-11-22 20:50:52 UTC
Ypserv has been added to the list of planned components for 4.5 as an exception.
So proposing this request as well.

Comment 53 Chris Feist 2007-01-24 22:51:48 UTC
I've had a chance to look through the rpm again, and made a slight change to the
way we poll in our svc_run function.  Can you try this rpm:

http://people.redhat.com/cfeist/ypserv/ypserv-2.8-21.mstest.6.i386.rpm

And if it core dumps can you send me the core file, as well as the back trace
from your machine.

Can you also send me the output of the following command:
rpm -q --queryformat '%{name}-%{version}-%{release}.%{arch}\n' glibc
glibc-common glibc-debug glibc-devel glibc-headers nptl nptl-devel nscd




Comment 57 Jay Turner 2007-01-30 15:53:35 UTC
QE ack for RHEL4.5.

Comment 58 Kiersten (Kerri) Anderson 2007-01-30 15:54:43 UTC
Devel ACK

Comment 63 RHEL Program Management 2007-03-10 01:03:16 UTC
This bugzilla had previously been approved for engineering
consideration but Red Hat Product Management is currently reevaluating
this issue for inclusion in RHEL4.6.

Comment 64 Frank Hirtz 2007-05-19 04:50:18 UTC
Created attachment 155041 [details]
ypserv mstest patch ported to 2.13

This is a fairly straightforward port of the 2.8 mstest patch to the 4.5 2.13.
Please take a look to certify that it's sane before using it for anything.

Comment 77 RHEL Program Management 2007-11-29 04:25:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 78 RHEL Program Management 2007-12-13 06:32:57 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Comment 79 Marco Bill-Peter 2007-12-13 12:40:07 UTC
Reopening this bugzilla

Comment 82 Chris Feist 2008-04-15 19:28:30 UTC
This fix is in ypserv-2.13-19, Frank is it possible for you to verify that it
works with your customer?  (This fix should also make it out for 4.7.

Comment 86 errata-xmlrpc 2008-07-24 20:00:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0747.html


Note You need to log in before you can comment on or make changes to this bug.