One of our machines complained about broken NIS configuration, and I
had a look. ypbind was not running, and 'ypwhich' reported
'ypwhich: Can't communicate with ypbind'.
Checking /var/log/messages, I foudn this message:
Nov 22 14:25:53 ankaa kernel: ypbind: segfault at 0000000000000008
rip 000000000040627f rsp 0000007fbffff390 error 4
I got ypbind version 1.17.2-3 installed on the machine, and uname reports
"Linux ankaa.uio.no 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64
x86_64 x86_64 GNU/Linux".
The machine have SELinux enabled with type set to 'targeted'. I'm not sure
if it is relevant.
I do not know how to reproduce this problem, nor how often it occures, but
found it best to report the problem as ypbind is a network server.
I'll keep the eyes open for this problem, but do not know how to reproduce it.
I can provide syslog messages from around that period, if relevant.
We have a 300 node computing cluster and starting 5 days ago for no apparent
reason ypbind has started dying on them randomly at the rate of a couple of
nodes an hour. We have a mix of 64bit and 32bit nodes. The segfault message
only ends up in the log on the 64bit nodes. I imagine this must be some kind
of debug option only in the x86_64 kernel because we have these failures
occuring equally randomly on both 64bit and 32bit nodes. Seems like something
recently added to our maps is tickling some bug pusing some memory allocation
thing over the edge.
Can you post the output of 'rpm -q ypbind' on your x86 & x86_64 nodes?
Would it also be possible to provide your maps so I can more easily replicate
the issue on our systems?
Closing due to inactivity.