Bug 404451 - minor updates/fixes for RHEL5.2
minor updates/fixes for RHEL5.2
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.1
All Linux
low Severity low
: ---
: ---
Assigned To: David Teigland
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-29 09:40 EST by David Teigland
Modified: 2009-04-16 19:03 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2008-0347
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:58:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed fix (444 bytes, patch)
2008-03-27 17:09 EDT, Lon Hohberger
no flags Details | Diff
Proposed fix (886 bytes, patch)
2008-03-27 17:23 EDT, Lon Hohberger
no flags Details | Diff
Add checks for POLLNVAL wherever we check for POLLHUP (1.66 KB, patch)
2008-03-27 17:25 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description David Teigland 2007-11-29 09:40:44 EST
Description of problem:

This is for minor updates/fixes for RHEL5.2 for things in the cman package.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 David Teigland 2007-11-29 09:48:26 EST
Module name:    cluster
Branch:         RHEL5
Changes by:     teigland@sourceware.org 2007-11-29 14:46:41

Modified files:
        fence/fence_tool: fence_tool.c

Log message:
        [sync from HEAD]

        clean out some options that were only relevant to rhel4
        remove the monitor option which didn't do anything
        add the dump option to dump the fenced debug buffer
        (group_tool can still do this, but fence_tool wasn't oddly enough
Comment 2 RHEL Product and Program Management 2007-11-29 10:14:38 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 Christine Caulfield 2008-01-04 04:11:08 EST
Module name:	cluster
Branch: 	RHEL5
Changes by:	pcaulfield@sourceware.org	2008-01-02 10:02:44

Modified files:
	cman/daemon    : commands.c 

Log message:
	totempg_ifaces_get() always copies INTERFACE_MAX addresses
	so make sure we allocate enough space for them all.

--- cluster/cman/daemon/commands.c	2007/11/26 17:02:59	1.55.2.15
+++ cluster/cman/daemon/commands.c	2008/01/02 10:02:44	1.55.2.16
Comment 4 Christine Caulfield 2008-01-04 04:12:14 EST
Module name:	cluster
Branch: 	RHEL5
Changes by:	pcaulfield@sourceware.org	2008-01-03 16:36:52

Modified files:
	cman/daemon    : commands.c 

Log message:
	Get rid of redundant totemip_parse() call. This was in a bad place and 
        could cause aisexec stalls and disallowed nodes, particularly at startup.

--- cluster/cman/daemon/commands.c	2008/01/02 10:02:44	1.55.2.16
+++ cluster/cman/daemon/commands.c	2008/01/03 16:36:51	1.55.2.17
Comment 6 David Teigland 2008-01-14 15:55:51 EST
Module name:    cluster
Branch:         RHEL5
Changes by:     teigland@sourceware.org 2008-01-14 20:54:30

Modified files:
        group/daemon   : app.c cpg.c joinleave.c

Log message:
        fix %llx printf warnings using (unsigned long long)
        bz 404451
Comment 8 Lon Hohberger 2008-03-27 16:31:55 EDT
group_tool dump (as noted in comment #1) works, but it causes groupd to enter a
tight loop.  Since groupd is at RT priority, this causes the machine to become
unusable:

top - 16:31:44 up 6 min,  2 users,  load average: 5.61, 2.97, 1.21
Tasks:  61 total,   6 running,  55 sleeping,   0 stopped,   0 zombie
Cpu(s): 21.1%us, 78.3%sy,  0.0%ni,  0.3%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    262144k total,   235344k used,    26800k free,    18396k buffers
Swap:   557048k total,        0k used,   557048k free,    84724k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 1625 root      RT   0  7512  788  492 R 99.7  0.3   3:18.10 groupd   

This was produced on cman-2.0.81 a 2-node+qdisk cluster (with no other nodes
online except qdiskd)


Comment 9 Lon Hohberger 2008-03-27 17:09:21 EDT
(gdb) bt
#0  0x00000035e94c585f in poll () from /lib64/libc.so.6
#1  0x000000000040dfb3 in loop () at main.c:760
#2  0x000000000040e862 in main (argc=1, argv=0x7ffff56844a8) at main.c:988
#3  0x00000035e941d8a4 in __libc_start_main () from /lib64/libc.so.6
#4  0x0000000000401879 in _start ()

* client_maxi = 7
* timeout = -1 (wait forever)
* rv (from the poll command) is always '1'
* the poll fd list looks like this:
  (gdb) p pollfd[0]
  $38 = {fd = 1, events = 1, revents = 0}
  (gdb) p pollfd[1]
  $39 = {fd = 2, events = 1, revents = 0}
  (gdb) p pollfd[2]
  $40 = {fd = 7, events = 1, revents = 0}
  (gdb) p pollfd[3]
  $41 = {fd = 8, events = 1, revents = 0}
  (gdb) p pollfd[4]
  $42 = {fd = 9, events = 1, revents = 0}
  (gdb) p pollfd[5]
  $43 = {fd = 10, events = 1, revents = 0}
  (gdb) p pollfd[6]
  $44 = {fd = 12, events = 1, revents = 0}
  (gdb) p pollfd[7]
  $45 = {fd = 13, events = 1, revents = 32}
  (gdb) p/x pollfd[7] 
  $46 = {fd = 0xd, events = 0x1, revents = 0x20}

  revents = 0, except for fd13 where it's set to 0x20 == POLLNVAL

[root@frederick ~]# ls -l /proc/2053/fd
total 0
l-wx------ 1 root root 64 Mar 27 17:04 0 -> /var/run/groupd.pid
lrwx------ 1 root root 64 Mar 27 17:04 1 -> socket:[10956]
lrwx------ 1 root root 64 Mar 27 17:04 10 -> socket:[11297]
lrwx------ 1 root root 64 Mar 27 17:04 11 -> socket:[11327]
lrwx------ 1 root root 64 Mar 27 17:04 12 -> socket:[11329]
lrwx------ 1 root root 64 Mar 27 17:04 2 -> socket:[10957]
lr-x------ 1 root root 64 Mar 27 17:04 3 -> /dev/zero
lrwx------ 1 root root 64 Mar 27 17:04 4 -> socket:[10958]
lr-x------ 1 root root 64 Mar 27 17:04 5 -> /dev/zero
lrwx------ 1 root root 64 Mar 27 17:04 6 -> socket:[11026]
lrwx------ 1 root root 64 Mar 27 17:04 7 -> socket:[11030]
lrwx------ 1 root root 64 Mar 27 17:04 8 -> socket:[11035]
lrwx------ 1 root root 64 Mar 27 17:04 9 -> socket:[11274]

The file descriptor 13 is invalid; it's not part of the process space.  It looks
like this should be zapped by the client_dead function -

Comment 10 Lon Hohberger 2008-03-27 17:09:42 EDT
Created attachment 299390 [details]
Proposed fix
Comment 11 Lon Hohberger 2008-03-27 17:13:40 EDT
The fix in comment #10 stopped the infinite loop on my cluster.
Comment 12 Lon Hohberger 2008-03-27 17:13:59 EDT
(I tested the patch against 2.0.81)
Comment 13 Lon Hohberger 2008-03-27 17:17:37 EDT
hmm, I just realized this bug was about fence_tool dump. - sorry about that.
Comment 14 Lon Hohberger 2008-03-27 17:22:37 EDT
No matter, same thing happens with fence_tool dump:

top - 17:20:42 up 55 min,  2 users,  load average: 19.87, 10.06, 6.41
Tasks:  59 total,  19 running,  37 sleeping,   0 stopped,   3 zombie
Cpu(s): 22.7%us, 77.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    262144k total,   245064k used,    17080k free,    35576k buffers
Swap:   557048k total,        0k used,   557048k free,    72956k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3485 root      RT   0  4900  624  448 R 99.9  0.2   2:45.70 fenced   

... same patch (applied to fenced) fixes it; attaching 
Comment 15 Lon Hohberger 2008-03-27 17:23:10 EDT
Created attachment 299391 [details]
Proposed fix
Comment 16 Lon Hohberger 2008-03-27 17:25:49 EDT
Created attachment 299392 [details]
Add checks for POLLNVAL wherever we check for POLLHUP

Note that gnbd_monitor also checks for POLLERR; not sure if we need to do that
in groupd/fenced/[dlm|gfs]_controld
Comment 19 errata-xmlrpc 2008-05-21 11:58:30 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html

Note You need to log in before you can comment on or make changes to this bug.