Bug 480401

Summary:

gfs_controld segfault during multiple mount attempt

Product:

Red Hat Enterprise Linux 5

Reporter:

Corey Marthaler <cmarthal>

Component:

cman

Assignee:

David Teigland <teigland>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.3

CC:

cfeist, cluster-maint, edamato, swhiteho, teigland

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

cman-2.0.100-1.el5

Doc Type:

Bug Fix

Doc Text:

- cause: mounting around 100 gfs fs's in parallel - consequence: some mount processes never complete - fix: initialize variable in gfs_controld so that connection with mount.gfs is not broken - result: mount.gfs receives correct data from gfs_controld, which prevents the mount error and hang

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-09-02 11:10:54 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
gfs_controld corefile from grant-02	none
gfs_controld debug log	none

Description Corey Marthaler 2009-01-16 20:48:45 UTC

Description of problem:
I was attempting to mount a bunch of gfs filesystems on all 3 grant cluster nodes when gfs_controld segfaulted on grant-02.

Jan 16 14:35:31 grant-02 gfs_controld[9289]: mount_client_dead ci 106 no sysfs entry for fs
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:364"
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:370"
Jan 16 14:35:31 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:371"
Jan 16 14:35:31 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:31 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:31 grant-02 gfs_controld[9289]: mount_client_dead ci 109 no sysfs entry for fs
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:129"
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:135"
Jan 16 14:35:31 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:31 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:31 grant-02 gfs_controld[9289]: mount_client_dead ci 112 no sysfs entry for fs
Jan 16 14:35:31 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:144"
Jan 16 14:35:31 grant-02 gfs_controld[9289]: write errno 9
Jan 16 14:35:31 grant-02 gfs_controld[9289]: notify_mount_client: send failed -1
[...]
Jan 16 14:35:32 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:119"
Jan 16 14:35:32 grant-02 kernel: Joined cluster. Now mounting FS... 
Jan 16 14:35:32 grant-02 gfs_controld[9289]: mount_client_dead ci 207 no sysfs entry for fs
Jan 16 14:35:32 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:251"
Jan 16 14:35:32 grant-02 kernel: Joined cluster. Now mounting FS...
Jan 16 14:35:32 grant-02 kernel: GFS: unknown option: 0
Jan 16 14:35:32 grant-02 kernel: GFS: invalid mount option(s)
Jan 16 14:35:32 grant-02 kernel: GFS: can't parse mount arguments
Jan 16 14:35:32 grant-02 kernel: gfs_controld[9289]: segfault at 0000000000000024 rip 000000000041038b rsp 00007fffcbf750e0 error 4
Jan 16 14:35:32 grant-02 kernel: Joined cluster. Now mounting FS... 
Jan 16 14:35:32 grant-02 groupd[9269]: gfs daemon appears to be dead
Jan 16 14:35:32 grant-02 dlm_controld[9283]: cluster is down, exiting
Jan 16 14:35:32 grant-02 fenced[9277]: cluster is down, exiting
Jan 16 14:35:32 grant-02 kernel: dlm: closing connection to node 1
Jan 16 14:35:32 grant-02 kernel: dlm: closing connection to node 3
Jan 16 14:35:32 grant-02 kernel: dlm: closing connection to node 2
Jan 16 14:35:33 grant-02 kernel: Trying to join cluster "lock_dlm", "GRANT:38"

It's possible that this issue may be closed bz 223271.

Version-Release number of selected component (if applicable):
2.6.18-128.el5

lvm2-2.02.40-6.el5    BUILT: Fri Oct 24 07:37:33 CDT 2008
lvm2-cluster-2.02.40-7.el5    BUILT: Wed Nov 26 07:19:19 CST 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
kmod-gfs-0.1.31-3.el5         BUILT: Wed 17 Dec 2008 03:19:30 PM CST
gfs-utils-0.1.18-1.el5        BUILT: Thu 31 Jul 2008 04:34:07 PM CDT


I'll attach the core file...

Comment 1 Corey Marthaler 2009-01-16 20:51:35 UTC

Created attachment 329254 [details]
gfs_controld corefile from grant-02

Comment 2 Robert Peterson 2009-01-16 21:00:12 UTC

Adding Dave Teigland to the cc list.  I don't have time to look at
this one until next week.

Comment 3 David Teigland 2009-01-16 21:30:17 UTC

I'm not sure how to make that core file useful, I think perhaps
install debuginfo on grant-02 and run gdb on it there?

It looks like there are other bad things happening before the
segfault, too; mount_client_dead indicates mount.gfs is going
away unexpectedly.

Comment 4 Steve Whitehouse 2009-01-19 15:47:00 UTC

See also comment #1 of bz #475312 which might be connected.

Comment 5 David Teigland 2009-01-19 16:53:21 UTC

Core was generated by `/sbin/gfs_controld'.
Program terminated with signal 11, Segmentation fault.
[New process 9289]
#0  0x000000000041038b in got_mount_result (mg=0xab08da0, result=-1, ci=207, 
    another=0) at recover.c:2172
2172            log_group(mg, "got_mount_result: ci %d result %d another %d "
(gdb) bt
#0  0x000000000041038b in got_mount_result (mg=0xab08da0, result=-1, ci=207, 
    another=0) at recover.c:2172
#1  0x0000000000403d0c in loop () at main.c:412
#2  0x0000000000404811 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at main.c:983
(gdb) p *mg
$1 = {list = {next = 0xab065b0, prev = 0x81ffe0}, id = 0, 
  associated_ls_id = 0, members = {next = 0xab08db8, prev = 0xab08db8}, 
  members_gone = {next = 0xab08dc8, prev = 0xab08dc8}, memb_count = 0, 
  resources = {next = 0xab08de0, prev = 0xab08de0}, mountpoints = {
    next = 0xab0a570, prev = 0xab0a570}, 
  name = "21", '\0' <repeats 253 times>, 
  table = "GRANT:21", '\0' <repeats 247 times>, type = "gfs\000", 
  options = "rw", '\0' <repeats 1022 times>, 
  dev = "/dev/mapper/vg2-lv1", '\0' <repeats 4077 times>, last_stop = 0, 
  last_start = 0, last_finish = 0, last_callback = 0, start_event_nr = 0, 
  start_type = 0, error_msg = '\0' <repeats 127 times>, mount_client = 207, 
  mount_client_fd = 213, mount_client_notified = 0, mount_client_delay = 0, 
  group_leave_on_finish = 0, remount_client = 0, init = 1, 
  got_our_options = 0, got_our_journals = 0, delay_send_journals = 0, 
  kernel_mount_error = 0, kernel_mount_done = 0, got_kernel_mount = 0, 
  first_mount_pending_stop = 0, first_mounter = 0, first_mounter_done = 0, 
  global_first_recover_done = 0, emulate_first_mounter = 0, 
  wait_first_done = 0, low_nodeid = -1, master_nodeid = -1, save_plocks = 0, 
  reject_mounts = 0, cp_handle = 0, last_checkpoint_time = 0, 
  last_plock_time = 0, drop_resources_last = {tv_sec = 0, tv_usec = 0}, 
  needs_recovery = 0, our_jid = 0, spectator = 0, readonly = 0, rw = 1, 
  withdraw = 0, dmsetup_wait = 0, dmsetup_pid = 0, saved_messages = {
    next = 0xab0a548, prev = 0xab0a548}, start2_fn = 0x0}
(gdb) p memb
$2 = <value optimized out>
(gdb) p *memb
Cannot access memory at address 0x0

Comment 6 David Teigland 2009-01-19 17:01:27 UTC

<dct_home> corey, so when you say mounting a "bunch of filesystems" you mean how many hundred at once?
<corey> dct_home: "how many hundred at once" you know me too well ;)
<corey> dct_home: 560
<dct_home> ok, thanks, just wanted that to put your bug into perspective for anyone who was actually worried about it ;-)

Comment 7 David Teigland 2009-01-19 17:05:29 UTC

Created attachment 329366 [details]
gfs_controld debug log

extracted from the core file

Comment 8 Corey Marthaler 2009-01-19 22:53:05 UTC

FWIW, I was able to repo this bz with "just" 100 GFS, I tried less then that but end up hitting cluster deadlocks each time. I'll file a BZ for that now...

Comment 9 David Teigland 2009-01-20 20:12:15 UTC

The immediate cause of the segfault isn't hard to fix, but it's not
terribly helpful because there are already other problems that have
occured before that happens.  I'm going to approach this from the
other direction -- find a number that works and increase from there.
The hope being that we'll hit the various bugs incrementally instead
of trying to diagnose and fix a whole bunch all jumbled together.

Comment 10 David Teigland 2009-01-21 22:44:53 UTC

My current testing is on a two node xen cluster, where one node does
for i in `seq 1 32`; do mount -v /dev/dctxen/x$i /gfs$i & done

This will often work, but sometimes triggers an error where one mount
process will be stuck:
 2519 pts/0    S      0:00 mount -v /dev/dctxen/x13 /gfs13
 2551 pts/0    S      0:00 /sbin/mount.gfs /dev/dctxen/x13 /gfs13 -v -o rw

mount.gfs is getting an empty response back from gfs_controld
(prefixing all the mount.gfs output with the fs name or mount point
would help in deciphering all the mixed together output).

/sbin/mount.gfs: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs: write "join /gfs13 gfs lock_dlm dctxen:13 rw /dev/mapper/dctxen
-x13"
/sbin/mount.gfs: message from gfs_controld: response to join request:
/sbin/mount.gfs: lock_dlm_join: read "0"
/sbin/mount.gfs: message from gfs_controld: mount options:
/sbin/mount.gfs: lock_dlm_join: read ""
/sbin/mount.gfs: lock_dlm_join: hostdata: ""
/sbin/mount.gfs: lock_dlm_join: extra_plus: ""
/sbin/mount.gfs: parse_opts: opts = "rw"
/sbin/mount.gfs:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs: parse_opts: flags = 0
/sbin/mount.gfs: parse_opts: extra = ""
/sbin/mount.gfs: parse_opts: hostdata = ""
/sbin/mount.gfs: parse_opts: lockproto = ""
/sbin/mount.gfs: parse_opts: locktable = ""

Instead of "", we *should* have gotten
lock_dlm_join: read "hostdata=jid=0:id=917505:first=1"

the gfs-kernel error confirms that no proper option string is passed:

Jan 21 16:29:12 dct-xen-01 kernel: GFS: fsid=dctxen:13.4294967295: can't mount j
ournal #4294967295
Jan 21 16:29:12 dct-xen-01 kernel: GFS: fsid=dctxen:13.4294967295: there are onl
y 2 journals (0 - 1)

gfs_controld debug shows that the connection between mount.gfs and
gfs_controld has been closed prematurely.

1232576938 client 18: join /gfs13 gfs lock_dlm dctxen:13 rw /dev/mapper/dctxen-x13
1232576938 mount: /gfs13 gfs lock_dlm dctxen:13 rw /dev/mapper/dctxen-x13
1232576938 13 cluster name matches: dctxen
1232576938 13 do_mount: rv 0
1232576938 client 18 fd 25 dead
1232576938 13 get open /sys/fs/gfs/dctxen:13/lock_module/id error -1 2
1232576938 mount_client_dead ci 18 no sysfs entry for fs
1232576938 groupd cb: set_id 13 e0001
1232576938 groupd cb: start 13 type 2 count 1 members 1
1232576938 13 start 14 init 1 type 2 member_count 1
1232576938 13 add member 1
1232576938 13 total members 1 master_nodeid -1 prev -1
1232576938 13 start_first_mounter
1232576938 13 start_done 14
1232576938 notify_mount_client: nodir not found for lockspace 13
1232576938 notify_mount_client: ccs_disconnect
1232576938 notify_mount_client: hostdata=jid=0:id=917505:first=1
1232576938 write errno 9
1232576938 notify_mount_client: send failed -1
1232576938 groupd cb: finish 13

gfs_controld is calling client_dead() on the connection from mount.gfs,
even though mount.gfs hasn't exited.  There's one of two places that
could be happening; finding which is the next step.

Comment 11 David Teigland 2009-01-22 22:39:31 UTC

I'm having a lot of difficulty reproducing this consistently.  In a way
it verifies the uninitialized memory that I suspect is the bug, but I've
not been able to confirm.  For a while, I was seeing poll revents of 0xFFFF
cause the connections to be closed.  That makes me think that the following
may be the bug.  I guess it's a good patch regardless, so I'll probably
commit to the RHEL5 branch so we can get a new build for you to try.

diff --git a/group/gfs_controld/main.c b/group/gfs_controld/main.c
index b2edc32..7293938 100644
--- a/group/gfs_controld/main.c
+++ b/group/gfs_controld/main.c
@@ -208,6 +208,7 @@ static int client_add(int fd)
                for (i = client_size - MAX_CLIENTS; i < client_size; i++) {
                        client[i].fd = -1;
                        pollfd[i].fd = -1;
+                       pollfd[i].revents = 0;
                }
        }
 }

Comment 12 David Teigland 2009-01-22 22:46:39 UTC

commit in RHEL5 branch e4c7af4521fdacd4663ccf44d21208973f891ede

Comment 13 David Teigland 2009-03-24 17:21:54 UTC

Let's assume for now that that patch fixed it.

Comment 16 David Teigland 2009-05-19 16:13:30 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
- cause: mounting around 100 gfs fs's in parallel

- consequence: some mount processes never complete

- fix: initialize variable in gfs_controld so that connection with mount.gfs is not broken

- result: mount.gfs receives correct data from gfs_controld, which prevents the mount error and hang

Comment 18 errata-xmlrpc 2009-09-02 11:10:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html