Bug 501561
Summary: | gfs_controld segfault during simultaneous gfs mounts | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | ||||||
Component: | openais | Assignee: | Steven Dake <sdake> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 5.4 | CC: | cluster-maint, edamato, jplans, rpeterso, sdake, swhiteho | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openais-0.80.6-2.e5_4 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 502044 (view as bug list) | Environment: | |||||||
Last Closed: | 2009-09-02 11:30:06 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 502044, 502940 | ||||||||
Attachments: |
|
Description
Corey Marthaler
2009-05-19 18:28:56 UTC
Reassigning to Dave Teigland; gfs_controld is his area of expertise. Reproduced and got a core. Created attachment 344712 [details]
core file from hayes-01
from hayes-01 mount that works 1242770170 client 6: join /mnt/hayes0 gfs lock_dlm HAYES:0 rw /dev/mapper/HAYES-HAYES0 1242770170 mount: /mnt/hayes0 gfs lock_dlm HAYES:0 rw /dev/mapper/HAYES-HAYES0 1242770170 0 cluster name matches: HAYES 1242770170 0 do_mount: rv 0 1242770170 groupd cb: set_id 0 30001 1242770170 groupd cb: start 0 type 2 count 1 members 1 1242770170 0 start 11 init 1 type 2 member_count 1 1242770170 0 add member 1 1242770170 0 total members 1 master_nodeid -1 prev -1 1242770170 0 start_first_mounter 1242770170 0 start_done 11 1242770170 notify_mount_client: nodir not found for lockspace 0 1242770170 notify_mount_client: ccs_disconnect 1242770170 notify_mount_client: hostdata=jid=0:id=196609:first=1 1242770170 groupd cb: finish 0 1242770170 0 finish 11 needs_recovery 0 1242770170 0 set /sys/fs/gfs/HAYES:0/lock_module/block to 0 then a mount (in parallel with other nodes) that doesn't 1242770261 client 6: join /mnt/hayes0 gfs lock_dlm HAYES:0 rw /dev/mapper/HAYES-HAYES0 1242770261 mount: /mnt/hayes0 gfs lock_dlm HAYES:0 rw /dev/mapper/HAYES-HAYES0 1242770261 0 cluster name matches: HAYES 1242770261 0 do_mount: rv 0 1242770261 groupd cb: stop 0 1242770261 0 set /sys/fs/gfs/HAYES:0/lock_module/block to 1 1242770261 0 set open /sys/fs/gfs/HAYES:0/lock_module/block error -1 2 1242770261 0 do_stop causes mount_client_delay 1242770261 groupd cb: set_id 0 0 1242770261 replace zero id for 0 with 4108050209 1242770261 groupd cb: start 0 type 2 count 1 members 2 1242770261 0 start 23 init 1 type 2 member_count 1 1242770261 0 add member 2 1242770261 0 total members 1 master_nodeid -1 prev -1 1242770261 0 start_first_mounter 1242770261 Assertion failed on line 682 of file recover.c Assertion: "memb" Seems to be bad callbacks/data from groupd. maybe related to bug 480709? openais regression. Created attachment 344852 [details]
debug info
Here are logs from groupd, dlm_controld, gfs_controld, group_tool, /var/log/messages, along with some analysis. groupd seems to be getting some strange cpg confchg data, but I can't say yet if there's one consistent problem with them.
Fix verified in openais-0.80.6-2.el5 / cman-2.0.103-1.el5. *** Bug 499734 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1366.html |