Bug 444909 - aisexec died when another node left cluster
aisexec died when another node left cluster
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais (Show other bugs)
5.2
All Linux
urgent Severity low
: rc
: ---
Assigned To: Steven Dake
: ZStream
Depends On:
Blocks: 509889
  Show dependency treegraph
 
Reported: 2008-05-01 14:26 EDT by Corey Marthaler
Modified: 2016-04-26 11:38 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:46:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
core file (1.58 MB, application/x-gzip)
2008-05-01 14:48 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2008-05-01 14:26:24 EDT
Description of problem:
I had 3 nodes doing mount/umount operations on GFS filesystems when I rebooted
one of the nodes (hayes-03). Instead of recovery taking place like I had
expected, the remaing 2 nodes each just left the cluster, leaving the umount
cmds hung.

I'll attach the core left behind on hayes-01.

May  1 11:16:41 hayes-01 openais[4171]: [TOTEM] Did not need to originate any
messages in recovery.
May  1 11:16:41 hayes-01 openais[4171]: [TOTEM] Sending initial ORF token
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] CLM CONFIGURATION CHANGE
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] New Configuration:
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ]         r(0) ip(10.15.89.135)
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ]         r(0) ip(10.15.89.136)
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] Members Left:
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ]         r(0) ip(10.15.89.137)
May  1 11:16:41 hayes-01 kernel: dlm: closing connection to node 3
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] Members Joined:
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] CLM CONFIGURATION CHANGE
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] New Configuration:
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ]         r(0) ip(10.15.89.135)
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ]         r(0) ip(10.15.89.136)
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] Members Left:
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] Members Joined:
May  1 11:16:41 hayes-01 openais[4171]: [SYNC ] This node is within the primary
component and will provide service.
May  1 11:16:41 hayes-01 openais[4171]: [TOTEM] entering OPERATIONAL state.
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] got nodejoin message 10.15.89.135
May  1 11:16:41 hayes-01 openais[4171]: [CLM  ] got nodejoin message 10.15.89.136
May  1 11:16:42 hayes-01 dlm_controld[4194]: cluster is down, exiting
May  1 11:16:42 hayes-01 groupd[4180]: cpg_mcast_joined error 2 handle
6b8b456700000000
May  1 11:16:42 hayes-01 gfs_controld[4200]: groupd_dispatch error -1 errno 104
May  1 11:16:42 hayes-01 gfs_controld[4200]: groupd connection died
May  1 11:16:42 hayes-01 gfs_controld[4200]: cluster is down, exiting
May  1 11:16:42 hayes-01 fenced[4188]: cluster is down, exiting
May  1 11:16:42 hayes-01 kernel: dlm: closing connection to node 2
May  1 11:16:42 hayes-01 kernel: dlm: closing connection to node 1
May  1 11:17:10 hayes-01 ccsd[4164]: Unable to connect to cluster infrastructure
after 30 seconds.


May  1 11:16:37 hayes-02 openais[4553]: [TOTEM] aru 87563 high delivered 87563
received flag 1
May  1 11:16:37 hayes-02 openais[4553]: [TOTEM] Did not need to originate any
messages in recovery.
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] CLM CONFIGURATION CHANGE
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] New Configuration:
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ]         r(0) ip(10.15.89.135)
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ]         r(0) ip(10.15.89.136)
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] Members Left:
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ]         r(0) ip(10.15.89.137)
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] Members Joined:
May  1 11:16:37 hayes-02 kernel: dlm: closing connection to node 3
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] CLM CONFIGURATION CHANGE
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] New Configuration:
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ]         r(0) ip(10.15.89.135)
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ]         r(0) ip(10.15.89.136)
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] Members Left:
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] Members Joined:
May  1 11:16:37 hayes-02 openais[4553]: [SYNC ] This node is within the primary
component and will provide service.
May  1 11:16:37 hayes-02 openais[4553]: [TOTEM] entering OPERATIONAL state.
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] got nodejoin message 10.15.89.135
May  1 11:16:37 hayes-02 openais[4553]: [CLM  ] got nodejoin message 10.15.89.136
May  1 11:16:37 hayes-02 gfs_controld[4581]: cluster is down, exiting
May  1 11:16:37 hayes-02 groupd[4564]: cpg_mcast_joined error 2 handle
6b8b456700000000
May  1 11:16:37 hayes-02 dlm_controld[4576]: cluster is down, exiting
May  1 11:16:37 hayes-02 fenced[4571]: cluster is down, exiting
May  1 11:16:37 hayes-02 kernel: dlm: closing connection to node 1
May  1 11:16:37 hayes-02 kernel: dlm: closing connection to node 2
May  1 11:17:02 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 30 seconds.
May  1 11:17:33 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 60 seconds.
May  1 11:18:03 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 90 seconds.
May  1 11:18:33 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 120 seconds.
May  1 11:19:03 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 150 seconds.
May  1 11:19:33 hayes-02 ccsd[4095]: Unable to connect to cluster infrastructure
after 180 seconds.
May  1 11:19:57 hayes-02 kernel: dlm: 24: remove fr 0 ID 2
May  1 11:19:57 hayes-02 kernel: dlm: 16: remove fr 0 ID 2
May  1 11:19:57 hayes-02 kernel: dlm: 16: remove fr 0 ID 2



Version-Release number of selected component (if applicable):
2.6.18-90.el5
openais-0.80.3-15.el5
cman-2.0.84-2.el5
Comment 1 Steven Dake 2008-05-01 14:32:22 EDT
corey
was there a core file in /var/lib/openais?

If so, what was its backtrace

thanks
Comment 2 Corey Marthaler 2008-05-01 14:38:44 EDT
(gdb) bt
#0  0x0000003f3e830155 in raise () from /lib64/libc.so.6
#1  0x0000003f3e831bf0 in abort () from /lib64/libc.so.6
#2  0x0000003f3e8295d6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002aaaabcf72a3 in ckpt_checkpoint_close () from
/usr/libexec/lcrso/service_ckpt.lcrso
#4  0x0000000000414885 in totempg_groups_initialize ()
#5  0x0000000000414b98 in totempg_groups_initialize ()
#6  0x000000000040fd28 in totem_callback_token_type ()
#7  0x000000000041194c in totem_callback_token_type ()
#8  0x0000000000409a03 in rrp_deliver_fn ()
#9  0x0000000000407e96 in totemnet_net_mtu_adjust ()
#10 0x0000000000405a12 in poll_run ()
#11 0x0000000000418860 in main ()
Comment 3 Corey Marthaler 2008-05-01 14:48:19 EDT
Created attachment 304333 [details]
core file
Comment 8 errata-xmlrpc 2009-01-20 15:46:52 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0074.html

Note You need to log in before you can comment on or make changes to this bug.