Description of problem: after kernel panic or reboot all nodes stop activity after restarting node i see in log file rebooted node: Jul 17 00:29:12 master kernel: CMAN: sending membership request Jul 17 00:29:12 master kernel: CMAN: got node c1.310.ru Jul 17 00:29:12 master kernel: CMAN: got node c5.310.ru Jul 17 00:29:12 master kernel: CMAN: got node c3.310.ru Jul 17 00:29:12 master kernel: CMAN: got node c4.310.ru Jul 17 00:29:12 master kernel: CMAN: got node c2.310.ru Jul 17 00:29:12 master kernel: CMAN: quorum regained, resuming activity on other nodes Jul 17 00:30:33 cX kernel: CMAN: node master.310.ru rejoining in master.310.ru: /proc/cluster/dlm_debug - empty /proc/cluster/dlm_locks - empty /proc/cluster/dlm_rcom - empty /proc/cluster/lock_dlm_debug - empty /proc/cluster/nodes: Node Votes Exp Sts Name 1 1 7 M c3.310.ru 2 1 7 M c4.310.ru 3 1 7 M c5.310.ru 4 1 7 M c2.310.ru 5 1 7 M c1.310.ru 6 1 7 M master.310.ru /proc/cluster/services: Service Name GID LID State Code Fence Domain: "default" 0 2 join S-1,80,0 [] DLM Lock Space: "gfs01" 0 3 join S-1,80,0 [] /proc/cluster/sm_debug: sevent state 3 00000000 sevent state 1 00000000 sevent state 3 00000000 sevent state 1 00000000 sevent state 3 00000000 sevent state 1 00000000 sevent state 3 00000000 sevent state 1 00000000 sevent state 3 00000000 sevent state 1 00000000 sevent state 3 /proc/cluster/status: Version: 2.0.1 Config version: 1 Cluster name: 310farm Cluster ID: 7141 Membership state: Cluster-Member Nodes: 6 Expected_votes: 7 Total_votes: 6 Quorum: 4 Active subsystems: 4 Node addresses: 83.97.108.3 in c3.310.ru: dlm_debug: 1 send cv 30081 to 4 gfs01 cv 3 60080 " 3 76161a9" gfs01 send cv 60080 to 4 gfs01 cv 3 500b1 " 3 7626193" gfs01 send cv 500b1 to 4 gfs01 cv 3 30250 " 3 763617d" gfs01 send cv 30250 to 4 gfs01 cv 3 402b4 " 3 7646167" gfs01 send cv 402b4 to 4 gfs01 cv 3 701bc " 3 7656151" gfs01 send cv 701bc to 4 gfs01 cv 3 802e6 " 3 766613b" gfs01 send cv 802e6 to 4 gfs01 cv 3 801a6 " 3 7676125" gfs01 send cv 801a6 to 4 gfs01 cv 3 602fd " 3 768610f" gfs01 send cv 602fd to 4 gfs01 cv 3 503d7 " 3 76960f9" gfs01 send cv 503d7 to 4 gfs01 cv 3 a0165 " 3 76e608b" gfs01 send cv a0165 to 4 gfs01 cv 3 70392 " 3 76f6075" gfs01 send cv 70392 to 4 gfs01 cv 3 60022 " 3 770605f" gfs01 send cv 60022 to 4 gfs01 cv 3 70061 " 3 7716049" gfs01 send cv 70061 to 4 gfs01 cv 3 2008f " 3 7726033" gfs01 send cv 2008f to 4 gfs01 move flags 1,0,0 ids 27,27,27 lock_dlm_debug: k 3,74663fb id 603a9 0,3 1c lk 3,74763e5 id 70132 0,3 1c lk 3,74863cf id c0043 0,3 1c lk 3,74963b9 id 903bc 0,3 1c lk 3,74a63a3 id 80111 0,3 1c lk 3,74b638d id a00ac 0,3 1c lk 3,74c6377 id 3012d 0,3 1c lk 3,74d6361 id 30032 0,3 1c lk 3,74e634b id 5039d 0,3 1c lk 3,74f6335 id 40071 0,3 1c lk 3,750631f id 4020c 0,3 1c lk 3,7516309 id 70169 0,3 1c lk 3,75262f3 id 50011 0,3 1c lk 3,75362dd id 60101 0,3 1c qc 3,71867ef 0,3 id 301d7 sts 0 qc 3,71967d9 0,3 id 703bb sts 0 qc 3,71a67c3 0,3 id 6026a sts 0 qc 3,71b67ad 0,3 id 80042 sts 0 qc 3,71c6797 0,3 id 801a5 sts 0 qc 3,71d6781 0,3 id 10254 sts 0 qc 3,71e676b 0,3 id 202a9 sts 0 qc 3,71f6755 0,3 id 3033c sts 0 qc 3,720673f 0,3 id 6015d sts 0 lk 3,75462c7 id 501fb 0,3 1c qc 3,7216729 0,3 id 403d6 sts 0 qc 3,7226713 0,3 id 703e4 sts 0 qc 3,72366fd 0,3 id 70300 sts 0 qc 3,72466e7 0,3 id 503a9 sts 0 lk 3,75562b1 id 9032e 0,3 1c qc 3,72566d1 0,3 id 500b7 sts 0 qc 3,72666bb 0,3 id 602bb sts 0 qc 3,72766a5 0,3 id 6025f sts 0 qc 3,728668f 0,3 id 402d6 sts 0 lk 3,756629b id 402b3 0,3 1c qc 3,7296679 0,3 id 6033e sts 0 qc 3,72a6663 0,3 id 902f4 sts 0 qc 3,72b664d 0,3 id 50193 sts 0 lk 3,7576285 id 5011f 0,3 1c lk 3,758626f id 80361 0,3 1c lk 3,7596259 id 50175 0,3 1c lk 3,75a6243 id 7037b 0,3 1c lk 3,75b622d id 50066 0,3 1c lk 3,75c6217 id 60211 0,3 1c lk 3,75d6201 id 40335 0,3 1c lk 3,75e61eb id 6039c 0,3 1c lk 3,75f61d5 id 803f5 0,3 1c lk 3,76061bf id 30081 0,3 1c lk 3,76161a9 id 60080 0,3 1c lk 3,7626193 id 500b1 0,3 1c lk 3,763617d id 30250 0,3 1c lk 3,7646167 id 402b4 0,3 1c qc 3,72c6637 0,3 id 10129 sts 0 qc 3,72d6621 0,3 id 90094 sts 0 lk 3,7656151 id 701bc 0,3 1c qc 3,72e660b 0,3 id 50257 sts 0 qc 3,72f65f5 0,3 id 40323 sts 0 qc 3,73065df 0,3 id 503c3 sts 0 qc 3,73165c9 0,3 id b0038 sts 0 lk 3,766613b id 802e6 0,3 1c qc 3,73265b3 0,3 id 6014f sts 0 qc 3,733659d 0,3 id 5024a sts 0 qc 3,7346587 0,3 id 501b9 sts 0 lk 3,7676125 id 801a6 0,3 1c qc 3,7356571 0,3 id 702c1 sts 0 qc 3,736655b 0,3 id 700e0 sts 0 lk 3,768610f id 602fd 0,3 1c qc 3,7376545 0,3 id 501e4 sts 0 qc 3,738652f 0,3 id 5038c sts 0 qc 3,7396519 0,3 id 60387 sts 0 qc 3,73a6503 0,3 id 8031e sts 0 lk 3,76960f9 id 503d7 0,3 1c qc 3,73b64ed 0,3 id 502f7 sts 0 qc 3,73c64d7 0,3 id a03ae sts 0 qc 3,73d64c1 0,3 id 50116 sts 0 qc 3,73e64ab 0,3 id 4021a sts 0 lk 3,76e608b id a0165 0,3 1c lk 3,76f6075 id 70392 0,3 1c lk 3,770605f id 60022 0,3 1c lk 3,7716049 id 70061 0,3 1c lk 3,7726033 id 2008f 0,3 1c qc 3,73f6495 0,3 id 60245 sts 0 qc 3,740647f 0,3 id 603a8 sts 0 qc 3,7416469 0,3 id 40106 sts 0 qc 3,7426453 0,3 id 501db sts 0 qc 3,743643d 0,3 id 30049 sts 0 qc 3,7446427 0,3 id 6036d sts 0 qc 3,7456411 0,3 id a03cf sts 0 qc 3,74663fb 0,3 id 603a9 sts 0 qc 3,74763e5 0,3 id 70132 sts 0 qc 3,74863cf 0,3 id c0043 sts 0 qc 3,74963b9 0,3 id 903bc sts 0 qc 3,74a63a3 0,3 id 80111 sts 0 qc 3,74b638d 0,3 id a00ac sts 0 qc 3,74c6377 0,3 id 3012d sts 0 qc 3,74d6361 0,3 id 30032 sts 0 qc 3,74e634b 0,3 id 5039d sts 0 qc 3,74f6335 0,3 id 40071 sts 0 qc 3,750631f 0,3 id 4020c sts 0 qc 3,7516309 0,3 id 70169 sts 0 qc 3,75262f3 0,3 id 50011 sts 0 qc 3,75362dd 0,3 id 60101 sts 0 qc 3,75462c7 0,3 id 501fb sts 0 qc 3,75562b1 0,3 id 9032e sts 0 qc 3,756629b 0,3 id 402b3 sts 0 qc 3,7576285 0,3 id 5011f sts 0 qc 3,758626f 0,3 id 80361 sts 0 qc 3,7596259 0,3 id 50175 sts 0 qc 3,75a6243 0,3 id 7037b sts 0 qc 3,75b622d 0,3 id 50066 sts 0 qc 3,75c6217 0,3 id 60211 sts 0 qc 3,75d6201 0,3 id 40335 sts 0 qc 3,75e61eb 0,3 id 6039c sts 0 qc 3,75f61d5 0,3 id 803f5 sts 0 ..... sm_debug: uevent state 3 node 6 02000003 add node 6 count 6 02000003 uevent state 5 node 6 02000003 uevent state 7 node 6 00000001 remove node 6 count 5 01000002 remove node 6 count 5 02000003 remove node 6 count 5 00000001 recover state 0 00000001 recover state 1 i probe on any nodes run fence_ack_manual - has not helped Version-Release number of selected component (if applicable): kernel 2.6.7 lock dlm from cvs
if you can reproduce this problem, please provide the output of /proc/cluster/services from all the nodes.
for exapmle: [root@c1 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 recover 4 - [1 3 4 2 6 7] DLM Lock Space: "gfs01" 4 3 recover 0 - [1 3] GFS Mount Group: "gfs01" 5 4 recover 0 - [1 3] [root@c2 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 0 2 join S-1,80,7 [] DLM Lock Space: "gfs01" 0 3 join S-1,80,7 [] [root@c3 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 recover 4 - [3 1 2 4 6 7] DLM Lock Space: "gfs01" 0 3 join S-1,280,7 [] c1 mount -t gfs /dev/sda12 /pub c2 mount -t gfs /dev/sda12 /pub c2 rebooted c3 mount -t gfs /dev/sda12 /pub c1,c2,c3 nodes stop activity after reboot c2 node. [root@c1 root]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 2 M master.310.ru 2 1 2 M c4.310.ru 3 1 2 M c1.310.ru 4 1 2 M c5.310.ru 5 1 2 M c2.310.ru 6 1 2 M c3.310.ru 7 1 2 M c0.310.ru
I recently fixed some things that may be related to this. Could you try this again with the latest code in cvs and let us know if it's still a problem?
Updates with the proper version and component name.
Could this bug and bz133420 be related?
yes it could be