Bug 128059 - lock dlm problem after reboot node
Summary: lock dlm problem after reboot node
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-07-16 20:56 UTC by Anton Nekhoroshikh
Modified: 2009-04-16 20:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-02 14:49:06 UTC
Embargoed:


Attachments (Terms of Use)

Description Anton Nekhoroshikh 2004-07-16 20:56:41 UTC
Description of problem:

after kernel panic or reboot all nodes stop activity
after restarting node i see in log file rebooted node:

Jul 17 00:29:12 master kernel: CMAN: sending membership request
Jul 17 00:29:12 master kernel: CMAN: got node c1.310.ru
Jul 17 00:29:12 master kernel: CMAN: got node c5.310.ru
Jul 17 00:29:12 master kernel: CMAN: got node c3.310.ru
Jul 17 00:29:12 master kernel: CMAN: got node c4.310.ru
Jul 17 00:29:12 master kernel: CMAN: got node c2.310.ru
Jul 17 00:29:12 master kernel: CMAN: quorum regained, resuming activity

on other nodes
Jul 17 00:30:33 cX kernel: CMAN: node master.310.ru rejoining

in master.310.ru:
/proc/cluster/dlm_debug - empty
/proc/cluster/dlm_locks - empty
/proc/cluster/dlm_rcom - empty
/proc/cluster/lock_dlm_debug - empty
/proc/cluster/nodes:
Node  Votes Exp Sts  Name
   1    1    7   M   c3.310.ru
   2    1    7   M   c4.310.ru
   3    1    7   M   c5.310.ru
   4    1    7   M   c2.310.ru
   5    1    7   M   c1.310.ru
   6    1    7   M   master.310.ru
/proc/cluster/services:
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           0   2 join     
S-1,80,0
[]

DLM Lock Space:  "gfs01"                             0   3 join     
S-1,80,0
[]
/proc/cluster/sm_debug:
 sevent state 3
00000000 sevent state 1
00000000 sevent state 3
00000000 sevent state 1
00000000 sevent state 3
00000000 sevent state 1
00000000 sevent state 3
00000000 sevent state 1
00000000 sevent state 3
00000000 sevent state 1
00000000 sevent state 3
/proc/cluster/status:
Version: 2.0.1
Config version: 1
Cluster name: 310farm
Cluster ID: 7141
Membership state: Cluster-Member
Nodes: 6
Expected_votes: 7
Total_votes: 6
Quorum: 4
Active subsystems: 4
Node addresses: 83.97.108.3

in c3.310.ru:
dlm_debug:
1 send cv 30081 to 4
gfs01 cv 3 60080 "       3         76161a9"
gfs01 send cv 60080 to 4
gfs01 cv 3 500b1 "       3         7626193"
gfs01 send cv 500b1 to 4
gfs01 cv 3 30250 "       3         763617d"
gfs01 send cv 30250 to 4
gfs01 cv 3 402b4 "       3         7646167"
gfs01 send cv 402b4 to 4
gfs01 cv 3 701bc "       3         7656151"
gfs01 send cv 701bc to 4
gfs01 cv 3 802e6 "       3         766613b"
gfs01 send cv 802e6 to 4
gfs01 cv 3 801a6 "       3         7676125"
gfs01 send cv 801a6 to 4
gfs01 cv 3 602fd "       3         768610f"
gfs01 send cv 602fd to 4
gfs01 cv 3 503d7 "       3         76960f9"
gfs01 send cv 503d7 to 4
gfs01 cv 3 a0165 "       3         76e608b"
gfs01 send cv a0165 to 4
gfs01 cv 3 70392 "       3         76f6075"
gfs01 send cv 70392 to 4
gfs01 cv 3 60022 "       3         770605f"
gfs01 send cv 60022 to 4
gfs01 cv 3 70061 "       3         7716049"
gfs01 send cv 70061 to 4
gfs01 cv 3 2008f "       3         7726033"
gfs01 send cv 2008f to 4
gfs01 move flags 1,0,0 ids 27,27,27

lock_dlm_debug:
k 3,74663fb id 603a9 0,3 1c
lk 3,74763e5 id 70132 0,3 1c
lk 3,74863cf id c0043 0,3 1c
lk 3,74963b9 id 903bc 0,3 1c
lk 3,74a63a3 id 80111 0,3 1c
lk 3,74b638d id a00ac 0,3 1c
lk 3,74c6377 id 3012d 0,3 1c
lk 3,74d6361 id 30032 0,3 1c
lk 3,74e634b id 5039d 0,3 1c
lk 3,74f6335 id 40071 0,3 1c
lk 3,750631f id 4020c 0,3 1c
lk 3,7516309 id 70169 0,3 1c
lk 3,75262f3 id 50011 0,3 1c
lk 3,75362dd id 60101 0,3 1c
qc 3,71867ef 0,3 id 301d7 sts 0
qc 3,71967d9 0,3 id 703bb sts 0
qc 3,71a67c3 0,3 id 6026a sts 0
qc 3,71b67ad 0,3 id 80042 sts 0
qc 3,71c6797 0,3 id 801a5 sts 0
qc 3,71d6781 0,3 id 10254 sts 0
qc 3,71e676b 0,3 id 202a9 sts 0
qc 3,71f6755 0,3 id 3033c sts 0
qc 3,720673f 0,3 id 6015d sts 0
lk 3,75462c7 id 501fb 0,3 1c
qc 3,7216729 0,3 id 403d6 sts 0
qc 3,7226713 0,3 id 703e4 sts 0
qc 3,72366fd 0,3 id 70300 sts 0
qc 3,72466e7 0,3 id 503a9 sts 0
lk 3,75562b1 id 9032e 0,3 1c
qc 3,72566d1 0,3 id 500b7 sts 0
qc 3,72666bb 0,3 id 602bb sts 0
qc 3,72766a5 0,3 id 6025f sts 0
qc 3,728668f 0,3 id 402d6 sts 0
lk 3,756629b id 402b3 0,3 1c
qc 3,7296679 0,3 id 6033e sts 0
qc 3,72a6663 0,3 id 902f4 sts 0
qc 3,72b664d 0,3 id 50193 sts 0
lk 3,7576285 id 5011f 0,3 1c
lk 3,758626f id 80361 0,3 1c
lk 3,7596259 id 50175 0,3 1c
lk 3,75a6243 id 7037b 0,3 1c
lk 3,75b622d id 50066 0,3 1c
lk 3,75c6217 id 60211 0,3 1c
lk 3,75d6201 id 40335 0,3 1c
lk 3,75e61eb id 6039c 0,3 1c
lk 3,75f61d5 id 803f5 0,3 1c
lk 3,76061bf id 30081 0,3 1c
lk 3,76161a9 id 60080 0,3 1c
lk 3,7626193 id 500b1 0,3 1c
lk 3,763617d id 30250 0,3 1c
lk 3,7646167 id 402b4 0,3 1c
qc 3,72c6637 0,3 id 10129 sts 0
qc 3,72d6621 0,3 id 90094 sts 0
lk 3,7656151 id 701bc 0,3 1c
qc 3,72e660b 0,3 id 50257 sts 0
qc 3,72f65f5 0,3 id 40323 sts 0
qc 3,73065df 0,3 id 503c3 sts 0
qc 3,73165c9 0,3 id b0038 sts 0
lk 3,766613b id 802e6 0,3 1c
qc 3,73265b3 0,3 id 6014f sts 0
qc 3,733659d 0,3 id 5024a sts 0
qc 3,7346587 0,3 id 501b9 sts 0
lk 3,7676125 id 801a6 0,3 1c
qc 3,7356571 0,3 id 702c1 sts 0
qc 3,736655b 0,3 id 700e0 sts 0
lk 3,768610f id 602fd 0,3 1c
qc 3,7376545 0,3 id 501e4 sts 0
qc 3,738652f 0,3 id 5038c sts 0
qc 3,7396519 0,3 id 60387 sts 0
qc 3,73a6503 0,3 id 8031e sts 0
lk 3,76960f9 id 503d7 0,3 1c
qc 3,73b64ed 0,3 id 502f7 sts 0
qc 3,73c64d7 0,3 id a03ae sts 0
qc 3,73d64c1 0,3 id 50116 sts 0
qc 3,73e64ab 0,3 id 4021a sts 0
lk 3,76e608b id a0165 0,3 1c
lk 3,76f6075 id 70392 0,3 1c
lk 3,770605f id 60022 0,3 1c
lk 3,7716049 id 70061 0,3 1c
lk 3,7726033 id 2008f 0,3 1c
qc 3,73f6495 0,3 id 60245 sts 0
qc 3,740647f 0,3 id 603a8 sts 0
qc 3,7416469 0,3 id 40106 sts 0
qc 3,7426453 0,3 id 501db sts 0
qc 3,743643d 0,3 id 30049 sts 0
qc 3,7446427 0,3 id 6036d sts 0
qc 3,7456411 0,3 id a03cf sts 0
qc 3,74663fb 0,3 id 603a9 sts 0
qc 3,74763e5 0,3 id 70132 sts 0
qc 3,74863cf 0,3 id c0043 sts 0
qc 3,74963b9 0,3 id 903bc sts 0
qc 3,74a63a3 0,3 id 80111 sts 0
qc 3,74b638d 0,3 id a00ac sts 0
qc 3,74c6377 0,3 id 3012d sts 0
qc 3,74d6361 0,3 id 30032 sts 0
qc 3,74e634b 0,3 id 5039d sts 0
qc 3,74f6335 0,3 id 40071 sts 0
qc 3,750631f 0,3 id 4020c sts 0
qc 3,7516309 0,3 id 70169 sts 0
qc 3,75262f3 0,3 id 50011 sts 0
qc 3,75362dd 0,3 id 60101 sts 0
qc 3,75462c7 0,3 id 501fb sts 0
qc 3,75562b1 0,3 id 9032e sts 0
qc 3,756629b 0,3 id 402b3 sts 0
qc 3,7576285 0,3 id 5011f sts 0
qc 3,758626f 0,3 id 80361 sts 0
qc 3,7596259 0,3 id 50175 sts 0
qc 3,75a6243 0,3 id 7037b sts 0
qc 3,75b622d 0,3 id 50066 sts 0
qc 3,75c6217 0,3 id 60211 sts 0
qc 3,75d6201 0,3 id 40335 sts 0
qc 3,75e61eb 0,3 id 6039c sts 0
qc 3,75f61d5 0,3 id 803f5 sts 0
.....

sm_debug:
 uevent state 3 node 6
02000003 add node 6 count 6
02000003 uevent state 5 node 6
02000003 uevent state 7 node 6
00000001 remove node 6 count 5
01000002 remove node 6 count 5
02000003 remove node 6 count 5
00000001 recover state 0
00000001 recover state 1

i probe on any nodes run fence_ack_manual - has not helped

Version-Release number of selected component (if applicable):

kernel 2.6.7 
lock dlm from cvs

Comment 1 David Teigland 2004-07-19 14:32:59 UTC
if you can reproduce this problem, please provide the output of
/proc/cluster/services from all the nodes.

Comment 2 Anton Nekhoroshikh 2004-07-19 21:05:41 UTC
for exapmle:
[root@c1 root]# cat /proc/cluster/services 

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 recover 4 -
[1 3 4 2 6 7]

DLM Lock Space:  "gfs01"                             4   3 recover 0 -
[1 3]

GFS Mount Group: "gfs01"                             5   4 recover 0 -
[1 3]

[root@c2 root]# cat /proc/cluster/services 

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           0   2 join     
S-1,80,7
[]

DLM Lock Space:  "gfs01"                             0   3 join     
S-1,80,7
[]

[root@c3 root]# cat /proc/cluster/services 

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 recover 4 -
[3 1 2 4 6 7]

DLM Lock Space:  "gfs01"                             0   3 join     
S-1,280,7
[]

c1 mount -t gfs /dev/sda12 /pub
c2 mount -t gfs /dev/sda12 /pub
c2 rebooted
c3 mount -t gfs /dev/sda12 /pub

c1,c2,c3 nodes stop activity after reboot c2 node.

[root@c1 root]# cat /proc/cluster/nodes    
Node  Votes Exp Sts  Name
   1    1    2   M   master.310.ru
   2    1    2   M   c4.310.ru
   3    1    2   M   c1.310.ru
   4    1    2   M   c5.310.ru
   5    1    2   M   c2.310.ru
   6    1    2   M   c3.310.ru
   7    1    2   M   c0.310.ru





Comment 3 David Teigland 2004-08-19 04:11:51 UTC
I recently fixed some things that may be related to this.  Could
you try this again with the latest code in cvs and let us know if it's
still a problem?

Comment 4 Kiersten (Kerri) Anderson 2004-11-04 15:15:31 UTC
Updates with the proper version and component name.

Comment 5 Corey Marthaler 2005-01-10 22:09:11 UTC
Could this bug and bz133420 be related?  

Comment 6 David Teigland 2005-01-11 02:16:40 UTC
yes it could be


Note You need to log in before you can comment on or make changes to this bug.