Bug 382621 - gfs umount deadlock cman:kcl_leave_service
gfs umount deadlock cman:kcl_leave_service
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: David Teigland
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-14 10:19 EST by Corey Marthaler
Modified: 2010-01-11 22:16 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-14 12:38:24 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
stack traces form grant-01 (99.16 KB, text/plain)
2007-11-14 10:31 EST, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-11-14 10:19:57 EST
Description of problem:
Hit this while running mount_stress with 10 filesystems during 4.6 regression
testing. The mount tests actually finished, but one of the final "clean up"
umounts ended up hanging. 

[root@grant-01 ~]# ps -ef | grep mount
root     21400 21399  0 Nov13 ?        00:00:00 umount -f /mnt/LINK_1286
root     22154  4716  0 08:56 pts/0    00:00:00 grep mount

umount   D 0000090b435b5e79     0 21400  21399                     (NOTLB)
           00000101f1955d08 0000000000000006 ffffffff804c6b20 000000698035fc0c
           00000101ed851190 000000000000469b 00000101ebe133d0 00000101ed851478
           00000101fa0c41b0 00000101ed851478
Call Trace:
<ffffffff8035ffec>{wait_for_completion+312}
<ffffffff801355dd>{default_wake_function+0}
<ffffffff801355dd>{default_wake_function+0}
<ffffffffa027fc7b>{:cman:kcl_leave_service+243}
<ffffffffa029d1e0>{:dlm:release_lockspace+157}
<ffffffffa034a325>{:lock_dlm:release_gdlm+15}
<ffffffffa034ab3f>{:lock_dlm:lm_dlm_unmount+54}
<ffffffffa02bb383>{:lock_harness:lm_unmount+61}
<ffffffffa02dd7c2>{:gfs:gfs_lm_unmount+32}
<ffffffffa02ee543>{:gfs:gfs_put_super+787}
<ffffffff801965c7>{generic_shutdown_super+334}
<ffffffffa02eba74>{:gfs:gfs_kill_sb+41}
<ffffffff80196460>{deactivate_super+220}
<ffffffff801b6c09>{sys_umount+1822}
<ffffffff8019b9b7>{sys_newstat+17}
<ffffffff80111415>{error_exit+0}
<ffffffff80110a92>{system_call+126}

Version-Release number of selected component (if applicable):
This was on a UP kernel:
2.6.9-67.EL
GFS-kernel-2.6.9-75.9
Comment 1 Corey Marthaler 2007-11-14 10:27:18 EST
I wonder if this is some how related to bz 290971?
Comment 2 Corey Marthaler 2007-11-14 10:31:10 EST
Created attachment 258181 [details]
stack traces form grant-01
Comment 3 Corey Marthaler 2007-11-14 10:32:02 EST
[root@grant-01 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    6   M   link-02
   2    1    6   M   grant-03
   3    1    6   M   grant-01
   4    1    6   M   grant-02
   5    1    6   M   link-07
   6    1    6   M   link-08
[root@grant-01 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[1 3 5 6 4 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 3 5 6 4 2]

DLM Lock Space:  "LINK_1286"                       385 150 run       S-15,200,2
[3 2]

DLM Lock Space:  "LINK_1288"                       377 152 run       -
[3 2]

DLM Lock Space:  "LINK_1283"                       422 154 run       -
[3 2]

DLM Lock Space:  "LINK_1284"                       369 156 run       -
[3]

DLM Lock Space:  "LINK_1287"                       412 158 run       -
[3 4 2]

DLM Lock Space:  "LINK_1289"                       361 160 run       -
[3 2]

DLM Lock Space:  "LINK_1282"                       353 162 run       -
[3 4 2]

DLM Lock Space:  "LINK_1285"                       432 164 run       -
[3 2]

GFS Mount Group: "LINK_1288"                       381 153 run       -
[3 2]

GFS Mount Group: "LINK_1283"                       427 155 run       -
[3 2]

GFS Mount Group: "LINK_1284"                       373 157 run       -
[3]

GFS Mount Group: "LINK_1287"                       417 159 run       -
[3 4 2]

GFS Mount Group: "LINK_1289"                       365 161 run       -
[3 2]

GFS Mount Group: "LINK_1282"                       357 163 run       -
[3 4 2]

GFS Mount Group: "LINK_1285"                       437 165 run       -
[3 2]
Comment 4 Christine Caulfield 2007-11-14 10:36:24 EST
There's a possibility it might be related to 373671 I suppose. The sooner we get
that one acked & included the happier I'll be about some of these odd hangs.
Comment 5 David Teigland 2007-11-14 10:52:51 EST
grant-03 would be just as interesting to inspect, can we still get data from
that node?  In addition to the bug Patrick mentioned, there are a number of
other bugs that we found and fixed while doing mount/unmount stress tests for nokia.
Comment 6 David Teigland 2007-11-14 11:27:10 EST
[root@grant-03 ~]# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       -
[1 2 3 4 5 6]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 2 3 4 5 6]

DLM Lock Space:  "LINK_1289"                       361  68 run       -
[2 3]

DLM Lock Space:  "LINK_1281"                       402  70 run       -
[2]

DLM Lock Space:  "LINK_1282"                       353  72 run       -
[2 3 4]

DLM Lock Space:  "LINK_1288"                       377  74 run       -
[2 3]

DLM Lock Space:  "LINK_1283"                       422  76 run       -
[2 3]

DLM Lock Space:  "LINK_1285"                       432  78 run       -
[2 3]

DLM Lock Space:  "LINK_1280"                       439  80 run       -
[2]

DLM Lock Space:  "LINK_1287"                       412  82 run       -
[2 3 4]

DLM Lock Space:  "LINK_1286"                       385  84 run       -
[2]

GFS Mount Group: "LINK_1289"                       365  69 run       -
[2 3]

GFS Mount Group: "LINK_1281"                       407  71 run       -
[2]

GFS Mount Group: "LINK_1282"                       357  73 run       -
[2 3 4]

GFS Mount Group: "LINK_1288"                       381  75 run       -
[2 3]

GFS Mount Group: "LINK_1283"                       427  77 run       -
[2 3]

GFS Mount Group: "LINK_1285"                       437  79 run       -
[2 3]

GFS Mount Group: "LINK_1280"                       441  81 run       -
[2]

GFS Mount Group: "LINK_1287"                       417  83 run       -
[2 3 4]

GFS Mount Group: "LINK_1286"                       389  85 run       -
[2]

Comment 7 David Teigland 2007-11-14 11:44:54 EST
Nov 13 19:53:04 grant-03 NET: /sbin/dhclient-script : updated /etc/resolv.conf
Nov 13 19:53:04 grant-03 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Nov 13 19:53:04 grant-03 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 4
Nov 13 19:53:04 grant-03 dhclient: receive_packet failed on eth0: Network is down
Nov 13 19:53:06 grant-03 kernel: CMAN: sendmsg failed: -22
Nov 13 19:53:06 grant-03 kernel: SM: send_nodeid_message error -22 to 3
Nov 13 19:53:07 grant-03 kernel: CMAN: resend failed: -22
Nov 13 19:53:07 grant-03 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
Nov 13 19:53:07 grant-03 kernel: tg3: eth0: Flow control is off for TX and off
for RX.
Nov 13 19:53:07 grant-03 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Comment 8 Corey Marthaler 2007-11-14 12:38:24 EST
Looks like the net was down during that umount attempt.

Note You need to log in before you can comment on or make changes to this bug.