Bug 127008
Summary: | assertion failure in dlm/lock.c "!error" | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | gfs | Assignee: | David Teigland <teigland> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | ccaulfie, kagiso, kanderso | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-11-15 16:48:58 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Corey Marthaler
2004-06-30 16:19:42 UTC
a ton of testing and fixes since this was reported and we've not seen it again. we should retry to be sure, but it's probably gone. this is a duplicate of 127839 which I've just reproduced *** This bug has been marked as a duplicate of 127839 *** Updating version to the right level in the defects. Sorry for the storm. Created attachment 114177 [details]
log dump from cypher-01
I've just seen something that looks like this bug. Check out the attachment for details. Running with 10 filesystems, I was getting this bug reliably after one or two rounds of revolver. After knocking the number down to 5, it seems to have gone away. It appears that cman has shut down on this node, evident from all the ENOTCONN and ENOBUFS errors the threads start getting in the dlm. When cman shuts down it tells the dlm to shut down which means all the dlm locks go away, so when lock_dlm tries to convert one of its locks, the lock isn't there, an error is returned and lock_dlm panics. It's not always clear when cman shuts down, but you can to use kdb to look for the normal cman threads -- see if they exist and if they do check what they're doing. You can also look for cman log messages on the different nodes. I am currently experiensing the Same problem. I have a 6 node GFS cluster that exports NFS and one of the nodes had died and this is what I found in /var/log/messages. Jul 28 09:34:35 jabbah kernel: lock_dlm: Assertion failed on line 428 of file /usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c Jul 28 09:34:35 jabbah kernel: lock_dlm: assertion: "!error" Jul 28 09:34:35 jabbah kernel: lock_dlm: time = 4574546183 Jul 28 09:34:35 jabbah kernel: gfs_mail: num=2,1f26f220 err=-22 cur=3 req=5 lkf=44 Jul 28 09:34:35 jabbah kernel: Jul 28 09:34:35 jabbah kernel: ----------- [cut here ] --------- [please bite here ] --------- Jul 28 09:34:35 jabbah kernel: Kernel BUG at lock:428 Jul 28 09:34:35 jabbah kernel: invalid operand: 0000 [1] SMP I have read through the posting and I can not figure out what I should do to solve this. How can I avoid this from happening again? |