Bug 221729
| Summary: | Deadlock still in copy process with gfs2 volume | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Gary Lindstrom <gplindstrom> |
| Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6 | CC: | swhiteho, wtogami |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 2.6.20-1.2943 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-04-11 20:23:28 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Gary Lindstrom
2007-01-06 23:40:55 UTC
Created attachment 144989 [details]
backtrace of copy process dead, but should not be as it has not completed...
The FC6 kernel is rather behind still I'm afraid. The upstream -git tree and the RHEL5 (beta) kernels are the most uptodate with regards to bug fixes. Russell is still looking into the deadlock and I'll try and have another look at it too as soon as I can. Thanks Steve... I had a couple FC6 kernel updates come through and thought that maybe they were suppose to be in there. I also wasn't sure if these were in-progress/fixed/on-hold because I had (wrongly) combined a couple problems in the same BZ.... Probably the ACL problem is in the same category... As you can see I took some of the EL5 Beta2 updates and used on a fresh fc6 install, so maybe I'll try the kernel too... Thanks Again! It is possbile this is same problem, hard to say at this time. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217356 Since running anything beyond a simple dd will result in a gfs2 deadlock it might all be releated? or I might not be? I haven't made any progress on 217356. Created attachment 145119 [details]
Another backtrace
Same problem with the EL5 Beta2 kernel... Attaching another backtrace in case
it is usefull...
Created attachment 145223 [details]
Backtrace with development kernel
Got this idea to try the latest development kernel... same problem, but the
backtrace shows the locks being held... maybe it will help lead to the
problem... so basically I have a clean fc6, with the previous referenced el5
updates for gfs and clustering, and the development kernel... The following is
the sequence used to create problem and I am attaching the corresponding output
from a backtrace:
[root@spool7 ~]# time mkfs.gfs2 -r 2048 -j 16 -p lock_dlm -t fpcl01:vg00lv00
/dev/mapper/fpcl01vg00-fpcl01vg00lv00
This will destroy any data on /dev/mapper/fpcl01vg00-fpcl01vg00lv00.
It appears to contain a gfs2 filesystem.
Are you sure you want to proceed? [y/n] y
Device: /dev/mapper/fpcl01vg00-fpcl01vg00lv00
Blocksize: 4096
Device Size 3019.94 GB (791658496 blocks)
Filesystem Size: 3019.94 GB (791658496 blocks)
Journals: 16
Resource Groups: 1510
Locking Protocol: "lock_dlm"
Lock Table: "fpcl01:vg00lv00"
real 2m1.198s
user 1m23.715s
sys 0m4.904s
[root@spool7 ~]#
[root@spool7 ~]# mount -t ext3 -r -o defaults
/dev/mapper/fpcl01vg01-fpcl01vg01lv00 /mnt/fpcl01vg01lv00
[root@spool7 ~]# mount -t gfs2 -o defaults
/dev/mapper/fpcl01vg00-fpcl01vg00lv00 /mnt/fpcl01vg00lv00
[root@spool7 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
15109112 3271208 11058028 23% /
/dev/cciss/c0d0p1 101086 25993 69874 28% /boot
tmpfs 1031512 0 1031512 0% /dev/shm
/dev/mapper/fpcl01vg01-fpcl01vg01lv00
420080288 61923572 336817788 16% /mnt/fpcl01vg01lv00
/dev/mapper/fpcl01vg00-fpcl01vg00lv00
3166434592 542192 3165892400 1% /mnt/fpcl01vg00lv00
[root@spool7 ~]# cd //mnt/fpcl01vg00lv00
[root@spool7 fpcl01vg00lv00]# cp -ax /mnt/fpcl01vg01lv00 .
I'm reassigning this to Patrick on the basis of the latest dmesg in comment #7 which appears to implicate the DLM. Patrick, if you are able to rule out the DLM, then please reassign back to one of us. I'm not sure whether the messages relating to the DLM are accurate or not. The sock_sem that is locked in the middle of accept_from_sock will always be a different one from the one that is locked at the start of that function - though the message implies it's the same. Have emailed ingo to see if lockdep could be simply getting confused and leading us up a blind alley. Created attachment 145576 [details]
patch to fix lockdep annotations
I think the lockdep warnings are spurious. To confirm this is it possible that
you could apply the attached patch please ? If this gets rid of the warnings
then it seems likely that the DLM lowcomms is not the culprit.
Created attachment 146066 [details]
New backtrace showing different things
Upgradeded to kernel-2.6.19-1.2895.fc6 - same results, copy process deadlocks -
maybe some additional info/clues in this dmesg with backtrace? There is a
fatal assertion right before the backtrace...
GFS2 has several lock ordering problems right now that result in ABBA type of deadlock situations. The trace you have attached does not look familiar so it is hard to say if it is the same problem as the one I am chasing. It's unclear how long it is going to take to find all the lock ordering issues, but it could potentially be quite a while. This should be fixed in the latest upstream kernel and also the latest FC-6 kernel. Please let me know if this is still a problem. Created attachment 150197 [details]
New messages file with backtrace.... sorry
You guys are gonna hate me... :( I can still make it deadlock... sorry. All
three machines in cluster are fc6 with all latest updates, specifically
kernel-2.6.20-1.2925.fc6. Attached is the messages file from the latest boot
with a backtrace that I generated after the deadlock. Let me know if you need
more info.
Thats fixed upstream, but its not made it to FC yet. I really didn't expect you to run into that as well :( I'll let you know when that has made it into FC-6 too. It should be in FC-6 by now: kernel-2.6.20-1.2943 or later. |