Well, I'll open another bug report since 221729 was closed. Maybe this is a
different problem than the last deadlocks...???
Steve, like I said in another report, your gonna hate me. Been busy and hadn't
tried recently, but I got some time this weekend to try this again, and I can
still deadlock gfs2. Upgraded all 3 machines in cluster to latest kernel and
updates. Kernel is: 2.6.20-1.2944.fc6.
I am attaching 3 backtraces, one from each machine in the cluster. I had a copy
from an ext3 to gfs2 partition running on spool7, a copy from an ocfs to the
same gfs2 partition (to a different directory structure), and ran a 'df' command
on virtual1b. All 3 machines were deadlocked after a few minutes. Not positive
but I think it deadlocked on spool8 first...
Sorry.... :( If you need more info, let me know.
Created attachment 152632 [details]
messages file with backtrace from spool7
Created attachment 152633 [details]
messages file with backtrace from spool8
Created attachment 152634 [details]
messages file with backtrace from virtual1b
Ummm... let me ammend the first comment... I did a directory list on virtual1b
that hung, not a df command...
This looks just like bz #231910, which has a fix. However, 231910 is a RHEL bug.
I'm not sure how Steve is handling bugs with respect to the differences between
RHEL and fedora. If he needs a fedora version on that bug for tracking
purposes, then one this will do fine. But at any rate, there is a solution to
this problem with will make it upstream shortly.
Looks like this is a Fedora build issue then. Reassigning to Chris Feist.
Re-assigning to Steve Whitehouse as he provides kernel patches for the fedora
I'll try and sort this out now that the latest upstream patches have been
accepted by Linus.
The patches have now been sent for both FC5/6 and FC7 so I'm just waiting to
find out which version of the kernel RPM they'll appear in.
Still waiting on FC5/6, but its in FC7 (pre-release) now and also in the current
rawhide devel kernel. Also fixed upstream.
For FC5/6 that will be kernel 2952 which is commited but will be built shortly