Description of problem: This new bug opened at request of swhiteho. See bz 210493. Fsck errors on gfs2 volume, and possible lock problem? on copy of files to gfs2 volume. Version-Release number of selected component (if applicable): kernel-2.6.18-1.2798.fc6 gfs2-utils-0.1.7-1.fc6 cman-2.0.18-2.fc6 How reproducible: Regular Actual results: fsck errors Expected results: clean filesystem From post under 210493: OK... Here is what I have done... Installed kernel-2.6.18-1.2798.fc6. Mount a clean (newly formated volume) on a cluster of three machines. Fsck says it is OK. Started a copy of 40GB data to the new volume. Two times the copy process stopped (1st and 4th time, presumably due to some sort of lock). Unable to terminate copy process. Tried dismounting the volume on another machine and the dismount would hang until the computer (the one doing the copy) was rebooted. The other two times the copy completed, but a fsck would generate errors. Some of the errors were: Starting pass2 Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . and Starting pass1 Inode 1872239 (0x1c916f): Ondisk block count (1050643) does not match what fsck found (2067) Inode 3902143 (0x3b8abf): Ondisk block count (525258) does not match what fsck found (1034) Inode 4427608 (0x438f58): Ondisk block count (525258) does not match what fsck found (1034) <--more delete--> and lots of message similiar to: Ondisk and fsck bitmaps differ at block 10415231 (0x9eec7f) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. <--Lots more deleted--> and also : RG #10362148 (0x9e1d24) free count inconsistent: is 18 should be 52991
Can you specify the parameters used when creating the filesystem? Specifically, which locking protocol did you use?
OK... mkfs.gfs2 -O -t fpcl01:vg00lv00 -p lock_dlm -j 8 /dev/fpcl01vg00/fpcl01vg00lv00 first test... Nothing should be writing to the volume other than the copy. Porcess was: mkfs new gfs2 volume. Mount on all three nodes... copy 40GB to volume... disount on all nodes... run gfs2_fsck... getting: [root@spool5 /]# time gfs2_fsck -y /dev/fpcl01vg00/fpcl01vg00lv00 Initializing fsck Clearing journals (this may take a while).... Journals cleared. Starting pass1 Inode 199724591 (0xbe78e2f): Ondisk block count (1050643) does not match what fsck found (2067) Inode 201754495 (0xc06877f): Ondisk block count (525258) does not match what fsck found (1034) Inode 202279960 (0xc0e8c18): Ondisk block count (525258) does not match what fsck found (1034) Inode 202805411 (0xc1690a3): Ondisk block count (525258) does not match what fsck found (1034) Inode 203330885 (0xc1e9545): Ondisk block count (525258) does not match what fsck found (1034) Inode 203856593 (0xc269ad1): Ondisk block count (929588) does not match what fsck found (1828) Inode 204889469 (0xc365d7d): Ondisk block count (580952) does not match what fsck found (1144) Inode 205470658 (0xc3f3bc2): Ondisk block count (1050643) does not match what fsck found (2067) Inode 206635092 (0xc510054): Ondisk block count (580952) does not match what fsck found (1144) Inode 207216557 (0xc59dfad): Ondisk block count (1050643) does not match what fsck found (2067) Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Way to much to bother sending you, but got to about 30% or so and then I am getting lots of: Ondisk and fsck bitmaps differ at block 199746004 (0xbe7e1d4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746005 (0xbe7e1d5) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746006 (0xbe7e1d6) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746007 (0xbe7e1d7) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746008 (0xbe7e1d8) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746009 (0xbe7e1d9) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746010 (0xbe7e1da) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746011 (0xbe7e1db) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746013 (0xbe7e1dd) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746014 (0xbe7e1de) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746015 (0xbe7e1df) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746016 (0xbe7e1e0) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746017 (0xbe7e1e1) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746018 (0xbe7e1e2) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746019 (0xbe7e1e3) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 199746020 (0xbe7e1e4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. If history holds like another one of these, it will take 4+ hours to finish now rather than the standard 30-45 minutes...
In case you are interested or makes a difference: [root@spool5 ~]# time mkfs.gfs2 -O -t fpcl01:vg00lv00 -p lock_dlm -j 8 /dev/fpcl01vg00/fpcl01vg00lv00 Device: /dev/fpcl01vg00/fpcl01vg00lv00 Blocksize: 4096 Device Size 3019.94 GB (791658496 blocks) Filesystem Size: 3019.94 GB (791658495 blocks) Journals: 8 Resource Groups: 12080 Locking Protocol: "lock_dlm" Lock Table: "fpcl01:vg00lv00" real 0m36.959s user 0m12.761s sys 0m1.924s
It finished ahead of expected time... here is the finish: Ondisk and fsck bitmaps differ at block 208267582 (0xc69e93e) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 208267583 (0xc69e93f) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. RG #208209294 (0xc69058e) free count inconsistent: is 18 should be 58187 Resource group counts updated Pass5 complete Writing changes to disk gfs2_fsck complete real 96m20.340s user 52m46.562s sys 9m26.135s
And... just did a new mkfs on volume, only mounted the volume on the one node, started a copy, watched io with vmstat... All io has ceased for the copy, and the copy command seems to be hung... I can leave it in this state for a little while if someone gets back to me and lets me know if there is anything they want me to try or do... It was about 3.5GB into the 40GB of data to copy... ps ax |grep copy shows: 8558 pts/0 D+ 2:08 cp -ar /mnt/fpcl01vg01lv00/copy2 /mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00....<to wide>
If its still possible it would be very useful to know which of the gfs2 deamons have hung, and if its possible (needs console access), to get backtraces with Alt-SysRq-t (you might have to set the right value into /proc/sys/kernel/sysrq to get that to work). That should give us some pointers as to what is causing the hang.
I didn't really see any indication of which daemon was hung (not obvious to me anyway). Computer has since been rebooted, but it does not appear to be hard to replicate (just may take a couple tries) so I will try to replicate it again. Not really a developer so I haven't used the SysRq interface. I assume I put 1 into /proc/sys/kernel/sysrq. Do I just press Alt-SysRq-t and it dumps stuff to a file, or do I need to do/type something else. If I have an IP, I would also be willing to open a hole in firewall for you to gain access to the machine...
You can either do Alt-Sysrq-t or echo 't' >/proc/sysrq-trigger which does the same thing. To see whether a daemon has hung just do a few "ps aux" and if its stuck in 'D' all the time, then its probably stuck. You should see 'R' or 'S' otherwise. If you want to see more info about what you can get sysrq to do, then there is a summary in the Documentation/sysrq.txt file from any recent set of kernel source.
Got it... [root@spool5 ~]# ps ax |grep D PID TTY STAT TIME COMMAND 195 ? D 0:05 [pdflush] 4791 ? D< 0:16 [lock_dlm1] 4792 ? D< 0:18 [lock_dlm2] 4818 ? D< 0:00 [gfs2_logd] 4819 ? D< 0:00 [gfs2_quotad] 4821 pts/0 D+ 4:17 cp -ar /mnt/fpcl01vg01lv00/copy2 /mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00/lost+found /mnt/fpcl01vg01lv00/save /mnt/fpcl01vg01lv00/tmp /mnt/fpcl01vg01lv00/usr /mnt/fpcl01vg01lv00/vmware . 5592 pts/2 S+ 0:00 grep D I'll try getting back traces next...
Created attachment 138994 [details] backtrace after hang of copy on gfs2 Here you go... hope it helps.... let me know if it is useful...
It does look very useful, thanks for send us the trace. So far as the hang goes, it looks like its caused by a deadlock when a "droplocks" callback has been received. When the lock subsystem thinks its running out of memory due to having a large number of locks cached, it sends one of these callbacks to the nodes. The nodes are supposed to respond by writing out any cached data, and dropping the glocks on their least recently used inodes. It looks like what happened is that this has then in turn caused a writeout of dirty data (as it should) but that the transaction code has deadlocked with it due to asking for a glock. I have a suspicion that if you mount with data=writeback (rather than the default data=journal) that you will not see this deadlock. I'll have a think and see if I can figure out why this glock should get stuck. It doesn't explain the messages from fsck though. I think they must have a different cause.
Hmmm... I don't have the volume mounted on any other nodes at the time, just the one (spool5 in this case) so there really should not be any reason to write data or hold locks by the other nodes I would think, but I'll leave that to you to decide before I make some stupid statement... The other nodes do have clvm enabled though, just no gfs2 volume mounted. I have a few things to do, but I might try it with clvm not loaded on the others and also the data=writeback a little later today. Do we need to split this bz into 2 seperate bz cases, or should we leave it combined for now...
Created attachment 139038 [details] gfs2 lock with data=writeback Tried it with data=writeback if I did it right and it locked again... At the time I was doing a df to see how much had been written into certain directories but this may or may not have had anything to do with it... ps ax |grep D: PID TTY STAT TIME COMMAND 3024 ? D< 0:00 [lock_dlm1] 3064 pts/0 D+ 0:36 cp -ar /mnt/fpcl01vg01lv00/copy2 /mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00/lost+found /mnt/fpcl01vg01lv00/save /mnt/fpcl01vg01lv00/tmp /mnt/fpcl01vg01lv00/usr /mnt/fpcl01vg01lv00/vmware . 3129 pts/2 D+ 0:06 du -s copy2 etc home lost+found save 3176 pts/3 S+ 0:00 grep D mount: /dev/fpcl01vg00/fpcl01vg00lv00 on /mnt/fpcl01vg00lv00 type gfs2 (rw,hostdata=jid=0:id=196609:first=1,data=writeback)
Created attachment 139040 [details] Newer/better backtrace for last problem Not exactly sure what happened on last attachment, but this one is more complete...
We think we know what the deadlock is here - its a conflict between truncating an inodes pages and readpage. Its not related to the fsck errors and we should have a fix fairly shortly now. I'll post a patch as soon as I have one.
I committed a fix to gfs2_fsck in the HEAD and RHEL5 branches of CVS so that it will handle this file system condition correctly. Therefore, I'm changing the status of this bugzilla to modified. The hang mentioned in previous comments is a separate issue and if it still needs attention, another bugzilla should be opened to track its progress.
I've just pushed the kernel change for the fsck/dirent problem into my -nmw git tree.
Fedora Core 5 and Fedora Core 6 are, as we're sure you've noticed, no longer test releases. We're cleaning up the bug database and making sure important bug reports filed against these test releases don't get lost. It would be helpful if you could test this issue with a released version of Fedora or with the latest development / test release. Thanks for your help and for your patience. [This is a bulk message for all open FC5/FC6 test release bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]
I verified that this is fixed on the "Gold" version of FC6 and the initial release of RHEL5. Closing as CurrentRelease.