Description of problem: I started up a load of I/O to run on my cluster over the weekend and after about 5 hours morph-04 tripped this assertion. The following tests were running: growfiles, doio/iogen, genesis, and accordion dlm: gfs2: dir entry exists ba009b fr 2 r 0 7 5413c0e dlm: gfs2: dir entry exists d10241 fr 2 r 0 7 5424557 dlm: gfs2: dir entry exists c50160 fr 2 r 0 7 5413c0e dlm: gfs2: dir entry exists bc00b6 fr 6 r 0 7 54b3fd3 dlm: gfs2: dir entry exists aa0068 fr 2 r 0 7 54e3f7a dlm: gfs2: dir entry exists c9027a fr 2 r 0 7 54c3fa3 dlm: gfs2: dir entry exists b801a5 fr 2 r 0 7 54f3f66 dlm: gfs2: dir entry exists c00383 fr 5 r 0 7 5413c10 dlm: gfs2: dir entry exists d1003d fr 2 r 0 7 54a3fd5 dlm: gfs2: dir entry exists bf03b6 fr 2 r 0 7 5424557 dlm: gfs2: dir entry exists c10247 fr 2 r 0 7 5424557 dlm: gfs2: dir entry exists be01d1 fr 2 r 0 7 5493ff1 dlm: gfs2: dir entry exists cb025d fr 2 r 0 7 54d3fa3 dlm: gfs0: dir entry exists 9c00a7 fr 2 r 0 7 5413c0e dlm: gfs1: dir entry exists b002ac fr 6 r 0 7 5413c0d dlm: gfs1: dir entry exists b401e4 fr 6 r 0 7 5424085 dlm: gfs2: dir entry exists be008e fr 2 r 0 7 5413c0e dlm: gfs1: dir entry exists b001c8 fr 6 r 0 7 5413c0d dlm: gfs0: dir entry exists 9503c6 fr 2 r 0 7 5413c0e GFS: Assertion failed on line 177 of file /usr/src/cluster/gfs-kernel/src/gfs/ops_address.c GFS: assertion: "gfs_glock_is_held_excl(ip->i_gl) && !gfs_is_stuffed(ip)" GFS: time = 1091844982 GFS: fsid=morph-cluster:gfs2.3: inode = 88621095/88621095 Kernel panic: GFS: Record message above and reboot. How reproducible: Didn't try
This exact assertion had also tripped on morph-03 about 3 hours prior to the assertion tripping on morph-04. Aug 6 18:09:59 morph-03 kernel: dlm: gfs1: dir entry exists 3a013e fr 6 r 0 7 37fa9a2 Aug 6 18:12:08 morph-03 kernel: dlm: gfs0: dir entry exists 3b0017 fr 4 r 0 7 37fa9a7 Aug 6 18:13:56 morph-03 kernel: dlm: gfs2: dir entry exists 3f0168 fr 5 r 0 7 382a956 Aug 6 18:14:17 morph-03 kernel: dlm: gfs0: dir entry exists 390231 fr 4 r 0 7 37fa9a7 Aug 6 18:16:45 morph-03 kernel: dlm: gfs0: dir entry exists 3501d7 fr 2 r 0 7 38bfd59 Aug 6 18:17:40 morph-03 kernel: Aug 6 18:17:40 morph-03 kernel: GFS: Assertion failed on line 177 of file /usr/src/clust er/gfs-kernel/src/gfs/ops_address.c Aug 6 18:17:40 morph-03 kernel: GFS: assertion: "gfs_glock_is_held_excl(ip->i_gl) && !gf s_is_stuffed(ip)" Aug 6 18:17:40 morph-03 kernel: GFS: time = 1091834260 Aug 6 18:17:40 morph-03 kernel: GFS: fsid=morph-cluster:gfs2.2: inode = 59615474/59615474 Aug 6 18:17:40 morph-03 kernel: Aug 6 18:17:40 morph-03 kernel: Kernel panic: GFS: Record message above and reboot.
I was able to reproduce this after about 1 hour and 20 minutes while running the exact same I/O load
do you know what options were used with the test programs? I have examples to copy for all but growfiles.
Here are the cmdlines I was using (sorry about not posting earlier): genesis genesis -n 500 -d 50 -p 3 accordion accordion -p 3 accrdfile1 accrdfile2 accrdfile3 accrdfile4 accrdfile5 growfiles growfiles -i 0 -N 500 -n 3 -b iogen iogen -f buffered -m sequential -s read,write,readv,writev -t 1b -T 100000b 100000b:rwbuflarge | doio -avk iogen iogen -f sync -m sequential -s read,write,readv,writev -t 1b -T 100000b 100000b:rwsynclarge | doio -avk
I managed to run all these on 8 nodes for about 2.5 hours until growfiles died due to a filled up fs and several genesis's died of a bus error. I plan to try this again, adding some output from the tests to I can see if/how each of the tests is progressing.
FYI, I hit this on sunday evening (9/12) after having started the load on friday (9/10) evening.
I reproduced this running "genesis -n 400 -d 50 -p2" on three bench (UP) nodes. Eight nodes were actually mounted. Only one genesis process was still running on each node by the time the assert was hit (it seems all but one genesis process will usually die with a "too many open files" error.) Trying to reproduce again to see if the kdb backtraces are the same.
Thanks for the genesis bug report Dave -- I missed a close() in unlink_file() in the case another processes unlinked the file before we could.
Updates with the proper version and component name.
hit this again over the weekend: Kernel panic - not syncing: GFS: Assertion failed on line 177 of file /usr/src/cluster/gfs-kernel/src/gfs/ops_address.c GFS: assertion: "!gfs_is_stuffed(ip)" GFS: time = 1100314782 GFS: fsid=morph-cluster:corey3.2: inode = 54246995/54246995
Seems there's a fair chance this is a gfs problem based on the modified assertion. Wonder if Ken has any idea.
appeared to have this again, this time on morph-01, while doing recovery after having fenced another node: from morph-04: Nov 25 07:32:03 morph-04 kernel: CMAN: no HELLO from morph-02, removing from the cluster Nov 25 07:32:04 morph-04 fenced[3639]: fencing deferred to 1 from morph-01: Nov 25 07:32:11 morph-01 fenced[3941]: fencing node "morph-02" Nov 25 07:32:15 morph-01 fenced[3941]: fence "morph-02 " success . . . Nov 25 07:32:26: GFS: fsid=morph-cluster:corey1.0: jid=1: Acquiring the transaction lock... GFS: fsid=morph-cluster:corey1.0: jid=1: Replaying journal... GFS: fsid=morph-cluster:corey1.0: jid=1: Replayed 40 of 108 blocks GFS: fsid=morph-cluster:corey1.0: jid=1: replays = 40, skips = 15, sames = 53 GFS: fsid=morph-cluster:corey1.0: jid=1: Journal replayed in 6s GFS: fsid=morph-cluster:corey1.0: jid=1: Done Kernel panic - not syncing: GFS: Assertion failed on line 180 of file /usr/src/cluster/gfs-kernel/src /gfs/ops_address.c GFS: assertion: "!gfs_is_stuffed(ip)" GFS: time = 1101402396 GFS: fsid=morph-cluster:corey1.0: inode = 13172669/13172669
Update: hit this again last night on one node in the morph cluster after about 20 - 40 minutes of running the I/O load outlined above on all nodes. I re-initialized my RAID with the "bad" drives pulled out so it should be "good" hardware now. After the I/O starts, about every 5 - 10 minutes there are qla2300 scsi messages, "kernel: qla2300 0000:02:01.0: scsi(0:0:0:1): Queue depth adjusted" which shouldn't really afect anything. Ten minutes after the last message, I saw this assert. I/O starts around 18:25 morph-03:Jan 17 18:36:49 morph-03 kernel: qla2300 0000:02:01.0: scsi(0:0:0:1): Queue depth adjusted to 30. Jan 17 18:42:31 morph-03 kernel: qla2300 0000:02:01.0: scsi(0:0:0:1): Queue depth adjusted to 28. Jan 17 18:55:10 morph-03 kernel: qla2300 0000:02:01.0: scsi(0:0:0:4): Queue depth adjusted to 30. Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: fatal: assertion "!gfs_is_stuffed(ip)" failed Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: function = gfs_writepage Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: file = /usr/src/build/505649-i686/BUILD/smp/src/gfs/ops_address.c, line = 183 Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: time = 1106010321 Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: about to withdraw from the cluster Jan 17 19:05:21 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: waiting for outstanding I/O Jan 17 19:05:22 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: warning: assertion "!buffer_busy(bh )" failed Jan 17 19:05:22 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: function = gfs_logbh_uninit Jan 17 19:05:22 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: file = /usr/src/build/505649-i686/BUILD/smp/src/gfs/dio.c, line = 882 Jan 17 19:05:22 morph-03 kernel: GFS: fsid=morph-cluster:gfs1.2: time = 1106010321 from then on, there are Buffer I/O errors until morph-03 gets shot which either caused this problem or was a result of this problem. If the Buffer I/O errors are what caused this, and the assertion messages just happened to get printed out before the I/O errors then this (along with a few other bugs) is probably a result of having a cluster with both qla2300 and lpfc drivers. Again, the hardware should not be "flakey" anymore and the rest of the nodes ran their I/O load all night long without problems. I'll run more tests with and without a mixed fc driver cluster to attempt to figure this out.
I've hit this again last night without any SCSI or buffered I/O errors and according to this bug report Dave had also hit this earlier. The result can be a filesystem which appears corrupted: [root@morph-05 root]# ls /mnt/gfs3 ls: /mnt/gfs3: Input/output error However a reboot and remount does allow I/O once again. Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: fatal: assertion "!gfs_is_stuffed(ip)" failed Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: function = gfs_writepage Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: file = /usr/src/build/511355-i686/BUILD/smp /src/gfs/ops_address.c, line = 183 Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: time = 1106879266 Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: about to withdraw from the cluster Jan 27 20:27:46 morph-01 kernel: GFS: fsid=morph-cluster:gfs3.0: waiting for outstanding I/O
hit this again last night on multiple nodes: Jan 31 18:03:10 morph-04 kernel: GFS: Trying to join cluster "lock_dlm", "morph-cluster:gfs4" Jan 31 18:03:12 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: Joined cluster. Now mounting FS... Jan 31 18:03:12 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=3: Trying to acquire journal lock... Jan 31 18:03:12 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=3: Looking at journal... Jan 31 18:03:12 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=3: Done Jan 31 18:50:45 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=2: Trying to acquire journal lock... Jan 31 18:50:45 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=2: Busy Jan 31 19:34:48 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=0: Trying to acquire journal lock... Jan 31 19:34:48 morph-04 kernel: GFS: fsid=morph-cluster:gfs4.3: jid=0: Busy Jan 31 21:13:04 morph-04 kernel: GFS: fsid=morph-cluster:gfs2.3: jid=2: Trying to acquire journal lock... Jan 31 21:13:05 morph-04 kernel: GFS: fsid=morph-cluster:gfs2.3: jid=2: Busy Jan 31 23:50:19 morph-04 kernel: GFS: fsid=morph-cluster:gfs2.3: jid=4: Trying to acquire journal lock... Jan 31 23:50:19 morph-04 kernel: GFS: fsid=morph-cluster:gfs2.3: jid=4: Busy Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: fatal: assertion "!gfs_is_stuffed(ip)" failed Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: function = gfs_writepage Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: file = /usr/src/build/512195-i686/BUILD/smp/src/gfs/ops_address.c, line = 183 Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: time = 1107259772 Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: about to withdraw from the cluster Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: waiting for outstanding I/O Feb 1 06:09:32 morph-04 kernel: GFS: fsid=morph-cluster:gfs3.3: telling LM to withdraw
If this is so easy for *you* to reproduce, how about KDB backtraces?
adding to blocker list.
need kdb bta
This may be causing corruption on the filesystems. gfs_fsck output after this assertion was seen: . . . Found directory entry 'pniwgvssqlsfkiwtglcuouak' in 65178873 to something not a file or directory! Directory entry 'pniwgvssqlsfkiwtglcuouak' cleared Found directory entry 'tfhjlmemsnuitmhotererwhs' in 65178873 to something not a file or directory! Directory entry 'tfhjlmemsnuitmhotererwhs' cleared Entries is 11 - should be 5 for 65178873 . . . ondisk and fsck bitmaps differ at block 86806053 Succeeded. ondisk and fsck bitmaps differ at block 86806054 Succeeded. ondisk and fsck bitmaps differ at block 86806055 Succeeded. . . . Ondisk block count does not match what fsck found for inode 21690814 Bad block count fixed Ondisk block count does not match what fsck found for inode 21690828 Bad block count fixed Checking metadata in Resource Group 332 Ondisk block count does not match what fsck found for inode 21756383 Bad block count fixed . . .
moving back to NEEDINFO.
Created attachment 111858 [details] kdb output, ps bt bta this is from the va nodes.
one of genesis actions is to shrink a file, it does this with SWmmaps. More importantly it does a ftruncate on a file that is SWmmapped. I think this is where the problem arises. Still digging though.
for whatever it is worth, for me, genesis won't run for more than a few minutes before dieing from a SIGBUS.
Fiddled with the genesis code, removed the ftruncate. Still got sigbus.
observation: when you try to memcpy a mmapped file on a full filesystem, you get sigbus.....
Must take the "tester's touch." :) Hit it last night. Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: fatal: assertion "!gfs_is_stuffed(ip )" failed Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: function = gfs_writepage Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: file = /usr/src/build/548215-i686/ BUILD/hugemem/src/gfs/ops_address.c, line = 183 Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: time = 1112340342 Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: about to withdraw from the cluster Apr 1 02:25:42 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: waiting for outstanding I/O Apr 1 02:25:42 tank-05 kernel: ------------[ cut here ]------------ Apr 1 02:25:42 tank-05 kernel: kernel BUG at /usr/src/build/548215-i686/BUILD/hugemem/src/gfs/lm.c :190! Apr 1 02:25:42 tank-05 kernel: invalid operand: 0000 [#1] Apr 1 02:25:42 tank-05 kernel: SMP Apr 1 02:25:42 tank-05 kernel: Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_gulm(U) lock_ harness(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc button battery ac uhci_hcd hw_random e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi _mod Apr 1 02:25:42 tank-05 kernel: CPU: 0 Apr 1 02:25:42 tank-05 kernel: EIP: 0060:[<82b57ba3>] Not tainted VLI Apr 1 02:25:42 tank-05 kernel: EFLAGS: 00010202 (2.6.9-6.37.ELhugemem) Apr 1 02:25:42 tank-05 kernel: EIP is at gfs_lm_withdraw+0x51/0xc0 [gfs] Apr 1 02:25:42 tank-05 kernel: eax: 0000003b ebx: 82b36728 ecx: 81c19e00 edx: 82b7339f Apr 1 02:25:42 tank-05 kernel: esi: 82b12000 edi: 0377f900 ebp: 81c19f64 esp: 81c19e14 Apr 1 02:25:42 tank-05 kernel: ds: 007b es: 007b ss: 0068 Apr 1 02:25:42 tank-05 kernel: Process pdflush (pid: 52, threadinfo=81c19000 task=045f4130) Apr 1 02:25:42 tank-05 kernel: Stack: 82b36728 688af344 82b6fde9 82b12000 82b7661a 82b36728 82b748 fc 82b36728 Apr 1 02:25:42 tank-05 kernel: 82b70e65 82b36728 82b7489c 000000b7 82b36728 424cf776 82b120 00 82b604b8 Apr 1 02:25:42 tank-05 kernel: 82b7489c 000000b7 0377f900 81c19f64 7415ba48 00000001 021720 a2 00000000 Apr 1 02:25:42 tank-05 kernel: Call Trace: Apr 1 02:25:42 tank-05 kernel: [<82b6fde9>] gfs_assert_withdraw_i+0x26/0x31 [gfs] Apr 1 02:25:42 tank-05 kernel: [<82b604b8>] gfs_writepage+0x72/0x9d [gfs] Apr 1 02:25:42 tank-05 kernel: [<021720a2>] mpage_writepages+0x1c2/0x314 Apr 1 02:25:42 tank-05 kernel: [<82b60446>] gfs_writepage+0x0/0x9d [gfs] Apr 1 02:25:42 tank-05 kernel: [<0213fe09>] mapping_tagged+0x2b/0x32 Apr 1 02:25:42 tank-05 kernel: [<02170a1c>] __sync_single_inode+0x5f/0x1c1 Apr 1 02:25:42 tank-05 kernel: [<02170db0>] sync_sb_inodes+0x1a7/0x274 Apr 1 02:25:42 tank-05 kernel: [<02152c3d>] shmem_mknod+0x5a/0x8e Apr 1 02:25:42 tank-05 kernel: [<0213ff90>] pdflush+0x0/0x1e Apr 1 02:25:42 tank-05 kernel: [<02170f0e>] writeback_inodes+0x91/0xde Apr 1 02:25:42 tank-05 kernel: [<0213f641>] background_writeout+0x65/0x97 Apr 1 02:25:42 tank-05 kernel: [<0213fefc>] __pdflush+0xec/0x180 Apr 1 02:25:42 tank-05 kernel: [<0213ffaa>] pdflush+0x1a/0x1e Apr 1 02:25:42 tank-05 kernel: [<0213f5dc>] background_writeout+0x0/0x97 Apr 1 02:25:42 tank-05 kernel: [<021527a4>] shmem_file_write+0x155/0x2a8 Apr 1 02:25:42 tank-05 kernel: [<0213ff90>] pdflush+0x0/0x1e Apr 1 02:25:42 tank-05 kernel: [<02131cd9>] kthread+0x73/0x9b Apr 1 02:25:42 tank-05 kernel: [<02131c66>] kthread+0x0/0x9b Apr 1 02:25:42 tank-05 kernel: [<021041f1>] kernel_thread_helper+0x5/0xb Apr 1 02:25:42 tank-05 kernel: Code: ff 74 24 14 e8 8d 9e 5c 7f 53 68 6d 33 b7 82 e8 71 9e 5c 7f 5 3 68 9f 33 b7 82 e8 66 9e 5c 7f 83 c4 18 83 be 34 02 00 00 00 74 08 <0f> 0b be 00 b2 32 b7 82 8b 86 08 47 02 00 85 c0 74 1c ba 02 00 Apr 1 02:25:42 tank-05 kernel: <0>Fatal exception: panic in 5 seconds
Looked through the code. Followed the trace and such. It is pretty clear that one of the other processes did somehting to bugger the one that BUGed. So we're back to needing that bta output.
does that mean that comment #26 is a different, new bug?
no, same bug. comment #26 is a bt not a bta. bta == Back Trace All. I need to see what every process is doing. You need to get a kernel with kdb installed.
Created attachment 113646 [details] bta from tank-02 briefly: most pids waiting on sema: [gfs].text.lock.log+0x70 couple waiting on sema: [gfs].text.lock.dio+0x5f couple waiting on buffers to be read couple waiting on buffers to be written pdflush is oopsed in gfs_writepage
The last sighting of this bug, with the kdb kernels, was with a 5 node dlm/cman cluster with 4 gfs filesystems all running the "HEAVY" revolver I/O load.
Interesting, pid 3894 (genesis) is in gfs_sharewrite_nopage, trying to unstuff a dinode. It hasn't completed this (looks like it hasn't even completed the journal entry yet.) And pid 52 (pdflush) is asserting ing gfs_writepage. Could this be the clue we've been looking for? Perhaps, must study code some more.
Created attachment 113651 [details] kdb bta from tank-01, a little bit after tank-02's
for what ever it is worth, the oops seems to always be in pdflush.
Created attachment 113737 [details] possible fix from ken ken thought this might fix things, but it doesn't appear to have.
Created attachment 113738 [details] bta of the first node to oops with the patch from ken, we still oops in pdflush. This node oopsed first, with other processes in gfs_sharewrite_nopage.
Created attachment 113739 [details] bta of the second node to oops Then this node oopses few minutes later. pdflush is still the oopsing process, but it looks like there are no other processes in gfs paging code. Possibly a sign that what happened on the first got to disk.
probably fix commited. This includes kens suggestion that the assert is testing something that may not need be true. Also is the change to avoid a possible race with writepage and sharewrite_nopage.
Created attachment 114071 [details] Script to trip bug Trip the bug in a couple of minutes. The bug doesn't actually have anything to do with memory mapping. The problem is that the unstuff code allocates a new block, builds a new buffer with the correct contents, and dirties it before munging the pointers in the dinode to point to the block. When a lot of data is being flushed out, buffer can be flushed out before the metadata is set up correctly. This wasn't an issue in 2.4 because dirty buffers were flushed directly (writepage() was only called for memory maps). In 2.6, all flushing happens through the writepage() function. Oops.
Commited fix to prevent the writepage() from seeing an inconsistant file tree.
fix verified.