Created attachment 930329 [details] corosync.conf for test cluster. This is my first Bugzilla report, please pardon me/let me know, if I'm not being specific enough. Description of problem: I have a GFS/DRBD/dlm storage cluster, which hosts a local yum repos and a Subversion repository (via httpd). Somewhere after kernel-3.13.8, generic operations such as createrepo, or Subversion commit, via https, began failing, with Input/output error reported in temp directories used by these applications. Follow up fsck.gfs2 check reports error such as: Starting pass2 Entries is 6 - should be 5 for inode block 12124184 (0xb90018) ... Starting pass4 Found unlinked inode at 12124206 (0xb9002e) Added inode #12124206 (0xb9002e) to lost+found ... I managed to narrow down the problem somewhat, and take out DRBD in the equation, by creating a single node Corosync cluster, and found that the problem appeared somewhere between kernels 3.13.8 and 3.14.2 (and as late as 3.15.10 still generates the same corrupted inodes). The problem disappears with the manual install of older kernel. Please advice. Version-Release number of selected component (if applicable): kernel-3.14.2 How reproducible: See next section. Steps to Reproduce: 1. Create a single-node cluster in /etc/corosync/corosync.conf, see attached. 2. systemctl start corosync; systemctl start dlm 3. mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:my_test /dev/sda 4. mount /dev/sda /mnt/temp 5. mkdir -p /mnt/temp/test; cd /mnt/temp/test; wget https://kojipkgs.fedoraproject.org//packages/kernel/3.13.8/200.fc20/x86_64/kernel-headers-3.13.8-200.fc20.x86_64.rpm (any rpm would do). 6. createrepo . 7. Output error: ... yum.Errors.MiscError: Error opening file for checksum: /mnt/temp/test/./.repodata/other.sqlite 8. ls .repodata 9. Error: ls: reading directory .repodata/: Input/output error 10. cd /mnt; umount temp; fsck.gfs2 -afy /dev/sda 11. See errors: ... Starting pass2 Entries is 6 - should be 5 for inode block 12124184 (0xb90018) ... Starting pass4 Found unlinked inode at 12124216 (0xb90038) Added inode #12124216 (0xb90038) to lost+found pass4 complete ... Actual results: See steps 6-11 Expected results: Should not see any of the errors from 6-11 (works normally with 3.13.8-200 Additional info:
Do you have dmesg output taken right after the failure? There were a lot of changes done to GFS2 between those two kernels, so I need more to go on. Unfortunately, all my lab machines are running RHEL, not Fedora, so I don't have a good way to recreate the problem here without a lot of extra work. (As you know, my paying Red Hat customers need to come first before I work on Fedora).
(In reply to Robert Peterson from comment #1) > Do you have dmesg output taken right after the failure? Thank you very much for the response, Robert. Please see below for the dmesg output, before and after the failure. There doesn't seem to be any additional logging in dmesg in between. I can try install a kernel version in between 3.13.8 and 3.14.2, and be more precise on the version that caused problems. [root@amy-vm ~]# systemctl start corosync; systemctl start dlm [root@amy-vm ~]# mount /dev/sda /mnt/temp/ [root@amy-vm ~]# dmesg |tail -n 20 [ 92.793820] dlm: pm_store: dlm_recover 1 [ 92.793858] dlm: pm_store: add member 3 [ 92.793861] dlm: pm_store: dlm_recover_members 1 nodes [ 92.793865] dlm: pm_store: generation 1 slots 1 1:3 [ 92.793867] dlm: pm_store: dlm_recover_directory [ 92.793869] dlm: pm_store: dlm_recover_directory 0 in 0 new [ 92.793872] dlm: pm_store: dlm_recover_directory 0 out 0 messages [ 92.793890] dlm: pm_store: dlm_recover 1 generation 1 done: 0 ms [ 92.793906] dlm: pm_store: group event done 0 0 [ 92.793920] dlm: pm_store: join complete [ 93.294407] gfs2: fsid=polymath:pm_store: first mounter control generation 0 [ 93.294417] gfs2: fsid=polymath:pm_store: Joined cluster. Now mounting FS... [ 93.370693] gfs2: fsid=polymath:pm_store.0: journal 0 mapped with 1 extents [ 93.370714] gfs2: fsid=polymath:pm_store.0: jid=0, already locked for use [ 93.370719] gfs2: fsid=polymath:pm_store.0: jid=0: Looking at journal... [ 93.406516] gfs2: fsid=polymath:pm_store.0: jid=0: Done [ 93.408151] gfs2: fsid=polymath:pm_store.0: jid=1: Trying to acquire journal lock... [ 93.410011] gfs2: fsid=polymath:pm_store.0: jid=1: Looking at journal... [ 93.512277] gfs2: fsid=polymath:pm_store.0: jid=1: Done [ 93.513549] gfs2: fsid=polymath:pm_store.0: first mount done, others may mount [root@amy-vm ~]# cd /mnt/temp/test/ [root@amy-vm test]# createrepo . Spawning worker 0 with 1 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Traceback (most recent call last): File "/usr/share/createrepo/genpkgmetadata.py", line 308, in <module> main(sys.argv[1:]) File "/usr/share/createrepo/genpkgmetadata.py", line 280, in main mdgen.doRepoMetadata() File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 1022, in doRepoMetadata db_csums[ftype] = misc.checksum(sumtype, resultpath) File "/usr/lib/python2.7/site-packages/yum/misc.py", line 371, in checksum raise MiscError, 'Error opening file for checksum: %s' % file yum.Errors.MiscError: Error opening file for checksum: /mnt/temp/test/./.repodata/other.sqlite [root@amy-vm test]# dmesg |tail -n 20 [ 92.793820] dlm: pm_store: dlm_recover 1 [ 92.793858] dlm: pm_store: add member 3 [ 92.793861] dlm: pm_store: dlm_recover_members 1 nodes [ 92.793865] dlm: pm_store: generation 1 slots 1 1:3 [ 92.793867] dlm: pm_store: dlm_recover_directory [ 92.793869] dlm: pm_store: dlm_recover_directory 0 in 0 new [ 92.793872] dlm: pm_store: dlm_recover_directory 0 out 0 messages [ 92.793890] dlm: pm_store: dlm_recover 1 generation 1 done: 0 ms [ 92.793906] dlm: pm_store: group event done 0 0 [ 92.793920] dlm: pm_store: join complete [ 93.294407] gfs2: fsid=polymath:pm_store: first mounter control generation 0 [ 93.294417] gfs2: fsid=polymath:pm_store: Joined cluster. Now mounting FS... [ 93.370693] gfs2: fsid=polymath:pm_store.0: journal 0 mapped with 1 extents [ 93.370714] gfs2: fsid=polymath:pm_store.0: jid=0, already locked for use [ 93.370719] gfs2: fsid=polymath:pm_store.0: jid=0: Looking at journal... [ 93.406516] gfs2: fsid=polymath:pm_store.0: jid=0: Done [ 93.408151] gfs2: fsid=polymath:pm_store.0: jid=1: Trying to acquire journal lock... [ 93.410011] gfs2: fsid=polymath:pm_store.0: jid=1: Looking at journal... [ 93.512277] gfs2: fsid=polymath:pm_store.0: jid=1: Done [ 93.513549] gfs2: fsid=polymath:pm_store.0: first mount done, others may mount [root@amy-vm test]#
Thanks, Albert. That tells me GFS2 is not crashing, which I thought might be happening. If you can narrow it down to a specific kernel, that would be very helpful.
Hi Robert, Per your suggestion, I installed both kernel-3.13.9 and kernel-3.14.1 (fc20), the two adjacent version on http://koji.fedoraproject.org. What I saw was that the steps above worked fine for 3.13.9, but started generating errors in 3.14.1. In other words, the last good version was actually 3.13.9. Interestingly, when I tried to umount the gfs device after the error, in 3.14.1, the umount command hung for >30 seconds every time and I ended up having to restart. Postmortem examination of /var/log/messages showed no error message. Please let me know if there's anything else I can do to help.
This is probably a duplicate of bug #1093191, which was discovered in-house, so it may be private. Still working on it.
I've spent all day chasing this problem. Apparently GFS2 has two problems. The first one has something to do with this upstream patch: http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=70d4ee94b370c5ef54d0870600f16bd92d18013c If I revert that patch, I can get intermediate versions to work properly with no failures. However, if I revert that patch ONLY from today's Fedora kernel, I've still got a recreateable problem. So I need to track down a second regression. The good news is that the cited patch is not in the RHEL7 kernel, so no problems there (although I have not tried to recreate the failure there. That's on my agenda).
The second problem is caused by this upstream patch: http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=2b47dad866d04f14c328f888ba5406057b8c7d33 So now that I've identified two upstream patches that are causing problems, I need to debug what's wrong with them and possibly fix or revert them.
I figured out what's wrong with the patch entitled "Remember directory insert point". The problem is with the rename path. In the failing case, this is what happens: 1. Rename is called to rename a file in a directory. For example, let's say it wants to rename "a" to "b" in some directory. 2. Function gfs2_rename calls gfs2_diradd_alloc_required to determine if an allocation is required to do the rename. 3. Function gfs2_diradd_alloc_required calls gfs2_dirent_search to see if there's enough room for a new dirent called "b". There is. So it remembers that location as the dirent that immediately follows "a". It correctly determines that no allocations are necessary, and returns success. 4. Function gfs2_rename then calls gfs2_dir_del to remove the old directory entry that says "a". 5. Function gfs2_rename then calls gfs2_dir_add to add the new directory entry "b", but the problem is, the remembered dirent location, "dent", still points to the entry AFTER "a". Since "a" is now removed, it needs to put the dirent for "b" in its place. Instead, it puts it AFTER the deleted "a". And therefore it never gets linked into the other dirents properly. There are a number of ways we can fix this particular problem and still retain the speed-up from the patch.
Created attachment 941703 [details] Early prototype #1 This early prototype patch circumvents the problem by reverting to previous behavior: If the call to gfs2_diradd_alloc_required saves a buffer, it releases the buffer and sets it to NULL. This forces the dirent search to be done a second time, thus finding the proper entry to be added. As an alternative, I could add a new parameter to function gfs2_diradd_alloc_required to specify whether you want the buffer_head to be released or dent location kept. This patch is kinda "hacky" but it reverts to previous behavior only for rename, which should be less common. The other way is cleaner, but it costs a few CPU cycles to check the parameter.
Created attachment 941715 [details] Early prototype #2 Here's an alternative patch. A bit bigger, but perhaps more elegant. I'll let Steve Whitehouse decide which he prefers. Then, of course, there's still the question of what's wrong with that second patch. I need to debug that as well.
I've determined that the first problematic patch, mentioned in comment #7, has already been fixed by this upstream patch: http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=1b2ad41214c9bf6e8befa000f0522629194bf540 I've sent my patch from comment #11 upstream. Hopefully it will be accepted and make its way into a Fedora kernel soon, along with the other upstream patch that fixes the first problem.
Both patches have been submitted to the relevant -stable trees
I think that the patches to fix this should now be in the relevant kernels. Please reopen if you are still experiencing problems with the updated kernels
*** Bug 1093191 has been marked as a duplicate of this bug. ***