Bug 1133395 - GFS2: corrupt/inconsistent inodes caused by generic operations approximately after kernel-3.13.8
Summary: GFS2: corrupt/inconsistent inodes caused by generic operations approximately ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: GFS
Version: 20
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Robert Peterson
QA Contact:
URL:
Whiteboard:
: 1093191 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-25 05:51 UTC by Albert Wang
Modified: 2014-11-17 13:24 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-11-17 11:54:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
corosync.conf for test cluster. (713 bytes, text/plain)
2014-08-25 05:51 UTC, Albert Wang
no flags Details
Early prototype #1 (421 bytes, patch)
2014-09-26 20:23 UTC, Robert Peterson
no flags Details | Diff
Early prototype #2 (1.76 KB, patch)
2014-09-26 20:37 UTC, Robert Peterson
no flags Details | Diff

Description Albert Wang 2014-08-25 05:51:14 UTC
Created attachment 930329 [details]
corosync.conf for test cluster.

This is my first Bugzilla report, please pardon me/let me know, if I'm not being specific enough.

Description of problem:
I have a GFS/DRBD/dlm storage cluster, which hosts a local yum repos and a Subversion repository (via httpd). Somewhere after kernel-3.13.8, generic operations such as createrepo, or Subversion commit, via https, began failing, with Input/output error reported in temp directories used by these applications. 

Follow up fsck.gfs2 check reports error such as:
Starting pass2
Entries is 6 - should be 5 for inode block 12124184 (0xb90018)
...

Starting pass4
Found unlinked inode at 12124206 (0xb9002e)
Added inode #12124206 (0xb9002e) to lost+found
...

I managed to narrow down the problem somewhat, and take out DRBD in the equation, by creating a single node Corosync cluster, and found that the problem appeared somewhere between kernels 3.13.8 and 3.14.2 (and as late as 3.15.10 still generates the same corrupted inodes).

The problem disappears with the manual install of older kernel. Please advice.

Version-Release number of selected component (if applicable):
kernel-3.14.2

How reproducible:
See next section.

Steps to Reproduce:
1. Create a single-node cluster in /etc/corosync/corosync.conf, see attached.
2. systemctl start corosync; systemctl start dlm
3. mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:my_test /dev/sda
4. mount /dev/sda /mnt/temp
5. mkdir -p /mnt/temp/test; cd /mnt/temp/test; wget https://kojipkgs.fedoraproject.org//packages/kernel/3.13.8/200.fc20/x86_64/kernel-headers-3.13.8-200.fc20.x86_64.rpm
(any rpm would do).
6. createrepo .
7. Output error:
...
yum.Errors.MiscError: Error opening file for checksum: /mnt/temp/test/./.repodata/other.sqlite
8. ls .repodata
9. Error:
ls: reading directory .repodata/: Input/output error
10. cd /mnt; umount temp; fsck.gfs2 -afy /dev/sda
11. See errors:
...
Starting pass2
Entries is 6 - should be 5 for inode block 12124184 (0xb90018)
...
Starting pass4
Found unlinked inode at 12124216 (0xb90038)
Added inode #12124216 (0xb90038) to lost+found
pass4 complete      
...

Actual results:
See steps 6-11

Expected results:
Should not see any of the errors from 6-11 (works normally with 3.13.8-200

Additional info:

Comment 1 Robert Peterson 2014-08-25 14:15:08 UTC
Do you have dmesg output taken right after the failure?
There were a lot of changes done to GFS2 between those two
kernels, so I need more to go on. Unfortunately, all my lab
machines are running RHEL, not Fedora, so I don't have a good
way to recreate the problem here without a lot of extra work.
(As you know, my paying Red Hat customers need to come first
before I work on Fedora).

Comment 2 Albert Wang 2014-08-25 15:02:31 UTC
(In reply to Robert Peterson from comment #1)
> Do you have dmesg output taken right after the failure?
Thank you very much for the response, Robert. Please see below for the dmesg output, before and after the failure. There doesn't seem to be any additional logging in dmesg in between. I can try install a kernel version in between 3.13.8 and 3.14.2, and be more precise on the version that caused problems.


[root@amy-vm ~]# systemctl start corosync; systemctl start dlm
[root@amy-vm ~]# mount /dev/sda /mnt/temp/
[root@amy-vm ~]# dmesg |tail -n 20
[   92.793820] dlm: pm_store: dlm_recover 1
[   92.793858] dlm: pm_store: add member 3
[   92.793861] dlm: pm_store: dlm_recover_members 1 nodes
[   92.793865] dlm: pm_store: generation 1 slots 1 1:3
[   92.793867] dlm: pm_store: dlm_recover_directory
[   92.793869] dlm: pm_store: dlm_recover_directory 0 in 0 new
[   92.793872] dlm: pm_store: dlm_recover_directory 0 out 0 messages
[   92.793890] dlm: pm_store: dlm_recover 1 generation 1 done: 0 ms
[   92.793906] dlm: pm_store: group event done 0 0
[   92.793920] dlm: pm_store: join complete
[   93.294407] gfs2: fsid=polymath:pm_store: first mounter control generation 0
[   93.294417] gfs2: fsid=polymath:pm_store: Joined cluster. Now mounting FS...
[   93.370693] gfs2: fsid=polymath:pm_store.0: journal 0 mapped with 1 extents
[   93.370714] gfs2: fsid=polymath:pm_store.0: jid=0, already locked for use
[   93.370719] gfs2: fsid=polymath:pm_store.0: jid=0: Looking at journal...
[   93.406516] gfs2: fsid=polymath:pm_store.0: jid=0: Done
[   93.408151] gfs2: fsid=polymath:pm_store.0: jid=1: Trying to acquire journal lock...
[   93.410011] gfs2: fsid=polymath:pm_store.0: jid=1: Looking at journal...
[   93.512277] gfs2: fsid=polymath:pm_store.0: jid=1: Done
[   93.513549] gfs2: fsid=polymath:pm_store.0: first mount done, others may mount
[root@amy-vm ~]# cd /mnt/temp/test/
[root@amy-vm test]# createrepo .
Spawning worker 0 with 1 pkgs
Workers Finished
Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Traceback (most recent call last):
  File "/usr/share/createrepo/genpkgmetadata.py", line 308, in <module>
    main(sys.argv[1:])
  File "/usr/share/createrepo/genpkgmetadata.py", line 280, in main
    mdgen.doRepoMetadata()
  File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 1022, in doRepoMetadata
    db_csums[ftype] = misc.checksum(sumtype, resultpath)
  File "/usr/lib/python2.7/site-packages/yum/misc.py", line 371, in checksum
    raise MiscError, 'Error opening file for checksum: %s' % file
yum.Errors.MiscError: Error opening file for checksum: /mnt/temp/test/./.repodata/other.sqlite
[root@amy-vm test]# dmesg |tail -n 20
[   92.793820] dlm: pm_store: dlm_recover 1
[   92.793858] dlm: pm_store: add member 3
[   92.793861] dlm: pm_store: dlm_recover_members 1 nodes
[   92.793865] dlm: pm_store: generation 1 slots 1 1:3
[   92.793867] dlm: pm_store: dlm_recover_directory
[   92.793869] dlm: pm_store: dlm_recover_directory 0 in 0 new
[   92.793872] dlm: pm_store: dlm_recover_directory 0 out 0 messages
[   92.793890] dlm: pm_store: dlm_recover 1 generation 1 done: 0 ms
[   92.793906] dlm: pm_store: group event done 0 0
[   92.793920] dlm: pm_store: join complete
[   93.294407] gfs2: fsid=polymath:pm_store: first mounter control generation 0
[   93.294417] gfs2: fsid=polymath:pm_store: Joined cluster. Now mounting FS...
[   93.370693] gfs2: fsid=polymath:pm_store.0: journal 0 mapped with 1 extents
[   93.370714] gfs2: fsid=polymath:pm_store.0: jid=0, already locked for use
[   93.370719] gfs2: fsid=polymath:pm_store.0: jid=0: Looking at journal...
[   93.406516] gfs2: fsid=polymath:pm_store.0: jid=0: Done
[   93.408151] gfs2: fsid=polymath:pm_store.0: jid=1: Trying to acquire journal lock...
[   93.410011] gfs2: fsid=polymath:pm_store.0: jid=1: Looking at journal...
[   93.512277] gfs2: fsid=polymath:pm_store.0: jid=1: Done
[   93.513549] gfs2: fsid=polymath:pm_store.0: first mount done, others may mount
[root@amy-vm test]#

Comment 3 Robert Peterson 2014-08-25 15:12:02 UTC
Thanks, Albert. That tells me GFS2 is not crashing, which I
thought might be happening. If you can narrow it down to a
specific kernel, that would be very helpful.

Comment 4 Albert Wang 2014-08-25 15:57:25 UTC
Hi Robert, 
Per your suggestion, I installed both kernel-3.13.9 and kernel-3.14.1 (fc20), the two adjacent version on http://koji.fedoraproject.org. 

What I saw was that the steps above worked fine for 3.13.9, but started generating errors in 3.14.1. In other words, the last good version was actually 3.13.9.

Interestingly, when I tried to umount the gfs device after the error, in 3.14.1, the umount command hung for >30 seconds every time and I ended up having to restart. Postmortem examination of /var/log/messages showed no error message.

Please let me know if there's anything else I can do to help.

Comment 5 Robert Peterson 2014-09-16 15:38:20 UTC
This is probably a duplicate of bug #1093191, which was discovered
in-house, so it may be private. Still working on it.

Comment 7 Robert Peterson 2014-09-25 19:58:46 UTC
I've spent all day chasing this problem. Apparently GFS2 has
two problems. The first one has something to do with this
upstream patch:

http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=70d4ee94b370c5ef54d0870600f16bd92d18013c

If I revert that patch, I can get intermediate versions to work
properly with no failures. However, if I revert that patch ONLY
from today's Fedora kernel, I've still got a recreateable problem.
So I need to track down a second regression.

The good news is that the cited patch is not in the RHEL7
kernel, so no problems there (although I have not tried to
recreate the failure there. That's on my agenda).

Comment 8 Robert Peterson 2014-09-26 14:25:48 UTC
The second problem is caused by this upstream patch:

http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=2b47dad866d04f14c328f888ba5406057b8c7d33

So now that I've identified two upstream patches that are
causing problems, I need to debug what's wrong with them
and possibly fix or revert them.

Comment 9 Robert Peterson 2014-09-26 20:19:05 UTC
I figured out what's wrong with the patch entitled "Remember
directory insert point". The problem is with the rename path.
In the failing case, this is what happens:

1. Rename is called to rename a file in a directory. For example,
   let's say it wants to rename "a" to "b" in some directory.
2. Function gfs2_rename calls gfs2_diradd_alloc_required to
   determine if an allocation is required to do the rename.
3. Function gfs2_diradd_alloc_required calls gfs2_dirent_search
   to see if there's enough room for a new dirent called "b".
   There is. So it remembers that location as the dirent that
   immediately follows "a". It correctly determines that no
   allocations are necessary, and returns success.
4. Function gfs2_rename then calls gfs2_dir_del to remove the
   old directory entry that says "a".
5. Function gfs2_rename then calls gfs2_dir_add to add the new
   directory entry "b", but the problem is, the remembered
   dirent location, "dent", still points to the entry AFTER "a".
   Since "a" is now removed, it needs to put the dirent for "b"
   in its place. Instead, it puts it AFTER the deleted "a". And
   therefore it never gets linked into the other dirents properly.

There are a number of ways we can fix this particular problem and
still retain the speed-up from the patch.

Comment 10 Robert Peterson 2014-09-26 20:23:25 UTC
Created attachment 941703 [details]
Early prototype #1

This early prototype patch circumvents the problem by reverting
to previous behavior: If the call to gfs2_diradd_alloc_required
saves a buffer, it releases the buffer and sets it to NULL.
This forces the dirent search to be done a second time, thus
finding the proper entry to be added.

As an alternative, I could add a new parameter to function
gfs2_diradd_alloc_required to specify whether you want the
buffer_head to be released or dent location kept. This patch
is kinda "hacky" but it reverts to previous behavior only for
rename, which should be less common. The other way is cleaner,
but it costs a few CPU cycles to check the parameter.

Comment 11 Robert Peterson 2014-09-26 20:37:16 UTC
Created attachment 941715 [details]
Early prototype #2

Here's an alternative patch. A bit bigger, but perhaps more
elegant.

I'll let Steve Whitehouse decide which he prefers.

Then, of course, there's still the question of what's wrong with
that second patch. I need to debug that as well.

Comment 12 Robert Peterson 2014-09-29 14:31:17 UTC
I've determined that the first problematic patch, mentioned in
comment #7, has already been fixed by this upstream patch:
http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=1b2ad41214c9bf6e8befa000f0522629194bf540

I've sent my patch from comment #11 upstream. Hopefully it will
be accepted and make its way into a Fedora kernel soon, along
with the other upstream patch that fixes the first problem.

Comment 13 Steve Whitehouse 2014-10-13 11:50:21 UTC
Both patches have been submitted to the relevant -stable trees

Comment 14 Steve Whitehouse 2014-11-17 11:54:38 UTC
I think that the patches to fix this should now be in the relevant kernels. Please reopen if you are still experiencing problems with the updated kernels

Comment 15 Andrew Price 2014-11-17 13:24:00 UTC
*** Bug 1093191 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.