Bug 549397
Summary: | I/O errors while accessing loop devices or file-based Xen images from GFS volume after Update from RHEL 5.3 to 5.4 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Michal Markowski <markowski> |
Component: | kernel | Assignee: | Josef Bacik <jbacik> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.5 | CC: | adas, bmarzins, edamato, grimme, jbacik, rpeterso, swhiteho, tao |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-03-30 07:16:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 526947 | ||
Attachments: |
Description
Michal Markowski
2009-12-21 15:16:45 UTC
Created attachment 379634 [details]
Xen console output after starting a VM with disk on a gfs volume
Created attachment 379635 [details]
dmesg output after trying to mount a loop device from gfs volume
Created attachment 379636 [details]
console output after trying to mount a loof device from a gfs volume
Created attachment 379637 [details]
xen log after starting a VM with disk on a gfs volume
Created attachment 379638 [details]
List of rpms istalled on the system
I can see whats going on there. The loop device is trying to call (not unreasonably) gfs_prepare_write as gfs is trying to write to a block. The issue is that this is not allowed unless the caller has already locked the glock. Unfortunately it is a consequence of the level at which gfs does its locking. GFS2 locks at the page cache level so that it will not have this issue. Also it will only affect writes, so that read only mounts via loop should be unaffected. The problem seems to be due to this bit of the new aops patches: @@ -791,7 +770,7 @@ static int loop_set_fd(struct loop_device *lo, struct file * lo_file, */ if (!file->f_op->sendfile) goto out_putf; - if (aops->prepare_write && aops->commit_write) + if (aops->prepare_write || aops->write_begin) lo_flags |= LO_FLAGS_USE_AOPS; if (!(lo_flags & LO_FLAGS_USE_AOPS) && !file->f_op->write) lo_flags |= LO_FLAGS_READ_ONLY; We were relying on the lack of a commit_write aop to not set the LO_FLAGS_USE_AOPS flag. Before I forget where it is, here is the gfs end of this issue: http://git.fedoraproject.org/git/cluster.git?p=cluster.git;a=commitdiff;h=386fe588f9e9bd70568df7795d36d88534b67a7d That patch was added to fix this issue the first time it cropped up. Created attachment 380060 [details]
possible fix
Please try this patch and verify it fixes your problem. I've built it and tested it to make sure it doesn't blow up, but I didn't test it with gfs.
Changing bug from RHEL4.x to 5.x. In my opinion, this needs to get fixed in 5.5. Setting and/or requesting flags appropriately. I currently have some problems building a kernel RPM from SRPM. As soon as that works, I will check out the patch and give you feedback on this matter. Happy New Year! I've built new kernel rpm, installed it, tested. It seems to work fine now. When can I expect an official bugfix, so I can go productive with it, with a supported kernel? dmesg output: [mount GFS] Trying to join cluster "lock_nolock", "clurhel5:root" Joined cluster. Now mounting FS... GFS: fsid=clurhel5:root.0: jid=0: Trying to acquire journal lock... GFS: fsid=clurhel5:root.0: jid=0: Looking at journal... [...] GFS: fsid=clurhel5:root.0: jid=3: Done GFS: fsid=clurhel5:root.0: Scanning for log elements... GFS: fsid=clurhel5:root.0: Found 0 unlinked inodes GFS: fsid=clurhel5:root.0: Found quota changes for 0 IDs GFS: fsid=clurhel5:root.0: Done [create loop device from a file on the gfs volume] loop: loaded (max 8 devices) [create ext3 filesystem on loop device, mount it] kjournald starting. Commit interval 5 seconds EXT3 FS on dm-0, internal journal EXT3-fs: mounted filesystem with ordered data mode. [no gfs errors! good job!] I'm reassigning this to Josef, since he wrote the patch. We're investigating whether we can still push this into 5.5 or not. It's very late in the process but we might be able to go through the exception process. I'll bump the priority toward that end, although we might need to bump it higher. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. RHEL 5.5 for this problem is not an option. I have a productive system with failed RHEL 5.4 update, starting the productive virtual machines may lead to data corruption. I need a supported kernel patch ASAP. This bug has to go into 5.5 first. If it is also required in 5.4.z then this bug needs to be cloned in order for that to happen. That is normally done via GSS so if you have a contact there, please ask them to do that. If not let us know and I'll try and kick off the process directly. *** Bug 566184 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |