Bug 520720
Summary: | Kernel panic throughout file transfer to gfs2 filesystem partition | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Cleber Paiva de Souza <cleberps> | ||||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.3 | CC: | adas, bmarzins, cleberps, djansa, rpeterso, swhiteho | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-03-12 10:04:26 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 526947, 533192 | ||||||||
Attachments: |
|
Description
Cleber Paiva de Souza
2009-09-01 23:29:56 UTC
You appear to be using journaled data mode. Please confirm the mount arguments which you used. The issue is that somehow we have tried to invalidate a page whilst not in a transaction. Everything else that you've seen has just followed on from that one issue. It looks like there is a page which has been left dirty and then truncated such that when writepage has been passed the page, its only option is to remove it from the journal as it is no longer required for writing back to disk. During this process it has tried to remove the page from the journal and hit this bug. I suspect that if you turn off journaled data mode then that will work around the issue in the short term while we try and come up with a fix for the bug. Created attachment 359507 [details]
Proposed fix (upstream)
I'll try and get a RHEL version sorted out ready for testing.
Created attachment 359518 [details]
Proposed fix (RHEL)
Needs testing, but I suspect that this will do the trick.
(In reply to comment #1) > You appear to be using journaled data mode. Please confirm the mount arguments > which you used. I used the acl and quota=account mount options. Did you do a chattr +j on any files/directories? (In reply to comment #6) > Did you do a chattr +j on any files/directories? No, I only mounted and transfered the files. No settings with chattr nor setfacl. On the original filesystem, (ext3) from where the files were transfered, I used ACL and have directories with were 'setfacl'ed. Now I'm justing testing using gfs version 1, and no problem until now for the same files. Almost 200 GB of data transfered. For gfs2 the system breaks at most with 10 GB of data transfer, sometimes sooner. The next test will be disabling data journaling for gfs2. (In reply to comment #2) > The issue is that somehow we have tried to invalidate a page whilst not in a > transaction. Everything else that you've seen has just followed on from that > one issue. > > It looks like there is a page which has been left dirty and then truncated such > that when writepage has been passed the page, its only option is to remove it > from the journal as it is no longer required for writing back to disk. During > this process it has tried to remove the page from the journal and hit this bug. > > I suspect that if you turn off journaled data mode then that will work around > the issue in the short term while we try and come up with a fix for the bug. The partition was already mounted as data=ordered, since this is the default and I do not specified anything for data= during the mounting. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I haven't been able to figure out what is going on here yet. If you upgrade to the latest 5.4 kernel does the issue go away? Also, please check if you upgraded from 5.2 that the gfs2 kmod isn't still around as that is know to be broken and unfortunately the upgrade process doesn't remove it and it will load in preference to the 5.3 gfs2 module if it is not removed by hand. This issue has been in needinfo for several months now. I greatly suspect that it was caused by a left-over gfs2 kmod. We've had one report via the mailing lists of a very similar result which appeared to have been caused by exactly the same thing (left over kmod). Since we've heard nothing more from the reporters of either issue since the suggestion to check for the kmod, I assume that must have been the cause. We are therefore closing this issue and if that is incorrect, please reopen the bug. |