Bug 570639
Summary: | fsck.ext4 uses massive amounts of memory to check a 4.5TB | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | James A. Peltier <james_a_peltier> | ||||
Component: | e4fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.4 | CC: | james_a_peltier, kurt, ralph | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-09-23 16:42:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
James A. Peltier
2010-03-04 22:04:27 UTC
Is there any tweaks I can due to fsck.ext4 or e4fsck to get this to complete? Created attachment 397941 [details] fsck.ext4 only known errors These are the only known errors that I have on the file system because it won't complete, this might provide some additional information that is useful. I also have a complete dumpe4fs file which is 98M available at http://www2.fas.sfu.ca/ftp/pub/fas/jpeltier/dumpe4fs.gz Is the dumpe2fs a raw image that we can point fsck at again? thanks, -Eric Sorry but I don't understand the question? It wasn't created with e2image if that is what you are asking? We are currently under a tight deadline which closes on March 19th, 2010. We've currently mounted the file system read-only so that users can continue to do some work. If there is any information that I can provide you without having to take the file system offline I'm willing to do that, otherwise we will need to wait until the deadline passes. e2image -r (with the -r option) means that we can run e2fsck directly on the image to investigate the behavior. e2image without -r can be examined by debugfs, but can't be directly fsck'd. -Eric FWIW, adding tons more swap on another filesystem may get you through the fsck, eventually. -Eric Mar 4 12:55:17 katamari kernel: EXT4-fs warning: mounting fs with errors, running e2fsck is recommended Any idea what the original ext4 error was? It should be in logs somewhere, an error message directly from ext4... (In reply to comment #4) > Sorry but I don't understand the question? It wasn't created with e2image if > that is what you are asking? I'm sorry, scanned too quickly, you provided dumpe2fs not an e2image, gotcha. An e2image may let us look at this particular filesystem's fsck memory usage, but it may also be pretty huge. Thanks, -Eric Is there anything I can do while the file system is online in read-only mode to help the process along? Is e2image -r safe to run on this file system to provide you for testing? (In reply to comment #10) > Is there anything I can do while the file system is online in read-only mode to > help the process along? Is e2image -r safe to run on this file system to > provide you for testing? should be, yes. It'll be a fair bit of read IO on the system while it runs. You'll want to pipe it through bzip as shown in the man page. The result may be fairly large, still, so consider whether you'll be able to provide it in some way... Thanks, -Eric I just began running e4image -r /dev/mapper/exports-vml - |bzip2 > /tmp/exports-vml.e2i.bz2 and it looks like I'll have the same problem as when I run fsck.ext4. The program is running and chewing up a lot of memory but nothing is being written to the /tmp/exports-vml.e2i.bz2 file. I don't think this is going to work either but I'll leave it running for a bit in the hopes that it might. Ok, thanks. Unfortunately it'll burn a bit of cpu zipping 0s ... if it interferes with your use of the fs and you need to stop it that's fine; we have done successful fscks of ext4 with many more inodes than this, but of course it was populated in a different way, so if something is diabolical here it might be good to know... -Eric Can you provide any insight into these file systems? You say that the other ones were populated in a different way? In what way? I'm trying to determine if there is something that I could have avoided when I created the file system. This file system is mainly 10s of thousands of small files, less than 32k, with some larger files sprinkled in. The performance of EXT3 and even EXT4 has been abysmal in this environment and has been a problem since it was deployed. Often the system has become severely over subscribed sometimes by 2-3 times due to kjournald being a bottleneck, but that is likely for another incident. :) BTW: The e4image is currently 6.3GB RES in memory and still hasn't written a single bit to the e2i file The filesystem we tested was populated by running the fs_mark benchmark, with generally small files, yes. your perf issues may be related to file & directory layout, but that's likely another question/bug/incident. :) As for the e2image not writing, we may not be able to go this route... e2image may need a revamp for efficiently handling these larger filesystems.... e4fsck failed entirely last night. Same issue, ran out of memory and crashed the node. Alright, I can get back to troubleshooting this issue again. Is there anything more you would like me to do to try and get this going? James, apologies for letting this one slide for a while. Is the problematic fs still around? This is going to be tough if we can't somehow see what's going on.... what happened with the e2image attempt? Thanks, -Eric p.s. saw on the centos bugzilla that your mke2fs.conf was ignored; use mke4fs.conf for ext4 utils on rhel5 (we had to put ext4 in a parallel universe to not perturb the production ext3 utils) FYI: I'm not sure why this is still open. It was corrected in a recent 5.4 release of the e4utils package. I was able to successfully repair the file system, so unless someone else is having difficulty you can close this. James, it's still open because I hadn't heard that you had had success with an updated version. :) Thanks for the update. I'll close it. |