Bug 445858
Summary: | GFS: gfs_fsck cannot allocate enough memory to run on large file systems | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ben Yarwood <ben.yarwood> |
Component: | gfs-utils | Assignee: | Robert Peterson <rpeterso> |
Status: | CLOSED WONTFIX | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 5.0 | CC: | edamato, rwheeler, slevine, swhiteho |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-03-15 17:14:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ben Yarwood
2008-05-09 13:53:51 UTC
Reassigning to myself: I've been working with Ben on this. According to http://www.redhat.com/rhel/compare/ the RHEL5 release does not ship with a HUGEMEM kernel, so on x86 (32-bit) platforms, the system is limited to 3GB of address space. So regardless of how much swap and/or ram you have, gfs_fsck cannot run properly today on a 16TB file system as documented, even if gfs itself can run. It turns out that for a 16TB file system, gfs_fsck allocates a 2GB chunk of RAM, then 3 more 1G chunks of RAM, for its internal bitmaps. These bitmaps are needed to keep track of every block in the file system and to determine what kind of block it is (inode, directory, data, duplicate, etc.) One way I've thought of to fix it is to try to process the bitmaps one resource group at a time, rather than all bitmaps in memory at once. The problem with that approach is that a block in one resource group can reference blocks in another resource group. So it might get complex trying to keep the cross-RG references straight. Plus each pass the code makes references the bits left behind by the previous passes. Another thought is to try to pare down the memory usage by combining certain bits in the bitmaps and adding some code to resolve the type. Perhaps we can get it down to one bitmap of 2G that way. Right now, the only circumvention is to do the gfs_fsck on a 64-bit arch. Correction to comment #1: It allocates 2G for bitmaps, then three smaller bitmaps of 512MB each (not 1G as previous stated). I've been working on the issue of gfs_fsck memory usage indirectly because gfs2_fsck has the same problem. For bug #404611, I wanted to test a gfs2_fsck against a 2T file system that's had millions of files and directories created through a benchmark program called benchp. The test system, kool, has 2GB of memory. The test didn't go so well because of gfs2_fsck's memory problems that are directly inherited from gfs_fsck. I created a patch that saves lots of memory. Basically, I eliminated three of the four bitmaps associated with the file system and made some data structures smaller. See the attachment associated with this comment: https://bugzilla.redhat.com/show_bug.cgi?id=404611#c8 Despite the patch, the program ran all weekend and only made it to 3% done with pass1. In doing all this, I've researched where gfs_fsck and gfs2_fsck are using their memory. A big part of the problem is that during pass1, it creates elements in memory associated with every inode and every directory. Their purpose is to keep counts, primarily the count of links, to be used later in pass4. So the patch is not enough memory savings. In fact, the cpu is way underutilized and the system spends most of its time swapping to disk. So I'm looking for more good ways to improve the memory usage and anything I find in gfs2_fsck should directly apply to gfs_fsck. The fix isn't ready to ship yet; retargeting to 5.6. I don't think we should fix this bug for several reasons: (1) It would be a lot of work and redesign for gfs's fsck. (2) gfs is being phased out in favor of gfs2, where we should consider fixing it. (3) To the best of my knowledge, no customers have expressed an interest in the fix. (4) There are no customer issues attached to the bug. I think the best course of action is to close this as WONTFIX unless customers start complaining and demanding a fix. Until that time, I think I'll open a DOC bug to document the approximate memory requirements for gfs_fsck. In gfs_fsck the majority of the memory used is consumed for the block maps that are all kept in memory. There is one big array that needs a nibble (half-byte) for each block, and three smaller arrays that need two bits per block. Add to that the additional memory needed for the buffers, the dinode hash table, the directory hash table and the duplicates linked list. So every block needs at least 7 bits plus slop, so call it 8 bits, or one byte, per block. So a good estimate is file system size (in bytes) divided by the block size, and that will be approximately how much memory you will need to run gfs_fsck. For this particular 16TB file system, the file system is 16TB and the block size is 4K, so 16TB / 4K blocks: 17592186044416 / 4096 = 4294967296, so this file system requires approximately 4GB of free memory to run gfs_fsck, above and beyond all the memory used for the operating system and kernel. Note that if the block size was 1K, it would require four times the memory. |