Description of problem: when running fsck.ext4 on a 4.5TB file system a massive amount of memory is used. So far the check has gotten to over 34GB VIRT and 13GB RES Version-Release number of selected component (if applicable): e4fsprogs-1.41.5-3.el5 How reproducible: Fails every time Steps to Reproduce: 1. 2. 3. Actual results: I am unable to correct the errors because I am unable to complete an fsck on the file system. Below is a top capture while the fsck was running after 6 hours 12411 root 18 0 36.2g 15g 584 R 39.5 98.6 6:43.94 fsck.ext4 Expected results: Be able to fsck this file system Additional info: Background information ----------------------- The original problem stemmed from the following error Mar 1 10:58:48 katamari kernel: EDAC MC: Ver: 2.0.1 Jan 21 2009 Mar 1 10:58:48 katamari kernel: i5000_edac: waiting for edac_mc to get alive Mar 1 10:58:48 katamari kernel: i5000_edac: waiting for edac_mc to get alive Mar 1 10:58:48 katamari kernel: EDAC MC0: Giving out device to i5000_edac.c I5000: DEV 0000:00:10.0 Mar 1 10:58:48 katamari kernel: kobject_add failed for i5000_edac with -EEXIST, don't try to register things with the same name in the same directory. Mar 1 10:58:48 katamari kernel: Mar 1 10:58:48 katamari kernel: Call Trace: Mar 1 10:58:48 katamari kernel: [<ffffffff801497ae>] kobject_add+0x16e/0x199 Mar 1 10:58:48 katamari kernel: [<ffffffff80148ce0>] cmp_ex+0x0/0x10 Mar 1 10:58:48 katamari kernel: [<ffffffff801498e2>] kobject_register+0x20/0x39 Mar 1 10:58:48 katamari kernel: [<ffffffff80040cf1>] load_module+0x16b9/0x1a19 Mar 1 10:58:48 katamari kernel: [<ffffffff80150743>] pci_bus_read_config_byte+0x0/0x72 Mar 1 10:58:48 katamari kernel: [<ffffffff801246ab>] task_has_capability+0x54/0x60 Mar 1 10:58:48 katamari kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Mar 1 10:58:48 katamari kernel: [<ffffffff800a3dfb>] sys_init_module+0x4d/0x1e8 Mar 1 10:58:48 katamari kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 Mar 1 10:58:48 katamari kernel: Which was apparently a bug and caused the machine to panic. I rebooted the box and performed a yum -y upgrade of all of the packages hoping to correct this issue. I machine crashed again with the following error Mar 1 22:51:54 katamari kernel: nfsd invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0 Mar 1 22:51:54 katamari kernel: Mar 1 22:51:54 katamari kernel: Call Trace: Mar 1 22:51:54 katamari kernel: [<ffffffff800c65e9>] out_of_memory+0x8e/0x2f3 Mar 1 22:51:54 katamari kernel: [<ffffffff8002e2ce>] __wake_up+0x38/0x4f Mar 1 22:51:54 katamari kernel: [<ffffffff8000f487>] __alloc_pages+0x245/0x2ce Mar 1 22:51:54 katamari kernel: [<ffffffff885a1bd1>] :sunrpc:svc_recv+0xfa/0x495 Mar 1 22:51:54 katamari kernel: [<ffffffff8008c86c>] default_wake_function+0x0/0xe Mar 1 22:51:54 katamari kernel: [<ffffffff80064644>] __down_read+0x12/0x92 Mar 1 22:51:54 katamari kernel: [<ffffffff886dd5a1>] :nfsd:nfsd+0x0/0x2cb Mar 1 22:51:54 katamari kernel: [<ffffffff886dd694>] :nfsd:nfsd+0xf3/0x2cb Mar 1 22:51:54 katamari kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Mar 1 22:51:54 katamari kernel: [<ffffffff886dd5a1>] :nfsd:nfsd+0x0/0x2cb Mar 1 22:51:54 katamari kernel: [<ffffffff886dd5a1>] :nfsd:nfsd+0x0/0x2cb Mar 1 22:51:54 katamari kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Mar 1 22:51:54 katamari kernel: Mar 1 22:51:54 katamari kernel: Mem-info: Mar 1 22:51:54 katamari kernel: Node 0 DMA per-cpu: Mar 1 22:51:54 katamari kernel: cpu 0 hot: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 0 cold: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 1 hot: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 1 cold: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 2 hot: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 2 cold: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 3 hot: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: cpu 3 cold: high 0, batch 1 used:0 Mar 1 22:51:54 katamari kernel: Node 0 DMA32 per-cpu: Mar 1 22:51:54 katamari kernel: cpu 0 hot: high 186, batch 31 used:179 Mar 1 22:51:54 katamari kernel: cpu 0 cold: high 62, batch 15 used:35 Mar 1 22:51:54 katamari kernel: cpu 1 hot: high 186, batch 31 used:29 Mar 1 22:51:54 katamari kernel: cpu 1 cold: high 62, batch 15 used:58 Mar 1 22:51:54 katamari kernel: cpu 2 hot: high 186, batch 31 used:29 Mar 1 22:51:54 katamari kernel: cpu 2 cold: high 62, batch 15 used:17 Mar 1 22:51:54 katamari kernel: cpu 3 hot: high 186, batch 31 used:30 Mar 1 22:51:54 katamari kernel: cpu 3 cold: high 62, batch 15 used:61 Mar 1 22:51:54 katamari kernel: Node 0 Normal per-cpu: Mar 1 22:51:54 katamari kernel: cpu 0 hot: high 186, batch 31 used:65 Mar 1 22:51:54 katamari kernel: cpu 0 cold: high 62, batch 15 used:58 Mar 1 22:51:54 katamari kernel: cpu 1 hot: high 186, batch 31 used:150 Mar 1 22:51:54 katamari kernel: cpu 1 cold: high 62, batch 15 used:58 Mar 1 22:51:54 katamari kernel: cpu 2 hot: high 186, batch 31 used:18 Mar 1 22:51:54 katamari kernel: cpu 2 cold: high 62, batch 15 used:57 Mar 1 22:51:54 katamari kernel: cpu 3 hot: high 186, batch 31 used:42 Mar 1 22:51:54 katamari kernel: cpu 3 cold: high 62, batch 15 used:60 Mar 1 22:51:54 katamari kernel: Node 0 HighMem per-cpu: empty Mar 1 22:51:54 katamari kernel: Free pages: 41476kB (0kB HighMem) ee:10369 slab:5813 mapped-file:1070 mapped-anon:2008526 pagetables:9934 nactive:0kB present:10532kB pages_scanned:0 all_unreclaimable? yes Mar 1 22:51:54 katamari kernel: lowmem_reserve[]: 0 3251 8049 8049 ve:1691796kB inactive:1572772kB present:3329568kB pages_scanned:8770701 all_unreclaimable? yes Mar 1 22:51:54 katamari kernel: lowmem_reserve[]: 0 0 4797 4797 ive:2596168kB inactive:2175712kB present:4912640kB pages_scanned:11465080 all_unreclaimable? yes Mar 1 22:51:54 katamari kernel: lowmem_reserve[]: 0 0 0 0 B inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Mar 1 22:51:54 katamari kernel: lowmem_reserve[]: 0 0 0 0 12kB 2*1024kB 0*2048kB 2*4096kB = 10908kB *512kB 1*1024kB 1*2048kB 5*4096kB = 23740kB 1*512kB 0*1024kB 1*2048kB 1*4096kB = 6828kB Mar 1 22:51:54 katamari kernel: Node 0 HighMem: empty Mar 1 22:51:54 katamari kernel: 1553 pagecache pages 0 Mar 1 22:51:54 katamari kernel: Free swap = 0kB Mar 1 22:51:54 katamari kernel: Total swap = 10217332kB Mar 1 22:51:54 katamari kernel: Free swap: 0kB Mar 1 22:51:54 katamari kernel: 2293760 pages of RAM Mar 1 22:51:54 katamari kernel: 250256 reserved pages Mar 1 22:51:54 katamari kernel: 8133 pages shared Mar 1 22:51:54 katamari kernel: 443 pages swap cached Mar 1 22:51:54 katamari kernel: Out of memory: Killed process 5340 (fsck.ext4). Mar 1 22:51:54 katamari sshd[5178]: fatal: Write failed: Broken pipe ts (/exports) (/exports) Mar 1 22:54:30 katamari avahi-daemon[3433]: Invalid query packet. Mar 1 22:54:31 katamari last message repeated 5 times ts (/exports) (/exports) (/exports) at which point we rebooted the machine again. To which I was greeted with Mar 2 23:34:54 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 2 23:34:54 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 2 23:34:54 katamari kernel: EXT4-fs: Unrecognized mount option "extents" or missing value Mar 2 23:35:18 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 2 23:35:18 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 2 23:35:18 katamari kernel: EXT4-fs: barriers enabled Mar 2 23:35:18 katamari kernel: kjournald2 starting: pid 11080, dev dm-0:8, commit interval 5 seconds Mar 2 23:35:18 katamari kernel: EXT4-fs: delayed allocation enabled Mar 2 23:35:18 katamari kernel: EXT4-fs: file extents enabled Mar 2 23:35:18 katamari kernel: EXT4-fs: mballoc enabled Mar 2 23:35:18 katamari kernel: EXT4-fs: mounted filesystem dm-0 with ordered data mode Mar 2 23:35:24 katamari kernel: EXT4-fs: mballoc: 0 blocks 0 reqs (0 success) Mar 2 23:35:24 katamari kernel: EXT4-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Mar 2 23:35:24 katamari kernel: EXT4-fs: mballoc: 0 generated and it took 0 Mar 2 23:35:24 katamari kernel: EXT4-fs: mballoc: 0 preallocated, 0 discarded Mar 2 23:35:35 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 2 23:35:35 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 2 23:35:35 katamari kernel: EXT4-fs: barriers enabled Mar 2 23:35:35 katamari kernel: kjournald2 starting: pid 11107, dev dm-0:8, commit interval 5 seconds Mar 2 23:35:35 katamari kernel: EXT4-fs: delayed allocation enabled Mar 2 23:35:35 katamari kernel: EXT4-fs: file extents enabled Mar 2 23:35:35 katamari kernel: EXT4-fs: mballoc enabled Mar 2 23:35:35 katamari kernel: EXT4-fs: mounted filesystem dm-0 with ordered data mode Mar 2 23:35:46 katamari kernel: EXT4-fs: mballoc: 0 blocks 0 reqs (0 success) Mar 2 23:35:46 katamari kernel: EXT4-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Mar 2 23:35:46 katamari kernel: EXT4-fs: mballoc: 0 generated and it took 0 Mar 2 23:35:46 katamari kernel: EXT4-fs: mballoc: 0 preallocated, 0 discarded Mar 2 23:35:56 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 2 23:35:56 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 2 23:35:56 katamari kernel: EXT4-fs: barriers enabled Mar 2 23:35:56 katamari kernel: kjournald2 starting: pid 11122, dev dm-0:8, commit interval 5 seconds Mar 2 23:35:56 katamari kernel: EXT4-fs: delayed allocation enabled Mar 2 23:35:56 katamari kernel: EXT4-fs: file extents enabled Mar 2 23:35:56 katamari kernel: EXT4-fs: mballoc enabled Mar 2 23:35:56 katamari kernel: EXT4-fs: mounted filesystem dm-0 with ordered data mode The machine then ran for about 24 hours at which point the following message appeared Mar 3 22:14:47 katamari kernel: EXT4-fs: Unrecognized mount option "extents" or missing value Mar 3 22:15:05 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 3 22:15:05 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 3 22:15:05 katamari kernel: EXT4-fs: Unrecognized mount option "extents" or missing value Mar 3 22:17:08 katamari kernel: EXT4-fs: Update your userspace programs to mount using ext4 Mar 3 22:17:08 katamari kernel: EXT4-fs: ext4dev backwards compatibility will go away by 2.6.31 Mar 3 22:17:08 katamari kernel: EXT4-fs: Unrecognized mount option "extents" or missing value Being as I had previously upgraded the entire environment I ignored these errors. Perhaps I should not have because the file system did not mount. I removed the ext4dev fstype from /etc/fstab and also removed the extents,barrier=0 options from /etc/fstab to appease the mount command and got the following error in /var/log/messages Mar 4 12:55:17 katamari kernel: EXT4-fs: barriers disabled Mar 4 12:55:17 katamari kernel: kjournald2 starting: pid 31060, dev dm-0:8, commit interval 5 seconds Mar 4 12:55:17 katamari kernel: EXT4-fs warning: mounting fs with errors, running e2fsck is recommended Mar 4 12:55:17 katamari kernel: EXT4 FS on dm-0, internal journal on dm-0:8 Mar 4 12:55:17 katamari kernel: EXT4-fs: delayed allocation enabled Mar 4 12:55:17 katamari kernel: EXT4-fs: file extents enabled Mar 4 12:55:17 katamari kernel: EXT4-fs: mballoc enabled Mar 4 12:55:17 katamari kernel: EXT4-fs: mounted filesystem dm-0 with ordered data mode Mar 4 12:55:28 katamari kernel: EXT4-fs: mballoc: 0 blocks 0 reqs (0 success) Mar 4 12:55:28 katamari kernel: EXT4-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Mar 4 12:55:28 katamari kernel: EXT4-fs: mballoc: 0 generated and it took 0 Mar 4 12:55:28 katamari kernel: EXT4-fs: mballoc: 0 preallocated, 0 discarded Here is the output of the top 50 lines of dumpe4fs [root@katamari ~]# dumpe4fs /dev/mapper/exports-vml |head -50 dumpe4fs 1.41.5 (23-Apr-2009) Filesystem volume name: vml Last mounted on: <not available> Filesystem UUID: 2753cef4-c76a-4502-b6a8-0ada45017473 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash test_filesystem Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 303759360 Block count: 1215036416 Reserved block count: 0 Free blocks: 118573241 Free inodes: 269707009 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 734 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stride: 8 RAID stripe width: 40 Flex block group size: 16 Filesystem created: Mon Aug 10 10:12:20 2009 Last mount time: Thu Mar 4 13:33:24 2010 Last write time: Thu Mar 4 13:33:27 2010 Mount count: 7 Maximum mount count: 27 Last checked: Mon Aug 10 10:12:20 2009 Check interval: 15552000 (6 months) Next check after: Sat Feb 6 09:12:20 2010 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 7b19f7b8-beb8-433e-b5c8-10e011f64e9f Journal backup: inode blocks Journal size: 128M Group 0: (Blocks 0-32767) [ITABLE_ZEROED] Checksum 0xe67f, unused inodes 0
Is there any tweaks I can due to fsck.ext4 or e4fsck to get this to complete?
Created attachment 397941 [details] fsck.ext4 only known errors These are the only known errors that I have on the file system because it won't complete, this might provide some additional information that is useful. I also have a complete dumpe4fs file which is 98M available at http://www2.fas.sfu.ca/ftp/pub/fas/jpeltier/dumpe4fs.gz
Is the dumpe2fs a raw image that we can point fsck at again? thanks, -Eric
Sorry but I don't understand the question? It wasn't created with e2image if that is what you are asking?
We are currently under a tight deadline which closes on March 19th, 2010. We've currently mounted the file system read-only so that users can continue to do some work. If there is any information that I can provide you without having to take the file system offline I'm willing to do that, otherwise we will need to wait until the deadline passes.
e2image -r (with the -r option) means that we can run e2fsck directly on the image to investigate the behavior. e2image without -r can be examined by debugfs, but can't be directly fsck'd. -Eric
FWIW, adding tons more swap on another filesystem may get you through the fsck, eventually. -Eric
Mar 4 12:55:17 katamari kernel: EXT4-fs warning: mounting fs with errors, running e2fsck is recommended Any idea what the original ext4 error was? It should be in logs somewhere, an error message directly from ext4...
(In reply to comment #4) > Sorry but I don't understand the question? It wasn't created with e2image if > that is what you are asking? I'm sorry, scanned too quickly, you provided dumpe2fs not an e2image, gotcha. An e2image may let us look at this particular filesystem's fsck memory usage, but it may also be pretty huge. Thanks, -Eric
Is there anything I can do while the file system is online in read-only mode to help the process along? Is e2image -r safe to run on this file system to provide you for testing?
(In reply to comment #10) > Is there anything I can do while the file system is online in read-only mode to > help the process along? Is e2image -r safe to run on this file system to > provide you for testing? should be, yes. It'll be a fair bit of read IO on the system while it runs. You'll want to pipe it through bzip as shown in the man page. The result may be fairly large, still, so consider whether you'll be able to provide it in some way... Thanks, -Eric
I just began running e4image -r /dev/mapper/exports-vml - |bzip2 > /tmp/exports-vml.e2i.bz2 and it looks like I'll have the same problem as when I run fsck.ext4. The program is running and chewing up a lot of memory but nothing is being written to the /tmp/exports-vml.e2i.bz2 file. I don't think this is going to work either but I'll leave it running for a bit in the hopes that it might.
Ok, thanks. Unfortunately it'll burn a bit of cpu zipping 0s ... if it interferes with your use of the fs and you need to stop it that's fine; we have done successful fscks of ext4 with many more inodes than this, but of course it was populated in a different way, so if something is diabolical here it might be good to know... -Eric
Can you provide any insight into these file systems? You say that the other ones were populated in a different way? In what way? I'm trying to determine if there is something that I could have avoided when I created the file system. This file system is mainly 10s of thousands of small files, less than 32k, with some larger files sprinkled in. The performance of EXT3 and even EXT4 has been abysmal in this environment and has been a problem since it was deployed. Often the system has become severely over subscribed sometimes by 2-3 times due to kjournald being a bottleneck, but that is likely for another incident. :)
BTW: The e4image is currently 6.3GB RES in memory and still hasn't written a single bit to the e2i file
The filesystem we tested was populated by running the fs_mark benchmark, with generally small files, yes. your perf issues may be related to file & directory layout, but that's likely another question/bug/incident. :) As for the e2image not writing, we may not be able to go this route... e2image may need a revamp for efficiently handling these larger filesystems....
e4fsck failed entirely last night. Same issue, ran out of memory and crashed the node.
Alright, I can get back to troubleshooting this issue again. Is there anything more you would like me to do to try and get this going?
James, apologies for letting this one slide for a while. Is the problematic fs still around? This is going to be tough if we can't somehow see what's going on.... what happened with the e2image attempt? Thanks, -Eric p.s. saw on the centos bugzilla that your mke2fs.conf was ignored; use mke4fs.conf for ext4 utils on rhel5 (we had to put ext4 in a parallel universe to not perturb the production ext3 utils)
FYI: I'm not sure why this is still open. It was corrected in a recent 5.4 release of the e4utils package. I was able to successfully repair the file system, so unless someone else is having difficulty you can close this.
James, it's still open because I hadn't heard that you had had success with an updated version. :) Thanks for the update. I'll close it.