Bug 786454 - OOPs when online resizing ext3 fs
Summary: OOPs when online resizing ext3 fs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Lukáš Czerner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-01 13:50 UTC by Milan Broz
Modified: 2013-03-01 04:11 UTC (History)
7 users (show)

Fixed In Version: kernel-3.3.0-0.rc4.git1.4.fc17
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-28 10:56:10 UTC
Type: ---


Attachments (Terms of Use)

Description Milan Broz 2012-02-01 13:50:32 UTC
Description of problem:
Resize of ext3 filesystem causes crash on
Linux saloonio 3.3.0-0.rc1.git6.1.fc17.x86_64 #1 SMP Mon Jan 30 21:47:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[ 5596.791034] kernel BUG at fs/ext4/resize.c:218!
[ 5596.791357] invalid opcode: 0000 [#19] SMP 
[ 5596.791761] CPU 2 
[ 5596.791907] Modules linked in: binfmt_misc raid0 dm_thin_pool dm_persistent_data dm_bufio libcrc32c dm_raid raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx lockd sunrpc tpm_bios snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd i2c_i801 iTCO_wdt microcode iTCO_vendor_support e1000e soundcore snd_page_alloc shpchp virtio_net kvm_intel kvm firewire_ohci firewire_core crc_itu_t qla2xxx scsi_transport_fc scsi_tgt i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_debug]
[ 5596.797624] 
[ 5596.797787] Pid: 14231, comm: resize2fs Tainted: G      D W    3.3.0-0.rc1.git6.1.fc17.x86_64 #1 Intel To be filled by O.E.M./To be filled by O.E.M.
[ 5596.798764] RIP: 0010:[<ffffffff8126c892>]  [<ffffffff8126c892>] ext4_resize_fs+0x932/0xa70
[ 5596.799383] RSP: 0018:ffff8801080c1ce8  EFLAGS: 00010246
[ 5596.799749] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011c6be400
[ 5596.800214] RDX: ffff88011fef5cd0 RSI: 00000000000033ff RDI: ffff88011fef5cd0
[ 5596.800676] RBP: ffff8801080c1db8 R08: 0000000000008000 R09: ffff8801080c1d7c
[ 5596.801145] R10: ffff8801080c1d84 R11: ffff880131408480 R12: ffff880112c3da08
[ 5596.801613] R13: ffff880113107c78 R14: ffff88011fef37b0 R15: 0000000000000000
[ 5596.802079] FS:  00007f0081781740(0000) GS:ffff880131c00000(0000) knlGS:0000000000000000
[ 5596.802632] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5596.803023] CR2: 00007f0080bc3ee0 CR3: 000000010dfa7000 CR4: 00000000000006e0
[ 5596.803487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5596.803950] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5596.804413] Process resize2fs (pid: 14231, threadinfo ffff8801080c0000, task ffff880113e28000)
[ 5596.804988] Stack:
[ 5596.805178]  0000000000000000 0000000000000001 0000000000000002 0000000000000000
[ 5596.805885]  00000000000033ff ffff880094191220 0000000000003400 0000000000008000
[ 5596.806591]  ffff88011c6be400 ffff880100000000 ffff880112c3da08 ffff88011c6be400
[ 5596.807297] Call Trace:
[ 5596.807512]  [<ffffffff812bc468>] ? security_capable+0x18/0x20
[ 5596.807908]  [<ffffffff8124c10a>] ext4_ioctl+0x28a/0xb20
[ 5596.808277]  [<ffffffff810ccf9d>] ? trace_hardirqs_on+0xd/0x10
[ 5596.808673]  [<ffffffff816987cc>] ? __slab_free+0x211/0x265
[ 5596.809056]  [<ffffffff811cf819>] do_vfs_ioctl+0x99/0x5a0
[ 5596.809429]  [<ffffffff811c7323>] ? putname+0x33/0x50
[ 5596.809822]  [<ffffffff811a06c4>] ? kmem_cache_free+0x234/0x250
[ 5596.810226]  [<ffffffff811bcc04>] ? fget_light+0x244/0x4a0
[ 5596.810603]  [<ffffffff811cfdb9>] sys_ioctl+0x99/0xa0
[ 5596.810959]  [<ffffffff816aa369>] system_call_fastpath+0x16/0x1b
[ 5596.811364] Code: ff ff ff e9 3d fa ff ff 0f 0b 48 8b 95 60 ff ff ff 48 8b b5 68 ff ff ff 48 c7 c7 48 65 9f 81 31 c0 e8 3b 88 42 00 e9 3a fa ff ff <0f> 0b 8b 4d a4 31 d8 f7 d9 85 c1 0f 84 e1 fa ff ff 0f 0b 0f 0b 
[ 5596.815705] RIP  [<ffffffff8126c892>] ext4_resize_fs+0x932/0xa70
[ 5596.816159]  RSP <ffff8801080c1ce8>
[ 5596.816529] ---[ end trace 0a8f978ed43d1aaa ]---

Version-Release number of selected component (if applicable):
kernel-3.3.0-0.rc1.git6.1.fc17.x86_64

How reproducible:

Just run this script, resize2fs crashes

#!/bin/bash -x

DEV=/dev/sdc

[ ! -d /mnt/tst ] && mkdir /mnt/tst
pvcreate $DEV
vgcreate vg_test $DEV
lvcreate -n LV -L 30M vg_test
mkfs.ext3 -b4096 -j /dev/vg_test/LV
mount /dev/vg_test/LV /mnt/tst

# lvresize -L+20M -r -n vg_test/LV
# split into 2 steps:
lvresize -L+20M vg_test/LV
resize2fs /dev/vg_test/LV

umount /mnt/tst
vgchange -an vg_test
pvremove -y -ff $DEV

Comment 1 Lukáš Czerner 2012-02-12 11:06:07 UTC
The patch is waiting for upstream merge for quite some time now: 

http://www.spinics.net/lists/linux-ext4/msg30293.html

But hopefully will be merged soon :).

Description:

When resizing file system in the way that the new size of the file
system is still in the same group (no new groups are added), then we can
hit a BUG_ON in ext4_alloc_group_tables()

BUG_ON(flex_gd->count == 0 || group_data == NULL);

because flex_gd->count is zero. The reason is the missing check for such
case, so the code always extend the last group fully and then attempt to
add more groups, but at that time n_blocks_count is actually smaller
than o_blocks_count.

It can be easily reproduced like this:

mkfs.ext4 -b 4096 /dev/sda 30M
mount /dev/sda /mnt/test
resize2fs /dev/sda 50M

Fix this by checking whether the resize happens within the singe group
and only add that many blocks into the last group to satisfy user
request. Then o_blocks_count == n_blocks_count and the resize will exit
successfully without and attempt to add more groups into the fs.

Also fix mixing together block number and blocks count which might be
confusing and can easily lead to off-by-one errors (but it is actually
not the case here since the two occurrence of this mix-up will cancel
each other).

Comment 2 Josh Boyer 2012-02-20 23:48:47 UTC
(In reply to comment #1)
> The patch is waiting for upstream merge for quite some time now: 
> 
> http://www.spinics.net/lists/linux-ext4/msg30293.html

Thanks Lukáš.  I don't see this in linux-next or in the 'dev' branch of the ext4 git tree.  Maybe you want to poke on the thread and follow up?

Comment 3 Lukáš Czerner 2012-02-21 07:10:55 UTC
Oh, thanks for reminding me :).

-Lukas

Comment 4 Josh Boyer 2012-02-21 19:42:46 UTC
Ted finally applied this.  Yay.

I threw the patch in f17/rawhide.

Comment 5 Fedora Update System 2012-02-22 19:58:26 UTC
kernel-3.3.0-0.rc4.git1.4.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.3.0-0.rc4.git1.4.fc17

Comment 6 Fedora Update System 2012-02-23 22:31:22 UTC
Package kernel-3.3.0-0.rc4.git1.4.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.3.0-0.rc4.git1.4.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2304/kernel-3.3.0-0.rc4.git1.4.fc17
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2012-02-28 10:56:10 UTC
kernel-3.3.0-0.rc4.git1.4.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.