Bug 857618 - [abrt]: [120634.182802] kernel BUG at fs/ext4/extents.c:1969!
Summary: [abrt]: [120634.182802] kernel BUG at fs/ext4/extents.c:1969!
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:577850f769d432ab80c963473db...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-15 08:35 UTC by Mikhail Veltishchev
Modified: 2012-11-14 17:04 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-14 17:04:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 47611 0 None None None 2012-09-19 16:34:35 UTC

Description Mikhail Veltishchev 2012-09-15 08:35:28 UTC
libreport version: 2.0.10
cmdline:        BOOT_IMAGE=/vmlinuz-3.4.9-2.fc16.x86_64 root=/dev/mapper/vg_lambda-lv_root ro rd.md=0 rd.dm=0 rd.lvm.lv=vg_lambda/lv_swap quiet SYSFONT=latarcyrheb-sun16 rhgb rd.lvm.lv=vg_lambda/lv_root rd.luks=0 KEYTABLE=us-acentos LANG=en_US.UTF-8

backtrace:      Text file, 5113 bytes

backtrace:
:[120634.182802] kernel BUG at fs/ext4/extents.c:1969!
:[120634.182860] invalid opcode: 0000 [#1] SMP 
:[120634.182918] CPU 0 
:[120634.182944] Modules linked in: vfat fat usb_storage tun tcp_lp lp fuse be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 8021q garp stp mdio llc fcoe ib_iser libfcoe libfc scsi_transport_fc scsi_tgt tpm_bios rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_hdmi snd_hda_codec_idt coretemp arc4 binfmt_misc dell_wmi sparse_keymap ppdev dell_laptop dcdbas iwlwifi microcode mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device btusb bluetooth joydev snd_pcm cfg80211 i2c_i801 iTCO_wdt iTCO_vendor_support snd_timer snd rfkill soundcore snd_page_alloc e1000e parport_pc parport uinput crc32c_intel ghash_clmulni_intel sdhci_pci sdhci mmc_core wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
:[120634.184321] 
:[120634.184346] Pid: 3298, comm: gnome-terminal Not tainted 3.4.9-2.fc16.x86_64 #1 Dell Inc. Latitude E6320/0HFWHN
:[120634.184469] RIP: 0010:[<ffffffff81229977>]  [<ffffffff81229977>] ext4_ext_put_in_cache+0xd7/0xe0
:[120634.184584] RSP: 0018:ffff880124b1b6b8  EFLAGS: 00010246
:[120634.184646] RAX: 0000000000000000 RBX: ffff880102000e70 RCX: 0000000000000000
:[120634.184727] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880102000e70
:[120634.184808] RBP: ffff880124b1b6f8 R08: 0000000000000000 R09: 0000000000000000
:[120634.184888] R10: ffff88001596d208 R11: ffffea0001031bc0 R12: ffff880102000e70
:[120634.184967] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
:[120634.185049] FS:  00007f1809a5f980(0000) GS:ffff88012dc00000(0000) knlGS:0000000000000000
:[120634.185140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
:[120634.185206] CR2: 0000003db3c60608 CR3: 0000000109456000 CR4: 00000000000407f0
:[120634.185287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:[120634.185367] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:[120634.185449] Process gnome-terminal (pid: 3298, threadinfo ffff880124b1a000, task ffff880036aa2e40)
:[120634.185547] Stack:
:[120634.185574]  ffff880102000e70 0000000000000001 ffff880124b1b758 ffff880102000e70
:[120634.185675]  ffff880124b1b8a0 ffff8800751e26c0 0000000000000000 ffff880049330e24
:[120634.185774]  ffff880124b1b828 ffffffff8122ca75 ffff88001596d1a0 000002c2aaaaaaab
:[120634.185875] Call Trace:
:[120634.185914]  [<ffffffff8122ca75>] ext4_ext_map_blocks+0x195/0x1c70
:[120634.185990]  [<ffffffff8116e91b>] ? kfree+0x3b/0x150
:[120634.186051]  [<ffffffff811251b5>] ? mempool_alloc_slab+0x15/0x20
:[120634.186124]  [<ffffffff8122c9a5>] ? ext4_ext_map_blocks+0xc5/0x1c70
:[120634.186202]  [<ffffffff81201ef9>] ext4_map_blocks+0x69/0x270
:[120634.186271]  [<ffffffff81204166>] _ext4_get_block+0xa6/0x160
:[120634.186341]  [<ffffffff81204286>] ext4_get_block+0x16/0x20
:[120634.186408]  [<ffffffff811b5934>] block_read_full_page+0x144/0x390
:[120634.186482]  [<ffffffff81201f04>] ? ext4_map_blocks+0x74/0x270
:[120634.186553]  [<ffffffff81204270>] ? noalloc_get_block_write+0x30/0x30
:[120634.186630]  [<ffffffff81204166>] ? _ext4_get_block+0xa6/0x160
:[120634.186701]  [<ffffffff811bd92f>] do_mpage_readpage+0x35f/0x5f0
:[120634.186776]  [<ffffffff81140843>] ? __inc_zone_page_state+0x33/0x40
:[120634.186853]  [<ffffffff81122d9d>] ? add_to_page_cache_locked+0xed/0x160
:[120634.186932]  [<ffffffff811bdcff>] mpage_readpages+0xcf/0x120
:[120634.187001]  [<ffffffff81204270>] ? noalloc_get_block_write+0x30/0x30
:[120634.190241]  [<ffffffff81204270>] ? noalloc_get_block_write+0x30/0x30
:[120634.193413]  [<ffffffff81163a66>] ? alloc_pages_current+0xb6/0x120
:[120634.196543]  [<ffffffff812005dd>] ext4_readpages+0x1d/0x20
:[120634.199669]  [<ffffffff8112e5a7>] __do_page_cache_readahead+0x1c7/0x270
:[120634.202820]  [<ffffffff8112e981>] ra_submit+0x21/0x30
:[120634.205962]  [<ffffffff8112eaa5>] ondemand_readahead+0x115/0x230
:[120634.209061]  [<ffffffff8112ec93>] page_cache_sync_readahead+0x33/0x50
:[120634.212039]  [<ffffffff811245b0>] generic_file_aio_read+0x4f0/0x780
:[120634.214890]  [<ffffffff810747bb>] ? flush_work+0x1b/0x40
:[120634.217624]  [<ffffffff812d6fa1>] ? list_del+0x11/0x40
:[120634.220246]  [<ffffffff811832ca>] do_sync_read+0xda/0x120
:[120634.222756]  [<ffffffff8126d453>] ? security_file_permission+0x93/0xb0
:[120634.225211]  [<ffffffff81183741>] ? rw_verify_area+0x61/0xf0
:[120634.227612]  [<ffffffff81183c20>] vfs_read+0xb0/0x180
:[120634.229952]  [<ffffffff81183d3a>] sys_read+0x4a/0x90
:[120634.232236]  [<ffffffff81605d69>] system_call_fastpath+0x16/0x1b
:[120634.234504] Code: 8b 55 cc 48 83 c3 10 4d 89 f8 44 89 e9 4c 89 e6 ff d0 48 8b 55 c0 48 89 d8 4c 29 f0 48 8b 44 02 f0 48 85 c0 75 d6 e9 6f ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 
:[120634.239832] RIP  [<ffffffff81229977>] ext4_ext_put_in_cache+0xd7/0xe0
:[120634.242404]  RSP <ffff880124b1b6b8>

Comment 1 Josh Boyer 2012-09-16 13:15:08 UTC
Eric, have you seen this one before?

Comment 2 Eric Sandeen 2012-09-17 16:26:58 UTC
bug #796714 and bug #829234 are similar, on 3.2.6 and 3.3.7 kernels.

They're all this BUG_ON():

1965 ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
1966                         __u32 len, ext4_fsblk_t start)
1967 {
1968         struct ext4_ext_cache *cex;
1969         BUG_ON(len == 0);

All very different paths to it, though.

The call seems to get it's len == 0 state from ee_len:

ee_len = ext4_ext_get_actual_len(ex); (if I have the right call to put_in_cache)

so I guess somehow we wound up creating a 0-length extent that we later ran into.

Not sure what is going on here, TBH.

Comment 5 Eric Sandeen 2012-09-19 16:34:35 UTC
commit 31d4f3a2f3c73f279ff96a7135d7202ef6833f12
Author: Theodore Ts'o <tytso>
Date:   Sun Mar 11 23:30:16 2012 -0400

    ext4: check for zero length extent
    
    Explicitly test for an extent whose length is zero, and flag that as a
    corrupted extent.
    
    This avoids a kernel BUG_ON assertion failure.
    
    Tested: Without this patch, the file system image found in
    tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
    kernel panic.  With this patch, an ext4 file system error is noted
    instead, and the file system is marked as being corrupted.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=42859
    
    Signed-off-by: "Theodore Ts'o" <tytso>
    Cc: stable


Backporting this would at least avoid a BUG(), yeah.  Thanks Lukas.

Comment 6 Theodore Tso 2012-09-19 16:49:02 UTC
The problem is by the time we hit the BUG in that case,	the file system has already been corrupted.  So the only	way to really fix this is to catch it after the file system has been corrupted --- which xfstests will catch, since we run e2fsck after each test, and if the extent tree has gotten corrupted, we would catch that case (e2fsck problem PR_1_EXTENT_LENGTH_ZERO)..  The fact that xfstests isn't triggering this means that it must be under pretty unique and/or unusual circumstances.

If someone does see this corruption, either when because the ext4_error_inode() fires in the kernel after backporting the above patch, or because e2fsck complains that an inode has a zero-length extent, it would be highly useful to get the pathname of the inode in question, which hopefully will give some hint about the how the file was being used before it got corrupted.

Comment 7 Dave Jones 2012-10-23 15:41:11 UTC
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 8 Justin M. Forbes 2012-11-14 17:04:53 UTC
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.


Note You need to log in before you can comment on or make changes to this bug.