Bug 852724 - Kernel oops on ext4 when moving very large files between filesystems
Summary: Kernel oops on ext4 when moving very large files between filesystems
Keywords:
Status: CLOSED DUPLICATE of bug 853875
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-29 12:18 UTC by Zoltan Boszormenyi
Modified: 2012-09-06 18:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-06 18:07:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Zoltan Boszormenyi 2012-08-29 12:18:12 UTC
Description of problem:

I got these two Oopses in half an hour, which makes it reproducible I guess:

Aug 29 13:10:02 localhost kernel: [18756.675237] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Aug 29 13:10:02 localhost kernel: [18756.675361] IP: [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:10:02 localhost kernel: [18756.675456] PGD 2b296f067 PUD 426472067 PMD 0 
Aug 29 13:10:02 localhost kernel: [18756.675526] Oops: 0000 [#1] SMP 
Aug 29 13:10:02 localhost kernel: [18756.675577] CPU 7 
Aug 29 13:10:02 localhost kernel: [18756.675605] Modules linked in: cdc_acm vfat fat fuse bnep bluetooth ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 xt_CHECKSUM cxgb3i iptable_mangle cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi it87 hwmon_vid bridge stp llc tpm_bios snd_hda_codec_hdmi r8169 usblp edac_core snd_hda_codec_realtek eeepc_wmi asus_wmi snd_hda_intel snd_hda_codec sparse_keymap rfkill snd_hwdep microcode edac_mce_amd snd_pcm snd_page_alloc snd_timer snd mii sp5100_tco serio_raw i2c_piix4 fam15h_power k10temp soundcore nfsd nfs_acl auth_rpcgss lockd sunrpc vhost_net tun macvtap macvlan kvm_amd kvm uinput binfmt_misc usb_storage crc32c_intel ghash_clmulni_intel firewire_ohci firewire_core crc_itu_t 3w_9xxx mxm_wmi wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unlo
Aug 29 13:10:02 localhost kernel: aded: scsi_wait_scan]
Aug 29 13:10:02 localhost kernel: [18756.677121] 
Aug 29 13:10:02 localhost kernel: [18756.677128] Pid: 20381, comm: mv Not tainted 3.5.2-3.fc17.x86_64 #1 To be filled by O.E.M. To be filled by O.E.M./M5A99X EVO
Aug 29 13:10:02 localhost kernel: [18756.677285] RIP: 0010:[<ffffffff81233194>]  [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:10:02 localhost kernel: [18756.677407] RSP: 0018:ffff88037feafc98  EFLAGS: 00010246
Aug 29 13:10:02 localhost kernel: [18756.677481] RAX: 0000000000000000 RBX: ffff88020f8a6fb0 RCX: 0000000037dcfd00
Aug 29 13:10:02 localhost kernel: [18756.677574] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88040d9c8400
Aug 29 13:10:02 localhost kernel: [18756.677669] RBP: ffff88037feafd88 R08: 0000000037dcfd00 R09: ffff8803db411900
Aug 29 13:10:02 localhost kernel: [18756.677761] R10: 0000000014efb701 R11: 0000000000000000 R12: 0000000000000001
Aug 29 13:10:02 localhost kernel: [18756.677856] R13: ffff8803db411930 R14: 0000000000000000 R15: ffff88020f8a6fb0
Aug 29 13:10:02 localhost kernel: [18756.677950] FS:  00007f5b2b031800(0000) GS:ffff88043edc0000(0000) knlGS:00000000f77c47c0
Aug 29 13:10:02 localhost kernel: [18756.678057] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 29 13:10:02 localhost kernel: [18756.678132] CR2: 0000000000000028 CR3: 000000042213f000 CR4: 00000000000407e0
Aug 29 13:10:02 localhost kernel: [18756.678227] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 29 13:10:02 localhost kernel: [18756.678321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 29 13:10:02 localhost kernel: [18756.678414] Process mv (pid: 20381, threadinfo ffff88037feae000, task ffff880076c51710)
Aug 29 13:10:02 localhost kernel: [18756.678519] Stack:
Aug 29 13:10:02 localhost kernel: [18756.678548]  ffff88037feafcd8 ffffffff812364b3 ffff88037feafce8 ffff88020f8a6fb0
Aug 29 13:10:02 localhost kernel: [18756.678662]  ffff8803f7878400 ffff880300000002 ffff8803fd4d51a0 ffff88020f8a6f00
Aug 29 13:10:02 localhost kernel: [18756.678771]  ffff880427092c00 ffff8803db411960 00000000ffffffff ffff880314efb448
Aug 29 13:10:02 localhost kernel: [18756.678884] Call Trace:
Aug 29 13:10:02 localhost kernel: [18756.678924]  [<ffffffff812364b3>] ? __ext4_handle_dirty_metadata+0x83/0x110
Aug 29 13:10:02 localhost kernel: [18756.682733]  [<ffffffff81235403>] ext4_ext_truncate+0x193/0x1d0
Aug 29 13:10:02 localhost kernel: [18756.686481]  [<ffffffff8120a8ff>] ? ext4_mark_inode_dirty+0x7f/0x1f0
Aug 29 13:10:02 localhost kernel: [18756.690259]  [<ffffffff81207e35>] ext4_truncate+0xf5/0x100
Aug 29 13:10:02 localhost kernel: [18756.694024]  [<ffffffff8120cd81>] ext4_evict_inode+0x461/0x490
Aug 29 13:10:02 localhost kernel: [18756.697785]  [<ffffffff811a1342>] evict+0xa2/0x1a0
Aug 29 13:10:02 localhost kernel: [18756.701539]  [<ffffffff811a1543>] iput+0x103/0x1f0
Aug 29 13:10:02 localhost kernel: [18756.705290]  [<ffffffff81196db4>] do_unlinkat+0x154/0x1c0
Aug 29 13:10:02 localhost kernel: [18756.709067]  [<ffffffff81185d56>] ? filp_close+0x66/0xa0
Aug 29 13:10:02 localhost kernel: [18756.712768]  [<ffffffff81197b3b>] sys_unlinkat+0x1b/0x50
Aug 29 13:10:02 localhost kernel: [18756.716451]  [<ffffffff81614969>] system_call_fastpath+0x16/0x1b
Aug 29 13:10:02 localhost kernel: [18756.720072] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 00 0f 
Aug 29 13:10:02 localhost kernel: [18756.727929] RIP  [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:10:02 localhost kernel: [18756.731756]  RSP <ffff88037feafc98>
Aug 29 13:10:02 localhost kernel: [18756.735460] CR2: 0000000000000028
Aug 29 13:10:02 localhost kernel: [18756.777432] ---[ end trace 1fdafb58b77a4db9 ]---


Aug 29 13:39:52 localhost kernel: [ 1410.318534] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Aug 29 13:39:52 localhost kernel: [ 1410.323553] IP: [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:39:52 localhost kernel: [ 1410.326180] PGD 1e70e7067 PUD 2ba59d067 PMD 0 
Aug 29 13:39:52 localhost kernel: [ 1410.328817] Oops: 0000 [#1] SMP 
Aug 29 13:39:52 localhost kernel: [ 1410.331424] CPU 3 
Aug 29 13:39:52 localhost kernel: [ 1410.331452] Modules linked in: fuse bnep bluetooth ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm xt_CHECKSUM ib_sa ib_mad ib_core iptable_mangle iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi it87 hwmon_vid bridge stp llc tpm_bios eeepc_wmi asus_wmi snd_hda_codec_hdmi snd_hda_codec_realtek sparse_keymap rfkill snd_hda_intel r8169 usblp snd_hda_codec snd_hwdep edac_core edac_mce_amd microcode snd_pcm snd_page_alloc snd_timer snd soundcore sp5100_tco mii i2c_piix4 fam15h_power serio_raw k10temp nfsd nfs_acl auth_rpcgss lockd sunrpc vhost_net tun macvtap macvlan kvm_amd kvm uinput binfmt_misc usb_storage crc32c_intel ghash_clmulni_intel firewire_ohci 3w_9xxx firewire_core crc_itu_t mxm_wmi wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_s
Aug 29 13:39:52 localhost kernel: can]
Aug 29 13:39:52 localhost kernel: [ 1410.353330] 
Aug 29 13:39:52 localhost kernel: [ 1410.356638] Pid: 2625, comm: mv Not tainted 3.5.2-3.fc17.x86_64 #1 To be filled by O.E.M. To be filled by O.E.M./M5A99X EVO
Aug 29 13:39:52 localhost kernel: [ 1410.360142] RIP: 0010:[<ffffffff81233194>]  [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:39:52 localhost kernel: [ 1410.363669] RSP: 0018:ffff8801e7179c98  EFLAGS: 00010246
Aug 29 13:39:52 localhost kernel: [ 1410.367175] RAX: 0000000000000000 RBX: ffff880425156c38 RCX: 0000000037dcfd00
Aug 29 13:39:52 localhost kernel: [ 1410.370709] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8804198ad400
Aug 29 13:39:52 localhost kernel: [ 1410.374229] RBP: ffff8801e7179d88 R08: 0000000037dcfd00 R09: ffff88028790f180
Aug 29 13:39:52 localhost kernel: [ 1410.377764] R10: 00000000edce6301 R11: 0000000000000000 R12: 0000000000000001
Aug 29 13:39:52 localhost kernel: [ 1410.381328] R13: ffff88028790f1b0 R14: 0000000000000000 R15: ffff880425156c38
Aug 29 13:39:52 localhost kernel: [ 1410.384905] FS:  00007f1eaa77b800(0000) GS:ffff88043ecc0000(0000) knlGS:00000000f76db7c0
Aug 29 13:39:52 localhost kernel: [ 1410.388578] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 29 13:39:52 localhost kernel: [ 1410.391140] CR2: 0000000000000028 CR3: 00000001e703d000 CR4: 00000000000407e0
Aug 29 13:39:52 localhost kernel: [ 1410.392799] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 29 13:39:52 localhost kernel: [ 1410.394438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 29 13:39:52 localhost kernel: [ 1410.396099] Process mv (pid: 2625, threadinfo ffff8801e7178000, task ffff8802ba5f8000)
Aug 29 13:39:52 localhost kernel: [ 1410.397790] Stack:
Aug 29 13:39:52 localhost kernel: [ 1410.399464]  ffff8801e7179cd8 ffffffff812364b3 ffff8801e7179ce8 ffff880425156c38
Aug 29 13:39:52 localhost kernel: [ 1410.401167]  ffff88032c3fd100 ffff880100000002 ffff8803d6f05e38 ffff880425156b88
Aug 29 13:39:52 localhost kernel: [ 1410.402859]  ffff880422f53800 ffff88028790f1e0 00000000ffffffff ffff8803edce6808
Aug 29 13:39:52 localhost kernel: [ 1410.404554] Call Trace:
Aug 29 13:39:52 localhost kernel: [ 1410.406212]  [<ffffffff812364b3>] ? __ext4_handle_dirty_metadata+0x83/0x110
Aug 29 13:39:52 localhost kernel: [ 1410.407806]  [<ffffffff81235403>] ext4_ext_truncate+0x193/0x1d0
Aug 29 13:39:52 localhost kernel: [ 1410.409369]  [<ffffffff8120a8ff>] ? ext4_mark_inode_dirty+0x7f/0x1f0
Aug 29 13:39:52 localhost kernel: [ 1410.410961]  [<ffffffff81207e35>] ext4_truncate+0xf5/0x100
Aug 29 13:39:52 localhost kernel: [ 1410.412509]  [<ffffffff8120cd81>] ext4_evict_inode+0x461/0x490
Aug 29 13:39:52 localhost kernel: [ 1410.414193]  [<ffffffff811a1342>] evict+0xa2/0x1a0
Aug 29 13:39:52 localhost kernel: [ 1410.415798]  [<ffffffff811a1543>] iput+0x103/0x1f0
Aug 29 13:39:52 localhost kernel: [ 1410.417334]  [<ffffffff81196db4>] do_unlinkat+0x154/0x1c0
Aug 29 13:39:52 localhost kernel: [ 1410.418901]  [<ffffffff81185d56>] ? filp_close+0x66/0xa0
Aug 29 13:39:52 localhost kernel: [ 1410.420434]  [<ffffffff81197b3b>] sys_unlinkat+0x1b/0x50
Aug 29 13:39:52 localhost kernel: [ 1410.423526]  [<ffffffff81614969>] system_call_fastpath+0x16/0x1b
Aug 29 13:39:52 localhost kernel: [ 1410.427708] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 00 0f 
Aug 29 13:39:52 localhost kernel: [ 1410.436657] RIP  [<ffffffff81233194>] ext4_ext_remove_space+0xa34/0xdf0
Aug 29 13:39:52 localhost kernel: [ 1410.440543]  RSP <ffff8801e7179c98>
Aug 29 13:39:52 localhost kernel: [ 1410.442200] CR2: 0000000000000028
Aug 29 13:39:52 localhost kernel: [ 1410.509322] ---[ end trace 102bd426d324d73e ]---

Both happened when I ran "mv dir1 dir2" where the two directories were on a different FS and the source directory contained very large files, in the range of 4-16GB. Both Oopses required the machine to be rebooted.

I didn't try to simply rm the files as they contain important data for me.

Version-Release number of selected component (if applicable):

# uname -r
3.5.2-3.fc17.x86_64

How reproducible:

Twice out of two tries make it 100%.

Steps to Reproduce:
1. Have large files
2. Try to mv (or rm?) them
3.
  
Actual results:

NULL pointer derefence.

Expected results:

No bugs, copying or moving files should succeed no matter their size.

Additional info:

I have 16GB memory in the machine, 2x 8GB DDR3, I have run memtest86+ several times since they were installed, no memory errors shown and the machine is stable otherwise.

Comment 1 Zoltan Boszormenyi 2012-08-29 12:55:25 UTC
I have to add that the files that were being moved at the first time were successfully moved after rebooting. The second Oops were on a different set of files.

The source partition is my /home which is on a 3Ware 9650SE 8LPML RAID, the partition is about 3.5TB. The target partition is on a 3TB external eSATA disk.

Comment 2 Dave Jones 2012-09-06 18:07:29 UTC

*** This bug has been marked as a duplicate of bug 853875 ***


Note You need to log in before you can comment on or make changes to this bug.