Red Hat Bugzilla – Bug 848000
Crash in ext4 while deleting a directory and running virtual machine(KVM)
Last modified: 2012-10-10 11:14:34 EDT
Created attachment 604238 [details]
/var/log/messages from boot and oops
Description of problem:
I received an oops in ext4 while removing some files.
1.I was running a VM of Windows XP with qemu from a qcow2 image file.
At the moment of crash I was removing VS2010 Express.
2.Started rtorrent and removed a few torrents.
No torrents were active(upload/download/hashing=0).
I delete torrent files manually from terminal.
I have deleted 3-4 files(>1GB) then tried to delete a directory with 3 files (2x19GB, 14GB) in it.
I recreived a ext4 oops.
3. After a restart the 51G dir was still there with files unaffected.
4. I reproduced the bug following the same steps(1,2)
5. Restarted again. With no VM running I was able to remove the directory.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run a VM. Start some IO inside.
2. Delete a directory with large content on host.
Kernel crash with ext4 oops message
Both vm image file and torrent directories are on one ext4 partition /data of size 390GBytes.
#/etc/fstab for /data
UUID=9a840539-7e09-44d6-ae9e-ee079b090fad /data ext4 defaults 1 2
Linux i3.kpp 3.5.1-1.fc17.x86_64 #1 SMP Thu Aug 9 17:50:43 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
qemu-kvm -cpu qemu32,+x2apic -localtime -name WinXPx32 -m 1G -vga std -soundhw ac97 -usb -no-quit
-net nic,model=virtio,macaddr=52:54:00:00:05:02 -net tap,ifname=tap-xp32,script=no,downscript=no
Here's the oops itself:
[ 7860.850590] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 7860.850679] IP: [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.850713] PGD 3692b067 PUD 368b7067 PMD 0
[ 7860.850738] Oops: 0000 [#1] SMP
[ 7860.850758] CPU 0
[ 7860.851138] Pid: 3689, comm: rm Not tainted 3.5.1-1.fc17.x86_64 #1 LENOVO 4239CTO/4239CTO
[ 7860.851177] RIP: 0010:[<ffffffff81233164>] [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.851215] RSP: 0018:ffff8800368b5c98 EFLAGS: 00010246
[ 7860.851238] RAX: 0000000000000000 RBX: ffff880132162360 RCX: 00000000062e4200
[ 7860.851267] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8801fe9c7400
[ 7860.851295] RBP: ffff8800368b5d88 R08: 00000000062e4200 R09: 0000000000000000
[ 7860.851324] R10: ffff8801386ee600 R11: 0000000000000000 R12: 0000000000000001
[ 7860.851352] R13: ffff8801386ee630 R14: 0000000000000000 R15: ffff880132162360
[ 7860.851382] FS: 00007f5000f1b740(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
[ 7860.851414] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7860.851438] CR2: 0000000000000028 CR3: 0000000036830000 CR4: 00000000000427e0
[ 7860.851467] DR0: 00000000000000a0 DR1: 0000000000000000 DR2: 0000000000000003
[ 7860.851496] DR3: 00000000000000b0 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7860.851525] Process rm (pid: 3689, threadinfo ffff8800368b4000, task ffff8801e333ae20)
[ 7860.851557] Stack:
[ 7860.851566] ffff8800368b5cd8 ffffffff81236483 ffff8800368b5ce8 ffff880132162360
[ 7860.851603] ffff880161ee4100 ffff880000000002 ffff8801f4747c98 ffff8801321622b0
[ 7860.851639] ffff88020de6fc00 ffff8801386ee660 00000000ffffffff ffff8801386ee688
[ 7860.851674] Call Trace:
[ 7860.851718] [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
[ 7860.851772] [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
[ 7860.851796] [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
[ 7860.851823] [<ffffffff811a1312>] evict+0xa2/0x1a0
[ 7860.851844] [<ffffffff811a1513>] iput+0x103/0x1f0
[ 7860.851866] [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
[ 7860.851947] [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
[ 7860.851973] [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
[ 7860.851997] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 00 0f
[ 7860.852193] RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.852222] RSP <ffff8800368b5c98>
[ 7860.852237] CR2: 0000000000000028
[ 7860.875640] ---[ end trace 9451c5c282064bc5 ]---
I've seen this with 3.5 kernels without any VMs running, although I do have the kvm module loaded.
Basically any attempt to delete a large directory was causing it to panic - going back to the 3.4.3 kernel resolved the problem.
I can confirm this bug using the kernel 3.5.2.
While I hadn't any running VM the kvm module was loaded.
I've got this bug four times in 2 days, three times while deleting some torrents from deluge and once while deleting a video file from a USB thumb drive using the Nautilus file manager.
I think this thread is strictly related: https://lkml.org/lkml/2012/8/15/372
Yes, it's being discussed upstream in that thread. Ted's looking for a fix I think.
*** Bug 849318 has been marked as a duplicate of this bug. ***
The fix is in now - commit 89a4e48f84.
(In reply to comment #6)
> The fix is in now - commit 89a4e48f84.
Queued for 3.5.3. We'll pick it up from there.
I received the same message at least ten times from thursday to monday in Fedora 17. I am not running VM, but I have kvm module. I was not deleting but moving big mkv files from a hard disk to another in the same computer or to external USB hard disk devices using Nutilus file manager.
This should have been fixed with 3.5.3 update. We're at 3.5.6 now and moving to the 3.6.1 kernel this week, so closing this out.