Bug 848000

Summary: Crash in ext4 while deleting a directory and running virtual machine(KVM)
Product: [Fedora] Fedora Reporter: Kaloyan Petrov <kaloyan_petrov>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: aab, clopezsandez, gansalmon, itamar, jonathan, kernel-maint, k.s.matheussen, lczerner, madhu.chinakonda, piotrdrag, public.oss, tom, webreg
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-10 15:14:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages from boot and oops none

Description Kaloyan Petrov 2012-08-14 09:49:32 UTC
Created attachment 604238 [details]
/var/log/messages from boot and oops

Description of problem:
I received an oops in ext4 while removing some files.
1.I was running a VM of Windows XP with qemu from a qcow2 image file.
At the moment of crash I was removing VS2010 Express.

2.Started rtorrent and removed a few torrents.
 No torrents were active(upload/download/hashing=0).
 I delete torrent files manually from terminal.
 I have deleted 3-4 files(>1GB) then tried to delete a directory with 3  files (2x19GB, 14GB) in it.
 I recreived a ext4 oops.
3. After a restart the 51G dir was still there with files unaffected.
4. I reproduced the bug following the same steps(1,2)
5. Restarted again. With no VM running I was able to remove the directory.

Version-Release number of selected component (if applicable):
kernel-3.5.1-1.fc17.x86_64

How reproducible:

Steps to Reproduce:
1. Run a VM. Start some IO inside.
2. Delete a directory with large content on host.
  
Actual results:
Kernel crash with ext4 oops message

Expected results:
Delete directory.

Additional info:
Both vm image file and torrent directories are on one ext4 partition /data of size 390GBytes.
#/etc/fstab for /data
UUID=9a840539-7e09-44d6-ae9e-ee079b090fad /data                   ext4    defaults        1 2
#uname -a
Linux i3.kpp 3.5.1-1.fc17.x86_64 #1 SMP Thu Aug 9 17:50:43 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

#qemu command
qemu-kvm -cpu qemu32,+x2apic -localtime -name WinXPx32 -m 1G -vga std -soundhw ac97 -usb -no-quit
-net nic,model=virtio,macaddr=52:54:00:00:05:02 -net tap,ifname=tap-xp32,script=no,downscript=no
/data/vm/winxp.qcow2 &

Comment 1 Eric Sandeen 2012-08-14 15:55:04 UTC
Here's the oops itself:

[ 7860.850590] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 7860.850679] IP: [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.850713] PGD 3692b067 PUD 368b7067 PMD 0 
[ 7860.850738] Oops: 0000 [#1] SMP 
[ 7860.850758] CPU 0 
[ 7860.851134] 
[ 7860.851138] Pid: 3689, comm: rm Not tainted 3.5.1-1.fc17.x86_64 #1 LENOVO 4239CTO/4239CTO
[ 7860.851177] RIP: 0010:[<ffffffff81233164>]  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.851215] RSP: 0018:ffff8800368b5c98  EFLAGS: 00010246
[ 7860.851238] RAX: 0000000000000000 RBX: ffff880132162360 RCX: 00000000062e4200
[ 7860.851267] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8801fe9c7400
[ 7860.851295] RBP: ffff8800368b5d88 R08: 00000000062e4200 R09: 0000000000000000
[ 7860.851324] R10: ffff8801386ee600 R11: 0000000000000000 R12: 0000000000000001
[ 7860.851352] R13: ffff8801386ee630 R14: 0000000000000000 R15: ffff880132162360
[ 7860.851382] FS:  00007f5000f1b740(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
[ 7860.851414] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7860.851438] CR2: 0000000000000028 CR3: 0000000036830000 CR4: 00000000000427e0
[ 7860.851467] DR0: 00000000000000a0 DR1: 0000000000000000 DR2: 0000000000000003
[ 7860.851496] DR3: 00000000000000b0 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7860.851525] Process rm (pid: 3689, threadinfo ffff8800368b4000, task ffff8801e333ae20)
[ 7860.851557] Stack:
[ 7860.851566]  ffff8800368b5cd8 ffffffff81236483 ffff8800368b5ce8 ffff880132162360
[ 7860.851603]  ffff880161ee4100 ffff880000000002 ffff8801f4747c98 ffff8801321622b0
[ 7860.851639]  ffff88020de6fc00 ffff8801386ee660 00000000ffffffff ffff8801386ee688
[ 7860.851674] Call Trace:
[ 7860.851718]  [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
[ 7860.851772]  [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
[ 7860.851796]  [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
[ 7860.851823]  [<ffffffff811a1312>] evict+0xa2/0x1a0
[ 7860.851844]  [<ffffffff811a1513>] iput+0x103/0x1f0
[ 7860.851866]  [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
[ 7860.851947]  [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
[ 7860.851973]  [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
[ 7860.851997] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 00 0f 
[ 7860.852193] RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.852222]  RSP <ffff8800368b5c98>
[ 7860.852237] CR2: 0000000000000028
[ 7860.875640] ---[ end trace 9451c5c282064bc5 ]---

Comment 2 Tom Hughes 2012-08-15 23:34:52 UTC
I've seen this with 3.5 kernels without any VMs running, although I do have the kvm module loaded.

Basically any attempt to delete a large directory was causing it to panic - going back to the 3.4.3 kernel resolved the problem.

Comment 3 Muflone 2012-08-16 21:21:02 UTC
I can confirm this bug using the kernel 3.5.2.
While I hadn't any running VM the kvm module was loaded.

I've got this bug four times in 2 days, three times while deleting some torrents from deluge and once while deleting a video file from a USB thumb drive using the Nautilus file manager.

I think this thread is strictly related: https://lkml.org/lkml/2012/8/15/372

Comment 4 Eric Sandeen 2012-08-16 21:47:00 UTC
Yes, it's being discussed upstream in that thread.  Ted's looking for a fix I think.

Comment 5 Kjetil Matheussen 2012-08-20 12:59:47 UTC
*** Bug 849318 has been marked as a duplicate of this bug. ***

Comment 6 Tom Hughes 2012-08-20 13:03:14 UTC
The fix is in now - commit 89a4e48f84.

Comment 7 Josh Boyer 2012-08-20 14:10:05 UTC
(In reply to comment #6)
> The fix is in now - commit 89a4e48f84.

Queued for 3.5.3.  We'll pick it up from there.

Comment 8 Ceferino M. Lopez-Sandez 2012-09-04 08:57:52 UTC
I received the same message at least ten times from thursday to monday in Fedora 17. I am not running VM, but I have kvm module. I was not deleting but moving big mkv files from a hard disk to another in the same computer or to external USB hard disk devices using Nutilus file manager.

Comment 9 Josh Boyer 2012-10-10 15:14:34 UTC
This should have been fixed with 3.5.3 update.  We're at 3.5.6 now and moving to the 3.6.1 kernel this week, so closing this out.