Bug 848000 - Crash in ext4 while deleting a directory and running virtual machine(KVM)
Crash in ext4 while deleting a directory and running virtual machine(KVM)
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Eric Sandeen
Fedora Extras Quality Assurance
:
: 849318 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-14 05:49 EDT by Kaloyan Petrov
Modified: 2012-10-10 11:14 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-10 11:14:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages from boot and oops (229.14 KB, text/plain)
2012-08-14 05:49 EDT, Kaloyan Petrov
no flags Details

  None (edit)
Description Kaloyan Petrov 2012-08-14 05:49:32 EDT
Created attachment 604238 [details]
/var/log/messages from boot and oops

Description of problem:
I received an oops in ext4 while removing some files.
1.I was running a VM of Windows XP with qemu from a qcow2 image file.
At the moment of crash I was removing VS2010 Express.

2.Started rtorrent and removed a few torrents.
 No torrents were active(upload/download/hashing=0).
 I delete torrent files manually from terminal.
 I have deleted 3-4 files(>1GB) then tried to delete a directory with 3  files (2x19GB, 14GB) in it.
 I recreived a ext4 oops.
3. After a restart the 51G dir was still there with files unaffected.
4. I reproduced the bug following the same steps(1,2)
5. Restarted again. With no VM running I was able to remove the directory.

Version-Release number of selected component (if applicable):
kernel-3.5.1-1.fc17.x86_64

How reproducible:

Steps to Reproduce:
1. Run a VM. Start some IO inside.
2. Delete a directory with large content on host.
  
Actual results:
Kernel crash with ext4 oops message

Expected results:
Delete directory.

Additional info:
Both vm image file and torrent directories are on one ext4 partition /data of size 390GBytes.
#/etc/fstab for /data
UUID=9a840539-7e09-44d6-ae9e-ee079b090fad /data                   ext4    defaults        1 2
#uname -a
Linux i3.kpp 3.5.1-1.fc17.x86_64 #1 SMP Thu Aug 9 17:50:43 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

#qemu command
qemu-kvm -cpu qemu32,+x2apic -localtime -name WinXPx32 -m 1G -vga std -soundhw ac97 -usb -no-quit
-net nic,model=virtio,macaddr=52:54:00:00:05:02 -net tap,ifname=tap-xp32,script=no,downscript=no
/data/vm/winxp.qcow2 &
Comment 1 Eric Sandeen 2012-08-14 11:55:04 EDT
Here's the oops itself:

[ 7860.850590] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 7860.850679] IP: [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.850713] PGD 3692b067 PUD 368b7067 PMD 0 
[ 7860.850738] Oops: 0000 [#1] SMP 
[ 7860.850758] CPU 0 
[ 7860.851134] 
[ 7860.851138] Pid: 3689, comm: rm Not tainted 3.5.1-1.fc17.x86_64 #1 LENOVO 4239CTO/4239CTO
[ 7860.851177] RIP: 0010:[<ffffffff81233164>]  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.851215] RSP: 0018:ffff8800368b5c98  EFLAGS: 00010246
[ 7860.851238] RAX: 0000000000000000 RBX: ffff880132162360 RCX: 00000000062e4200
[ 7860.851267] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8801fe9c7400
[ 7860.851295] RBP: ffff8800368b5d88 R08: 00000000062e4200 R09: 0000000000000000
[ 7860.851324] R10: ffff8801386ee600 R11: 0000000000000000 R12: 0000000000000001
[ 7860.851352] R13: ffff8801386ee630 R14: 0000000000000000 R15: ffff880132162360
[ 7860.851382] FS:  00007f5000f1b740(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
[ 7860.851414] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7860.851438] CR2: 0000000000000028 CR3: 0000000036830000 CR4: 00000000000427e0
[ 7860.851467] DR0: 00000000000000a0 DR1: 0000000000000000 DR2: 0000000000000003
[ 7860.851496] DR3: 00000000000000b0 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7860.851525] Process rm (pid: 3689, threadinfo ffff8800368b4000, task ffff8801e333ae20)
[ 7860.851557] Stack:
[ 7860.851566]  ffff8800368b5cd8 ffffffff81236483 ffff8800368b5ce8 ffff880132162360
[ 7860.851603]  ffff880161ee4100 ffff880000000002 ffff8801f4747c98 ffff8801321622b0
[ 7860.851639]  ffff88020de6fc00 ffff8801386ee660 00000000ffffffff ffff8801386ee688
[ 7860.851674] Call Trace:
[ 7860.851718]  [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
[ 7860.851772]  [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
[ 7860.851796]  [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
[ 7860.851823]  [<ffffffff811a1312>] evict+0xa2/0x1a0
[ 7860.851844]  [<ffffffff811a1513>] iput+0x103/0x1f0
[ 7860.851866]  [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
[ 7860.851947]  [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
[ 7860.851973]  [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
[ 7860.851997] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 00 0f 
[ 7860.852193] RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
[ 7860.852222]  RSP <ffff8800368b5c98>
[ 7860.852237] CR2: 0000000000000028
[ 7860.875640] ---[ end trace 9451c5c282064bc5 ]---
Comment 2 Tom Hughes 2012-08-15 19:34:52 EDT
I've seen this with 3.5 kernels without any VMs running, although I do have the kvm module loaded.

Basically any attempt to delete a large directory was causing it to panic - going back to the 3.4.3 kernel resolved the problem.
Comment 3 Muflone 2012-08-16 17:21:02 EDT
I can confirm this bug using the kernel 3.5.2.
While I hadn't any running VM the kvm module was loaded.

I've got this bug four times in 2 days, three times while deleting some torrents from deluge and once while deleting a video file from a USB thumb drive using the Nautilus file manager.

I think this thread is strictly related: https://lkml.org/lkml/2012/8/15/372
Comment 4 Eric Sandeen 2012-08-16 17:47:00 EDT
Yes, it's being discussed upstream in that thread.  Ted's looking for a fix I think.
Comment 5 Kjetil Matheussen 2012-08-20 08:59:47 EDT
*** Bug 849318 has been marked as a duplicate of this bug. ***
Comment 6 Tom Hughes 2012-08-20 09:03:14 EDT
The fix is in now - commit 89a4e48f84.
Comment 7 Josh Boyer 2012-08-20 10:10:05 EDT
(In reply to comment #6)
> The fix is in now - commit 89a4e48f84.

Queued for 3.5.3.  We'll pick it up from there.
Comment 8 Ceferino M. Lopez-Sandez 2012-09-04 04:57:52 EDT
I received the same message at least ten times from thursday to monday in Fedora 17. I am not running VM, but I have kvm module. I was not deleting but moving big mkv files from a hard disk to another in the same computer or to external USB hard disk devices using Nutilus file manager.
Comment 9 Josh Boyer 2012-10-10 11:14:34 EDT
This should have been fixed with 3.5.3 update.  We're at 3.5.6 now and moving to the 3.6.1 kernel this week, so closing this out.

Note You need to log in before you can comment on or make changes to this bug.