453635 – kernel BUG at fs/ext4/mballoc.c:1648!

Bug 453635 - kernel BUG at fs/ext4/mballoc.c:1648!

Summary: kernel BUG at fs/ext4/mballoc.c:1648!

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Eric Sandeen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-01 16:45 UTC by Jeff Moyer
Modified:	2008-11-10 14:50 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-11-10 14:50:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Moyer 2008-07-01 16:45:25 UTC

Description of problem:
------------[ cut here ]------------
kernel BUG at fs/ext4/mballoc.c:1648!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: nfsd auth_rpcgss exportfs nls_utf8 nfs lockd nfs_acl
usb_storage bridge bnep rfcomm l2cap bluetooth autofs4 fuse sunrpc ip6t_REJECT
xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables
x_tables cpufreq_ondemand acpi_cpufreq freq_table ext4dev jbd2 crc16 ext2
dm_mirror dm_multipath dm_mod ipv6 sr_mod cdrom ata_generic ppdev snd_hda_intel
floppy dcdbas parport_pc parport snd_seq_dummy snd_seq_oss i2c_i801 pcspkr
i2c_core firewire_ohci snd_seq_midi_event ata_piix sg iTCO_wdt firewire_core
snd_seq pata_acpi iTCO_vendor_support snd_seq_device crc_itu_t snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd tg3 joydev button
i82975x_edac soundcore edac_core ahci libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 4357, comm: fio Not tainted 2.6.25.6-55.fc9.x86_64 #1
RIP: 0010:[<ffffffff8832ae8d>]  [<ffffffff8832ae8d>]
:ext4dev:ext4_mb_new_blocks+0x1043/0x2175
RSP: 0018:ffff81006cca3a98  EFLAGS: 00010246
RAX: 0000000000008000 RBX: 0000000000008000 RCX: 0000000000008000
RDX: 0000000000008000 RSI: 0000000000008000 RDI: 000000000000000c
RBP: ffff81006cca3c58 R08: 000000000000000d R09: ffff81007b044fff
R10: 000000000000000d R11: 0000000000000001 R12: 0000000000000000
R13: ffff81001e9f31f8 R14: 0000000000000fce R15: ffff81001e9f3238
FS:  00007f5a30f966f0(0000) GS:ffffffff813f2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000032c30dad60 CR3: 000000006ac42000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fio (pid: 4357, threadinfo ffff81006cca2000, task ffff810016c7a000)
Stack:  ffff810027481f00 ffff810044c02420 0000000000000000 ffff81006cca3bf8
 0000000300000000 0000000000000000 0000000000000000 ffff8100587dc000
 0000000000000292 ffff81006cca3d64 ffff81006cca3cf8 ffff81007f6e1300
Call Trace:
 [<ffffffff88302e22>] ? :jbd2:find_revoke_record+0x5a/0x89
 [<ffffffff883032cf>] ? :jbd2:jbd2_journal_cancel_revoke+0x11c/0x163
 [<ffffffff8832699b>] :ext4dev:ext4_ext_get_blocks+0x83a/0xa2f
 [<ffffffff8128e800>] ? __down_read+0x1a/0x98
 [<ffffffff88317f41>] :ext4dev:ext4_get_blocks_wrap+0xd3/0x110
 [<ffffffff88323de3>] :ext4dev:ext4_fallocate+0x194/0x350
 [<ffffffff810b7e55>] ? notify_change+0x2fb/0x30e
 [<ffffffff8106c5bf>] ? audit_syscall_entry+0x126/0x15a
 [<ffffffff8106c290>] ? audit_syscall_exit+0x331/0x353
 [<ffffffff810a30ef>] sys_fallocate+0xfb/0x11f
 [<ffffffff8100c052>] tracesys+0xd5/0xda


Code: 88 48 c7 c6 70 1e 33 88 31 c0 e8 53 4d ff ff e9 da 00 00 00 49 8b 45 08 48
8b 80 58 02 00 00 48 8b 48 10 48 63 c2 48 39 c8 72 04 <0f> 0b eb fe 48 63 45 a4
48 39 c8 72 04 0f 0b eb fe 41 80 bd 82 
RIP  [<ffffffff8832ae8d>] :ext4dev:ext4_mb_new_blocks+0x1043/0x2175
 RSP <ffff81006cca3a98>
---[ end trace e1aedad6ea231792 ]---


Version-Release number of selected component (if applicable):
2.6.25.6-55.fc9.x86_64

How reproducible:
Not sure.

Steps to Reproduce:

Use the following fio work file:

[global]
ioengine=libaio
iodepth=64
bs=4k
; job files should be pre-allocated, and each file should be created
; in turn so as not to interleave disk blocks.
direct=1
size=1024m
overwrite=1
create_serialize=1
unlink=0
;thread

[aio-test1]
rw=write

[aio-test2]
rw=read

[aio-test3]
rw=randwrite

[aio-test4]
rw=randread
  
Actual results:
Backtrace reported above, and file system does not like to do I/O after this.

Comment 1 Jeff Moyer 2008-07-01 17:20:24 UTC

OK, this is reproducible.  Time for another reboot.

Comment 2 Jeff Moyer 2008-07-01 17:45:39 UTC

1640 static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
1641                                         struct ext4_free_extent *ex,
1642                                         struct ext4_buddy *e4b)
1643 {
1644         struct ext4_free_extent *bex = &ac->ac_b_ex;
1645         struct ext4_free_extent *gex = &ac->ac_g_ex;
1646 
1647         BUG_ON(ex->fe_len <= 0);
1648         BUG_ON(ex->fe_len >= EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1649         BUG_ON(ex->fe_start >= EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
1650         BUG_ON(ac->ac_status != AC_STATUS_CONTINUE);

Here's what fio is doing:

open("aio-test1.1.0", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
fstat(8, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
close(8)                                = 0
write(1, "aio-test1: Laying out IO file(s)"..., 55aio-test1: Laying out IO
file(s) (1 file(s) / 1024MiB)
) = 55
open("aio-test1.1.0", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 8
ftruncate(8, 1073741824)                = 0
syscall_285(0x8, 0, 0, 0x40000000, 0x40000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Message from syslogd@segfault at Jul  1 13:42:01 ...
 kernel: ------------[ cut here ]------------

Comment 3 Jeff Moyer 2008-07-01 17:54:48 UTC

(sorry for so many updates, but this is crashing my desktop machine, so I can
only get so far before I need to reboot!)

And, of course, syscall 285 is fallocate (but we all knew that, given the stack
trace):

#define __NR_fallocate                          285
__SYSCALL(__NR_fallocate, sys_fallocate)

Comment 4 Jeff Moyer 2008-07-01 19:05:50 UTC

FYI, I booted 2.6.26-rc8 and could not reproduce the problem with that kernel.

Comment 5 Chuck Ebbert 2008-11-10 14:50:50 UTC

Pretty sure this is fixed now.

Note You need to log in before you can comment on or make changes to this bug.