Description of problem: On mainline 2.6.22 kernel, oops at bio_get_nr_vecs+0x0/0x30. Machine is hanging http://marc.info/?l=linux-kernel&m=118551339032528 BUG: unable to handle kernel paging request at virtual address 23c070bf printing eip: c04a07fd *pdpt = 000000001ff88001 *pde = 0000000000000000 Oops: 0000 [#1] SMP Modules linked in: netconsole autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr @ iscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath dm_mod video sbs button battery ac ipv6 parport_pc lp parport i2c_piix4 i2c_core cfi_probe gen_probe floppy scb2_flash sg mtdcore chipreg tg3 e1000 serio_raw ide_cd @ cdrom aic7xxx scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<c04a07fd>] Not tainted VLI EFLAGS: 00010293 (2.6.22 #2) EIP is at bio_get_nr_vecs+0x0/0x30 eax: 23c07063 ebx: 00000003 ecx: ffffffff edx: 00000000 esi: de5cef74 edi: f54a9600 ebp: 00000000 esp: de5ceca8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process fio (pid: 17820, ti=de5ce000 task=de6570e0 task.ti=de5ce000) Stack: c04a1c9d ffffffff ffffffff 00000009 f54a9600 de5cef74 00000000 f54a9600 c04a1f43 00000000 c04a2b46 c0460466 c2c5baa0 c0812500 c0462c0a 00000001 00000001 df4b90d4 de5ceee4 00000011 00000001 00000009 00000009 00000000 Call Trace: [<c04a1c9d>] dio_new_bio+0x82/0xfe [<c04a1f43>] dio_send_cur_page+0x4a/0x92 [<c04a2b46>] __blockdev_direct_IO+0xa09/0xc83 [<c0460466>] __pagevec_free+0x14/0x1a [<c0462c0a>] release_pages+0x137/0x13f [<f8856f30>] journal_start+0xaf/0xdd [jbd] [<f8890fec>] ext3_direct_IO+0xfd/0x190 [ext3] [<f888f6af>] ext3_get_block+0x0/0xd0 [ext3] [<c045d803>] generic_file_direct_IO+0xe5/0x116 [<c045d890>] generic_file_direct_write+0x5c/0x137 [<c045e285>] __generic_file_aio_write_nolock+0x37b/0x4df [<c045e43e>] generic_file_aio_write+0x55/0xb3 [<f888cfdc>] ext3_file_write+0x24/0x8f [ext3] [<c0481af9>] do_sync_write+0xc7/0x10a [<c04347d2>] check_kill_permission+0xec/0xf5 [<c043c557>] autoremove_wake_function+0x0/0x35 [<c0481a32>] do_sync_write+0x0/0x10a [<c048233e>] vfs_write+0xa8/0x154 [<c0482a1a>] sys_pwrite64+0x48/0x5f [<c0404e12>] syscall_call+0x7/0xb [<c0620000>] xfrm_replay_timer_handler+0x3e/0x44 ======================= Code: 89 c5 c7 44 24 14 f4 ff ff ff 74 d2 e9 b3 fe ff ff 83 7c 24 34 00 0f 84 0b ff ff ff e9 51 ff ff ff 83 c4 20 89 e8 5b 5e 5f 5d c3 <8b> 40 5c 8b 48 38 8b 81 20 01 00 00 0f b7 91 2a 01 00 00 0f b7 EIP: [<c04a07fd>] bio_get_nr_vecs+0x0/0x30 SS:ESP 0068:de5ceca8
Created attachment 299789 [details] patch to fix dio error path
The upstream patch is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=848c4dd5153c7a0de55470ce99a8e13a63b4703f Has the bug in question actually been seen on RHEL5? Thanks, -Eric
Greg, ping, hve you actually seen this bug on RHEL5? Thanks, -Eric
Looking at the RHEL 5 sources, I'd say we're vulnerable. I'll take a closer look and see if I can reproduce the problem.
Yeah, I can crash my box with a slightly modified version of the fio job file posted in the upstream bug report.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-103.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Oracle, a fix for this bug should be available for testing in the RHEL 5.3 Beta release. You can download these bits from RHN. Please take a moment to verify that the fix is present and functioning as expected and report back your test results as soon as possible. Thanks! Please ping your Red Hat Partner Manager with any additional questions.
Removing the CVE name from the synopsis; this is because we have already fixed this issue for Red Hat Enterprise Linux 5 users via an asynchronous security advisory. This bug serves as a placeholder to ensure that the bug was also fixed and tested in the upcoming 5.3 kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html