Bug 471639
Summary: | max_phys_segments violation with dm-linear + md raid1 + cciss | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Bryn M. Reeves <bmr> | ||||||
Component: | kernel | Assignee: | Milan Broz <mbroz> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 5.2 | CC: | agk, charlieb-fedora-bugzilla, dledford, dmair, dzickus, mbroz, mgahagan, peterm, pvrabec, syeghiay, tao, thenzl | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 512387 (view as bug list) | Environment: | |||||||
Last Closed: | 2009-01-20 19:39:21 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Bryn M. Reeves
2008-11-14 19:27:16 UTC
Created attachment 323633 [details] log from BUG in comment #0 Log from above BUG. Simple analysis: The OOPs happens during read request initiated from page readahead code. The page vectors are there correctly allocated through bio_add_page() call and the whole device stack do not modify these vectors, every part uses always bio_clone() where is cloned only new bio header, not the page request data itself. (One exception is page bouncing, but it surely doesn't apply here because according to sos report there is no high memory on the affected system.) So the problem is in the beginning - the request is allocated with different number of physical segments that final request queue see. Simulating similar bio request on the device stack I found that request queue parameters are *not* correctly propagated through the device stack: From similar configuration on x86_64 system (LVM over MD over SCSI): request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunatelly bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these paramaters. During the genereic_make_request() is segment count recalculated and *can* *icrease* bio->bi_phys_segments count, whis correlates with request queue n_phys_segments later. BUGS: - generic block layer code doesn't initialize some values if stacked driver use request queue simplified constructor blk_alloc_queue() and set own request function through blk_queue_make_request() - parameters above remains wrong even if MD calls blk_queue_stack_limits() later (DM uses its own function for that which set the limits "correctly" unfortunatelly using not initialized queue below...) Reproducer: After long fiddling with creating bio requests I found extremely simple way how to reproduce it: - create raid1 array over CCISS device - use this array for VG - create LV there and simply several time repeat this until it oopses:-) dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bio with 250 sectors, allocated in 32 pages (last page uses only 1024 bytes). (I have also reproducer in-kernel code to do this more straightforward.) For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS requirements and triggers BUG_ON() problem. Unfortunately the problem is in upstream too. I'll attach patch to fix, but it need to be discussed with block layer maintainer first (The setting of boundary mask is full of magic constants anyway...) Created attachment 325139 [details]
Proposed patch (NOT FINAL yet!)
This is backport for RHEL5 patch to test, original patch it need to be sent upstream first.
in kernel-2.6.18-126.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html > After long fiddling with creating bio requests I found extremely > simple way how to reproduce it: > - create raid1 array over CCISS device > - use this array for VG > - create LV there > and simply several time repeat this until it oopses:-) > > dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 The same procedure still creates a kernel oops with 5.3 and 5.4beta kernels. ------------[ cut here ]------------ kernel BUG at drivers/block/cciss.c:2862! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:08.0/0000:02:00.1/irq Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button batte ry asus_acpi ac parport_pc lp parport sr_mod cdrom sg pcspkr hpilo bnx2 serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata cciss sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 10 EIP: 0060:[<f88c4f31>] Not tainted VLI EFLAGS: 00010012 (2.6.18-155.el5PAE #1) EIP is at do_cciss_request+0x46/0x3a3 [cciss] eax: f7c2a9e0 ebx: f7bbe7e4 ecx: 00000000 edx: 00000000 esi: f7c1da00 edi: 0000000c ebp: 00000001 esp: f65e1abc ds: 007b es: 007b ss: 0068 Process dd (pid: 4374, ti=f65e1000 task=f7579550 task.ti=f65e1000) Stack: f7913800 f7bbe7e4 f7bd0000 00001116 f7c2a9e0 c04e6425 f7bbe7e4 f69a877c c04e4b3b f7c2a9e0 f69edb38 00000000 f69a877c c04e5974 f7913800 f7c2a9e0 f7c2a9e0 f7bbe7e4 f7c2a9e0 c04e5a7a 00000000 006629b0 f7913800 f69edb38 Call Trace: [<c04e6425>] cfq_set_request+0x0/0x31f [<c04e4b3b>] cfq_resort_rr_list+0x23/0x8b [<c04e5974>] cfq_add_crq_rb+0xba/0xc3 [<c04e5a7a>] cfq_insert_request+0x42/0x498 [<c04db3a0>] elv_insert+0xc7/0x160 [<c04df21d>] __make_request+0x2fb/0x344 [<c04dd1ca>] generic_make_request+0x255/0x265 [<c0478a60>] __bio_clone+0x6f/0x8a [<f884800e>] make_request+0x174/0x543 [raid1] [<c04dd1ca>] generic_make_request+0x255/0x265 [<c0478a60>] __bio_clone+0x6f/0x8a [<f88e7442>] __map_bio+0x44/0x103 [dm_mod] [<f88e80d8>] __split_bio+0x428/0x438 [dm_mod] [<c0461a9e>] __handle_mm_fault+0x79c/0xcf8 [<f88e87a2>] dm_request+0xe2/0xe8 [dm_mod] [<c04dd1ca>] generic_make_request+0x255/0x265 [<c042c529>] lock_timer_base+0x15/0x2f [<c042c9e4>] del_timer+0x41/0x47 [<c04ddf45>] __generic_unplug_device+0x1d/0x1f [<c04deeba>] generic_unplug_device+0x1f/0x2c [<f88472b4>] unplug_slaves+0x4f/0x83 [raid1] [<f8847300>] raid1_unplug+0xe/0x1a [raid1] [<f88e98fd>] dm_table_unplug_all+0x2d/0x60 [dm_mod] [<c0478e37>] bio_add_page+0x25/0x2e [<f88e7c7b>] dm_unplug_all+0x17/0x21 [dm_mod] [<c04df515>] blk_backing_dev_unplug+0x2f/0x32 [<c04945a8>] __blockdev_direct_IO+0x9a9/0xba9 [<c047adf5>] blkdev_direct_IO+0x30/0x35 [<c047ad10>] blkdev_get_blocks+0x0/0xb5 [<c0455d9a>] generic_file_direct_IO+0xd0/0x118 [<c0455ffc>] __generic_file_aio_read+0xd2/0x198 [<c0617192>] lock_kernel+0x16/0x25 [<c04570ae>] generic_file_read+0x0/0xab [<c0457145>] generic_file_read+0x97/0xab [<c0434cef>] autoremove_wake_function+0x0/0x2d [<c04738a5>] vfs_read+0x9f/0x141 [<c0473cf3>] sys_read+0x3c/0x63 [<c0404f17>] syscall_call+0x7/0xb ======================= Code: 08 8b 82 f8 00 00 00 84 c0 0f 88 65 03 00 00 8b 44 24 04 e8 87 62 c1 c7 85 c0 89 44 24 10 0f 84 50 03 00 00 66 83 78 54 1f 76 08 <0f> 0b 2e 0b 20 93 8c f8 8b 44 24 08 ba 01 00 00 00 e8 ff d3 ff EIP: [<f88c4f31>] do_cciss_request+0x46/0x3a3 [cciss] SS:ESP 0068:f65e1abc (In reply to comment #23) > The same procedure still creates a kernel oops with 5.3 and 5.4beta kernels. "still" or "again" - I didn't go back to 2.6.18-92.1.13.el5 to check. FWIW I first saw this problem when installing DB2. |