Bug 86222
Summary: | [PATCH] AS 2.1 gives oops in md_update_sb | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Larry Troan <ltroan> | ||||||
Component: | kernel | Assignee: | Doug Ledford <dledford> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 2.1 | CC: | fhirtz, ichute, jparadis, tao, tburke | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-04-28 15:05:07 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 132992 | ||||||||
Attachments: |
|
Description
Larry Troan
2003-03-17 15:00:08 UTC
Created attachment 90622 [details]
raid_oops-as21.patch
FROM ISSUE TRACKER Event posted 10-20-2003 12:17pm by andrewrcress with duration of 0.00 After 7 iterations with e.24, here are the results: 1) panic'd somewhere doing an scsi io during the md_recovery (about 9% through). I couldn't save the output because I didn't have a serial console hooked up. At this point I hooked up a serial console. 2) some delay during remove of sda, but insert & rebuild worked fine. 3 - 6) ditto 7) All disk IOs hung while doing raidhotremove of sda partitions. Note that this test scenario works fine on RedHat kernels built on 2.4.19 or greater. There is apparently still something wrong with this test scenario (hotplugging sda in a software root mirror) using 2.4.9-e.24. Status set to: Waiting on Tech FROM ISSUE TRACKER Event posted 10-24-2003 03:08pm by andrewrcress with duration of 0.00 After a few more iterations, I did reproduce the panic. It is in scsi_reset. Serial console output is below. In this test, I hot-removed sdb, then did "cat /proc/mdstat". The I/O that is on the stack was an sg command in this case (it does inquiry, test-unit-ready, and get capacity commands). I understand that later kernels have a good bit of rework in the scsi_reset area. Unable to handle kernel NULL pointer dereference at virtual address 00000204 *pde = 00000000 Oops: 0000 Kernel 2.4.9-e.24smp CPU: 0 EIP: 0010:[<c883843c>] Tainted: P EFLAGS: 00010006 Process sgraidmon (pid: 10250, stackpage=c4171000) Stack: 00000009 00000200 00000000 00000000 c7ce2000 c4171c48 c8807638 c4171c48 00000006 c4171c48 00000286 c8806a60 00000046 c8806b0a c4171c48 00000006 c8812920 00000000 00000000 00000000 c038e1c0 c4171c48 c01249d1 c4171c48 Call Trace: [<c8807638>] scsi_reset [<c883890e>] aic7xxx_reset [aic7xxx] 0x59e [<c8807638>] scsi_reset [scsi_mod] 0xe8 set md1 8 [<c88078f1>] scsi_old_reset [scsi_mod] 0x41 [<c013da56>] _wrapped_alloc_pages [kernel] 0x76 [<c012d14d>] do_anonymous_page [kernel] 0x18d [<c8802ca0>] scsi_reset_provider_done_command [scsi_mod] 0x0 [<c8803a25>] scsi_ioctl_Rsmp_914b0d65 [scsi_mod] 0x25 [<c88cdbaf>] sg_ioctl [sg] 0xb2f [<c8836bea>] aic7xxx_queue [aic7xxx] 0x15a [<c88006a1>] scsi_dispatch_cmd [scsi_mod] 0x161 [<c880878e>] scsi_request_fn [scsi_mod] 0x31e [<c8807a34>] __scsi_insert_special [scsi_mod] 0x74 [<c8807a9a>] scsi_insert_special_req [scsi_mod] 0x1a [<c880097b>] scsi_do_req_Rsmp_bdc72156 [scsi_mod] 0x14b [<c88cc53c>] sg_read [sg] 0xfc [<c88cde40>] sg_cmd_done_bh [sg] 0x0 [<c88cca4a>] sg_write [sg] 0x12a [<c0145d26>] sys_write [kernel] 0x96 [<c0145d9e>] sys_write [kernel] 0x10e [<c0155887>] sys_ioctl [kernel] 0x257 [<c01073c3>] system_call [kernel] 0x33 Code: 39 78 04 74 53 f6 05 3d 29 84 c8 10 74 29 8b 47 4c 83 e0 07 <0>Kernel panic: not continuing In interrupt handler - not syncing The patch as attached to this bugzilla is broken. It has 2 specific problems. First, it doesn't add an sb_page item to the md_k.h header file so it won't even compile. I fixed that. But, it also doesn't set the BH_Lock bit on the buffer head when it's being constructed, where as the md.c file in RHEL3 does. I added that lock to the buffer head in my updated version of the patch. I'll attach the new patch to this bugzilla and also submit it for review upon successful testing. Created attachment 111189 [details]
Redone version of the raid-oops-as21.patch file
Patch tested and submitted for review. A fix for this problem has just been committed to the RHEL2.1 U7 patch pool this evening (in kernel version 2.4.9-61). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-283.html |