From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2 Description of problem: When a diskdump is triggered, system is hanging due to ipr adapter reset. Version-Release number of selected component (if applicable): kernel-2.6.9-6.37 How reproducible: Always Steps to Reproduce: 1. Add dump partition to /etc/sysconfig/diskdump 2. service diskdump initialformat 3. chkconfig diskdump on 4. service diskdump start 5. echo 1 >/proc/sys/kernel/sysrq 6. echo c >/proc/sysrq-trigger Actual Results: When the dump is triggered, the system drops into xmon. When I exit xmon, the following is printed: 0:mon> X Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=64 NUMA PSERIES LPAR NIP: C0000000001B1BFC XER: 0000000000000010 LR: C0000000001B1FA4 REGS: c0000000e0a33920 TRAP: 0300 Not tainted (2.6.9-6.37.EL) MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 0000000000000000, DSISR: 0000000042000000 TASK: c0000001b83844c0[6095] 'bash' THREAD: c0000000e0a30000 CPU: 0 GPR00: C0000000001B1BF8 C0000000E0A33BA0 C0000000004D0610 0000000000000063 GPR04: 0000000000000000 0000000000000000 0000000000000080 0000000000000001 GPR08: 0000000000000018 0000000000000000 C0000001BD247BD8 C0000000001B1C04 GPR12: 0000000044242428 C0000000003DA000 00000000100CCBB0 0000000000000000 GPR16: 00000000FFFFFFFF 0000000010060000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 00000000100C0000 0000000000000000 GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000000 GPR28: 0000000000000063 C000000000418D88 C000000000453D38 C00000000043F2F0 NIP [c0000000001b1bfc] .sysrq_handle_crash+0x4/0xc LR [c0000000001b1fa4] .__handle_sysrq+0xac/0x170 Call Trace: [c0000000e0a33ba0] [c0000000001b1f70] .__handle_sysrq+0x78/0x170 (unreliable) [c0000000e0a33c50] [c000000000108ee0] .write_sysrq_trigger+0x84/0xb4 [c0000000e0a33cf0] [c0000000000b69b8] .vfs_write+0x148/0x1ac [c0000000e0a33d90] [c0000000000b6af4] .sys_write+0x4c/0x8c [c0000000e0a33e30] [c000000000011180] syscall_exit+0x0/0x18 CPU frozen: #1#2#3#4#5#6#7 CPU#0 is executing diskdump. start dumping <3>ipr 0002:48:01.0: Adapter being reset as a result of error recovery. Then nothing happens. The dump is not created and the system does not reboot. I have to manually reboot the system. Expected Results: System should dump memory to the dump partition then automatically reboot. Additional info: Attaching patch to fix problem.
Created attachment 115233 [details] Patch to fix problem
Could you please explain how the patch fixes the problem? Thank you.
The patch keeps the adapter from being reset before the dump is created. Customer believes that reenabling interrupts when resetting the adapter is leaving a window for other interrupts to come in, causing the system to hang.
With this patch, ipr_eh_host_rest() returns without calling ipr_reset_reload() when crashdump_mode() is on. So I think the first part of patch is not required. Is it correct?
Correct, I think they were cleaning up ipr_reset_reload(), removing the check for crashdump_mode() since this check was moved to ipr_eh_host_reset(), which as you stated, does not call ipr_reset_reload() when crashdump_mode() is on.
Created attachment 116128 [details] patch2.diff I tried to modify the patch. The original patch does not wait for completion of reset and I think waiting for completion is better if it is possible to do. With this patch, ipr_eh_host_reset() calls ipr_reset_reload() and ipr_reset_reload() calls spin_unlock() instead of spin_unlock_irq() if crashdump_mode() is on. So the new code can wait for completion fo reset without enabling interrupt. If the customer agree with this, could you please request the customer to test the patch?
> The patch does not work. The ipr driver requires interrupts to be > functioning in order for its host reset processing to work. The patch would > result in a hang as well. I'm afraid allowing interrupts during diskdump process makes the possibility of failure increase because it also allows unexpected other interrupts. Is there any way to detect the completion without allowing interrupts? Please confirm your next comment is written in this BZ in public comment. If the comment is private, I cannot read it.
Nobuhiro, here is the customer's response to your question: I agree that enabling interrupts during diskdump is not the right thing to do. It would be possible to make ipr's adapter reset routine work without needing interrupts, but to do so would be complex and, I think, unnecessary. The problem is not so much just providing a way to detect the completion of the adapter reset with interrupts disabled. The problem is that the process of resetting an ipr adapter such that it can accept new commands is a multi-stage process. At a high level, what ipr needs to do is to run BIST on the card, then wait for 2 minutes. During this time, if ipr were to attempt to talk to the adapter, it would result in an EEH error and the reset would fail. It must then write to a register and wait for an interrupt to occur. This interrupt could take as long as 5 minutes to occur (usually it takes around 30 seconds or less). Then it needs to send several commands to the adapter, all of which are timed and require interrupts in the current implementation. To implement what you are proposing would require somone calling ipr_poll repeatedly to make forward progress on the reset job and also would require a way for ipr to time events which could take as long as 5 minutes. There is no reason for diskdump to reset the ipr adapter before executing a dump. There should be no need to quiesce the host before the dump as other commands that may be completing should have no effect on the dump. If there is a desire to prevent other scsi ops that are in progress at the time the dump is started from having their done function from being called, it would be much simpler to accomplish this. To make ipr's reset job work without interrupts and without timer interrupts is complex and adds lots of potential for deadlock if not properly implemented, which is why the initial patch was suggested, which essentially no-ops the host reset when in diskdump. In my tests, diskdump through ipr worked with the patch, but did not work consistently without the patch. Brian says modifying the driver to do a reset with interrupts disabled is not trivial. Besides, he also says it's unnecessary to reset the adapter before the dump. Since he is the ipr driver maintainer, I give his opinion added weight.
> There is no reason for diskdump to reset the ipr adapter before executing a > dump. There should be no need to quiesce the host before the dump as other > commands that may be completing should have no effect on the dump. If there is a > desire to prevent other scsi ops that are in progress at the time the dump is > started from having their done function from being called, it would be much > simpler to accomplish this. I'm not sure the ipr adapter has no need to reset the host. But given that it is correct, ipr adapter can just return SUCCESS as the result of its host reset handler when crashdump_mode() is set. And ipr adapter can do anything in its quiesce handler to dispose of its scsi commands in progress. IMHO this is more robust than allowing interrupts.
> Are you expecting a new patch from IBM that incorporates the suggestions in comment #22? Yes, I'm expecting a new patch. If host reset before diskdump is useless for ipr as you said, I recommend skipping to do host reset. I think that makes diskdump on ipr more robust because it does not open a window for interrupts.
The revised patch which was based on the patch of comment#1 was posted. Got one ACK.
committed in -22.25
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html