Hide Forgot
Description of problem: Some SmartArray (hpsa/cciss) adapters are known to be non-resettable in kdump, so we should not allow the user to dump on such devices. In kexec-tools, we detect this kind of devices and emit an error in such cases. s-c-kdump should do this too. Additional info: See also Bug 674893.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Is there (which?) any system I can test it on?
Could we know what HBA's are deemed 'not resettable', please?
(In reply to comment #7) > Could we know what HBA's are deemed 'not resettable', please? There was a pleasant progress in this area - almost all cards are resettable either via a soft or hard reset Right now non resetable cards are '6400' 6400 EM' and '5i'. On some cards changes in firmware were needed, so please update your firmware to the latest.
HP has a bulletin saying that they have fixed this in THEIR custom kernel module(drivers). I DO NOT WANT to use any custom kernel modules and pollute my build...can we try and see what fixes were made in their modules and backport them to the RHEL distro kernel modules? From HP DOC: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02758009&lang=en&cc=us&taskId=101&prodSeriesId=4268682&prodTypeId=3709945 ---------------------- SUPPORT COMMUNICATION - CUSTOMER ADVISORY Document ID: c02758009 Version: 2 Advisory: (Revision) HP Smart Array Controllers - CUSTOMER ACTION REQUIRED for Certain HP Smart Array Controllers to Ensure Pending Writes to Storage Devices Complete Properly if Attempting to Use the Linux Kdump Facility on ProLiant Servers Pending writes to storage devices may not complete properly as detailed below if the kdump facility is used. Using the kdump facility could leave the server in an unstable condition, which could potentially result in an inconsistent filesystem state. By disregarding this notification, the customer accepts the risk of incurring potential related errors. The Linux kdump facility fails to execute properly when used on HP ProLiant servers running Linux and configured with certain Smart Array controllers using certain versions of the cciss device drivers. Kdump is a facility for capturing a system memory image when a kernel panic occurs. These memory images are useful for debugging. Kdump loads a special kernel into a reserved memory area. When a panic occurs in the main kernel, control is transferred to the kdump kernel. As the memory image dump process begins, storage device drivers reset their associated controllers to clear all pending I/O activity before starting the memory dump activity, including pending writes to storage devices. When kdump is executing on the affected Smart Array controllers, the reset process does not work as expected, and commands that were issued to the controller prior to the kdump kernel beginning execution may complete during the kdump process, potentially disrupting the kdump kernel's I/O, and potentially leading to a corrupt kdump image or inconsistent filesystem state. This problem occurs on Smart Array controllers that do not support the PCI power management reset method. In these cases, the needed reset never occurs. In some cases, if no I/O was pending at the time of the system panic, the missed reset may go unnoticed, and the kdump process may appear to work normally. Messages similar to the following may be displayed on some but not all controllers having this problem: <4>cciss 0000:19:08.0: Unable to successfully reset controller. . . SCOPE Any HP ProLiant server configured with any of the following HP Smart Array controllers: HP Smart Array P400 controller HP Smart Array P400i controller HP Smart Array P800 controller HP Smart Array E500 controller HP Smart Array P700m controller HP Smart Array E200 controller HP Smart Array E200i controller And running any of the following versions of the HP Smart Array driver (cciss): For Red Hat Enterprise Linux 5, Version 3.6.28-7 (or earlier) For SUSE Linux Enterprise Server 10, Version 3.6.28-6 (or earlier) For Red Hat Enterprise Linux 6, Version 4.6.28-6 (or earlier) For SUSE Linux Enterprise Server 11, Version 4.6.28-6 (or earlier) RESOLUTION To ensure this issue does not occur, do not use the kdump facility to dump the memory image on the affected Smart Array controllers when using the cciss drivers listed in the Scope section. To eliminate the potential for lost data if the kdump process is used on the affected Smart Array controllers, use the following versions of the HP Smart Array driver (cciss): For Red Hat Enterprise Linux 5, Version 3.6.28-12 (or later) For Red Hat Enterprise Linux 6, Version 4.6.28-12 (or later) ----------------------
(In reply to comment #9) > HP has a bulletin saying that they have fixed this in THEIR custom kernel > module(drivers). > > I DO NOT WANT to use any custom kernel modules and pollute my build...can we > try and see what fixes were made in their modules and backport them to the RHEL > distro kernel modules? It is backported to RHEL6.1, please check the resettable parameter on your system. (an updated fw is also needed) There is only a very small set of cards which aren't resettable please see comment#8 - those cards aren't mentioned in the HP document... > http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02758009&lang=en&cc=us&taskId=101&prodSeriesId=4268682&prodTypeId=3709945 ... > HP Smart Array P400 controller > HP Smart Array P400i controller > HP Smart Array P800 controller > HP Smart Array E500 controller > HP Smart Array P700m controller > HP Smart Array E200 controller > HP Smart Array E200i controller
So this bug can be closed? I'm not sure - is it fixed in firmware (or elsewhere)?
(In reply to comment #11) > So this bug can be closed? I'm not sure - is it fixed in firmware (or > elsewhere)? It is fixed in both, firmware and the driver. There was no response for several months I think you can close it.