Created attachment 320867 [details] panic picture I've tried to upgrade kernel in one IBM xSeries to 78.0.5 but doesn't work. Panic every time. I've added one picture of panic. I suppose somethink in ibmphp or ibmasm module. Last kernel that works fine is: 2.6.9-67.0.7.ELsmp Loaded modules in 2.6.9-67.0.7.ELsmp: hangcheck_timer 7897 0 iptable_filter 6977 0 ip_tables 22721 1 iptable_filter dm_mirror 31557 0 dm_round_robin 7361 1 dm_multipath 22984 2 dm_round_robin dm_mod 67177 29 dm_mirror,dm_multipath button 10705 0 battery 12997 0 ac 8901 0 joydev 14465 0 ohci_hcd 24273 0 ibmasm 28493 0 ibmphp 70573 4294967295 e1000 122705 0 e100 36677 0 mii 9281 1 e100 floppy 58193 0 sg 38369 0 ext3 119497 6 jbd 59865 1 ext3 raid1 19777 3 qla2300 129857 0 aic7xxx 146425 8 qla2xxx 171877 34 qla2300 scsi_transport_fc 12353 1 qla2xxx sd_mod 20545 25 scsi_mod 120269 5 sg,aic7xxx,qla2xxx,scsi_transport_fc,sd_mod # lspci 00:00.0 Host bridge: IBM Winnipeg PCI-X Host Bridge (rev 03) 00:01.0 VGA compatible controller: S3 Inc. Savage 4 (rev 06) 00:02.0 Bridge: IBM Remote Supervisor Adapter (RSA) 00:03.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) 00:04.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02) 00:06.0 Class 0808: IBM: Unknown device 0246 00:0f.0 ISA bridge: Broadcom OSB4 South Bridge (rev 50) 00:0f.1 IDE interface: Broadcom OSB4 IDE Controller 00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 04) 01:00.0 Host bridge: IBM Winnipeg PCI-X Host Bridge (rev 03) 01:01.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Copper) (rev 02) 01:03.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) 0a:00.0 Host bridge: IBM Winnipeg PCI-X Host Bridge (rev 03) 0a:01.0 Fibre Channel: QLogic Corp. ISP2312-based 2Gb Fibre Channel to PCI-X HBA (rev 02) Regards
Guys, some news about this problem? I can't upgrade my server until this is fixed. Thanks
ping
Encountered the same error message on an xSeries system. Add following to /etc/modprobe.conf allowed the system to boot. alias ibmphp off
The error occurs on this line of sys_init_module(): /* Drop initial reference. */ module_put(mod); The "initial reference" is created in the module_unload_init() routine while the module is being loaded and mapped, and it should be dropped here. The BUG is hit because there is no reference to drop. From the module_put() in-line function: BUG_ON(module_refcount(module) == 0) The module being loaded is ibmphp I cannot think of a way this situation could occur. Perhaps IBM has seen this before? Product changed from 'Red Hat Enterprise Linux 4.6' to 'Red Hat Enterprise Linux' Category set to: Kernel::Modules Internal Status set to 'Waiting on Support' Version set to: '4.6' This event sent from IssueTracker by streeter issue 232278
Well, the reason why it is happening now is that the BUG_ON() test was added in the 68.28.EL kernel for BZ 280431. It has probably been wrong all along, but was never caught before. This event sent from IssueTracker by streeter issue 232278
I think this qualifies as a regression in 4.7. Even though the real problem probably existed in 4.6, it was benign but now cause a system crash.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
added bugproxy.com to the cc list for reverse mirroring
Created attachment 326535 [details] patch to mark ibmphp as unsafe to remove The BUG_ON that I added in my previous patch doesn't need to be removed. In fact it worked perfectly here, uncovering a very poor use of module_put in the ibmphp driver. From ibmphp_init: /* lock ourselves into memory with a module * count of -1 so that no one can unload us. */ module_put(THIS_MODULE); The driver is purposely underflowing the module refcount of this driver to prevent it from being unloaded. That is both poor practice, and an incorrect solution, as a subsequent module_get would return the count to zero, allowing for a possible unload. If the module is unsafe to unload, there is a call to inform the kernel of exactly that. I've attached a patch to correct the problem. Please test it out and confirm that the issue is resolved. Thanks!
ping, whats the word here? It would be nice to get this handled by 4.8 close, given that its marked as a high priority bug.
I don't understand why this is set to needinfo for the QA contact.
I will build a test kernel.
Is this happening on a specific IBM server model or is this happening on all IBM xSeries boxes? Please provide the hardware information for IBM to repro.
(In reply to comment #14) > Is this happening on a specific IBM server model or is this happening on all > IBM xSeries boxes? Please provide the hardware information for IBM to repro. It's an IBM xSeries 360 (4x XEON 1.4GHz / 4Gb MEM) Regards
John, The problem is not specific to a module. It involves loading the ibmphp module. We believe Neil has identified the problem and we are testing his fix.
The more recent IBM models don't use this module, but there are likely a fair number of systems in the field that do need this module.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The ibmphp module is unsafe to unload. The mechanism by which this module is prevented from unloading was, in previous releases, insufficient, and eventually triggered a bug halt. The new, more correct method of preventing this module from uloading prevents the aforementioned bug halt, but produces a warning messsge that was previously unrecorded in the message log, indicating that the module is marked as being unsafe to unload. This warning message can be safely ignored.
Committed in 78.30.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
------- Comment From lnx1138.ibm.com 2009-03-27 11:34 EDT------- Hello, anyone verified this is fixed in latest 4.8 snapshot so we can close please? Thanks.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -The ibmphp module is unsafe to unload. The mechanism by which this module is prevented from unloading was, in previous releases, insufficient, and eventually triggered a bug halt. The new, more correct method of preventing this module from uloading prevents the aforementioned bug halt, but produces a warning messsge that was previously unrecorded in the message log, indicating that the module is marked as being unsafe to unload. This warning message can be safely ignored.+The ibmphp module is not safe to unload. Previously, the mechanism that prevented the ibmphp module from unloading was insufficient, and eventually triggered a bug halt. With this update, the method to prevent this module from unloading has been improved, preventing the bug halt. However, attempting to unload the module may produce a warning in the message log, indicating that the module is not safe to unload. This warning can be safely ignored.
------- Comment From mbeeraka.com 2009-03-30 08:09 EDT------- (In reply to comment #17) > (In reply to comment #16) > > Hello, anyone verified this is fixed in latest 4.8 snapshot so we can close > > please? Thanks. > > > > Alright, we will verify this bug & update the bug report soon. > Verified this bug by upgrading the kernel from RHEL4.6 (2.6.9-67.ELsmp) to RHEL4.8-snap1(2.6.9-84.ELsmp) and could not reproduce.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html