Description of problem: The updated SATA support in 2.6.9-44 has a reference counting bug in ata_scsi_scan_host. ata_scsi_scan_host iterates over all entries in ap->devices[] calling __scsi_add_device for each device found. In 2.6.18 this looks like this: ata_scsi_scan_host() -> __scsi_add_device() -> scsi_probe_and_add_lun() -> scsi_device_lookup_by_target() -> scsi_device_get() So, on return from __scsi_add_device we are holding a reference on the returned scsi_device and need a scsi_device_put(sdev) following the return from __scsi_add_device: sdev = __scsi_add_device(ap->scsi_host, 0, i, 0, NULL); if (!IS_ERR(sdev)) { dev->sdev = sdev; scsi_device_put(sdev); } In the RHEL4 2.6.9 kernels the sequence looks instead like this: ata_scsi_scan_host() -> __scsi_add_device() -> scsi_probe_and_add_lun() -> scsi_device_lookup() And we are NOT holding a reference on the scsi_device when we return into ata_scsi_scan_host. The backport from 2.6.18 included the upstream scsi_device_put(): void ata_scsi_scan_host(struct ata_port *ap) { - struct ata_device *dev; unsigned int i; - if (ap->flags & ATA_FLAG_PORT_DISABLED) + if (ap->flags & ATA_FLAG_DISABLED) return; for (i = 0; i < ATA_MAX_DEVICES; i++) { + struct ata_device *dev = &ap->device[i]; + struct scsi_device *sdev; + + if (!ata_dev_enabled(dev) || dev->sdev) + continue; + + sdev = __scsi_add_device(ap->scsi_host, 0, i, 0, NULL); + if (!IS_ERR(sdev)) { + dev->sdev = sdev; + scsi_device_put(sdev); + } + } +} But since we aren't holding a reference, this breaks the reference counting for the SATA module, e.g. ata_piix. If the reference count is incorrectly decremented to 0 the module may be unloaded while still in use, triggering a panic on the next access to the SATA device. Version-Release number of selected component (if applicable): 2.6.9-44.EL onward (reproduced on 55.EL) How reproducible: 100% but see notes below. Steps to Reproduce: Depending on the configuration of the machine it's easier/harder to see the problem and trigger a panic because of it. For e.g. if device-mapper is used, when dm claims the devices the reference count is incremented above zero (it's still wrong, but it's harder to see). Examples here use ata_piix as that's what I had for testing but any of the libata based drivers should be similarly affected. Reproducing with rescue mode 1. Boot a machine with a single ata_piix device using rhel4.5 install media in rescue mode 2. Select "skip" when asked about fs detection 3. Examine reference count on the ata_piix module, it is 4294967295 (-1). 4. Mount a partition from the SATA disk (I used /boot) 5. Examine reference count on the ata_piix module, it is 0. 6. rmmod ata_piix 7. Poke the device (e.g. ls -R /path/to/mount) Reproducing on an installed system 1. Install the machine onto a SAT disk with only a root file system (no LVM, no /boot) 2. Check the reference count on the ata module (will be 0) 3. rmmod ata_piix 4. Poke the device (e.g. ls -R /) Actual results: rmmod succeeds, machine panics at 7 (rescue mode) or 4 (installed system). Expected results: rmmod fails, machine does not panic Additional info:
Created attachment 155343 [details] remove bogus scsi_device_put in ata_scsi_scan_host
*** Bug 240016 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
Posted to rhkl
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
committed in stream U6 build 55.14. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot1 on partners.redhat.com. Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot3 on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html