Bug 618097
Summary: | kernel thread ata/1 consume too much cpu time | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Wu <dwu> | ||||||
Component: | kernel | Assignee: | David Milburn <dmilburn> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.5 | CC: | cww, gsgatlin, jjneely, jwilson, tao | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-01-18 17:35:21 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
crash> bt 443 PID: 443 TASK: ffff81007f5fe860 CPU: 1 COMMAND: "ata/1" #0 [ffff81000a6faf20] crash_nmi_callback at ffffffff8007bf44 #1 [ffff81000a6faf40] do_nmi at ffffffff8006688a #2 [ffff81000a6faf50] nmi at ffffffff80065eef [exception RIP: __delay+6] RIP: ffffffff8000c9f2 RSP: ffff81007ef5be18 RFLAGS: 00000287 RAX: 00000000000619a4 RBX: ffff81007eed8000 RCX: 00000000083fe723 RDX: 000000000000010d RSI: 0000000000000282 RDI: 00000000029094f0 RBP: 0000000000000005 R8: ffff81007ef5a000 R9: ffff81007eed8000 R10: ffff81007ed721d8 R11: ffffffff8807aeb3 R12: ffff81007eed80e0 R13: 0000000000000282 R14: ffff81007eed8000 R15: ffffffff880c6511 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff81007ef5be18] __delay at ffffffff8000c9f2 #4 [ffff81007ef5be18] ata_pio_task at ffffffff880c6565 #5 [ffff81007ef5be38] run_workqueue at ffffffff8004dc37 #6 [ffff81007ef5be78] worker_thread at ffffffff8004a562 #7 [ffff81007ef5bee8] kthread at ffffffff80032bdc #8 [ffff81007ef5bf48] kernel_thread at ffffffff8005efb1 void ata_pio_task(void *_data) { struct ata_port *ap = _data; struct ata_queued_cmd *qc = ap->port_task_data; u8 status; int poll_next; fsm_start: WARN_ON(ap->hsm_task_state == HSM_ST_IDLE); /* * This is purely heuristic. This is a fast path. * Sometimes when we enter, BSY will be cleared in * a chk-status or two. If not, the drive is probably seeking * or something. Snooze for a couple msecs, then * chk-status again. If still busy, queue delayed work. */ status = ata_sff_busy_wait(ap, ATA_BUSY, 5); if (status & ATA_BUSY) { msleep(2); status = ata_sff_busy_wait(ap, ATA_BUSY, 10); if (status & ATA_BUSY) { ata_pio_queue_task(ap, qc, ATA_SHORT_PAUSE); return; } } ... } It seems that kernel thread ata/1 is waiting for bit ATA_BUSY to be cleaned. And it was shown that ata/1 consume too much cpu from top, so maybe it spent a lot of time in busy wait. vmcore is available at megatron.gsslab.rdu.redhat.com:/cores/20100518081635/work Specify the option "noacpi=1" of module libata, and still have a slow response. Similar issue was reported in https://bugzilla.redhat.com/show_bug.cgi?id=468027#c49 Would you please attach your dmesg output after a successful -164.el5 boot using the kernel parameter "hda=ide-scsi"? Created attachment 436442 [details]
dmesg on -128 kernel which also works fine.
Currently we only have sosreport for -194 kernel and -128 kernel. The system in question also works fine with 128 kernel. I am going to collect dmesg on -164 kernel from the customer.
Created attachment 436452 [details]
dmesg on -194 kernel which has bad performance
With respect to the Optiplex 740 workstation. It seems that updating the BIOS to the latest version (version 2.2.5) fixes the issues with ata/1 consuming too much CPU. I hit this bug when upgrading from 32 bit to 64 bit RHEL 5. Its easy to confuse this with bug 586532 so make sure you have the workaround for enable_msi=0 in /etc/modprobe.conf and have the latest BIOS and this problem goes away. It's been almost a year and a half since this BZ was updated. Customer issue is closed. Closing the BZ NOTABUG. If this is still an issue please open a case with Red Hat Support via the Customer Portal. |
Description of problem: Performance de-gradated in Dell OptiPlex 740 after booting to 2.6.18-194 kernel. System response is very slow. ata/1 process is consuming too much cpu and in R state all the time. hald-addon-storage process and scsi_eh_1 is also in D state or R state most of the time. And the system can work well with any of the following workarounds: 1. Reverting back to 2.6.18-164 kernel. 2. Stopping hald's polling CR-ROM 3. Booting with acpi=off option. They found that both CD-ROM model GSA-H73N and DH-16A6S has this problem. Version-Release number of selected component (if applicable): kernel - 2.6.18-194 How reproducible: Steps to Reproduce: 1. boot with 194 kernel (with kernel parameter "hda=ide-scsi" and without "acpi=off") Actual results: Expected results: Additional info: Recursive "diff" between ata drivers of -164 and -194, haven't found any change related to this issue. # diff -r ata-194/ ata-164/ diff -r ata-194/ahci.c ata-164/ahci.c 481d480 < { PCI_VDEVICE(INTEL, 0x3a22), board_ahci }, /* ICH10 */ 505,510d503 < /* AMD */ < { PCI_VDEVICE(AMD, 0x7800), board_ahci }, /* AMD Hudson-2 */ < /* AMD is using RAID class only for ahci controllers */ < { PCI_VENDOR_ID_AMD, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, < PCI_CLASS_STORAGE_RAID << 8, 0xffffff, board_ahci }, < 2295,2297c2288,2289 < if ((pdev->vendor == PCI_VENDOR_ID_AMD && pdev->device == 0x7800) || < (pdev->vendor == PCI_VENDOR_ID_ATI && < (pdev->device == 0x4380 || pdev->device == 0x4390))) { --- > if (pdev->vendor == PCI_VENDOR_ID_ATI && > (pdev->device == 0x4380 || pdev->device == 0x4390)) { diff -r ata-194/pata_atiixp.c ata-164/pata_atiixp.c 255d254 < { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_HUDSON2_IDE), },