Bug 609156
Summary: | CPU usage of kipmi thread is too high..(95 to 98%) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Shyam Iyer <shiyer> | ||||
Component: | kernel | Assignee: | Peter Martuccelli <peterm> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | WANG Chao <chaowang> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | arozansk, charles_rose, czhang, jfeeney, martinez, peterm, qcai, raghavendra_biligiri, ruyang, tao, wwlinuxengineering | ||||
Target Milestone: | rc | ||||||
Target Release: | 6.0 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-11-11 16:15:11 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 594856 | ||||||
Attachments: |
|
Description
Shyam Iyer
2010-06-29 14:34:58 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Does this manifest itself in real use cases? The polling is used because the hardware doesn't provide interrupts. Reducing the polling interval will result in increased latency in handling IPMI commands. kipmi will only be running if commands are in flight, so I think things are working as designed here. In the real world scenario the higher CPU usage is observed when Dell's systems management software Open-Manage is running. Opening up to other Dell folks to provide the specific use cases. (In reply to comment #3) > Does this manifest itself in real use cases? The polling is used because the > hardware doesn't provide interrupts. Reducing the polling interval will result > in increased latency in handling IPMI commands. kipmi will only be running if > commands are in flight, so I think things are working as designed here. Sure.. kipmi is running only when the commands are in flight but the open manage system software check the sensors periodically and that spikes the CPU usage considerably and affects normal operations. In the past we have asked folks to see if they can shut of the OpenManage software for their performance critical tasks but sysadmins are wary of losing out sensor logs during that time. See below thread.. https://patchwork.kernel.org/patch/13068/ The net effect of this patch is that if kipmid runs for more than the configured number of nanoseconds it'll then sleep for a millisecond. This still requires manual configuration and may potentially significantly slow bulk ipmi transactions such as firmware updates. How frequently is the OpenManage code actually triggering ipmi queries, and how long does each of those queries take? From the OpenManage team: "OpenManage queries the IPMI driver every 20 seconds and queries will last for about a second. Firmware update to iDRAC doesn't happen over KCS for DRAC5 and above. iDRAC key is mounted as partition on host OS and image is transferred there and then upgrade initiated" Setting Sev to Urgent since Bug 584106 results in unacceptable performance on Dell PowerEdge servers. This isn't an elegant solution and still requires manual configuration (and may result in problems for some other use cases), but upstream carries this so I guess there's no real harm. In the long term I'd recommend that Dell ship hardware that supports interrupts. (In reply to comment #9) > This isn't an elegant solution and still requires manual configuration (and may > result in problems for some other use cases), but upstream carries this so I > guess there's no real harm. In the long term I'd recommend that Dell ship > hardware that supports interrupts. This is definitely the plan and we are working on the implementation, but we will not have it implemented by RHEL6 GA. Created attachment 431335 [details]
Patch to add parameter kipmid_max_busy_us module parameter
This patch factors the recent upstream fix to the regression caused by module parameter patch.
Please test.
Updates from testing by Srini at Dell with the patch: Reboot 1 Kipmid_max_busy_us CPU Utilization range Real time 0 94% to 96% 0.145s 100 8% to 10% 0.288s 200 2% to 5% 0.093s 300 0.3% to 1% 0.154s 400 3% to 6% 0.205s 500 5% to 12% 0.077s Reboot 2: Kipmid_max_busy_us CPU Utilization range Real time 0 95% to 97% 0.162s 100 0.3% to 1.8% 0.248s 200 0.3% to 0.5% 0.260s 300 0.3% to 0.7% 0.211s 400 2% to 4% 0.291s 500 5% to 8% 0.113s Reboot 3: Kipmid_max_busy_us CPU Utilization range Real time 0 95% to 98% 0.311s 100 0.3% to 1.8% 0.183s 200 0.3% to 0.5% 0.240s 300 0.2% to 0.4% 0.201s 400 2% to 4% 0.492s 500 5% to 8% 0.121s We did not find this fix in 2.6.32-44.1. What is the target kernel for this fix? Patch(es) available on kernel-2.6.32-52.el6 Verified that the patch is present in RHEL6-Snapshot8 (kernel 2.6.32-52). Below is the CPU utilization we observed in RHEL6-Snapshot8: Kipmid_max_busy_us CPU Utilization range Real time 0 80% to 95% 0m37.45s 100 0.6% to 1.0% 0m39.104s 200 1% 0m29.78s 300 1.0% to 3.0% 0m28.974s 400 2.0% to 10.0% 0m36.65s 500 3% to 22% 0m30.93s Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |