Bug 1074097
Summary: | With RHEV-H 6.5, the main vdsm process is using excessive cpu time on an idle system. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gordon Watson <gwatson> | ||||||
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Aharon Canan <acanan> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.3.0 | CC: | amureini, bazulay, cpelland, cshao, gouyang, gwatson, hadong, huiwa, iheim, leiwang, loberman, lpeer, mkalinin, nsoffer, scohen, s.kieske, tnisan, yaniwang, ybronhei, ycui, yeylon | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 3.3.3 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-04-24 15:04:06 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Gordon Watson
2014-03-07 23:03:59 UTC
Created attachment 876092 [details]
vgs output on a hypervisor with high CPU
Created attachment 876093 [details]
vgs output on 6.4 hypervisor - normal CPU
Marina, Can you please also attach the actual vdsm logs? Thanks Hello Based on straces taken, we see most of the time in vdsm spent in futex waits. See below. Are you maybe looking for an ltrace to profile here. [root@do-rhevh3 vdsm]# strace -p 25815 -f -v [pid 25876] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 25861] <... futex resumed> ) = 0 [pid 13289] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4176] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4174] <... futex resumed> ) = 0 [pid 4166] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4163] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4157] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4153] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4150] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4147] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 4138] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4137] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4132] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4129] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4122] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 4120] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4114] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4073] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4068] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4067] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4064] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4063] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4061] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 4058] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 4056] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 25876] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 25861] futex(0x1ef9420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13289] <... futex resumed> ) = 0 [pid 4176] <... futex resumed> ) = 0 [pid 4174] futex(0x1ef9420, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4166] <... futex resumed> ) = 0 [pid 4163] <... futex resumed> ) = 0 [pid 4157] <... futex resumed> ) = 0 [pid 4153] <... futex resumed> ) = 0 [pid 4150] <... futex resumed> ) = 0 [pid 4147] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 4138] <... futex resumed> ) = 0 [pid 4137] <... futex resumed> ) = 0 [pid 4132] <... futex resumed> ) = 0 [pid 4129] <... futex resumed> ) = 0 [pid 4122] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 4120] <... futex resumed> ) = 0 [pid 4114] <... futex resumed> ) = 0 [pid 4073] <... futex resumed> ) = 0 [pid 4068] <... futex resumed> ) = 0 Hello I called the customer in Dubai this morning early. Theo (the Red Hat consultant) has provisioned and installed two new RHEVH hosts using the 3PAR storage and the problem is no longer apparent. The high CPU spin VDSM issue on RHEVH 6.5 is no longer an issue in these new RHEVH hosts. He has one more old configuration left, on which he was attempting to get RHEL installed on so he could then stage a "fat" RHEV so we could perform further debugging, but he has had some issue there. We may never get root cause here, and we know we could never reproduce in-house. If he is able to get RHEL and RHEVH installed I have asked him to capture a vmcore and try the older kernel for 6.4 but that may not happen. So for now I would change the severity of this issue and accept that we had some anomaly in the original installation that we may never get to root cause for. Thank You Laurence This seem to be a duplicate of bug 1090664. While this bug was opened first, we still have insufficient data, while in the other bug we have profiling resutls. I suggest to close this as dupicate and continue tracking this issue in bug 1090664. |