Bug 1591385

Summary: Seeing cpu affinity is not supported messages on compute node.
Product: Red Hat OpenStack Reporter: Siggy Sigwald <ssigwald>
Component: openstack-novaAssignee: Stephen Finucane <stephenfin>
Status: CLOSED NOTABUG QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: akaris, berrange, ccollett, chhudson, dasmith, dhill, eglynn, jhakimra, jmelvin, kchamart, knoel, lyarwood, mburns, mrezanin, nova-maint, panbalag, rbalakri, rlondhe, saime, sbauza, sferdjao, sgordon, srevivo, ssigwald, stephenfin, vasili.namatov, vromanso
Target Milestone: zstreamKeywords: ZStream
Target Release: 10.0 (Newton)Flags: ssigwald: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1420903 Environment:
Last Closed: 2018-07-06 14:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1420903    
Bug Blocks: 1515165, 1563067    

Comment 1 Siggy Sigwald 2018-06-14 15:13:52 UTC
From one of our orchestration VM , reboot of the VM was issued thru notifications , but it took > 5 min to start the VM. 
from host nova-compute logs for that VM we observed following logs. 

[root@lbucs002-osd-compute-1 nova]# zgrep -i bef70156-e73e-4223-9e43-e97cfaa47873 nova-compute.log-20180609.gz
2018-06-05 18:21:32.548 306607 INFO nova.virt.libvirt.driver [req-438ef125-dc1a-4a0a-bc50-de1a80f621b3 411aa1af9d0248f4843207ee6736c4d7 24a12c2c3c414615ae23fa75fad11683 - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Instance failed to shutdown in 60 seconds.
2018-06-05 18:22:07.134 306607 INFO nova.virt.libvirt.driver [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Instance destroyed successfully.
2018-06-05 18:22:07.225 306607 INFO nova.compute.manager [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] During sync_power_state the instance has a pending task (powering-off). Skip.
2018-06-05 18:22:13.759 306607 INFO nova.virt.libvirt.driver [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Instance destroyed successfully.
2018-06-05 18:22:22.134 306607 INFO nova.compute.manager [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] VM Stopped (Lifecycle Event)
2018-06-05 18:22:39.029 306607 WARNING nova.virt.libvirt.driver [req-bc110325-58d3-46d5-bf43-6961773bae1f - - - - -] couldn't obtain the vcpu count from domain id: bef70156-e73e-4223-9e43-e97cfaa47873, exception: Requested operation is not valid: cpu affinity is not supported
2018-06-05 18:23:38.293 306607 WARNING nova.virt.libvirt.driver [req-bc110325-58d3-46d5-bf43-6961773bae1f - - - - -] couldn't obtain the vcpu count from domain id: bef70156-e73e-4223-9e43-e97cfaa47873, exception: Requested operation is not valid: cpu affinity is not supported
2018-06-05 18:24:38.369 306607 WARNING nova.virt.libvirt.driver [req-bc110325-58d3-46d5-bf43-6961773bae1f - - - - -] couldn't obtain the vcpu count from domain id: bef70156-e73e-4223-9e43-e97cfaa47873, exception: Requested operation is not valid: cpu affinity is not supported
2018-06-05 18:25:38.420 306607 WARNING nova.virt.libvirt.driver [req-bc110325-58d3-46d5-bf43-6961773bae1f - - - - -] couldn't obtain the vcpu count from domain id: bef70156-e73e-4223-9e43-e97cfaa47873, exception: Requested operation is not valid: cpu affinity is not supported
2018-06-05 18:26:42.440 306607 WARNING nova.virt.libvirt.driver [req-bc110325-58d3-46d5-bf43-6961773bae1f - - - - -] couldn't obtain the vcpu count from domain id: bef70156-e73e-4223-9e43-e97cfaa47873, exception: Requested operation is not valid: cpu affinity is not supported
2018-06-05 18:27:35.825 306607 INFO nova.compute.manager [req-6eddf26b-e4f2-44da-8db4-f639ad465ac1 - - - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] VM Resumed (Lifecycle Event)
2018-06-05 18:27:36.030 306607 INFO nova.virt.libvirt.driver [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Instance rebooted successfully.
2018-06-05 18:27:36.060 306607 INFO nova.compute.manager [req-6eddf
2018-06-05 18:27:36.060 306607 INFO nova.compute.manager [req-6eddf26b-e4f2-44da-8db4-f639ad465ac1 - - - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] During sync_power_state the instance has a pending task (powering-on). Skip.
2018-06-05 18:27:36.060 306607 INFO nova.compute.manager [req-6eddf26b-e4f2-44da-8db4-f639ad465ac1 - - - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] VM Started (Lifecycle Event)

next time when we issued such event , it went thru immediately.
 
2018-06-05 19:12:16.861 306607 INFO nova.compute.manager [req-a0f088b1-eda7-4faa-b2eb-bbe7cb2f6761 411aa1af9d0248f4843207ee6736c4d7 24a12c2c3c414615ae23fa75fad11683 - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Rebooting instance
2018-06-05 19:12:58.370 306607 INFO nova.compute.manager [req-6eddf26b-e4f2-44da-8db4-f639ad465ac1 - - - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] VM Resumed (Lifecycle Event)
2018-06-05 19:12:58.580 306607 INFO nova.virt.libvirt.driver [-] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] Instance rebooted successfully.
2018-06-05 19:12:58.714 306607 INFO nova.compute.manager [req-6eddf26b-e4f2-44da-8db4-f639ad465ac1 - - - - -] [instance: bef70156-e73e-4223-9e43-e97cfaa47873] VM Started (Lifecycle Event)

Comment 15 Stephen Finucane 2018-06-25 17:49:52 UTC
As requested on our recent call, I've checked to see the package that first included the above fix for the cloned bug. This was resolved in OSP 10 with the openstack-nova-14.0.9-2.el7ost package. However, as noted above, I don't think this is the same issue and that error message itself is harmless. If we could tweak the '[DEFAULT] instance_delete_interval' value to something like 60 seconds and, ideally, get DEBUG-level logging, we should have a better handle on what's going on.

Comment 23 Stephen Finucane 2018-07-06 14:34:08 UTC
The original issue, "cpu affinity is not supported" warning messages in logs, has been identified as non-issue and fixes are available in the package versions indicated in comment 15. While it does appear that there is an issue with delayed startup for restarted instances this is tangential to the original issue and it is not yet obvious that the issue is with nova as opposed to the EPC application.

Given the above, I'm going to close this. Discussion can continue in the customer case regarding the timeout issue and a new bug should be opened if this is identified as a nova issue.