Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1584077

Summary: Fail to deploy vm on rt-compute + ovs+dpdk
Product: Red Hat OpenStack Reporter: Yariv <yrachman>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED NOTABUG QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: amuller, atelang, berrange, chrisw, dasmith, eglynn, jhakimra, kchamart, nyechiel, sbauza, sferdjao, sgordon, skramaja, srevivo, vromanso
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-31 15:50:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RT-KVM Custom Role files
none
sos-compute
none
nova log
none
sos-compute
none
sos-compute
none
controller-sos-1
none
controller-sos-2
none
controller-sos-3
none
sos-compute-2 none

Description Yariv 2018-05-30 08:36:27 UTC
Created attachment 1445725 [details]
RT-KVM Custom Role files

Description of problem:

Can not launch VM on TripleO Custom Role for RHEL RT+KVM + OVS DPDK 


Version-Release number of selected component (if applicable):


How reproducible:

Permanent

Steps to Reproduce:

1) TripleO Custom Role for RHEL RT+KVM + OVS DPDK
see attached THTd

2) Compute OverCloud image prepared based on 
the following googledoc
https://docs.google.com/document/d/1x3E-Dpn6IGIsARc11mTvmrKI-j0QII9sp4zMKS9eZog/edit#heading=h.guam5tds2kz9

3) Prepare flavor with the following extraspecs:


Actual results:
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                                    |
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                        |
| access_project_ids         | None                                                                                                                                     |
| disk                       | 20                                                                                                                                       |
| id                         | b005d00e-1483-41cc-8d2e-8172cefcc26b                                                                                                     |
| name                       | m1.large.huge_pages_cpu_pinning_numa_node-0                                                                                              |
| os-flavor-access:is_public | True                                                                                                                                     |
| properties                 | hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB' |
| ram                        | 2048                                                                                                                                     |
| rxtx_factor                | 1.0                                                                                                                                      |
| swap                       |                                                                                                                                          |
| vcpus                      | 4                                                                                                                                        |


Expected results:
Deploy does not fail

Additional info:
Nova log retun the following ERROR
Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed. _do_build_and_run_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:1861

Comment 1 Yariv 2018-05-30 08:41:58 UTC
Created attachment 1445726 [details]
sos-compute

Comment 2 Yariv 2018-05-30 08:42:53 UTC
Created attachment 1445727 [details]
nova log

Comment 4 Sahid Ferdjaoui 2018-05-30 09:04:38 UTC
What was the request you did? it seems that you tried to boot an instance with SRIOV ports attached to it and according to the logs you shared it seems that no PCI devices have been whitelisted:

Final resource view: name=computeovsdpdk-0.localdo\
main phys_ram=130946MB used_ram=4096MB phys_disk=419GB used_disk=0GB total_vcpus=24 used_vcpus=0 pci_stats=[]

So you probably have to update nova.conf of compute nodes to whitelist devices that you want to use.

  see: pci_passthrough_whitelist

Comment 5 Yariv 2018-05-30 15:25:34 UTC
openstack flavor show m1.large.huge_pages_cpu_pinning_numa_node-0
|
| disk                       | 20                                                                                                    | name                       | m1.large.huge_pages_cpu_pinning_numa_node-0                                                           | properties                 |hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB'|
| ram                        | 2048                                                                                                  | vcpus                      | 4                                                                                                     |

still have scheduling issues see logs from scheduler 
2018-05-30 15:24:26.871 1 INFO nova.filters [req-27ae8f3e-39ad-4f56-b1e1-1d85821041c5 4a9700758c0d45c3bd7c06254b9b09e5 d9dd13f924fe4794be832be190e7026c - default default] Filtering removed all hosts for the request with instance ID '63be8017-af99-4713-bd78-5aa9baaf4073'. Filter results: ['RamFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 0)']

Is there missing flavor spec?

Comment 6 Yariv 2018-05-30 17:16:19 UTC
Created attachment 1445960 [details]
sos-compute

Comment 7 Yariv 2018-05-30 17:28:39 UTC
Created attachment 1445962 [details]
sos-compute

Comment 8 Yariv 2018-05-30 17:31:01 UTC
Created attachment 1445963 [details]
controller-sos-1

Comment 9 Yariv 2018-05-30 17:32:02 UTC
Created attachment 1445964 [details]
controller-sos-2

Comment 10 Yariv 2018-05-30 17:39:36 UTC
Created attachment 1445968 [details]
controller-sos-3

Comment 12 Sahid Ferdjaoui 2018-05-31 08:05:50 UTC
Seems that the service was disable:

(overcloud) [stack@undercloud-0 ~]$ openstack compute service list
+----+------------------+------------------------------+----------+----------+-------+----------------------------+
| ID | Binary           | Host                         | Zone     | Status   | State | Updated At                 |
+----+------------------+------------------------------+----------+----------+-------+----------------------------+
|  1 | nova-scheduler   | controller-0.localdomain     | internal | enabled  | up    | 2018-05-31T08:01:09.000000 |
|  2 | nova-consoleauth | controller-0.localdomain     | internal | enabled  | up    | 2018-05-31T08:01:03.000000 |
|  3 | nova-conductor   | controller-0.localdomain     | internal | enabled  | up    | 2018-05-31T08:01:04.000000 |
|  7 | nova-compute     | computeovsdpdk-0.localdomain | nova     | disabled | up    | 2018-05-31T08:01:04.000000 |
+----+------------------+------------------------------+----------+----------+-------+----------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack compute service set --enable nova-compute
(overcloud) [stack@undercloud-0 ~]$ openstack compute service list
+----+------------------+------------------------------+----------+---------+-------+----------------------------+
| ID | Binary           | Host                         | Zone     | Status  | State | Updated At                 |
+----+------------------+------------------------------+----------+---------+-------+----------------------------+
|  1 | nova-scheduler   | controller-0.localdomain     | internal | enabled | up    | 2018-05-31T08:02:09.000000 |
|  2 | nova-consoleauth | controller-0.localdomain     | internal | enabled | up    | 2018-05-31T08:02:03.000000 |
|  3 | nova-conductor   | controller-0.localdomain     | internal | enabled | up    | 2018-05-31T08:02:04.000000 |
|  7 | nova-compute     | computeovsdpdk-0.localdomain | nova     | enabled | up    | 2018-05-31T08:02:06.000000 |
+----+------------------+------------------------------+----------+---------+-------+----------------------------+

Comment 13 Yariv 2018-05-31 08:24:58 UTC
Created attachment 1446159 [details]
sos-compute-2

OK that problem is solved:

Question1:
* how come nova-compute is down? is it related to my THT?

Question2:
* failing uploading sosreport for compute

Comment 14 Sahid Ferdjaoui 2018-05-31 08:56:18 UTC
I did not find any specific action from os-services API that could have cause the service to be set in disable.

So I assume that, you restarted the service during the heal check, so the controller disabled the service by itself.

Comment 15 Sahid Ferdjaoui 2018-05-31 09:09:00 UTC
About the new issue, that looks to be related to neutron, I checked neutron server but did not find much information. Perhaps it would be interesting to double check the network configuration, enable debug in neutron (server and agent), and try again.

2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] Traceback (most recent call last):
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2031, in _build_and_run_instance
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     block_device_info=block_device_info)
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3092, in spawn
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     destroy_disks_on_failure=True)
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5577, in _create_domain_and_network
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     destroy_disks_on_failure)
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     self.force_reraise()
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     six.reraise(self.type_, self.value, self.tb)
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5556, in _create_domain_and_network
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     error_callback=self._neutron_failed_callback):
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     return self.gen.next()
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 469, in wait_for_instance_event
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     error_callback(event_name, instance)
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5499, in _neutron_failed_callback
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]     raise exception.VirtualInterfaceCreateException()
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] VirtualInterfaceCreateException: Virtual Interface creation failed
2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084]

Comment 16 Yariv 2018-05-31 15:50:14 UTC
There was a missing in part in 



hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB'

VM is Active
3adf7778-9b41-4145-a03a-cfbda3b40d36 | dpdk1 | ACTIVE | private=10.10.150.41 | rhel-guest-image-7.3-36.x86_64.qcow2 | m1.large.huge_pages_cpu_pinning_numa_node-0 

With OVS+DPDK
ComputeOvsDpdkRTParameters:

VhostuserSocketGroup: "hugetlbfs"

Comment 17 Yariv 2018-05-31 16:00:42 UTC
(In reply to Yariv from comment #16)
 There was a missing in part in network_environment.yaml
 related to OVS+DPDK
 ComputeOvsDpdkRTParameters:
 VhostuserSocketGroup: "hugetlbfs" 

> 
> 
> hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated',
> hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB'
> 
> VM is Active
> 3adf7778-9b41-4145-a03a-cfbda3b40d36 | dpdk1 | ACTIVE | private=10.10.150.41
> | rhel-guest-image-7.3-36.x86_64.qcow2 |
> m1.large.huge_pages_cpu_pinning_numa_node-0 
> 
>