Bug 1584077
| Summary: | Fail to deploy vm on rt-compute + ovs+dpdk | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yariv <yrachman> | ||||||||||||||||||||
| Component: | openstack-neutron | Assignee: | Assaf Muller <amuller> | ||||||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Toni Freger <tfreger> | ||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||
| Version: | 13.0 (Queens) | CC: | amuller, atelang, berrange, chrisw, dasmith, eglynn, jhakimra, kchamart, nyechiel, sbauza, sferdjao, sgordon, skramaja, srevivo, vromanso | ||||||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||
| Last Closed: | 2018-05-31 15:50:14 UTC | Type: | Bug | ||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||
|
Description
Yariv
2018-05-30 08:36:27 UTC
Created attachment 1445726 [details]
sos-compute
Created attachment 1445727 [details]
nova log
What was the request you did? it seems that you tried to boot an instance with SRIOV ports attached to it and according to the logs you shared it seems that no PCI devices have been whitelisted: Final resource view: name=computeovsdpdk-0.localdo\ main phys_ram=130946MB used_ram=4096MB phys_disk=419GB used_disk=0GB total_vcpus=24 used_vcpus=0 pci_stats=[] So you probably have to update nova.conf of compute nodes to whitelist devices that you want to use. see: pci_passthrough_whitelist openstack flavor show m1.large.huge_pages_cpu_pinning_numa_node-0 | | disk | 20 | name | m1.large.huge_pages_cpu_pinning_numa_node-0 | properties |hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB'| | ram | 2048 | vcpus | 4 | still have scheduling issues see logs from scheduler 2018-05-30 15:24:26.871 1 INFO nova.filters [req-27ae8f3e-39ad-4f56-b1e1-1d85821041c5 4a9700758c0d45c3bd7c06254b9b09e5 d9dd13f924fe4794be832be190e7026c - default default] Filtering removed all hosts for the request with instance ID '63be8017-af99-4713-bd78-5aa9baaf4073'. Filter results: ['RamFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 0)'] Is there missing flavor spec? Created attachment 1445960 [details]
sos-compute
Created attachment 1445962 [details]
sos-compute
Created attachment 1445963 [details]
controller-sos-1
Created attachment 1445964 [details]
controller-sos-2
Created attachment 1445968 [details]
controller-sos-3
Seems that the service was disable: (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +----+------------------+------------------------------+----------+----------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +----+------------------+------------------------------+----------+----------+-------+----------------------------+ | 1 | nova-scheduler | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:01:09.000000 | | 2 | nova-consoleauth | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:01:03.000000 | | 3 | nova-conductor | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:01:04.000000 | | 7 | nova-compute | computeovsdpdk-0.localdomain | nova | disabled | up | 2018-05-31T08:01:04.000000 | +----+------------------+------------------------------+----------+----------+-------+----------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack compute service set --enable nova-compute (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +----+------------------+------------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +----+------------------+------------------------------+----------+---------+-------+----------------------------+ | 1 | nova-scheduler | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:02:09.000000 | | 2 | nova-consoleauth | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:02:03.000000 | | 3 | nova-conductor | controller-0.localdomain | internal | enabled | up | 2018-05-31T08:02:04.000000 | | 7 | nova-compute | computeovsdpdk-0.localdomain | nova | enabled | up | 2018-05-31T08:02:06.000000 | +----+------------------+------------------------------+----------+---------+-------+----------------------------+ Created attachment 1446159 [details]
sos-compute-2
OK that problem is solved:
Question1:
* how come nova-compute is down? is it related to my THT?
Question2:
* failing uploading sosreport for compute
I did not find any specific action from os-services API that could have cause the service to be set in disable. So I assume that, you restarted the service during the heal check, so the controller disabled the service by itself. About the new issue, that looks to be related to neutron, I checked neutron server but did not find much information. Perhaps it would be interesting to double check the network configuration, enable debug in neutron (server and agent), and try again. 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] Traceback (most recent call last): 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2031, in _build_and_run_instance 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] block_device_info=block_device_info) 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3092, in spawn 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] destroy_disks_on_failure=True) 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5577, in _create_domain_and_network 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] destroy_disks_on_failure) 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] self.force_reraise() 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] six.reraise(self.type_, self.value, self.tb) 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5556, in _create_domain_and_network 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] error_callback=self._neutron_failed_callback): 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] return self.gen.next() 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 469, in wait_for_instance_event 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] error_callback(event_name, instance) 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5499, in _neutron_failed_callback 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] raise exception.VirtualInterfaceCreateException() 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] VirtualInterfaceCreateException: Virtual Interface creation failed 2018-05-31 08:11:44.056 1 ERROR nova.compute.manager [instance: 3df2a813-c216-44d2-a522-fa1ea0de1084] There was a missing in part in hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB' VM is Active 3adf7778-9b41-4145-a03a-cfbda3b40d36 | dpdk1 | ACTIVE | private=10.10.150.41 | rhel-guest-image-7.3-36.x86_64.qcow2 | m1.large.huge_pages_cpu_pinning_numa_node-0 With OVS+DPDK ComputeOvsDpdkRTParameters: VhostuserSocketGroup: "hugetlbfs" (In reply to Yariv from comment #16) There was a missing in part in network_environment.yaml related to OVS+DPDK ComputeOvsDpdkRTParameters: VhostuserSocketGroup: "hugetlbfs" > > > hw:cpu_emulator_threads='isolate', hw:cpu_policy='dedicated', > hw:cpu_realtime='yes', hw:cpu_realtime_mask='^0-1', hw:mem_page_size='1GB' > > VM is Active > 3adf7778-9b41-4145-a03a-cfbda3b40d36 | dpdk1 | ACTIVE | private=10.10.150.41 > | rhel-guest-image-7.3-36.x86_64.qcow2 | > m1.large.huge_pages_cpu_pinning_numa_node-0 > > |