Bug 1600373
Summary: | after a minor update it's not possible to boot instances | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> |
Component: | python-networking-ovn | Assignee: | Assaf Muller <amuller> |
Status: | CLOSED DUPLICATE | QA Contact: | Eran Kuris <ekuris> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 13.0 (Queens) | CC: | apevec, jfrancoa, lhh, majopela, nyechiel |
Target Milestone: | z1 | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-07-12 13:08:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eran Kuris
2018-07-12 06:19:06 UTC
It looks like the nova-compute service is down: (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +----+------------------+--------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +----+------------------+--------------------------+----------+---------+-------+----------------------------+ | 2 | nova-scheduler | controller-1.localdomain | internal | enabled | up | 2018-07-12T10:20:46.000000 | | 5 | nova-consoleauth | controller-1.localdomain | internal | enabled | up | 2018-07-12T10:20:41.000000 | | 8 | nova-scheduler | controller-2.localdomain | internal | enabled | up | 2018-07-12T10:20:46.000000 | | 11 | nova-conductor | controller-1.localdomain | internal | enabled | up | 2018-07-12T10:20:40.000000 | | 14 | nova-consoleauth | controller-2.localdomain | internal | enabled | up | 2018-07-12T10:20:47.000000 | | 17 | nova-compute | compute-1.localdomain | nova | enabled | up | 2018-07-12T10:20:44.000000 | | 20 | nova-conductor | controller-2.localdomain | internal | enabled | up | 2018-07-12T10:20:43.000000 | | 26 | nova-compute | compute-0.localdomain | nova | enabled | down | 2018-07-12T08:20:12.000000 | | 41 | nova-scheduler | controller-0.localdomain | internal | enabled | up | 2018-07-12T10:20:41.000000 | | 50 | nova-consoleauth | controller-0.localdomain | internal | enabled | up | 2018-07-12T10:20:41.000000 | | 65 | nova-conductor | controller-0.localdomain | internal | enabled | up | 2018-07-12T10:20:44.000000 | +----+------------------+--------------------------+----------+---------+-------+----------------------------+ Went to check the logs in nova_compute container inside compute-0, and found the following backtrace error: Running command: '/usr/bin/nova-compute ' /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported exception.NotSupportedWarning + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/libvirt/libvirtd.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/libvirtd.conf to /etc/libvirt/libvirtd.conf INFO:__main__:Deleting /etc/libvirt/passwd.db INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/passwd.db to /etc/libvirt/passwd.db INFO:__main__:Deleting /etc/libvirt/qemu.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/qemu.conf to /etc/libvirt/qemu.conf INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf INFO:__main__:Deleting /etc/nova/migration/authorized_keys INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/authorized_keys to /etc/nova/migration/authorized_keys INFO:__main__:Deleting /etc/nova/migration/identity INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/identity to /etc/nova/migration/identity INFO:__main__:Deleting /etc/nova/nova.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf INFO:__main__:Deleting /etc/sasl2/libvirt.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sasl2/libvirt.conf to /etc/sasl2/libvirt.conf INFO:__main__:Deleting /etc/ssh/sshd_config INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/ssh/sshd_config to /etc/ssh/sshd_config INFO:__main__:Deleting /var/lib/nova/.ssh/config ERROR:__main__:Unexpected error: Traceback (most recent call last): File "/usr/local/bin/kolla_set_configs", line 411, in main execute_config_strategy(config) File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy copy_config(config) File "/usr/local/bin/kolla_set_configs", line 306, in copy_config config_file.copy() File "/usr/local/bin/kolla_set_configs", line 150, in copy self._merge_directories(source, dest) File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories os.path.join(dest, to_copy)) File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories os.path.join(dest, to_copy)) File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories os.path.join(dest, to_copy)) File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories os.path.join(dest, to_copy)) File "/usr/local/bin/kolla_set_configs", line 99, in _merge_directories self._copy_file(source, dest) File "/usr/local/bin/kolla_set_configs", line 75, in _copy_file self._delete_path(dest) File "/usr/local/bin/kolla_set_configs", line 108, in _delete_path os.remove(path) OSError: [Errno 2] No such file or directory: '/var/lib/nova/.ssh/config' + sudo -E kolla_set_configs However, the tempest test cases managed to create some other instances and delete them for what I could see in the nova-compute logs: 2018-07-11 23:47:50.839 1 INFO nova.compute.manager [req-b0b534ed-48f8-413e-a3c2-2d9bafd579fa 84d1c3f3bb2840cebd562c9518a3a31d a88b57c748ab477a90052a31ea1bc48e - default default] [instance: 3bc365c4-c8fd-4cdc-be69-2431b1944f1e] VM Started (Lifecycle Event) 2018-07-11 23:48:05.947 1 INFO nova.compute.manager [req-88904359-c26a-479c-bd86-7cd5699a5603 84d1c3f3bb2840cebd562c9518a3a31d a88b57c748ab477a90052a31ea1bc48e - default default] [instance: 3bc365c4-c8fd-4cdc-be69-2431b1944f1e] VM Resumed (Lifecycle Event) 2018-07-11 23:49:07.760 1 INFO nova.scheduler.client.report [req-70ec49ce-1342-4527-b13b-9fe0f82e8c10 84d1c3f3bb2840cebd562c9518a3a31d a88b57c748ab477a90052a31ea1bc48e - default default] Deleted allocation for instance 3bc365c4-c8fd-4cdc-be69-2431b1944f1e For that reason, I wonder if this is not a side effect of the tempest running tests. Anyway, I will continue debugging the issue. According to Neutron ci update job, It looks like the cloud is not functional after minor update even in ml2/ovs environments and the issue is not related to OVN. ML2/OVS job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron/job/DFG-network-neutron-update-13_director-rhel-virthost-3cont_2comp_2net-ipv4-vlan-composable/ ML2/OVN job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-update-13_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/24/testReport/ There is a lot of DuplicateMessageError messages on controllers in /var/log/containers/nova/nova-scheduler.log. That's one of the main symptoms of bug 1592528 so i wonder if we're hitting that one. After restarting the rabbitmq-bundle the issue disappeared. looks like it's Duplicate bug 1592528 *** This bug has been marked as a duplicate of bug 1592528 *** |