Hide Forgot
openstack-nova: After rebooting compute: state of nova-compute service is down and virtlogd service is down. Environment: openstack-nova-common-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-migration-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-scheduler-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch python-novaclient-9.0.1-0.20170621200422.ddb386b.el7ost.noarch openstack-nova-novncproxy-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-api-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-compute-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-conductor-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-placement-api-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-nova-console-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch puppet-nova-11.2.0-0.20170708032209.c47cf92.el7ost.noarch python-nova-16.0.0-0.20170707160319.dcea3ff.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170710191337.el7ost.noarch instack-undercloud-7.1.1-0.20170710151630.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch libvirt-daemon-driver-storage-rbd-3.2.0-14.el7.x86_64 libvirt-daemon-driver-network-3.2.0-14.el7.x86_64 libvirt-daemon-config-nwfilter-3.2.0-14.el7.x86_64 libvirt-daemon-driver-qemu-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64 libvirt-daemon-driver-nwfilter-3.2.0-14.el7.x86_64 libvirt-daemon-driver-lxc-3.2.0-14.el7.x86_64 libvirt-libs-3.2.0-14.el7.x86_64 libvirt-daemon-driver-interface-3.2.0-14.el7.x86_64 libvirt-daemon-config-network-3.2.0-14.el7.x86_64 libvirt-client-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-core-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-logical-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7.x86_64 libvirt-daemon-kvm-3.2.0-14.el7.x86_64 libvirt-daemon-driver-nodedev-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-disk-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-3.2.0-14.el7.x86_64 libvirt-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.2.0-14.el7.x86_64 libvirt-python-3.2.0-3.el7.x86_64 libvirt-daemon-driver-storage-mpath-3.2.0-14.el7.x86_64 libvirt-daemon-3.2.0-14.el7.x86_64 libvirt-daemon-driver-secret-3.2.0-14.el7.x86_64 Steps to reproduce: 1) ON a deployed OSP12 setup reboot a compute node (or compute nodes). 2) Check the status of virtlogd service after reboot. Result: [root@overcloud-compute-1 ~]# systemctl status virtlogd ● virtlogd.service - Virtual machine log manager Loaded: loaded (/usr/lib/systemd/system/virtlogd.service; indirect; vendor preset: disabled) Active: inactive (dead) Docs: man:virtlogd(8) http://libvirt.org 3) Check the output from nova service-list (note the overcloud-compute-0.redhat.local as I manually enabled the service on overcloud-compute-1.redhat.local): (overcloud) [stack@undercloud-0 ~]$ nova service-list +----+------------------+-------------------------------------+----------+----------+-------+----------------------------+----------------------------------------------------------------------------------------------------------------------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+-------------------------------------+----------+----------+-------+----------------------------+----------------------------------------------------------------------------------------------------------------------------+ | 2 | nova-conductor | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T03:06:09.000000 | - | | 5 | nova-compute | overcloud-compute-1.redhat.local | nova | enabled | down | 2017-07-20T02:52:31.000000 | - | | 8 | nova-conductor | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T03:06:03.000000 | - | | 11 | nova-consoleauth | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T03:06:02.000000 | - | | 14 | nova-compute | overcloud-compute-0.redhat.local | nova | disabled | down | 2017-07-20T03:03:27.000000 | AUTO: Failed to connect to libvirt: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory | | 17 | nova-consoleauth | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T03:06:00.000000 | - | | 20 | nova-conductor | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T03:06:06.000000 | - | | 23 | nova-scheduler | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T03:06:04.000000 | - | | 26 | nova-consoleauth | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T03:06:03.000000 | - | | 29 | nova-scheduler | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T03:06:04.000000 | - | | 47 | nova-scheduler | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T03:06:08.000000 | - | +----+------------------+-------------------------------------+----------+----------+-------+----------------------------+----------------------------------------------------------------------------------------------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack compute service set overcloud-compute-1.redhat.local nova-compute --enable --up Failed to set service state to up Compute service nova-compute of host overcloud-compute-1.redhat.local failed to set. Expected result: The services should be UP. Note: The services were UP prior to reboot, and I was able to launch instances.
First issue is the network: /var/log/containers/nova/nova-compute.log: 2017-07-20 09:53:00.117 1 ERROR oslo.messaging._drivers.impl_rabbit [req-5c96f16d-fc18-464f-a1c0-0a0c3c8d472f - - - - -] [d9ac8582-017c-4b76-b407-5d2f85417c71] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 32 seconds. Client port: None: error: [Errno 113] EHOSTUNREACH [root@overcloud-compute-0 heat-admin]# ping overcloud-controller-1.internalapi.localdomain PING overcloud-controller-1.internalapi.localdomain (172.17.1.16) 56(84) bytes of data. From overcloud-compute-0.localdomain (172.17.1.22) icmp_seq=1 Destination Host Unreachable I've reboot both computes via virsh and the network looks ok now. However, the nova_libvirt container did not start on reboot: [root@overcloud-compute-0 heat-admin]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dec5c68d2f01 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-13.2 "kolla_start" 23 hours ago Exited (0) 17 minutes ago nova_libvirt f65b6b6cab99 192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 16 minutes neutron_ovs_agent c2ab7d9c1ecd 192.168.24.1:8787/rhosp12/openstack-ceilometer-compute-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 16 minutes ceilometer_agent_compute 77a74284e06d 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-13.2 "kolla_start" 24 hours ago Restarting (0) 2 minutes ago nova_compute INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/iscsi/.initiator_reset INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/iscsi/.initiator_reset to /etc/iscsi/.initiator_reset INFO:__main__:Deleting /etc/libvirt/qemu.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/qemu.conf to /etc/libvirt/qemu.conf INFO:__main__:Deleting /etc/nova/nova.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf INFO:__main__:Writing out command to execute INFO:__main__:Setting permission for /var/log/nova INFO:__main__:Setting permission for /var/log/nova/nova-compute.log INFO:__main__:Setting permission for /var/log/nova/privsep-helper.log Running command: '/usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf' 2017-07-20 03:05:56.879+0000: 3459: info : libvirt version: 3.5.0, package: 1.el7 (Unknown, 2017-06-20-18:23:45, rhel7-next) 2017-07-20 03:05:56.879+0000: 3459: info : hostname: overcloud-compute-0.redhat.local 2017-07-20 03:05:56.879+0000: 3459: error : virDriverLoadModuleFile:59 : failed to load module /usr/lib64/libvirt/storage-backend/libvirt_storage_backend_rbd.so /usr/lib64/libvirt/storage-backend/libvirt_storage_backend_rbd.so: undefined symbol: rbd_diff_iterate2 And on the other compute: [root@overcloud-compute-1 heat-admin]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d808239aad63 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-13.2 "kolla_start" 11 hours ago Exited (0) 20 minutes ago nova_libvirt a901e3b48ed1 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-13.2 "kolla_start" 11 hours ago Exited (137) 19 minutes ago nova_compute fd379bd9dd69 192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 19 minutes neutron_ovs_agent b48d4c83a007 192.168.24.1:8787/rhosp12/openstack-ceilometer-compute-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 19 minutes ceilometer_agent_compute Forcing the containers to start resolved this: [root@overcloud-compute-1 heat-admin]# docker restart nova_libvirt nova_libvirt [root@overcloud-compute-1 heat-admin]# docker restart nova_compute nova_compute [root@overcloud-compute-1 heat-admin]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d808239aad63 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-13.2 "kolla_start" 11 hours ago Up 21 seconds nova_libvirt a901e3b48ed1 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-13.2 "kolla_start" 11 hours ago Up 7 seconds nova_compute fd379bd9dd69 192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 20 minutes neutron_ovs_agent b48d4c83a007 192.168.24.1:8787/rhosp12/openstack-ceilometer-compute-docker:2017-07-13.2 "kolla_start" 24 hours ago Up 20 minutes ceilometer_agent_compute (overcloud) [stack@undercloud-0 ~]$ nova service-list +----+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+ | 2 | nova-conductor | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T14:53:29.000000 | - | | 5 | nova-compute | overcloud-compute-1.redhat.local | nova | enabled | up | 2017-07-20T14:53:24.000000 | - | | 8 | nova-conductor | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T14:53:30.000000 | - | | 11 | nova-consoleauth | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T14:53:22.000000 | - | | 14 | nova-compute | overcloud-compute-0.redhat.local | nova | enabled | up | 2017-07-20T14:53:27.000000 | - | | 17 | nova-consoleauth | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T14:53:28.000000 | - | | 20 | nova-conductor | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T14:53:26.000000 | - | | 23 | nova-scheduler | overcloud-controller-0.redhat.local | internal | enabled | up | 2017-07-20T14:53:30.000000 | - | | 26 | nova-consoleauth | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T14:53:24.000000 | - | | 29 | nova-scheduler | overcloud-controller-2.redhat.local | internal | enabled | up | 2017-07-20T14:53:31.000000 | - | | 47 | nova-scheduler | overcloud-controller-1.redhat.local | internal | enabled | up | 2017-07-20T14:53:24.000000 | - | +----+------------------+-------------------------------------+----------+---------+-------+----------------------------+-----------------+ Dan - don't we override the default docker restart policy?
occasionally reproduce the networking issue occuring after reboot, mentioned in comment #2. The w/a for the networking issue - to run the following command on compute: sudo ovs-ofctl add-flow br-isolated priority=0,actions=normal 172.17.1.13 overcloud-controller-2.internalapi.localdomain overcloud-controller-2.internalapi [root@overcloud-compute-1 ~]# ping 172.17.1.13 PING 172.17.1.13 (172.17.1.13) 56(84) bytes of data. From 172.17.1.20 icmp_seq=1 Destination Host Unreachable From 172.17.1.20 icmp_seq=2 Destination Host Unreachable ^C --- 172.17.1.13 ping statistics --- 3 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1999ms pipe 2 [root@overcloud-compute-1 ~]# sudo ovs-ofctl add-flow br-isolated priority=0,actions=normal [root@overcloud-compute-1 ~]# ping 172.17.1.13 PING 172.17.1.13 (172.17.1.13) 56(84) bytes of data. 64 bytes from 172.17.1.13: icmp_seq=1 ttl=64 time=0.486 ms ^C --- 172.17.1.13 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.486/0.486/0.486/0.000 ms
The networking issue mentioned in comment #2 and in comment #4 is reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1473763
Created attachment 1302481 [details] neutron-openvswitch-agent log
Confirmed that restart policy is not applied: /var/lib/docker-container-startup-configs.json: "nova_libvirt": { "environment": [ "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" ], "image": "<snipped/>", "net": "host", "pid": "host", "privileged": true, "restart": "always", "volumes": [ "/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/puppet:/etc/puppet:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro", "/lib/modules:/lib/modules:ro", "/dev:/dev", "/run:/run", "/sys/fs/cgroup:/sys/fs/cgroup", "/var/lib/nova:/var/lib/nova", "/var/run/libvirt:/var/run/libvirt", "/var/lib/libvirt:/var/lib/libvirt", "/etc/libvirt/qemu:/etc/libvirt/qemu", "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro", "/var/log/containers/nova:/var/log/nova" ] } # docker inspect nova_libvirt | grep -i -A 3 restartpolicy "RestartPolicy": { "Name": "no", "MaximumRetryCount": 0 }, python-paunch-1.1.1-0.20170602025340.c8e22e5.el7ost.noarch docker-1.12.6-40.1.gitf55a118.el7.x86_64
owalsh: I don't think we are modifying the restart policy with paunch at this time. Could do it, but we aren't yet.
owalsh: sorry. we do support and use rely on docker restart:always for many services at this time. It seems like it should work for libvirt too
Here's what I got trying to reproduce: - Created instance on compute node. - Reboot compute node. - Tried to restart the instance. Error: 2017-08-16 18:49:40.895 5 ERROR oslo_messaging.rpc.server [req-2a744345-57a1-43a0-8072-2d339c5b93db 426d4632dec848148e0b13fde0344abf 253062ffd8ee41598ae1950eccdf925c - default default] Exception during message handling: libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory [heat-admin@overcloud-novacompute-0 nova]$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 02ec0f04329d 192.168.24.1:8787/tripleoupstream/centos-binary-neutron-openvswitch-agent:latest "kolla_start" 23 hours ago Up 22 minutes neutron_ovs_agent 34d4a1d6168d 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest "kolla_start" 23 hours ago Up 22 minutes nova_migration_target e3febb808428 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest "kolla_start" 23 hours ago Up 22 minutes nova_compute 1e8ec92afa50 192.168.24.1:8787/tripleoupstream/centos-binary-iscsid:latest "kolla_start" 23 hours ago Up 22 minutes iscsid 81ef28b5a7f2 192.168.24.1:8787/tripleoupstream/centos-binary-nova-libvirt:latest "kolla_start" 23 hours ago Up 22 minutes nova_libvirt (so everything appears to be up)
(In reply to Ian Main from comment #10) > Here's what I got trying to reproduce: > > - Created instance on compute node. > - Reboot compute node. > - Tried to restart the instance. > > Error: > > 2017-08-16 18:49:40.895 5 ERROR oslo_messaging.rpc.server > [req-2a744345-57a1-43a0-8072-2d339c5b93db 426d4632dec848148e0b13fde0344abf > 253062ffd8ee41598ae1950eccdf925c - default default] Exception during message > handling: libvirtError: Failed to connect socket to > '/var/run/libvirt/virtlogd-sock': No such file or directory > > [heat-admin@overcloud-novacompute-0 nova]$ sudo docker ps -a > CONTAINER ID IMAGE > COMMAND CREATED STATUS PORTS > NAMES > 02ec0f04329d > 192.168.24.1:8787/tripleoupstream/centos-binary-neutron-openvswitch-agent: > latest "kolla_start" 23 hours ago Up 22 minutes > neutron_ovs_agent > 34d4a1d6168d > 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest > "kolla_start" 23 hours ago Up 22 minutes > nova_migration_target > e3febb808428 > 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest > "kolla_start" 23 hours ago Up 22 minutes > nova_compute > 1e8ec92afa50 > 192.168.24.1:8787/tripleoupstream/centos-binary-iscsid:latest > "kolla_start" 23 hours ago Up 22 minutes > iscsid > 81ef28b5a7f2 > 192.168.24.1:8787/tripleoupstream/centos-binary-nova-libvirt:latest > "kolla_start" 23 hours ago Up 22 minutes > nova_libvirt > > (so everything appears to be up) I think this should be resolved by https://review.openstack.org/469116. The 'start virtlogd socket' host prep task would not be applied on a reboot. Now that virtlogd is containerized we no longer need to do this. However I expect the restart policy not being applied is still an issue.
owalsh: Correct this is upstream code. You are also correct that it appears to be fixed now. I attempted to reproduce with OVB/RDO cloud and it looks like everything works now (eg all containers restart properly). I'll do a final test starting a container after reboot soon.
And the restart policy is not being applied because infra-red is starting the nova-libvirt container, not paunch. https://review.gerrithub.io/373878 should resolve this.
Yeah I'm seeing the restart policy working fine upstream. All seems well actually.
Environment: openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch Still had to start virtlogd on compute post reboot, before was able to launch an instance.
Verified: Environment: openstack-tripleo-heat-templates-7.0.3-3.el7ost.noarch The reported issue doesn't reproduce.
Verified based on comment #21
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462