Created attachment 1558201 [details] overcloud node after reboot console Description of problem: Network doesn't come up at boot time after reboot on overcloud nodes. Version-Release number of selected component (if applicable): 15 -p RHOS_TRUNK-15.0-RHEL-8-20190423.n.1 How reproducible: 100% Steps to Reproduce: 1. Deploy OSP15 overcloud 2. SSH to one of the overcloud nodes and run reboot Actual results: The node isn't accessible via SSH after reboot because the network service is down. Expected results: The node is accessible via SSH after reboot. Additional info: Attaching console screenshot.
This is a an issue with the network interfaces not being restarted, see also https://bugzilla.redhat.com/show_bug.cgi?id=1667265, which was opened against Fedora 29 but exhibits the same status of network service after reboot as the screen shot: $ systemctl status network.service ● network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network; generated) Active: inactive (dead) Docs: man:systemd-sysv-generator(8) I'm not sure what we can do about this in OSP, its a RHEL 8 issue.
*** Bug 1701866 has been marked as a duplicate of this bug. ***
C(In reply to Bob Fournier from comment #1) > This is a an issue with the network interfaces not being restarted, see also > https://bugzilla.redhat.com/show_bug.cgi?id=1667265, which was opened > against Fedora 29 but exhibits the same status of network service after > reboot as the screen shot: > > $ systemctl status network.service > ● network.service - LSB: Bring up/down networking > Loaded: loaded (/etc/rc.d/init.d/network; generated) > Active: inactive (dead) > Docs: man:systemd-sysv-generator(8) > > I'm not sure what we can do about this in OSP, its a RHEL 8 issue. Can we perhaps enable the network service from OSP side?
I thought I fixed that with https://review.opendev.org/#/q/topic:bug/1823353+(status:open+OR+status:merged) -- I wonder if the image change was taken in account when building the new images.
also note for myself, I missed to fix the undercloud as well. I'll send a patch.
I wasn't able to reproduce on both the undercloud & overcloud. However I'm hitting https://bugzilla.redhat.com/show_bug.cgi?id=1701866. Marius, can you try again and show me a reproducer ?
(In reply to Emilien Macchi from comment #6) > I wasn't able to reproduce on both the undercloud & overcloud. However I'm > hitting https://bugzilla.redhat.com/show_bug.cgi?id=1701866. > > Marius, can you try again and show me a reproducer ? I've got a reproducer: [root@controller-0 heat-admin]# systemctl status network ● network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network; generated) Active: inactive (dead) Docs: man:systemd-sysv-generator(8) openstack-tripleo-puppet-elements-10.3.1-0.20190420090433.9ba1438.el8ost.noarch The patch is present: [root@undercloud-0 stack]# cat /usr/share/tripleo-puppet-elements/overcloud-base/post-install.d/51-enable-network-service #!/bin/bash set -eux set -o pipefail # https://launchpad.net/bugs/1823353 systemctl enable network systemctl start network ## Images version rhosp-director-images-x86_64-15.0-20190423.1.el8ost.noarch rhosp-director-images-15.0-20190423.1.el8ost.noarch rhosp-director-images-ipa-x86_64-15.0-20190423.1.el8ost.noarch
I could reboot my overcloud node today without any workaround... I'm a bit confused why it fails for me. You confirm the reboot doesn't work right? If yes, can you try to reboot after running a "systemctl enable network" and report back. Thanks
(In reply to Emilien Macchi from comment #8) > I could reboot my overcloud node today without any workaround... I'm a bit > confused why it fails for me. You confirm the reboot doesn't work right? If > yes, can you try to reboot after running a "systemctl enable network" and > report back. > Thanks Yes, after rebooting one of the controller nodes it's not reachable over the network. I can confirm that after manually systemctl enable network and rebooting the nodes it is reachable at boot time.
https://review.opendev.org/#/c/655758/ will fix the issue
(In reply to Emilien Macchi from comment #10) > https://review.opendev.org/#/c/655758/ will fix the issue How do I test it? Do I need it on undercloud only or in mistral executor container as well?
Fix is in FIV but bug didn't get updated so updating now.
undercloud) [stack@undercloud-0 ~]$ dnf list installed openstack-tripleo-common Installed Packages openstack-tripleo-common.noarch 10.7.1-0.20190525000410.71c099f.el8ost @rhelosp-15.0-trunk (undercloud) [stack@undercloud-0 ~]$ . ./stackrc (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 307f181e-842b-4328-b95e-4e64ef5f43de | ceph-2 | ACTIVE | ctlplane=192.168.24.8 | overcloud-full | ceph | | a6cfdcea-c98a-429c-aa2b-59eb969b7164 | compute-1 | ACTIVE | ctlplane=192.168.24.16 | overcloud-full | compute | | 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | ceph-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | ceph | | 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | controller-2 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | controller | | 551ce9c2-e255-4e24-ad2e-e181873acaaa | controller-0 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | controller | | aca677ee-a9a3-42fc-9715-a975ab74d447 | controller-1 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | controller | | e9575990-0c91-430a-9b24-7380375361a4 | compute-0 | ACTIVE | ctlplane=192.168.24.10 | overcloud-full | compute | | 05c29a3e-2691-4bc7-8211-d872a80e3aee | ceph-1 | ACTIVE | ctlplane=192.168.24.23 | overcloud-full | ceph | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 6d5436e8-c1ce-44e8-bbef-3f50217bd7ea | ceph-0 | 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | power on | active | False | | 00984634-7eb8-4dd9-a524-6d6430be0d5a | ceph-1 | 307f181e-842b-4328-b95e-4e64ef5f43de | power on | active | False | | e3235602-a400-4e36-bd01-3f3e4d31acf3 | ceph-2 | 05c29a3e-2691-4bc7-8211-d872a80e3aee | power on | active | False | | 62291b36-fa7b-4367-8af1-e338588549cf | compute-0 | e9575990-0c91-430a-9b24-7380375361a4 | power on | active | False | | 633aa813-6e0f-488b-b92c-258137771434 | compute-1 | a6cfdcea-c98a-429c-aa2b-59eb969b7164 | power on | active | False | | 4ca41f32-2c0f-4c0c-aee7-87b3d2ddd7b3 | controller-0 | 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | power on | active | False | | b3d2c723-a9b3-478d-8dae-04efa42ce5da | controller-1 | 551ce9c2-e255-4e24-ad2e-e181873acaaa | power on | active | False | | d8e93647-66b9-4c61-acc8-82078fcd587f | controller-2 | aca677ee-a9a3-42fc-9715-a975ab74d447 | power on | active | False | | 7a658110-a136-4a34-a99e-9e2cc45b54cf | ironic-0 | None | power off | available | False | | 6d1b2a74-cc7a-45c5-a011-26fc5486a2c1 | ironic-1 | None | power off | available | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node reboot d8e93647-66b9-4c61-acc8-82078fcd587f (undercloud) [stack@undercloud-0 ~]$ ping 192.168.24.15 PING 192.168.24.15 (192.168.24.15) 56(84) bytes of data. From 192.168.24.1 icmp_seq=9 Destination Host Unreachable From 192.168.24.1 icmp_seq=10 Destination Host Unreachable From 192.168.24.1 icmp_seq=11 Destination Host Unreachable From 192.168.24.1 icmp_seq=12 Destination Host Unreachable From 192.168.24.1 icmp_seq=13 Destination Host Unreachable From 192.168.24.1 icmp_seq=14 Destination Host Unreachable openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 6d5436e8-c1ce-44e8-bbef-3f50217bd7ea | ceph-0 | 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | power on | active | False | | 00984634-7eb8-4dd9-a524-6d6430be0d5a | ceph-1 | 307f181e-842b-4328-b95e-4e64ef5f43de | power on | active | False | | e3235602-a400-4e36-bd01-3f3e4d31acf3 | ceph-2 | 05c29a3e-2691-4bc7-8211-d872a80e3aee | power on | active | False | | 62291b36-fa7b-4367-8af1-e338588549cf | compute-0 | e9575990-0c91-430a-9b24-7380375361a4 | power on | active | False | | 633aa813-6e0f-488b-b92c-258137771434 | compute-1 | a6cfdcea-c98a-429c-aa2b-59eb969b7164 | power on | active | False | | 4ca41f32-2c0f-4c0c-aee7-87b3d2ddd7b3 | controller-0 | 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | power on | active | False | | b3d2c723-a9b3-478d-8dae-04efa42ce5da | controller-1 | 551ce9c2-e255-4e24-ad2e-e181873acaaa | power on | active | False | | d8e93647-66b9-4c61-acc8-82078fcd587f | controller-2 | aca677ee-a9a3-42fc-9715-a975ab74d447 | power on | active | False | | 7a658110-a136-4a34-a99e-9e2cc45b54cf | ironic-0 | None | power off | available | False | | 6d1b2a74-cc7a-45c5-a011-26fc5486a2c1 | ironic-1 | None | power off | available | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ ping 192.168.24.15 PING 192.168.24.15 (192.168.24.15) 56(84) bytes of data. 64 bytes from 192.168.24.15: icmp_seq=1 ttl=64 time=1.21 ms 64 bytes from 192.168.24.15: icmp_seq=2 ttl=64 time=0.358 ms 64 bytes from 192.168.24.15: icmp_seq=3 ttl=64 time=0.308 ms ^C --- 192.168.24.15 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 18ms rtt min/avg/max/mdev = 0.308/0.626/1.213/0.415 ms (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.15 Warning: Permanently added '192.168.24.15' (ECDSA) to the list of known hosts. [heat-admin@controller-1 ~]$ uptime 12:38:56 up 1 min, 1 user, load average: 25.95, 6.80, 2.30 [heat-admin@controller-1 ~]$ exit successfully rebooted overcloud controller and found it accessible after the reboot:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811