Bug 1702685

Summary: Network doesn't come up at boot time after reboot on overcloud nodes
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-commonAssignee: Adriano Petrich <apetrich>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: urgent Docs Contact:
Priority: high    
Version: 15.0 (Stein)CC: apetrich, atonner, bfournie, dbecker, dsneddon, emacchi, hjensas, mburns, morazi, racedoro, sasha, sclewis, slinaber, ssmolyak
Target Milestone: betaKeywords: AutomationBlocker, Regression, Triaged
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-10.7.1-0.20190522180807.438b9fb.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:21:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
overcloud node after reboot console none

Description Marius Cornea 2019-04-24 13:23:39 UTC
Created attachment 1558201 [details]
overcloud node after reboot console

Description of problem:

Network doesn't come up at boot time after reboot on overcloud nodes.

Version-Release number of selected component (if applicable):
15  -p RHOS_TRUNK-15.0-RHEL-8-20190423.n.1


How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP15 overcloud
2. SSH to one of the overcloud nodes and run reboot

Actual results:
The node isn't accessible via SSH after reboot because the network service is down.

Expected results:
The node is accessible via SSH after reboot.

Additional info:
Attaching console screenshot.

Comment 1 Bob Fournier 2019-04-24 23:32:56 UTC
This is a an issue with the network interfaces not being restarted, see also https://bugzilla.redhat.com/show_bug.cgi?id=1667265, which was opened against Fedora 29 but exhibits the same status of network service after reboot as the screen shot:

$ systemctl status network.service
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; generated)
   Active: inactive (dead)
     Docs: man:systemd-sysv-generator(8)

I'm not sure what we can do about this in OSP, its a RHEL 8 issue.

Comment 2 Bob Fournier 2019-04-24 23:35:29 UTC
*** Bug 1701866 has been marked as a duplicate of this bug. ***

Comment 3 Marius Cornea 2019-04-24 23:54:05 UTC
C(In reply to Bob Fournier from comment #1)
> This is a an issue with the network interfaces not being restarted, see also
> https://bugzilla.redhat.com/show_bug.cgi?id=1667265, which was opened
> against Fedora 29 but exhibits the same status of network service after
> reboot as the screen shot:
> 
> $ systemctl status network.service
> ● network.service - LSB: Bring up/down networking
>    Loaded: loaded (/etc/rc.d/init.d/network; generated)
>    Active: inactive (dead)
>      Docs: man:systemd-sysv-generator(8)
> 
> I'm not sure what we can do about this in OSP, its a RHEL 8 issue.

Can we perhaps enable the network service from OSP side?

Comment 4 Emilien Macchi 2019-04-25 00:34:09 UTC
I thought I fixed that with https://review.opendev.org/#/q/topic:bug/1823353+(status:open+OR+status:merged)  -- I wonder if the image change was taken in account when building the new images.

Comment 5 Emilien Macchi 2019-04-25 01:03:58 UTC
also note for myself, I missed to fix the undercloud as well. I'll send a patch.

Comment 6 Emilien Macchi 2019-04-25 17:46:53 UTC
I wasn't able to reproduce on both the undercloud & overcloud. However I'm hitting https://bugzilla.redhat.com/show_bug.cgi?id=1701866.

Marius, can you try again and show me a reproducer ?

Comment 7 Marius Cornea 2019-04-25 23:18:33 UTC
(In reply to Emilien Macchi from comment #6)
> I wasn't able to reproduce on both the undercloud & overcloud. However I'm
> hitting https://bugzilla.redhat.com/show_bug.cgi?id=1701866.
> 
> Marius, can you try again and show me a reproducer ?

I've got a reproducer:

[root@controller-0 heat-admin]# systemctl status network
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; generated)
   Active: inactive (dead)
     Docs: man:systemd-sysv-generator(8)

openstack-tripleo-puppet-elements-10.3.1-0.20190420090433.9ba1438.el8ost.noarch

The patch is present:
[root@undercloud-0 stack]# cat /usr/share/tripleo-puppet-elements/overcloud-base/post-install.d/51-enable-network-service
#!/bin/bash

set -eux
set -o pipefail

# https://launchpad.net/bugs/1823353
systemctl enable network
systemctl start network


## Images version

rhosp-director-images-x86_64-15.0-20190423.1.el8ost.noarch
rhosp-director-images-15.0-20190423.1.el8ost.noarch
rhosp-director-images-ipa-x86_64-15.0-20190423.1.el8ost.noarch

Comment 8 Emilien Macchi 2019-04-25 23:54:58 UTC
I could reboot my overcloud node today without any workaround... I'm a bit confused why it fails for me. You confirm the reboot doesn't work right? If yes, can you try to reboot after running a "systemctl enable network" and report back.
Thanks

Comment 9 Marius Cornea 2019-04-26 00:14:41 UTC
(In reply to Emilien Macchi from comment #8)
> I could reboot my overcloud node today without any workaround... I'm a bit
> confused why it fails for me. You confirm the reboot doesn't work right? If
> yes, can you try to reboot after running a "systemctl enable network" and
> report back.
> Thanks

Yes, after rebooting one of the controller nodes it's not reachable over the network.

I can confirm that after manually systemctl enable network and rebooting the nodes it is reachable at boot time.

Comment 10 Emilien Macchi 2019-04-26 00:22:15 UTC
https://review.opendev.org/#/c/655758/ will fix the issue

Comment 11 Marius Cornea 2019-04-26 00:28:15 UTC
(In reply to Emilien Macchi from comment #10)
> https://review.opendev.org/#/c/655758/ will fix the issue

How do I test it? Do I need it on undercloud only or in mistral executor container as well?

Comment 21 Bob Fournier 2019-05-28 14:58:01 UTC
Fix is in FIV but bug didn't get updated so updating now.

Comment 22 Alistair Tonner 2019-05-29 12:44:03 UTC
undercloud) [stack@undercloud-0 ~]$ dnf list installed openstack-tripleo-common
Installed Packages
openstack-tripleo-common.noarch   10.7.1-0.20190525000410.71c099f.el8ost        @rhelosp-15.0-trunk

(undercloud) [stack@undercloud-0 ~]$ . ./stackrc
(undercloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+--------------+--------+------------------------+----------------+------------+
| ID                                   | Name         | Status | Networks               | Image          | Flavor     |
+--------------------------------------+--------------+--------+------------------------+----------------+------------+
| 307f181e-842b-4328-b95e-4e64ef5f43de | ceph-2       | ACTIVE | ctlplane=192.168.24.8  | overcloud-full | ceph       |
| a6cfdcea-c98a-429c-aa2b-59eb969b7164 | compute-1    | ACTIVE | ctlplane=192.168.24.16 | overcloud-full | compute    |
| 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | ceph-0       | ACTIVE | ctlplane=192.168.24.6  | overcloud-full | ceph       |
| 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | controller-2 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | controller |
| 551ce9c2-e255-4e24-ad2e-e181873acaaa | controller-0 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | controller |
| aca677ee-a9a3-42fc-9715-a975ab74d447 | controller-1 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | controller |
| e9575990-0c91-430a-9b24-7380375361a4 | compute-0    | ACTIVE | ctlplane=192.168.24.10 | overcloud-full | compute    |
| 05c29a3e-2691-4bc7-8211-d872a80e3aee | ceph-1       | ACTIVE | ctlplane=192.168.24.23 | overcloud-full | ceph       |
+--------------------------------------+--------------+--------+------------------------+----------------+------------+
(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 6d5436e8-c1ce-44e8-bbef-3f50217bd7ea | ceph-0       | 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | power on    | active             | False       |
| 00984634-7eb8-4dd9-a524-6d6430be0d5a | ceph-1       | 307f181e-842b-4328-b95e-4e64ef5f43de | power on    | active             | False       |
| e3235602-a400-4e36-bd01-3f3e4d31acf3 | ceph-2       | 05c29a3e-2691-4bc7-8211-d872a80e3aee | power on    | active             | False       |
| 62291b36-fa7b-4367-8af1-e338588549cf | compute-0    | e9575990-0c91-430a-9b24-7380375361a4 | power on    | active             | False       |
| 633aa813-6e0f-488b-b92c-258137771434 | compute-1    | a6cfdcea-c98a-429c-aa2b-59eb969b7164 | power on    | active             | False       |
| 4ca41f32-2c0f-4c0c-aee7-87b3d2ddd7b3 | controller-0 | 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | power on    | active             | False       |
| b3d2c723-a9b3-478d-8dae-04efa42ce5da | controller-1 | 551ce9c2-e255-4e24-ad2e-e181873acaaa | power on    | active             | False       |
| d8e93647-66b9-4c61-acc8-82078fcd587f | controller-2 | aca677ee-a9a3-42fc-9715-a975ab74d447 | power on    | active             | False       |
| 7a658110-a136-4a34-a99e-9e2cc45b54cf | ironic-0     | None                                 | power off   | available          | False       |
| 6d1b2a74-cc7a-45c5-a011-26fc5486a2c1 | ironic-1     | None                                 | power off   | available          | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node reboot d8e93647-66b9-4c61-acc8-82078fcd587f
(undercloud) [stack@undercloud-0 ~]$ ping  192.168.24.15
PING 192.168.24.15 (192.168.24.15) 56(84) bytes of data.
From 192.168.24.1 icmp_seq=9 Destination Host Unreachable
From 192.168.24.1 icmp_seq=10 Destination Host Unreachable
From 192.168.24.1 icmp_seq=11 Destination Host Unreachable
From 192.168.24.1 icmp_seq=12 Destination Host Unreachable
From 192.168.24.1 icmp_seq=13 Destination Host Unreachable
From 192.168.24.1 icmp_seq=14 Destination Host Unreachable

 openstack baremetal node list
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 6d5436e8-c1ce-44e8-bbef-3f50217bd7ea | ceph-0       | 016b729d-6ed7-4363-aa8c-2b3965ff7a91 | power on    | active             | False       |
| 00984634-7eb8-4dd9-a524-6d6430be0d5a | ceph-1       | 307f181e-842b-4328-b95e-4e64ef5f43de | power on    | active             | False       |
| e3235602-a400-4e36-bd01-3f3e4d31acf3 | ceph-2       | 05c29a3e-2691-4bc7-8211-d872a80e3aee | power on    | active             | False       |
| 62291b36-fa7b-4367-8af1-e338588549cf | compute-0    | e9575990-0c91-430a-9b24-7380375361a4 | power on    | active             | False       |
| 633aa813-6e0f-488b-b92c-258137771434 | compute-1    | a6cfdcea-c98a-429c-aa2b-59eb969b7164 | power on    | active             | False       |
| 4ca41f32-2c0f-4c0c-aee7-87b3d2ddd7b3 | controller-0 | 2f3b9fa6-6049-4fb5-a05d-14d3ee6965ca | power on    | active             | False       |
| b3d2c723-a9b3-478d-8dae-04efa42ce5da | controller-1 | 551ce9c2-e255-4e24-ad2e-e181873acaaa | power on    | active             | False       |
| d8e93647-66b9-4c61-acc8-82078fcd587f | controller-2 | aca677ee-a9a3-42fc-9715-a975ab74d447 | power on    | active             | False       |
| 7a658110-a136-4a34-a99e-9e2cc45b54cf | ironic-0     | None                                 | power off   | available          | False       |
| 6d1b2a74-cc7a-45c5-a011-26fc5486a2c1 | ironic-1     | None                                 | power off   | available          | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
(undercloud) [stack@undercloud-0 ~]$ ping  192.168.24.15
PING 192.168.24.15 (192.168.24.15) 56(84) bytes of data.
64 bytes from 192.168.24.15: icmp_seq=1 ttl=64 time=1.21 ms
64 bytes from 192.168.24.15: icmp_seq=2 ttl=64 time=0.358 ms
64 bytes from 192.168.24.15: icmp_seq=3 ttl=64 time=0.308 ms
^C
--- 192.168.24.15 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 18ms
rtt min/avg/max/mdev = 0.308/0.626/1.213/0.415 ms
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.15
Warning: Permanently added '192.168.24.15' (ECDSA) to the list of known hosts.
[heat-admin@controller-1 ~]$ uptime
 12:38:56 up 1 min,  1 user,  load average: 25.95, 6.80, 2.30
[heat-admin@controller-1 ~]$ exit

   successfully rebooted overcloud controller and found it accessible after the reboot:

Comment 26 errata-xmlrpc 2019-09-21 11:21:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811