Bug 1701866 - After reboot of undercloud node network doesn't come back automatically, need to restart it
Summary: After reboot of undercloud node network doesn't come back automatically, need...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 15.0 (Stein)
Assignee: Emilien Macchi
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks: 1779069
TreeView+ depends on / blocked
 
Reported: 2019-04-22 10:14 UTC by Sasha Smolyak
Modified: 2023-02-22 23:02 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-common-10.7.1-0.20190426083235.1988c18.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1779069 (view as bug list)
Environment:
Last Closed: 2019-09-21 11:21:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 655758 0 'None' MERGED tripleo-bootstrap: ensure network service is enabled & started 2020-11-10 20:16:54 UTC
OpenStack gerrit 656183 0 'None' MERGED tripleo-bootstrap: only enable network, not starting. 2020-11-10 20:16:35 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:21:48 UTC

Description Sasha Smolyak 2019-04-22 10:14:11 UTC
Description of problem:
After reboot of undercloud eth0, ovs-system and br-ctlplane remain DOWN till manual restart of network

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy
2. Reboot undercloud node
3. use ip a to check the networks

Actual results:
[stack@undercloud-0 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master ovs-system state DOWN group default qlen 1000
    link/ether 52:54:00:f5:a4:35 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:a1:b4:37 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.65/24 brd 172.16.0.255 scope global dynamic noprefixroute eth1
       valid_lft 3383sec preferred_lft 3383sec
    inet6 fe80::5054:ff:fea1:b437/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:fa:ea:6e brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.3/24 brd 10.0.0.255 scope global dynamic noprefixroute eth2
       valid_lft 3383sec preferred_lft 3383sec
    inet6 2620:52:0:13b8::fe:29/128 scope global dynamic noprefixroute 
       valid_lft 3384sec preferred_lft 3384sec
    inet6 fe80::5054:ff:fefa:ea6e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 66:8a:98:23:ad:1d brd ff:ff:ff:ff:ff:ff
6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:c1:6e:31:20:44 brd ff:ff:ff:ff:ff:ff
8: br-ctlplane: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:f5:a4:35 brd ff:ff:ff:ff:ff:ff



Expected results:
eth0 and br-ctlplane are UP

Additional info:
Workaround: restart network manually

roles_data_undercloud.yaml:
###############################################################################
# File generated by TripleO
###############################################################################
###############################################################################
# Role: Undercloud                                                            #
###############################################################################
- name: Undercloud
  description: |
    A role to deploy the undercloud via heat using the 'openstack undercloud
    deploy' command.
  CountDefault: 1
  disable_constraints: True
  tags:
    - primary
    - controller
  networks:
    External:
      subnet: external_subnet
  ServicesDefault:
    - OS::TripleO::Services::Aide
    - OS::TripleO::Services::AodhApi
    - OS::TripleO::Services::AodhEvaluator
    - OS::TripleO::Services::AodhListener
    - OS::TripleO::Services::AodhNotifier
    - OS::TripleO::Services::Apache
    - OS::TripleO::Services::BarbicanApi
    - OS::TripleO::Services::BarbicanBackendDogtag
    - OS::TripleO::Services::BarbicanBackendKmip
    - OS::TripleO::Services::BarbicanBackendPkcs11Crypto
    - OS::TripleO::Services::BarbicanBackendSimpleCrypto
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CeilometerAgentCentral
    - OS::TripleO::Services::CeilometerAgentIpmi
    - OS::TripleO::Services::CeilometerAgentNotification
    - OS::TripleO::Services::CertmongerUser
    - OS::TripleO::Services::CinderApi
    - OS::TripleO::Services::CinderScheduler
    - OS::TripleO::Services::CinderVolume
    - OS::TripleO::Services::ContainerImagePrepare
    - OS::TripleO::Services::ContainersLogrotateCrond
    - OS::TripleO::Services::DockerRegistry
    - OS::TripleO::Services::GlanceApi
    - OS::TripleO::Services::GnocchiApi
    - OS::TripleO::Services::GnocchiMetricd
    - OS::TripleO::Services::GnocchiStatsd
    - OS::TripleO::Services::HAproxy
    - OS::TripleO::Services::HeatApi
    - OS::TripleO::Services::HeatApiCfn
    - OS::TripleO::Services::HeatEngine
    - OS::TripleO::Services::IronicApi
    - OS::TripleO::Services::IronicConductor
    - OS::TripleO::Services::IronicInspector
    - OS::TripleO::Services::IronicNeutronAgent
    - OS::TripleO::Services::IronicPxe
    - OS::TripleO::Services::Iscsid
    - OS::TripleO::Services::Keepalived
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::Keystone
    - OS::TripleO::Services::LoginDefs
    - OS::TripleO::Services::MasqueradeNetworks
    - OS::TripleO::Services::Memcached
    - OS::TripleO::Services::MistralApi
    - OS::TripleO::Services::MistralEngine
    - OS::TripleO::Services::MistralEventEngine
    - OS::TripleO::Services::MistralExecutor
    - OS::TripleO::Services::MySQL
    - OS::TripleO::Services::MySQLClient
    - OS::TripleO::Services::NeutronApi
    - OS::TripleO::Services::NeutronCorePlugin
    - OS::TripleO::Services::NeutronDhcpAgent
    - OS::TripleO::Services::NeutronL3Agent
    - OS::TripleO::Services::NeutronOvsAgent
    - OS::TripleO::Services::NovaApi
    - OS::TripleO::Services::NovaConductor
    - OS::TripleO::Services::NovaIronic
    - OS::TripleO::Services::NovaMetadata
    - OS::TripleO::Services::NovaPlacement
    - OS::TripleO::Services::NovaScheduler
    - OS::TripleO::Services::Novajoin
    - OS::TripleO::Services::OpenStackClients
    - OS::TripleO::Services::OsloMessagingNotify
    - OS::TripleO::Services::OsloMessagingRpc
    - OS::TripleO::Services::PankoApi
    - OS::TripleO::Services::Podman
    - OS::TripleO::Services::Redis
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::SELinux
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::SwiftProxy
    - OS::TripleO::Services::SwiftRingBuilder
    - OS::TripleO::Services::SwiftStorage
    - OS::TripleO::Services::Tempest
    - OS::TripleO::Services::Timesync
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::Tmpwatch
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoUI
    - OS::TripleO::Services::Tuned
    - OS::TripleO::Services::UndercloudUpgrade
    - OS::TripleO::Services::TripleoValidations
    - OS::TripleO::Services::Zaqar

Comment 1 Bob Fournier 2019-04-24 23:35:29 UTC
This is the same issue being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1702685 - the network service is down on RHEL 8 after a reboot.  Making this a duplicate to have one place to track it.

*** This bug has been marked as a duplicate of bug 1702685 ***

Comment 2 Bob Fournier 2019-04-25 00:40:07 UTC
We'll probably have a separate fix for this, so removing it as a duplicate.

Comment 4 Emilien Macchi 2019-04-25 17:50:24 UTC
It seems like I managed to workaround it by restarting network service... I suspect that openvswitch started after the network service...

Comment 5 Emilien Macchi 2019-04-25 18:05:51 UTC
cat /usr/lib/systemd/system/openvswitch.service

[Unit]
Description=Open vSwitch
Before=network.target network.service
After=network-pre.target ovsdb-server.service ovs-vswitchd.service
PartOf=network.target
Requires=ovsdb-server.service
Requires=ovs-vswitchd.service

[Service]
Type=oneshot
ExecStart=/bin/true
ExecReload=/usr/share/openvswitch/scripts/ovs-systemd-reload
ExecStop=/bin/true
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

OVS is well configured to start *before* the network service.
But the network service isn't configured to start at boot.
The bug was fixed on the overcloud with https://review.opendev.org/#/q/topic:bug/1823353+(status:open+OR+status:merged but I've not done it for the Undercloud yet.

So I went ahead and configured it:
# systemctl enable network
# reboot

Then the node became completely unreachable:
http://paste.openstack.org/show/749771/

Still digging...

Comment 6 Emilien Macchi 2019-04-25 18:09:22 UTC
Ok so it was maybe in my env but I virsh reset the undercloud and I can ssh and ping 192.168.24.1 and some bridges are online:

[stack@undercloud ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 24:42:53:21:52:15 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.38/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
       valid_lft 3550sec preferred_lft 3550sec
    inet6 fe80::2642:53ff:fe21:5215/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 24:42:53:21:52:16 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2642:53ff:fe21:5216/64 scope link 
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 42:b7:d7:e2:72:a3 brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3e:61:66:4a:ff:47 brd ff:ff:ff:ff:ff:ff
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 24:42:53:21:52:16 brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::2642:53ff:fe21:5216/64 scope link 
       valid_lft forever preferred_lft forever


Now trying to deploy an overcloud with it... Will report back.

Comment 7 Emilien Macchi 2019-04-25 18:23:29 UTC
I came to the conclusion that the network service needs to be enabled everywhere until we get os-net-config using NetworkManager, otherwise openvswitch-managed interface won't be started after a reboot.
My overcloud is still deploying now, let's see if it finishes correctly. I'll report back.

Comment 8 Emilien Macchi 2019-04-25 18:36:02 UTC
My overcloud is deploying fine... so bottom line to me is that network service needs to be enabled on the undercloud too, and that's all.

Comment 9 Bob Fournier 2019-05-07 18:36:34 UTC
I think the FixedInVersion is incorrect as openstack-tripleo-common-10.7.1-0.20190426083235.1988c18.el8ost was built on 4/26 but the fix we really need for this issue (https://review.opendev.org/#/c/656183/) merged on 4/29.

Comment 11 Sasha Smolyak 2019-05-12 12:51:07 UTC
Tested and verified. 

Before reboot 
>ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:1c:65:e5 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe1c:65e5/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:e4:71:4d brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.14/24 brd 172.16.0.255 scope global dynamic noprefixroute eth1
       valid_lft 2504sec preferred_lft 2504sec
    inet6 fe80::5054:ff:fee4:714d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:33:39:70 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.44/24 brd 10.0.0.255 scope global dynamic noprefixroute eth2
       valid_lft 2592sec preferred_lft 2592sec
    inet6 2620:52:0:13b8::fe:29/128 scope global dynamic noprefixroute 
       valid_lft 3029sec preferred_lft 3029sec
    inet6 fe80::5054:ff:fe33:3970/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 1a:fa:f0:c0:c1:f9 brd ff:ff:ff:ff:ff:ff
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:1c:65:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1c:65e5/64 scope link 
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0a:2d:2b:13:83:49 brd ff:ff:ff:ff:ff:ff

after reboot:
>ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:1c:65:e5 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe1c:65e5/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:e4:71:4d brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.14/24 brd 172.16.0.255 scope global dynamic noprefixroute eth1
       valid_lft 3104sec preferred_lft 3104sec
    inet6 fe80::5054:ff:fee4:714d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:33:39:70 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.44/24 brd 10.0.0.255 scope global dynamic noprefixroute eth2
       valid_lft 3104sec preferred_lft 3104sec
    inet6 2620:52:0:13b8::fe:29/128 scope global dynamic noprefixroute 
       valid_lft 3106sec preferred_lft 3106sec
    inet6 fe80::5054:ff:fe33:3970/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b6:6b:a6:a4:e6:8f brd ff:ff:ff:ff:ff:ff
6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0a:2d:2b:13:83:49 brd ff:ff:ff:ff:ff:ff
8: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:1c:65:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.24.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1c:65e5/64 scope link 
       valid_lft forever preferred_lft forever

Comment 18 errata-xmlrpc 2019-09-21 11:21:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.