Bug 1773642

Summary: Openstack Director should actively prevent cloud-init from modifying network config on overcloud nodes after first boot.
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: tripleo-ansibleAssignee: Harald Jensås <hjensas>
Status: CLOSED NEXTRELEASE QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: afariasa, akaris, alisci, apetrich, augol, bfournie, dhill, hbrock, hjensas, jparker, jslagle, kecarter, mburns, pweeks, sbaker
Target Milestone: zstreamKeywords: Triaged
Target Release: 17.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: tripleo-ansible-3.3.1-0.20220204021043.c195ba1.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2063235 (view as bug list) Environment:
Last Closed: 2022-03-11 14:59:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2063235    

Description Matt Flusche 2019-11-18 15:42:10 UTC
Description of problem:
Due to issues described here:  https://bugzilla.redhat.com/show_bug.cgi?id=1760806

During the overcloud deployment, the following setting should be applied to prevent cloud-init from re-writing network config data during any subsequent reboots of overcloud nodes.

  echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

Version-Release number of selected component (if applicable):
Current versions and consider back-porting to all supported versions.

How reproducible:
Unknown

Steps to Reproduce:
1.  see the following for additional discussion:
https://bugzilla.redhat.com/show_bug.cgi?id=1760806
https://bugzilla.redhat.com/show_bug.cgi?id=1761363#c2

Actual results:
In some situation cloud-init may re-write network interface config on active clouds during reboot.

Expected results:
cloud-init should not operate on the network config after initial boot.


Additional info:

Comment 1 Bob Fournier 2020-07-29 20:31:30 UTC
*** Bug 1760806 has been marked as a duplicate of this bug. ***

Comment 2 pweeks 2020-12-04 18:35:25 UTC
DF RFE scrub, moving over for consideration.

Comment 6 Kevin Carter 2021-07-14 12:25:29 UTC
I ran into this issue with the current head of TripleO. To correct this issue I had to add the following options to the baremetal configuration file,     `config_drive: > cloud_config: > network: > config: disabled`.


Example entry

- name: Compute
  count: 1
  defaults:
    profile: compute
    networks:
    - network: tenant
      subnet: tenant_subnet
    - network: storage
      subnet: storage_subnet
    - network: storage_mgmt
      subnet: storage_mgmt_subnet
    - network: internal_api
      subnet: internal_api_subnet
    - network: external
      subnet: external_subnet
    - network: management
      subnet: management_subnet
    config_drive:
      cloud_config:
        network:
          config: disabled
    network_config:
      template: /home/centos/dual-nic-multi-vlan.yaml.j2
      default_route_network:
      - external

Without the `config_drive` entries, cloud-init restarts network interfaces after OVS is up which causes the OVS integration device to go into a down state after cloud-init fails to rename the device. I think it's safe to say that we will require this entry for baremetal-provisioning via metalsmith, which may make this a documentation bug as I'm not sure we should enforce config drive behind the scenes.

Comment 14 Steve Baker 2022-01-18 21:32:50 UTC
We're thinking this should really be a bug for OSP-17, and the fix will either be to disable cloud-config network config during the overcloud image build, or in the provision command only when network_config is specified.

Comment 15 Harald Jensås 2022-01-19 01:34:46 UTC
So, I tried the workaround:

    config_drive:
      cloud_config:
        network:
          config: disabled

Interestingly, on boot cloud-init did configure networking:

[root@controller-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens3                     
# Created by cloud-init on instance boot automatically, do not edit.                                                                                         
#                                                                                                      
BOOTPROTO=dhcp                                                                               
DEVICE=ens3                                                                                          
DHCPV6C=yes                                                    
HWADDR=fa:16:3e:a6:86:55                                  
IPV6INIT=yes                                                                               
IPV6_AUTOCONF=no                                                   
IPV6_FORCE_ACCEPT_RA=yes                                 
MTU=1442                                                
ONBOOT=yes                                                                   
TYPE=Ethernet                                                   
USERCTL=no   

Looking at the user_data ``` "network": {"config": "disabled"} ``` is in there.

[root@controller-0 ~]# mount /dev/vda1 /mnt/
mount: /mnt: WARNING: device write-protected, mounted read-only.
[root@controller-0 ~]# cd /mnt/
[root@controller-0 mnt]# ls
openstack
[root@controller-0 mnt]# cd openstack/
[root@controller-0 openstack]# ls
2012-08-10  latest
[root@controller-0 openstack]# cd latest/
[root@controller-0 latest]# ls
meta_data.json  network_data.json  user_data
[root@controller-0 latest]# cat user_data 
#cloud-config
{"network": {"config": "disabled"}, "users": [{"name": "heat-admin", "groups": ["wheel"], "sudo": "ALL=(ALL) NOPASSWD:ALL", "ssh_authorized_keys": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDRnacvBJZ4QVAt9Y3+VOmM/K8/z4v+OJ6xOHcWOkVfDgnaXPTNBB3i9PyiBFvp3WPJuvAtvwhcA7eHsuDyqUHf6hrIm6SpLxBQHvYVQAJ18mb8E4uJcpd++DKuPDx/HlGCo8dRRDcYtFWF0//QPEgE7m7zKHvbWY+b1vSqO+LP7Q== root "}]}

Comment 16 Harald Jensås 2022-01-19 01:50:00 UTC
Following up on my previous comment, adding "network": {"config": "disabled"} is not listed in cloud-init documentation[1] as a valid method to disable network config.

[1] https://cloudinit.readthedocs.io/en/latest/topics/network-config.html#disabling-network-configuration

Comment 17 Harald Jensås 2022-01-19 04:12:12 UTC
Created attachment 1851781 [details]
Screenshot - Node deployed with 99-disable-network-config.cfg

File `99-disable-network-config.cfg` added in /etc/cloud/cloud.cfg.d on the overcloud-full image with content:

  network:
    config: disabled

This did disable network configuration in cloud-init.

But as can be seen in the screenshot it also result in no-network connectivity to the provisioned node.

Conclusion, we need cloud-init for initial configuration.
We should disable it after successfully applying network configuraiton.

Comment 18 Steve Baker 2022-01-25 20:45:20 UTC
*** Bug 2044544 has been marked as a duplicate of this bug. ***

Comment 21 Red Hat Bugzilla 2023-09-18 00:18:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days