Bug 1773642 - Openstack Director should actively prevent cloud-init from modifying network config on overcloud nodes after first boot.
Summary: Openstack Director should actively prevent cloud-init from modifying network ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: zstream
: 17.0
Assignee: Harald Jensås
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 1760806 2044544 (view as bug list)
Depends On:
Blocks: 2063235
TreeView+ depends on / blocked
 
Reported: 2019-11-18 15:42 UTC by Matt Flusche
Modified: 2024-10-01 16:23 UTC (History)
15 users (show)

Fixed In Version: tripleo-ansible-3.3.1-0.20220204021043.c195ba1.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2063235 (view as bug list)
Environment:
Last Closed: 2022-03-11 14:59:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1958332 0 None None None 2022-01-19 05:04:24 UTC
OpenStack gerrit 825278 0 None NEW Disable cloud-init netconfig post os-net-config 2022-01-19 09:28:38 UTC
Red Hat Issue Tracker OSP-2781 0 None None None 2022-01-18 21:34:34 UTC
Red Hat Knowledge Base (Solution) 4871261 0 None None None 2021-06-24 13:04:50 UTC

Description Matt Flusche 2019-11-18 15:42:10 UTC
Description of problem:
Due to issues described here:  https://bugzilla.redhat.com/show_bug.cgi?id=1760806

During the overcloud deployment, the following setting should be applied to prevent cloud-init from re-writing network config data during any subsequent reboots of overcloud nodes.

  echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

Version-Release number of selected component (if applicable):
Current versions and consider back-porting to all supported versions.

How reproducible:
Unknown

Steps to Reproduce:
1.  see the following for additional discussion:
https://bugzilla.redhat.com/show_bug.cgi?id=1760806
https://bugzilla.redhat.com/show_bug.cgi?id=1761363#c2

Actual results:
In some situation cloud-init may re-write network interface config on active clouds during reboot.

Expected results:
cloud-init should not operate on the network config after initial boot.


Additional info:

Comment 1 Bob Fournier 2020-07-29 20:31:30 UTC
*** Bug 1760806 has been marked as a duplicate of this bug. ***

Comment 2 pweeks 2020-12-04 18:35:25 UTC
DF RFE scrub, moving over for consideration.

Comment 6 Kevin Carter 2021-07-14 12:25:29 UTC
I ran into this issue with the current head of TripleO. To correct this issue I had to add the following options to the baremetal configuration file,     `config_drive: > cloud_config: > network: > config: disabled`.


Example entry

- name: Compute
  count: 1
  defaults:
    profile: compute
    networks:
    - network: tenant
      subnet: tenant_subnet
    - network: storage
      subnet: storage_subnet
    - network: storage_mgmt
      subnet: storage_mgmt_subnet
    - network: internal_api
      subnet: internal_api_subnet
    - network: external
      subnet: external_subnet
    - network: management
      subnet: management_subnet
    config_drive:
      cloud_config:
        network:
          config: disabled
    network_config:
      template: /home/centos/dual-nic-multi-vlan.yaml.j2
      default_route_network:
      - external

Without the `config_drive` entries, cloud-init restarts network interfaces after OVS is up which causes the OVS integration device to go into a down state after cloud-init fails to rename the device. I think it's safe to say that we will require this entry for baremetal-provisioning via metalsmith, which may make this a documentation bug as I'm not sure we should enforce config drive behind the scenes.

Comment 14 Steve Baker 2022-01-18 21:32:50 UTC
We're thinking this should really be a bug for OSP-17, and the fix will either be to disable cloud-config network config during the overcloud image build, or in the provision command only when network_config is specified.

Comment 15 Harald Jensås 2022-01-19 01:34:46 UTC
So, I tried the workaround:

    config_drive:
      cloud_config:
        network:
          config: disabled

Interestingly, on boot cloud-init did configure networking:

[root@controller-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens3                     
# Created by cloud-init on instance boot automatically, do not edit.                                                                                         
#                                                                                                      
BOOTPROTO=dhcp                                                                               
DEVICE=ens3                                                                                          
DHCPV6C=yes                                                    
HWADDR=fa:16:3e:a6:86:55                                  
IPV6INIT=yes                                                                               
IPV6_AUTOCONF=no                                                   
IPV6_FORCE_ACCEPT_RA=yes                                 
MTU=1442                                                
ONBOOT=yes                                                                   
TYPE=Ethernet                                                   
USERCTL=no   

Looking at the user_data ``` "network": {"config": "disabled"} ``` is in there.

[root@controller-0 ~]# mount /dev/vda1 /mnt/
mount: /mnt: WARNING: device write-protected, mounted read-only.
[root@controller-0 ~]# cd /mnt/
[root@controller-0 mnt]# ls
openstack
[root@controller-0 mnt]# cd openstack/
[root@controller-0 openstack]# ls
2012-08-10  latest
[root@controller-0 openstack]# cd latest/
[root@controller-0 latest]# ls
meta_data.json  network_data.json  user_data
[root@controller-0 latest]# cat user_data 
#cloud-config
{"network": {"config": "disabled"}, "users": [{"name": "heat-admin", "groups": ["wheel"], "sudo": "ALL=(ALL) NOPASSWD:ALL", "ssh_authorized_keys": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDRnacvBJZ4QVAt9Y3+VOmM/K8/z4v+OJ6xOHcWOkVfDgnaXPTNBB3i9PyiBFvp3WPJuvAtvwhcA7eHsuDyqUHf6hrIm6SpLxBQHvYVQAJ18mb8E4uJcpd++DKuPDx/HlGCo8dRRDcYtFWF0//QPEgE7m7zKHvbWY+b1vSqO+LP7Q== root "}]}

Comment 16 Harald Jensås 2022-01-19 01:50:00 UTC
Following up on my previous comment, adding "network": {"config": "disabled"} is not listed in cloud-init documentation[1] as a valid method to disable network config.

[1] https://cloudinit.readthedocs.io/en/latest/topics/network-config.html#disabling-network-configuration

Comment 17 Harald Jensås 2022-01-19 04:12:12 UTC
Created attachment 1851781 [details]
Screenshot - Node deployed with 99-disable-network-config.cfg

File `99-disable-network-config.cfg` added in /etc/cloud/cloud.cfg.d on the overcloud-full image with content:

  network:
    config: disabled

This did disable network configuration in cloud-init.

But as can be seen in the screenshot it also result in no-network connectivity to the provisioned node.

Conclusion, we need cloud-init for initial configuration.
We should disable it after successfully applying network configuraiton.

Comment 18 Steve Baker 2022-01-25 20:45:20 UTC
*** Bug 2044544 has been marked as a duplicate of this bug. ***

Comment 21 Red Hat Bugzilla 2023-09-18 00:18:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.