Bug 1952448

Summary:	Switch from Managed to Disabled mode: no IP removed from configuration and no container metal3-static-ip-manager stopped
Product:	OpenShift Container Platform	Reporter:	Oleg Sher <osher>
Component:	Bare Metal Hardware Provisioning	Assignee:	sdasu
Bare Metal Hardware Provisioning sub component:	cluster-baremetal-operator	QA Contact:	Aleksandra Malykhin <amalykhi>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	amalykhi, aos-bugs, rbartal
Version:	4.8	Keywords:	Triaged
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:02:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Oleg Sher 2021-04-22 09:45:41 UTC

Version:

$ openshift-install version
12:25:44 workspace > oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-fc.0   True        False         42h     Cluster version is 4.8.0-fc.0

Platform:
libvirt

Please specify:
* IPI

What happened (Step to reproduction)?
1. Deploy cluster with Managed mode (in current case: 3 workers provisioned but 0 deployed), Disconnected mode
2. Switch Managed to Disable mode by:
2.0 get current configuration by oc get provisioning -o yaml >> original.yaml
2.1 create set_disabled_mode.yaml from original.yaml with following values:
name: provisioning-configuration
    resourceVersion: "392077"
    uid: 373cb2fd-af6c-4cd1-ba07-c72a0258a5a7
  spec:
    provisioningNetwork: Disabled
    provisioningOSDownloadURL: http://registry.ocp-edge-cluster-0.qe.lab.redhat.com:8080/images/rhcos-48.83.202103221318-0-openstack.x86_64.qcow2.gz?sha256=10f55ea6f71d4dc382183597f9360aad6c6551fcc94aa100bbdadaecfe888452
pay attention for removed lines:
provisioningDHCPRange: fd00:1101:0:1::a,fd00:1101:0:1:ffff:ffff:ffff:fffe
    provisioningIP: fd00:1101:0:1::3
    provisioningInterface: enp4s0
    provisioningNetwork: Managed
    provisioningNetworkCIDR: fd00:1101:0:1::/64

2.3 oc apply -f set_disabled_mode.yaml

What did you expect to happen?
1. the system should stop two containers: metal3-static-ip-manager and metal3-dnsmasq
2. remove provisioning network configuration

Actual Result:

metal3-static-ip-manager - just restarted and running and no configuration for provision network removed

Comment 1 sdasu 2021-05-03 19:15:34 UTC

When the Provisioning CR contains just the Provisioning Network and Provisioning OS Download URL, metal3 pod ends up with just 8 containers and that is expected behavior.

Spec:
  Provisioning Network:          Disabled
  Provisioning OS Download URL:  http://192.168.111.1/images/rhcos-48.83.202103221318-0-openstack.x86_64.qcow2.gz?sha25
6=323e7ba4ba3448e340946543c963823136e1367ed0b229d2a05e1cf537642bb8

[stack@localhost dev-scripts]$ oc get pods -n openshift-machine-api
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-68ff977bd5-q5k6l   2/2     Running   0          60m
cluster-baremetal-operator-846d767c44-lph69    2/2     Running   0          60m
machine-api-controllers-c6fb94c57-8lnlp        7/7     Running   1          54m
machine-api-operator-868d49f997-llzhc          2/2     Running   0          60m
metal3-5f476b595b-tj872                        8/8     Running   0          3m34s
metal3-image-cache-5xzjw                       1/1     Running   0          52m
metal3-image-cache-hlx8s                       1/1     Running   0          52m
metal3-image-cache-kslhq                       1/1     Running   0          52m

But, when Provisioning CR is edited to only change the Provisioning Network from Managed to Disabled (all other fields are left intact), then we see that 9 containers are active after the metal3 pod terminates and restarts. So, the conditions under which this error is seen is not listed accurately in the description.

Comment 6 Aleksandra Malykhin 2021-06-29 06:32:34 UTC

Verified on the OCP version Cluster version is 4.8.0-rc.1

1. Verify that there are 10/10 pods running
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-machine-api
...
metal3-64fdf54f4d-26tkn 10/10 Running 0 50m

2. Save the config file
[kni@provisionhost-0-0 ~]$ oc get provisioning -o yaml > new_disabled_mode.yaml

3. Remove the lines from the config file provisioningDHCPRange, provisioningIP, provisioningInterface, provisioningNetworkCIDR and change the provisioningNetwork type
The spec should be looks like:
spec:
provisioningNetwork: Disabled
provisioningOSDownloadURL: http://registry.ocp-edge-cluster-0.qe.lab.redhat.com:8080/images/rhcos-48.84.202106091622-0-openstack.x86_64.qcow2.gz?sha256=2efc7539f200ffea150272523a9526ba393a9a0b8312b40031b13bfdeda36fde

4. Apply the new config file
[kni@provisionhost-0-0 ~]$ oc apply -f set_disabled_mode.yaml
provisioning.metal3.io/provisioning-configuration configured

5. Check the pods status ( only 8/8 pods are running)
[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-machine-api
NAME READY STATUS RESTARTS AGE
...
metal3-76c6758645-5l5zc 8/8 Running 0 81s

6. Verify the config file
[kni@provisionhost-0-0 ~]$ oc get provisioning -o yaml
...
spec:
provisioningNetwork: Disabled
provisioningOSDownloadURL: http://registry.ocp-edge-cluster-0.qe.lab.redhat.com:8080/images/rhcos-48.84.202106091622-0-openstack.x86_64.qcow2.gz?sha256=2efc7539f200ffea150272523a9526ba393a9a0b8312b40031b13bfdeda36fde
...

Comment 8 errata-xmlrpc 2021-07-27 23:02:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438