Bug 2053309
| Summary: | Unicast mode change upgrade check not working | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Nemec <bnemec> |
| Component: | Networking | Assignee: | Douglas Schilling Landgraf <dougsland> |
| Networking sub component: | runtime-cfg | QA Contact: | Victor Voronkov <vvoronko> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | adpawar, dougsland, jima, vvoronko |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-30 18:04:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben Nemec
2022-02-10 22:25:45 UTC
For testing, make sure the test machine contain RHEL8.5 or higher and go version. Here the steps how to create the environment. 1) Check system $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 (Ootpa) 2) Check Go Version $ go version go version go1.17.1 linux/amd64 3) Download the code for creating the scenario for testing $ mkdir -p ~/go/src/github.com/openshift $ cd ~/go/src/github.com/openshift $ git clone https://github.com/openshift/baremetal-runtimecfg.git Create a symbol link in the home $ cd ~ $ ln -s ~/go/src/github.com/openshift/baremetal-runtimecfg baremetal-runtimecfg 4) Download and do small changes into machine-config-operator $ git clone https://github.com/openshift/machine-config-operator . $ pushd . $ cd machine-config-operator Do the following changes to disable unicast by default: $ git diff diff --git a/manifests/on-prem/keepalived.yaml b/manifests/on-prem/keepalived.yaml index 4d9dab8a..6c3871bc 100644 --- a/manifests/on-prem/keepalived.yaml +++ b/manifests/on-prem/keepalived.yaml @@ -107,7 +107,7 @@ spec: image: {{ .Images.BaremetalRuntimeCfgBootstrap }} env: - name: ENABLE_UNICAST - value: "yes" + value: "no" - name: IS_BOOTSTRAP value: "yes" command: diff --git a/templates/common/on-prem/files/keepalived.yaml b/templates/common/on-prem/files/keepalived.yaml index c7166388..61ff0b75 100644 --- a/templates/common/on-prem/files/keepalived.yaml +++ b/templates/common/on-prem/files/keepalived.yaml @@ -153,7 +153,7 @@ contents: image: {{ .Images.baremetalRuntimeCfgImage }} env: - name: ENABLE_UNICAST - value: "yes" + value: "no" - name: IS_BOOTSTRAP value: "no" $ popd Create a symbol link in the home $ cd ~ $ ln -s ~/go/src/github.com/openshift/machine-config-operator machine-config-operator 5) Download devscript to create OpenShift Baremetal Environment NOTE: In the machine I am working, I had to move devscript to /home/ as I needed more space: $ cd /home $ ls git/ drwxr-xr-x. 5 douglas douglas 75 Apr 14 15:07 git $ cd git/ $ git clone https://github.com/openshift-metal3/dev-scripts.git Create a symbol link in the home $ cd ~ $ ln -s /home/git/dev-scripts/ devscript $ cd devscript Set environment: ======================= export WORKING_DIR=/home/git/wrk-dir-devscripts/ export IP_STACK=v4 export KUBECONFIG=/home/git/dev-scripts/ocp/ostest/auth/kubeconfig export EXTRA_NETWORK_NAMES="nmstate1 nmstate2" export NMSTATE1_NETWORK_SUBNET_V4='192.168.221.0/24' export NMSTATE1_NETWORK_SUBNET_V6='fd2e:6f44:5dd8:ca56::/120' export NMSTATE2_NETWORK_SUBNET_V4='192.168.222.0/24' export NMSTATE2_NETWORK_SUBNET_V6='fd2e:6f44:5dd8:cc56::/120' # Use the symbol link create in HOME dir export MACHINE_CONFIG_OPERATOR_LOCAL_IMAGE=://machine-config-operator export BAREMETAL_RUNTIMECFG_LOCAL_IMAGE=://baremetal-runtimecfg Build (takes approx 1h or more) ================================== devscript> make all After the installation is done, create a manifest.yaml for applying the change:
manifest.yaml
===================================================
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 10-keepalived-override
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,[base64 version of keepalived.yaml modified to set ENABLE_UNICAST to "yes"]
mode: 0644
overwrite: true
path: /etc/kubernetes/manifests/keepalived.yaml
- contents:
source: data:text/plain;charset=utf-8;base64,bW9kZTogdW5pY2FzdA==
mode: 0644
overwrite: true
path: /etc/keepalived/monitor.conf
1) Apply the manifest to enable Unicast.
$ oc apply -f manifest.yaml
2) Keep watching the logs in the worker and master
$ oc logs -f -n openshift-kni-infra keepalived-worker-1 keepalived-monitor
$ oc logs -f -n openshift-kni-infra keepalived-master-1 keepalived-monitor
Look for "Update Mode" message, example:
time="2022-03-30T18:54:37Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2022-03-30 18:55:00 +0000 UTC" newConfig.EnableUnicast=false
time="2022-03-30T18:54:37Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA 192.168.111.4 93 A AAAA 32 0 []} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.20 master-0 enp2s0 [192.168.111.1 fe80::5054:ff:fe70:45c9%enp3s0] {[192.168.111.20 192.168.111.21 192.168.111.22]} true}"
time="2022-03-30T18:54:37Z" level=info msg="global_defs {"
If there is no "Failing", "Error" message we are all set, bug verified.
Feel free to reach out if any questions.
Hi Victor, To verify the bug, please use the steps above as soon the https://github.com/openshift/baremetal-runtimecfg/pull/173 land/merge upstream. Feel free to reach me. Thanks Douglas Hello Team, I have a customer who faced an issue 4.10 to 4.11 upgrade. The ingress VIP was active on more than one node at the same time causing the upgrade failure. After some digging we found out that, some nodes were configured to use Unicast for Keepalived and some were not, resulting in effectively a split-brain situation where there were 2 keepalived masters for the ingress VIP. keepalived shouldn’t switch to unicast until after the cluster upgrade is complete, but what we found was the was a period of around 2 hours during the upgrade that keepalived was in a split brain scenario. Not all nodes were upgraded to 4.11.25 before the switch to unicast was made. I was wondering if this is related to an existing bug or if I need to file a new bug for this issue. Aditya Pawar OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary |