Hide Forgot
+++ This bug was initially created as a clone of Bug #2004451 +++ Description of problem: The filename displayed in the RecentBackup message is incorrect. The RecentBackup condition is telling "message": "UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541 on node \"yangyang0915-2-88b9l-master-0.c.openshift-qe.internal\"", but the file does not exist. # oc debug node/yangyang0915-2-88b9l-master-0.c.openshift-qe.internal Starting pod/yangyang0915-2-88b9l-master-0copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.2 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# ls /etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541 ls: cannot access '/etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541': No such file or directory sh-4.4# ls /etc/kubernetes/cluster-backup/ upgrade-backup-2021-09-15_100535 The exact file name is upgrade-backup-2021-09-15_100535 rather than upgrade-backup-2021-09-15_100541. Version-Release number of selected component (if applicable): How reproducible: 2/2 Steps to Reproduce: 1. Install a 4.8 cluster 2. Upgrade to 4.9 signed release 3. Check RecentBackup condition Actual results: The bakcup filename displayed in the RecentBackup condition does not exist Expected results: Display the exact backup filename in the condition message Additional info:
The issue continues to exist on upgrade from 4.8 to 4.9 Please find the below setps: Upgrade was triggered from 4.8 to 4.9 [skundu@skundu admin]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.5 True False 138m Cluster version is 4.9.5 [skundu@skundu admin]$ oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-10-28T08:57:38Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found 2021-10-28T07:29:10Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 4 EtcdMembersProgressing: No unstarted etcd members found 2021-10-28T05:59:41Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 4 EtcdMembersAvailable: 3 members are available 2021-10-28T05:57:58Z Upgradeable=True AsExpected: All is well 2021-10-28T07:23:11Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-10-28_072311 on node "yanpzhan28134355-gvsss-master-1.c.openshift-qe.internal" [skundu@skundu admin]$ oc debug node/yanpzhan28134355-gvsss-master-1.c.openshift-qe.internal Starting pod/yanpzhan28134355-gvsss-master-1copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.3 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# cd /etc/kubernetes/cluster-backup sh-4.4# ls -lrt upgrade-backup-2021-10-28_072311 ls: cannot access 'upgrade-backup-2021-10-28_072311': No such file or directory sh-4.4# ls upgrade-backup-2021-10-28_072306 sh-4.4# _____________________________________________________________________________________________________________________ As per the RecentBackup message, the file name is upgrade-backup-2021-10-28_072311 But the actual file name in the above path is upgrade-backup-2021-10-28_072306 The time stamp of the actual file name doesn't get updated.
4.9 is GA, so this bug should target 4.9.z.
As discussed offline the verification for the 4.9.z target should be tested with an upgrade from 4.9.z (with the fix [1]) to 4.10 since the fix is for the backup controller which runs pre-upgrade. [1]: https://github.com/openshift/cluster-etcd-operator/pull/683
@skundu Can you please reverify with an upgrade from 4.9->4.10? I was able to confirm by upgrading from 4.9.6 to 4.10.0-0.nightly-2021-11-04-001635 1. After initiating the upgrade we confirm the trigger from CVO on the ClusterVersion status i.e EtcdRecentBackup condition: ``` $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-11-05T01:00:32Z Available=True : Done applying 4.9.6 2021-11-05T04:09:02Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted": ... ``` 2. The etcd ClusterOperator status condition shows the backup dir name: ``` $ oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-11-05T04:09:05Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-11-05_040902 on node "ip-10-0-171-129.us-west-1.compute.internal" 2021-11-05T00:40:38Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found ... 2021-11-05T00:40:38Z Upgradeable=True AsExpected: All is well ``` 3. Confirm that the dir `upgrade-backup-2021-11-05_040902` is the same on disk: ``` $ oc debug node/ip-10-0-171-129.us-west-1.compute.internal Starting pod/ip-10-0-171-129us-west-1computeinternal-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 10.0.171.129 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# cd /etc/kubernetes/cluster-backup sh-4.4# ls upgrade-backup-2021-11-05_040902 ```
Hi @htariq , I performed the upgrade from 4.9.6 to 4.10. steps followed after initiating the upgrade. 1.$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.6 True True 111s Working towards 4.10.0-0.nightly-2021-11-15-034648: 94 of 756 done (12% complete) Unable to confirm the trigger from CVO on the ClusterVersion status i.e EtcdRecentBackup condition: 2.$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-11-16T11:11:36Z Available=True : Done applying 4.9.6 2021-11-16T11:11:36Z Failing=False : 2021-11-16T11:15:03Z Progressing=True : Working towards 4.10.0-0.nightly-2021-11-15-034648: 94 of 756 done (12% complete) after progressing for sometime, 3.$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-11-16T11:11:36Z Available=True : Done applying 4.9.6 2021-11-16T11:44:19Z Failing=False : 2021-11-16T11:15:03Z Progressing=True : Working towards 4.10.0-0.nightly-2021-11-15-034648: 634 of 756 done (83% complete) 2021-11-16T10:43:53Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.10.0-0.nightly-2021-11-15-034648 not found in the "stable-4.9" channel Unable to find the RecentBackup message 4.$ oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-11-16T10:49:09Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found 2021-11-16T11:20:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 4 EtcdMembersProgressing: No unstarted etcd members found 2021-11-16T10:50:53Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 4 EtcdMembersAvailable: 3 members are available 2021-11-16T10:49:10Z Upgradeable=True AsExpected: All is well 2021-11-16T10:49:10Z RecentBackup=Unknown ControllerStarted: 5. Upgrade successfully completed. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-11-15-034648 True False 5m11s Cluster version is 4.10.0-0.nightly-2021-11-15-034648 Not sure if I'm missing something.
@skundu Apologies for the delayed follow up. It's hard to tell what the difference might be between our test runs. I think it might be best if I could access your 4.9 cluster pre-upgrade so I can observe the pre-conditions for the backup trigger and figure out what might be missing.
*** Bug 2018306 has been marked as a duplicate of this bug. ***