2009890 – The backup filename displayed in the RecentBackup message is incorrect

Bug 2009890 - The backup filename displayed in the RecentBackup message is incorrect

Summary: The backup filename displayed in the RecentBackup message is incorrect

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.z
Assignee:	Haseeb Tariq
QA Contact:	Sandeep
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2018306 (view as bug list)
Depends On:	2004451
Blocks:	2018306
TreeView+	depends on / blocked

Reported:	2021-10-01 21:36 UTC by OpenShift BugZilla Robot
Modified:	2022-04-27 11:39 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-27 11:38:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 683	0	None	open	[release-4.9] Bug 2009890: pkg/operator/upgradebackupcontroller: fix backup dir name in status condition	2021-10-01 21:36:15 UTC

Description OpenShift BugZilla Robot 2021-10-01 21:36:04 UTC

+++ This bug was initially created as a clone of Bug #2004451 +++

Description of problem:

The filename displayed in the RecentBackup message is incorrect. The RecentBackup condition is telling "message": "UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541 on node \"yangyang0915-2-88b9l-master-0.c.openshift-qe.internal\"", but the file does not exist. 

# oc debug node/yangyang0915-2-88b9l-master-0.c.openshift-qe.internal
Starting pod/yangyang0915-2-88b9l-master-0copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.2
If you don't see a command prompt, try pressing enter.

sh-4.4# chroot /host
sh-4.4# ls /etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541
ls: cannot access '/etc/kubernetes/cluster-backup/upgrade-backup-2021-09-15_100541': No such file or directory
sh-4.4# ls /etc/kubernetes/cluster-backup/
upgrade-backup-2021-09-15_100535

The exact file name is upgrade-backup-2021-09-15_100535 rather than upgrade-backup-2021-09-15_100541.

Version-Release number of selected component (if applicable):


How reproducible:
2/2

Steps to Reproduce:
1. Install a 4.8 cluster
2. Upgrade to 4.9 signed release
3. Check RecentBackup condition

Actual results:
The bakcup filename displayed in the RecentBackup condition does not exist

Expected results:
Display the exact backup filename in the condition message

Additional info:

Comment 4 Sandeep 2021-10-28 11:34:02 UTC

The issue continues to exist on upgrade from 4.8 to 4.9


Please find the below setps:
Upgrade was triggered from 4.8 to 4.9



[skundu@skundu admin]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.5     True        False         138m    Cluster version is 4.9.5


[skundu@skundu admin]$  oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-10-28T08:57:38Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
EtcdMembersDegraded: No unhealthy members found
2021-10-28T07:29:10Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 4
EtcdMembersProgressing: No unstarted etcd members found
2021-10-28T05:59:41Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 4
EtcdMembersAvailable: 3 members are available
2021-10-28T05:57:58Z Upgradeable=True AsExpected: All is well
2021-10-28T07:23:11Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-10-28_072311 on node "yanpzhan28134355-gvsss-master-1.c.openshift-qe.internal"



[skundu@skundu admin]$ oc debug node/yanpzhan28134355-gvsss-master-1.c.openshift-qe.internal
Starting pod/yanpzhan28134355-gvsss-master-1copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cd /etc/kubernetes/cluster-backup
sh-4.4# ls -lrt upgrade-backup-2021-10-28_072311
ls: cannot access 'upgrade-backup-2021-10-28_072311': No such file or directory


sh-4.4# ls
upgrade-backup-2021-10-28_072306
sh-4.4#


_____________________________________________________________________________________________________________________
As per the RecentBackup message, the file name is upgrade-backup-2021-10-28_072311

But the actual file name in the above path is upgrade-backup-2021-10-28_072306


The time stamp of the actual file name doesn't get updated.

Comment 6 W. Trevor King 2021-11-02 22:01:59 UTC

4.9 is GA, so this bug should target 4.9.z.

Comment 7 Haseeb Tariq 2021-11-02 22:30:25 UTC

As discussed offline the verification for the 4.9.z target should be tested with an upgrade from 4.9.z (with the fix [1]) to 4.10 since the fix is for the backup controller which runs pre-upgrade.

[1]: https://github.com/openshift/cluster-etcd-operator/pull/683

Comment 8 Haseeb Tariq 2021-11-15 23:34:55 UTC

@skundu Can you please reverify with an upgrade from 4.9->4.10?

I was able to confirm by upgrading from 4.9.6 to 4.10.0-0.nightly-2021-11-04-001635

1. After initiating the upgrade we confirm the trigger from CVO on the ClusterVersion status i.e EtcdRecentBackup condition:
```
$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-11-05T01:00:32Z Available=True : Done applying 4.9.6
2021-11-05T04:09:02Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
...
```

2. The etcd ClusterOperator status condition shows the backup dir name:
```
$ oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-11-05T04:09:05Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2021-11-05_040902 on node "ip-10-0-171-129.us-west-1.compute.internal"
2021-11-05T00:40:38Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
EtcdMembersDegraded: No unhealthy members found
...
2021-11-05T00:40:38Z Upgradeable=True AsExpected: All is well
```

3. Confirm that the dir `upgrade-backup-2021-11-05_040902` is the same on disk:
```
$ oc debug node/ip-10-0-171-129.us-west-1.compute.internal
Starting pod/ip-10-0-171-129us-west-1computeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 10.0.171.129
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cd /etc/kubernetes/cluster-backup
sh-4.4# ls
upgrade-backup-2021-11-05_040902
```

Comment 10 Sandeep 2021-11-16 12:57:32 UTC

Hi @htariq ,

I performed the upgrade from 4.9.6 to 4.10.

steps followed after initiating the upgrade.


1.$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.6     True        True          111s    Working towards 4.10.0-0.nightly-2021-11-15-034648: 94 of 756 done (12% complete)


Unable to confirm the trigger from CVO on the ClusterVersion status i.e EtcdRecentBackup condition:


2.$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-11-16T11:11:36Z Available=True : Done applying 4.9.6
2021-11-16T11:11:36Z Failing=False : 
2021-11-16T11:15:03Z Progressing=True : Working towards 4.10.0-0.nightly-2021-11-15-034648: 94 of 756 done (12% complete)

after progressing for sometime, 

3.$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-11-16T11:11:36Z Available=True : Done applying 4.9.6
2021-11-16T11:44:19Z Failing=False : 
2021-11-16T11:15:03Z Progressing=True : Working towards 4.10.0-0.nightly-2021-11-15-034648: 634 of 756 done (83% complete)
2021-11-16T10:43:53Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.10.0-0.nightly-2021-11-15-034648 not found in the "stable-4.9" channel


Unable to find the RecentBackup message


4.$ oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2021-11-16T10:49:09Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
EtcdMembersDegraded: No unhealthy members found
2021-11-16T11:20:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 4
EtcdMembersProgressing: No unstarted etcd members found
2021-11-16T10:50:53Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 4
EtcdMembersAvailable: 3 members are available
2021-11-16T10:49:10Z Upgradeable=True AsExpected: All is well
2021-11-16T10:49:10Z RecentBackup=Unknown ControllerStarted: 



5. Upgrade successfully completed.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-15-034648   True        False         5m11s   Cluster version is 4.10.0-0.nightly-2021-11-15-034648



Not sure if I'm missing something.

Comment 13 Haseeb Tariq 2021-11-23 00:13:41 UTC

@skundu Apologies for the delayed follow up.
It's hard to tell what the difference might be between our test runs. I think it might be best if I could access your 4.9 cluster pre-upgrade so I can observe the pre-conditions for the backup trigger and figure out what might be missing.

Comment 16 Thomas Jungblut 2022-04-27 11:39:07 UTC

*** Bug 2018306 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.