2061444 – CVO reports "it may not be safe to apply this update" for all precondition check failures

Bug 2061444 - CVO reports "it may not be safe to apply this update" for all precondition check failures

Summary: CVO reports "it may not be safe to apply this update" for all precondition ch...

Keywords:
Status:	CLOSED DUPLICATE of bug 2065507
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Over the Air Updates
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-03-07 15:34 UTC by Mike Fiedler
Modified:	2022-05-06 12:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Release Note
Doc Text:	CVO reports "it may not be safe to apply this update" for all precondition check failures. The precondition checks are performed before actual update starts and the update will move ahead after the precondition tests pass.
Clone Of:
Clones:	2061496 (view as bug list)
Environment:
Last Closed:	2022-04-26 18:26:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs from the outgoing CVO pod (3.16 MB, text/plain) 2022-03-07 20:10 UTC, W. Trevor King	no flags	Details
View All

Description Mike Fiedler 2022-03-07 15:34:13 UTC

Description of problem:  Using channel candidate-4.10 and upgrading without force, CVO reports that the upgrade is unsafe for about 4 minutes before proceeding:

[root@ip-172-31-18-207 tmp]# oc adm upgrade
Cluster version is 4.9.23

Upstream is unset, so the cluster will use an appropriate default.
Channel: candidate-4.10 (available channels: candidate-4.10, candidate-4.9, fast-4.9)

Recommended updates:

  VERSION     IMAGE
  4.10.2      quay.io/openshift-release-dev/ocp-release@sha256:b807908afe1b7b25a8727f092cc11a104bf3996908a80f035d079b014b4b96b6
  4.10.1      quay.io/openshift-release-dev/ocp-release@sha256:95a31c7227c105311eb9f461972d5db90fa22270f017ac6eef4d1f95d3e1675c
  4.10.0      quay.io/openshift-release-dev/ocp-release@sha256:a63c470411c087c0568729fa5faf32cd8d5fa1db3c73e28f3989f17f5c458351
  4.10.0-rc.8 quay.io/openshift-release-dev/ocp-release@sha256:65e8dbc576556e0296e29ac1ae7496743e3494ad65a111e134c6e7202a498d11
  4.10.0-rc.7 quay.io/openshift-release-dev/ocp-release@sha256:a1e6c5ad9f3e7a6faa93508d7c1458f3fcd13869fdb1ac6e2dab7cc0f9269847
  4.10.0-rc.6 quay.io/openshift-release-dev/ocp-release@sha256:aed2a6eb5c99da354dfd4182f0fcdf7b8bc1c0c7db356644cf1fc669b1052012
  4.10.0-rc.5 quay.io/openshift-release-dev/ocp-release@sha256:5ebccbbcbeeb3987f9ee3fa7f593c5e24bdd89efd9c6b2e6d525346e4a235ced
[root@ip-172-31-18-207 tmp]# oc adm upgrade --to 4.10.2
Updating to 4.10.2
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          8s      Working towards 4.10.2: downloading update
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          15s     Working towards 4.10.2: downloading update
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          25s     Unable to apply 4.10.2: it may not be safe to apply this update
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          72s     Unable to apply 4.10.2: it may not be safe to apply this update
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          2m41s   Unable to apply 4.10.2: it may not be safe to apply this update
[root@ip-172-31-18-207 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        True          4m22s   Working towards 4.10.2: 94 of 770 done (12% complete)

While reporting that the upgrade is not safe oc get -o yaml clusterversion shows this status:

    conditions:                                                                                     
    - lastTransitionTime: "2022-03-07T14:38:31Z"                                                    
      message: Done applying 4.9.23                                                                 
      status: "True"                                                                                
      type: Available                                                                               
    - lastTransitionTime: "2022-03-07T15:20:31Z"                                                    
      message: 'Precondition "EtcdRecentBackup" failed because of "ControllerStarted": '
      reason: UpgradePreconditionCheckFailed                                                       
      status: "True"                                                                               
      type: Failing                                                                                 
    - lastTransitionTime: "2022-03-07T15:20:11Z"  
      message: 'Unable to apply 4.10.2: it may not be safe to apply this update'
      reason: UpgradePreconditionCheckFailed                                                       
      status: "True"                              
      type: Progressing                           
    - lastTransitionTime: "2022-03-07T15:19:37Z"    
      status: "True" 
      type: RetrievedUpdates                 


There is no end-user facing indication why the upgrade is not proceeding.


Version-Release number of the following components: 4.10.2


How reproducible: Always

Steps to Reproduce:
1. Install 4.9.23
2. Set the CVO channel to candidate-4.10
3. oc adm upgrade --to 4.10.2
4. oc get clusterversion several times

Comment 2 W. Trevor King 2022-03-07 16:51:26 UTC

(In reply to Mike Fiedler from comment #0)
> [root@ip-172-31-18-207 tmp]# oc get clusterversion
> NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
> version   4.9.23    True        True          25s     Unable to apply
> 4.10.2: it may not be safe to apply this update
> ...
>       message: 'Precondition "EtcdRecentBackup" failed because of
> "ControllerStarted": '
>       reason: UpgradePreconditionCheckFailed                                
>       status: "True"                                                        
> ...
> There is no end-user facing indication why the upgrade is not proceeding.

This is the CVO trying to "simplify" error messages by making assumptions based on the reason.  In this case, precondition failures [1], which were once mostly about verifying signatures on the target release, but which now include things like "has etcd completed a pre-minor-bump snapshot?" [2].  I think we should drop that attempt at simplification, and just pass through the underlying message.  But either way, this is cosmetic, and certainly not a blocker.

[1]: https://github.com/openshift/cluster-version-operator/blob/77a24f77b8f6e9975b5f41a787d6368ff609e391/pkg/payload/task.go#L242-L243
[2]: https://github.com/openshift/cluster-version-operator/pull/637

Comment 3 W. Trevor King 2022-03-07 19:03:06 UTC

Confirming the same UX on 4.8 -> 4.9 with a cluster-bot cluster:

  $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version
  4.8.33
  $ oc adm upgrade channel fast-4.9
  $ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.8-kube-1.22-api-removals-in-4.9":"true"}}' --type=merge

In one terminal, poll state:

  $ while sleep 5; do date --iso=s --utc; echo "ClusterVersion:"; oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'; echo; echo "etcd ClusterOperator"; oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'; echo; done

In another, trigger the update:

  $ oc adm upgrade --to 4.9.23

In the polling terminal, before the update, things looked like:

  2022-03-07T18:53:50+00:00                                                                                                                                                                                                                          
  ClusterVersion:                                                                                                                                                                                                                                    
  2022-03-07T18:48:14Z Available=True : Done applying 4.8.33                                                                                                                                                                                         
  2022-03-07T18:47:44Z Failing=False :                                                                                                                                                                                                               
  2022-03-07T18:53:49Z Progressing=True DownloadingUpdate: Working towards 4.9.23: downloading update                                                                                                                                                
  2022-03-07T18:52:56Z RetrievedUpdates=True :                                                                                                                                                                                                       

  etcd ClusterOperator                                                                                                                                                                                                                               
  2022-03-07T18:30:10Z RecentBackup=Unknown ControllerStarted:                                                                                                                                                                                       
  2022-03-07T18:32:47Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready                                                                                                                                                 
  EtcdMembersDegraded: No unhealthy members found                                                                                                                                                                                                    
  2022-03-07T18:40:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3                                                                                                                                             
  EtcdMembersProgressing: No unstarted etcd members found                                                                                                                                                                                            
  2022-03-07T18:31:57Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3                                                                                                                                 
  EtcdMembersAvailable: 3 members are available                                                                                                                                                                                                      
  2022-03-07T18:30:11Z Upgradeable=True AsExpected: All is well                                                                                                                                                                                      

As we got into the update:

  2022-03-07T18:53:56+00:00                                                                                                                                                                                                                         
  ClusterVersion:                                                                                                                                                                                                                                   
  2022-03-07T18:48:14Z Available=True : Done applying 4.8.33
  2022-03-07T18:47:44Z Failing=False :
  2022-03-07T18:53:49Z Progressing=True DownloadingUpdate: Working towards 4.9.23: downloading update
  2022-03-07T18:52:56Z RetrievedUpdates=True :                                                                                                                                                                                                      

  etcd ClusterOperator                           
  2022-03-07T18:30:10Z RecentBackup=Unknown ControllerStarted:                                          
  2022-03-07T18:32:47Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                                   
  2022-03-07T18:40:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found      
  2022-03-07T18:31:57Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T18:30:11Z Upgradeable=True AsExpected: All is well

The CVO notices the etcd state, and complains about it the same way it does in 4.9 -> 4.10:

  2022-03-07T18:54:01+00:00                                                                                                                                                                                                                         
  ClusterVersion:                                                                                                                                                                                                                                   
  2022-03-07T18:48:14Z Available=True : Done applying 4.8.33
  2022-03-07T18:53:59Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
  2022-03-07T18:53:49Z Progressing=True UpgradePreconditionCheckFailed: Unable to apply 4.9.23: it may not be safe to apply this update
  2022-03-07T18:52:56Z RetrievedUpdates=True :                                                                                                                                                                                                      

  etcd ClusterOperator                           
  2022-03-07T18:30:10Z RecentBackup=Unknown UpgradeBackupInProgress: Backup pod phase: "Running"        
  2022-03-07T18:32:47Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                                   
  2022-03-07T18:40:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found      
  2022-03-07T18:31:57Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T18:30:11Z Upgradeable=True AsExpected: All is well

Etcd operator completes the backup at 18:54:05Z, and 14s later, the CVO still hasn't noticed and is still talking about ControllerStarted:

  2022-03-07T18:54:19+00:00
  ClusterVersion:
  2022-03-07T18:48:14Z Available=True : Done applying 4.8.33
  2022-03-07T18:53:59Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
  2022-03-07T18:53:49Z Progressing=True UpgradePreconditionCheckFailed: Unable to apply 4.9.23: it may not be safe to apply this update
  2022-03-07T18:52:56Z RetrievedUpdates=True :

  etcd ClusterOperator
  2022-03-07T18:54:05Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-03-07_185405 on node "ci-ln-y1rj17k-72292-4d9v8-master-1"
  2022-03-07T18:32:47Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found
  2022-03-07T18:40:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found
  2022-03-07T18:31:57Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T18:30:11Z Upgradeable=True AsExpected: All is well

But 19s after etcd has completed the backup, the CVO has accepted the update, and is getting on with reconciling the incoming manifests:

  2022-03-07T18:54:24+00:00
  ClusterVersion:
  2022-03-07T18:48:14Z Available=True : Done applying 4.8.33
  2022-03-07T18:54:23Z Failing=False :
  2022-03-07T18:53:49Z Progressing=True : Working towards 4.9.23: 9 of 738 done (1% complete)
  2022-03-07T18:52:56Z RetrievedUpdates=True :

  etcd ClusterOperator
  2022-03-07T18:54:05Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-03-07_185405 on node "ci-ln-y1rj17k-72292-4d9v8-master-1"
  2022-03-07T18:32:47Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found
  2022-03-07T18:40:16Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found
  2022-03-07T18:31:57Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T18:30:11Z Upgradeable=True AsExpected: All is well

So if folks feel like a bit of time where the CVO is saying "it may not be safe to apply this update" while it waits for the etcd backup is worth release notes, we may want to add those release notes to 4.9 as well.  Personally, I think the risk of confusion here is small, and I'd rather fix the bug, instead of trying to explain this transient, mis-reported string in release notes.

Comment 4 W. Trevor King 2022-03-07 20:10:33 UTC

Created attachment 1864435 [details]
logs from the outgoing CVO pod

And checking 4.9 -> 4.10 again:

  $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version
  4.9.23
  $ oc adm upgrade channel candidate-4.10

In one terminal:

  $ while sleep 5; do date --iso=s --utc; echo "ClusterVersion:"; oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'; echo; echo "etcd ClusterOperator"; oc get -o json clusteroperator etcd | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'; echo; done

In another:

  $ oc adm upgrade --to 4.10.2
  Updating to 4.10.2

Excerpts from the monitor:

Before the update:

  2022-03-07T19:58:19+00:00                                    
  ClusterVersion:                             
  2022-03-07T19:57:23Z RetrievedUpdates=True :              
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23                                                                                                                                                                                        
  2022-03-07T19:36:32Z Failing=False :                                                                                                                                                                                                              
  2022-03-07T19:37:02Z Progressing=False : Cluster version is 4.9.23

  etcd ClusterOperator                                                                               
  2022-03-07T19:19:46Z RecentBackup=Unknown ControllerStarted:                                      
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                       
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found                                                           
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available                
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well                                                     


CVO downloading the new release image:

  2022-03-07T19:58:25+00:00                                    
  ClusterVersion:                             
  2022-03-07T19:57:23Z RetrievedUpdates=True :              
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23                                                                                                                                                                                        
  2022-03-07T19:36:32Z Failing=False :                                                                                                                                                                                                              
  2022-03-07T19:58:26Z Progressing=True DownloadingUpdate: Working towards 4.10.2: downloading update

  etcd ClusterOperator                                                                                                                                                                                                                              
  2022-03-07T19:19:46Z RecentBackup=Unknown ControllerStarted:                                      
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                       
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found                                                           
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available                
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well                                                     

CVO is waiting on etcd, and etcd is waiting on the backup:

  2022-03-07T19:58:48+00:00                                 
  ClusterVersion:                                                                                                                                                                                                                                   
  2022-03-07T19:57:23Z RetrievedUpdates=True :                                                                                                                                                                                                      
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23
  2022-03-07T19:58:47Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
  2022-03-07T19:58:26Z Progressing=True UpgradePreconditionCheckFailed: Unable to apply 4.10.2: it may not be safe to apply this update                                                                                                             

  etcd ClusterOperator                           
  2022-03-07T19:19:46Z RecentBackup=Unknown UpgradeBackupInProgress: Backup pod phase: "Pending"        
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                                   
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found                                           
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available                                                         
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well

etcd completes the backup, CVO hasn't noticed yet:

  2022-03-07T19:58:53+00:00                                 
  ClusterVersion:                                                                                                                                                                                                                                   
  2022-03-07T19:57:23Z RetrievedUpdates=True :                                                                                                                                                                                                      
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23
  2022-03-07T19:58:47Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
  2022-03-07T19:58:26Z Progressing=True UpgradePreconditionCheckFailed: Unable to apply 4.10.2: it may not be safe to apply this update                                                                                                             

  etcd ClusterOperator                           
  2022-03-07T19:58:51Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-03-07_195848 on node "ci-ln-5cjbzj2-72292-9gkd5-master-0"
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                                   
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found                                           
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available                                                         
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well

And 2 and a half minutes after etcd completes the backup, the CVO still hasn't noticed:

  2022-03-07T20:01:29+00:00                                 
  ClusterVersion:                     
  2022-03-07T19:57:23Z RetrievedUpdates=True :                                                                                                                                                                                                      
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23                                                                           
  2022-03-07T19:58:47Z Failing=True UpgradePreconditionCheckFailed: Precondition "EtcdRecentBackup" failed because of "ControllerStarted":
  2022-03-07T19:58:26Z Progressing=True UpgradePreconditionCheckFailed: Unable to apply 4.10.2: it may not be safe to apply this update                                                                                                             

  etcd ClusterOperator                           
  2022-03-07T19:58:51Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-03-07_195848 on node "ci-ln-5cjbzj2-72292-9gkd5-master-0"
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found                                                                   
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found      
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well

But once it does notice, it starts rolling out the incoming manifests:

  2022-03-07T20:01:34+00:00
  ClusterVersion:
  2022-03-07T19:57:23Z RetrievedUpdates=True :
  2022-03-07T19:36:32Z Available=True : Done applying 4.9.23
  2022-03-07T20:01:31Z Failing=False :
  2022-03-07T19:58:26Z Progressing=True : Working towards 4.10.2: 5 of 770 done (0% complete)

  etcd ClusterOperator
  2022-03-07T19:58:51Z RecentBackup=True UpgradeBackupSuccessful: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-03-07_195848 on node "ci-ln-5cjbzj2-72292-9gkd5-master-0"
  2022-03-07T19:19:46Z Degraded=False AsExpected: NodeControllerDegraded: All master nodes are ready
  EtcdMembersDegraded: No unhealthy members found
  2022-03-07T19:29:39Z Progressing=False AsExpected: NodeInstallerProgressing: 3 nodes are at revision 3
  EtcdMembersProgressing: No unstarted etcd members found
  2022-03-07T19:21:42Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 3
  EtcdMembersAvailable: 3 members are available
  2022-03-07T19:20:58Z Upgradeable=True AsExpected: All is well

Events included more detail:

  $ oc -n openshift-cluster-version get events
  ...
  91s         Warning   PreconditionsFailed   clusterversion/version                          preconditions failed for payload loaded version="4.10.2" image="quay.io/openshift-release-dev/ocp-release@sha256:b807908afe1b7b25a8727f092cc11a104bf3996908a80f035d079b014b4b96b6" failures=Precondition "EtcdRecentBackup" failed because of "ControllerStarted":

I'm attaching CVO logs that I streamed down from the outgoing CVO pod:

  $ oc -n openshift-cluster-version logs -f cluster-version-operator-78d979db4-gsgvw >cvo.log

Comment 5 W. Trevor King 2022-03-07 22:10:25 UTC

Moving the version back, although I'll pick 4.2.0.  The CVO has rolled up under "it may not be safe to apply this update" since 2019 [1].

[1]: https://github.com/openshift/cluster-version-operator/blame/77a24f77b8f6e9975b5f41a787d6368ff609e391/pkg/payload/task.go#L243

Comment 6 Yang Yang 2022-03-08 03:46:04 UTC

Getting the situations causing "it may not be safe to apply this update" that we observed listed here:

1. Resource deletion is in progress when upgrade (it sometimes blocks upgrade from 4.9 to 4.10 because [1] isn't introduced back)
2. Machine config pool is updating when upgrade 
3. Etcd is backingup when upgrade
4. Target image is unsigned

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1822752

Comment 7 Jack Ottofaro 2022-04-26 18:26:05 UTC

Although "EtcdRecentBackup" is different than our other preconditions - it's an ongoing automated process that's expected to complete and resolve the precondition failure - the message "it may not be safe to apply this update" still applies until the Etcd backup has completed. Closing this as a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2065507 which enhances `oc adm upgrade` to provide amplifying information about upgrade status.

*** This bug has been marked as a duplicate of bug 2065507 ***

Note You need to log in before you can comment on or make changes to this bug.