Bug 1815468 - Upgrade fail due to mcp master meet with "unexpected on-disk state validating against rendered-master-563ffe5acd3f458dc0bce06714073cc0"
Summary: Upgrade fail due to mcp master meet with "unexpected on-disk state validating...
Keywords:
Status: CLOSED DUPLICATE of bug 1815203
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-20 11:13 UTC by MinLi
Modified: 2020-03-20 11:20 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-20 11:20:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description MinLi 2020-03-20 11:13:31 UTC
Description of problem:
Upgrade fail due to mcp master meet with "unexpected on-disk state validating against rendered-master-563ffe5acd3f458dc0bce06714073cc0"

Version-Release number of selected component (if applicable):
update from 4.3.5 to 4.4.0-0.nightly-2020-03-20-041725

How reproducible:
always

Steps to Reproduce:
1.set up one ocp cluster via flexy job with version 4.3.5
2.update the cluster to 4.4.0-0.nightly-2020-03-20-041725
$ oc adm upgrade  --to-image='registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-20-041725' --allow-explicit-upgrade --force 

Actual results:
2.upgrade fail

Expected results:
2.upgrade succ

Additional info:
$ oc get clusterversion 
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.5     True        True          149m    Unable to apply 4.4.0-0.nightly-2020-03-20-041725: the cluster operator machine-config has not yet successfully rolled out


$ oc get node 
NAME                             STATUS   ROLES    AGE     VERSION
upg-0320-qjwq9-compute-0         Ready    worker   6h27m   v1.17.1
upg-0320-qjwq9-compute-1         Ready    worker   6h26m   v1.17.1
upg-0320-qjwq9-compute-2         Ready    worker   6h27m   v1.17.1
upg-0320-qjwq9-compute-3         Ready    worker   6h27m   v1.17.1
upg-0320-qjwq9-control-plane-0   Ready    master   6h46m   v1.16.2
upg-0320-qjwq9-control-plane-1   Ready    master   6h46m   v1.16.2
upg-0320-qjwq9-control-plane-2   Ready    master   6h46m   v1.16.2


$ oc get co machine-config -o yaml 
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2020-03-20T02:58:23Z"
  generation: 1
  name: machine-config
  resourceVersion: "235190"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
  uid: de617b8f-3654-4221-9e91-680073f1330a
spec: {}
status:
  conditions:
  - lastTransitionTime: "2020-03-20T08:04:22Z"
    message: Cluster not available for 4.4.0-0.nightly-2020-03-20-041725
    status: "False"
    type: Available
  - lastTransitionTime: "2020-03-20T07:50:19Z"
    message: Working towards 4.4.0-0.nightly-2020-03-20-041725
    status: "True"
    type: Progressing
  - lastTransitionTime: "2020-03-20T08:04:22Z"
    message: 'Unable to apply 4.4.0-0.nightly-2020-03-20-041725: timed out waiting
      for the condition during syncRequiredMachineConfigPools: pool master has not
      progressed to latest configuration: controller version mismatch for rendered-master-563ffe5acd3f458dc0bce06714073cc0
      expected d5d9a488c1e0e19e1d3044bd0fac90096b0224d6 has d5599de7a6b86ec385e0f9c849e93977fcb4eeb8,
      retrying'
    reason: RequiredPoolsFailed
    status: "True"
    type: Degraded
  - lastTransitionTime: "2020-03-20T03:05:40Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: {}
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: master
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: worker
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: machine-config-controller
    resource: controllerconfigs
  versions:
  - name: operator
    version: 4.3.5

$ oc get mcp master -o yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  creationTimestamp: "2020-03-20T02:58:28Z"
  generation: 3
  labels:
    machineconfiguration.openshift.io/mco-built-in: ""
    operator.machineconfiguration.openshift.io/required-for-upgrade: ""
  name: master
  resourceVersion: "156053"
  selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/master
  uid: b75f29e4-1de6-42f5-9553-d885901003a8
spec:
  configuration:
    name: rendered-master-baa94c7c521e2124f7fc245290984329
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-b75f29e4-1de6-42f5-9553-d885901003a8-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-fips
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: master
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master: ""
  paused: false
status:
  conditions:
  - lastTransitionTime: "2020-03-20T02:59:20Z"
    message: ""
    reason: ""
    status: "False"
    type: RenderDegraded
  - lastTransitionTime: "2020-03-20T07:51:05Z"
    message: ""
    reason: ""
    status: "False"
    type: Updated
  - lastTransitionTime: "2020-03-20T07:51:05Z"
    message: All nodes are updating to rendered-master-563ffe5acd3f458dc0bce06714073cc0
    reason: ""
    status: "True"
    type: Updating
  - lastTransitionTime: "2020-03-20T07:51:05Z"
    message: ""
    reason: ""
    status: "True"
    type: Degraded
  - lastTransitionTime: "2020-03-20T07:51:05Z"
    message: 'Node upg-0320-qjwq9-control-plane-0 is reporting: "unexpected on-disk
      state validating against rendered-master-563ffe5acd3f458dc0bce06714073cc0",
      Node upg-0320-qjwq9-control-plane-1 is reporting: "unexpected on-disk state
      validating against rendered-master-563ffe5acd3f458dc0bce06714073cc0", Node upg-0320-qjwq9-control-plane-2
      is reporting: "unexpected on-disk state validating against rendered-master-563ffe5acd3f458dc0bce06714073cc0"'
    reason: 3 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded
  configuration:
    name: rendered-master-563ffe5acd3f458dc0bce06714073cc0
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-b75f29e4-1de6-42f5-9553-d885901003a8-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-fips
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
  degradedMachineCount: 3
  machineCount: 3
  observedGeneration: 3
  readyMachineCount: 0
  unavailableMachineCount: 3
  updatedMachineCount: 0

Comment 1 MinLi 2020-03-20 11:20:01 UTC

*** This bug has been marked as a duplicate of bug 1815203 ***


Note You need to log in before you can comment on or make changes to this bug.