Bug 1903820

Summary: Performamce Profile does not update status when MCP goes into degraded state
Product: OpenShift Container Platform Reporter: Denys Shchedrivyi <dshchedr>
Component: Performance Addon OperatorAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: Gowrishankar Rajaiyan <grajaiya>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, grajaiya, mniranja, vlaad
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: performance-addon-operator-container-v4.7.0-18 Doc Type: Bug Fix
Doc Text:
Cause: Watching at the owned machine config pool, when no machine config pool is owned by the performance profile. Consequence: The performance profile did not have an updated status regarding the machine config pool state. Fix: Watch at machine config pools that referred to by the performance profile node selector or machine config pool selector. Result: We have updated the state of the selected machine config pool under the performance profile status.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-26 13:56:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Denys Shchedrivyi 2020-12-02 21:31:50 UTC
Description of problem:
 When MCP is degraded - Performance profile does not catch it and does not show the error message:

# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker-cnf   rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3   False     True       True       1              0                   0                     1                      174m

# oc describe mcp worker-cnf
.
    Last Transition Time:  2020-12-02T21:13:37Z
    Message:               Node cnf-den-nngpc-worker-0-4qv7w is reporting: "can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ignition disks section contains changes: unreconcilable"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-12-02T21:13:37Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded


# oc describe node cnf-den-nngpc-worker-0-4qv7w
.
Annotations:
       .
       machineconfiguration.openshift.io/reason:
          can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ign...
       machineconfiguration.openshift.io/state: Unreconcilable



# oc describe performanceprofile manual
Status:
  Conditions:
.
    Status:                False
    Type:                  Degraded



Version-Release number of selected component (if applicable):
4.6, 4.7


Steps to Reproduce:
1. Set mcp into degraded state (for example by creating some wrong MC)
2. Check MCP, node and performance profile


Actual results:
 MCP in degraded state, Node has the reason in annotations, but Performance Profile shows Degraded=False


Expected results:
 Performance Profile has right MCP status

Comment 1 Denys Shchedrivyi 2020-12-02 22:42:53 UTC
after manually updating profile (removing some unnecessary lines) degraded message appeared:

initially I have degraded=false
># oc get performanceprofile manual -o jsonpath={.status.conditions[3]}
>{"lastHeartbeatTime":"2020-12-02T22:10:19Z","lastTransitionTime":"2020-12-02T22:10:19Z","status":"False","type":"Degraded"}[☘️ glip-rh ~/.../openshift-kni/performance-addon-operators$] 

editing profile and removing some unnecessary staff:
># oc edit performanceprofile
>performanceprofile.performance.openshift.io/manual edited

degraded message appeared in profile:
># oc get performanceprofile manual -o jsonpath={.status.conditions[3]}
>{"lastHeartbeatTime":"2020-12-02T22:29:52Z","lastTransitionTime":"2020-12-02T22:29:52Z","message":"Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.\nMachine config pool worker-cnf Degraded Message: Node cnf-den-nngpc-worker-0-4qv7w is reporting: \"can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ignition disks section contains changes: unreconcilable\".\n","reason":"MCPDegraded","status":"True","type":"Degraded"}

Comment 3 Denys Shchedrivyi 2021-01-07 20:49:46 UTC
 Status in profile updated, but for some reason the message is duplicated:

> # oc describe performanceprofile
>.
>    Message:               Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
>Machine config pool worker-cnf Degraded Message: Node ocp47rtfix-worker-0.demo.lab.den is reporting: "can't reconcile config rendered-worker-cnf-66805b1fc1b445e249355a7955da4e9b with rendered-worker-cnf-8102e3c868d464cb299e26b9e45c307a: ignition disks section contains changes: unreconcilable".
>Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
>Machine config pool worker-cnf Degraded Message: Node ocp47rtfix-worker-0.demo.lab.den is reporting: "can't reconcile config rendered-worker-cnf-66805b1fc1b445e249355a7955da4e9b with rendered-worker-cnf-8102e3c868d464cb299e26b9e45c307a: ignition disks section contains changes: unreconcilable".
>    Reason:       MCPDegraded
>    Status:       True
>    Type:         Degraded

Comment 4 Artyom 2021-01-10 11:21:48 UTC
Hm now I remember why we had the remove duplication method for MCPS, I will create an additional PR.

Comment 5 Denys Shchedrivyi 2021-01-22 19:42:44 UTC
Verified on performance-addon-operator-container-v4.7.0-24