Bug 1903820 - Performamce Profile does not update status when MCP goes into degraded state
Summary: Performamce Profile does not update status when MCP goes into degraded state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Martin Sivák
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-02 21:31 UTC by Denys Shchedrivyi
Modified: 2022-08-26 13:56 UTC (History)
4 users (show)

Fixed In Version: performance-addon-operator-container-v4.7.0-18
Doc Type: Bug Fix
Doc Text:
Cause: Watching at the owned machine config pool, when no machine config pool is owned by the performance profile. Consequence: The performance profile did not have an updated status regarding the machine config pool state. Fix: Watch at machine config pools that referred to by the performance profile node selector or machine config pool selector. Result: We have updated the state of the selected machine config pool under the performance profile status.
Clone Of:
Environment:
Last Closed: 2022-08-26 13:56:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni/performance-addon-operators/blob/master/functests/3_performance_status/status.go#L94 0 None None None 2021-07-02 10:40:13 UTC
Github openshift-kni performance-addon-operators pull 479 0 None closed Bug 1903820: watch machine config pools not owned by the performance profile 2021-02-19 21:15:11 UTC

Description Denys Shchedrivyi 2020-12-02 21:31:50 UTC
Description of problem:
 When MCP is degraded - Performance profile does not catch it and does not show the error message:

# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker-cnf   rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3   False     True       True       1              0                   0                     1                      174m

# oc describe mcp worker-cnf
.
    Last Transition Time:  2020-12-02T21:13:37Z
    Message:               Node cnf-den-nngpc-worker-0-4qv7w is reporting: "can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ignition disks section contains changes: unreconcilable"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-12-02T21:13:37Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded


# oc describe node cnf-den-nngpc-worker-0-4qv7w
.
Annotations:
       .
       machineconfiguration.openshift.io/reason:
          can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ign...
       machineconfiguration.openshift.io/state: Unreconcilable



# oc describe performanceprofile manual
Status:
  Conditions:
.
    Status:                False
    Type:                  Degraded



Version-Release number of selected component (if applicable):
4.6, 4.7


Steps to Reproduce:
1. Set mcp into degraded state (for example by creating some wrong MC)
2. Check MCP, node and performance profile


Actual results:
 MCP in degraded state, Node has the reason in annotations, but Performance Profile shows Degraded=False


Expected results:
 Performance Profile has right MCP status

Comment 1 Denys Shchedrivyi 2020-12-02 22:42:53 UTC
after manually updating profile (removing some unnecessary lines) degraded message appeared:

initially I have degraded=false
># oc get performanceprofile manual -o jsonpath={.status.conditions[3]}
>{"lastHeartbeatTime":"2020-12-02T22:10:19Z","lastTransitionTime":"2020-12-02T22:10:19Z","status":"False","type":"Degraded"}[☘️ glip-rh ~/.../openshift-kni/performance-addon-operators$] 

editing profile and removing some unnecessary staff:
># oc edit performanceprofile
>performanceprofile.performance.openshift.io/manual edited

degraded message appeared in profile:
># oc get performanceprofile manual -o jsonpath={.status.conditions[3]}
>{"lastHeartbeatTime":"2020-12-02T22:29:52Z","lastTransitionTime":"2020-12-02T22:29:52Z","message":"Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.\nMachine config pool worker-cnf Degraded Message: Node cnf-den-nngpc-worker-0-4qv7w is reporting: \"can't reconcile config rendered-worker-cnf-f3ee0715b65b7bdddb637e3bbf640ce3 with rendered-worker-cnf-eea4ca47f8b99a156057f933321e2e25: ignition disks section contains changes: unreconcilable\".\n","reason":"MCPDegraded","status":"True","type":"Degraded"}

Comment 3 Denys Shchedrivyi 2021-01-07 20:49:46 UTC
 Status in profile updated, but for some reason the message is duplicated:

> # oc describe performanceprofile
>.
>    Message:               Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
>Machine config pool worker-cnf Degraded Message: Node ocp47rtfix-worker-0.demo.lab.den is reporting: "can't reconcile config rendered-worker-cnf-66805b1fc1b445e249355a7955da4e9b with rendered-worker-cnf-8102e3c868d464cb299e26b9e45c307a: ignition disks section contains changes: unreconcilable".
>Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
>Machine config pool worker-cnf Degraded Message: Node ocp47rtfix-worker-0.demo.lab.den is reporting: "can't reconcile config rendered-worker-cnf-66805b1fc1b445e249355a7955da4e9b with rendered-worker-cnf-8102e3c868d464cb299e26b9e45c307a: ignition disks section contains changes: unreconcilable".
>    Reason:       MCPDegraded
>    Status:       True
>    Type:         Degraded

Comment 4 Artyom 2021-01-10 11:21:48 UTC
Hm now I remember why we had the remove duplication method for MCPS, I will create an additional PR.

Comment 5 Denys Shchedrivyi 2021-01-22 19:42:44 UTC
Verified on performance-addon-operator-container-v4.7.0-24


Note You need to log in before you can comment on or make changes to this bug.