Hide Forgot
Description of problem: During the installation of a new cluster, the kube-scheduler revision-pruner pods fail several times during control plane creation. Version-Release number of selected component (if applicable): 4.9.4 How reproducible: Seems to be low to reproduce. Steps to Reproduce: N/A Actual results: revision-pruner pods fail Expected results: revision-pruner pods complete Additional info: ~~~ $ omg get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE installer-5-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h34m 10.129.0.16 ip-10-0-15-64.us-west-2.compute.internal installer-6-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h32m 10.129.0.24 ip-10-0-15-64.us-west-2.compute.internal installer-6-ip-10-0-16-77.us-west-2.compute.internal 0/1 Succeeded 0 9h31m 10.128.0.42 ip-10-0-16-77.us-west-2.compute.internal installer-7-ip-10-0-16-77.us-west-2.compute.internal 0/1 Succeeded 0 9h31m 10.128.0.43 ip-10-0-16-77.us-west-2.compute.internal installer-7-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.27 ip-10-0-17-153.us-west-2.compute.internal installer-8-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h28m 10.129.0.32 ip-10-0-15-64.us-west-2.compute.internal installer-8-ip-10-0-16-77.us-west-2.compute.internal 0/1 Succeeded 0 9h27m 10.128.0.52 ip-10-0-16-77.us-west-2.compute.internal installer-8-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.28 ip-10-0-17-153.us-west-2.compute.internal openshift-kube-scheduler-ip-10-0-15-64.us-west-2.compute.internal 3/3 Running 0 9h28m 10.0.15.64 ip-10-0-15-64.us-west-2.compute.internal openshift-kube-scheduler-ip-10-0-16-77.us-west-2.compute.internal 3/3 Running 0 9h27m 10.0.16.77 ip-10-0-16-77.us-west-2.compute.internal openshift-kube-scheduler-ip-10-0-17-153.us-west-2.compute.internal 3/3 Running 0 9h29m 10.0.17.153 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-6-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h32m 10.129.0.23 ip-10-0-15-64.us-west-2.compute.internal revision-pruner-6-ip-10-0-16-77.us-west-2.compute.internal 0/1 Failed 0 9h32m 10.128.0.39 ip-10-0-16-77.us-west-2.compute.internal revision-pruner-6-ip-10-0-17-153.us-west-2.compute.internal 0/1 Failed 0 9h32m 10.130.0.21 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-7-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h31m 10.129.0.25 ip-10-0-15-64.us-west-2.compute.internal revision-pruner-7-ip-10-0-16-77.us-west-2.compute.internal 0/1 Succeeded 0 9h31m 10.128.0.44 ip-10-0-16-77.us-west-2.compute.internal revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal 0/1 Failed 0 9h31m 10.130.0.24 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-8-ip-10-0-15-64.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.129.0.28 ip-10-0-15-64.us-west-2.compute.internal revision-pruner-8-ip-10-0-16-77.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.128.0.46 ip-10-0-16-77.us-west-2.compute.internal revision-pruner-8-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.29 ip-10-0-17-153.us-west-2.compute.internal ~~~ Logs: ~~~ $ omg logs revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal /cases/03072554/0020-must-gather.tar.gz/must-gather.local.3799198186116137965/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ef4d0df32283c1aae39942b149010b23c98659d54e8845ea0fcfffc36ea99f4e/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal/pruner/pruner/logs/current.log 2021-11-03T04:25:43.102103535Z I1103 04:25:43.101919 1 cmd.go:41] &{<nil> true {false} prune true map[cert-dir:0xc0005fe960 max-eligible-revision:0xc0005fe6e0 protected-revisions:0xc0005fe780 resource-dir:0xc0005fe820 static-pod-name:0xc0005fe8c0 v:0xc00032cc80] [0xc00032cc80 0xc0005fe6e0 0xc0005fe780 0xc0005fe820 0xc0005fe960 0xc0005fe8c0] [] map[add-dir-header:0xc00032c5a0 alsologtostderr:0xc00032c640 cert-dir:0xc0005fe960 help:0xc0005fefa0 log-backtrace-at:0xc00032c6e0 log-dir:0xc00032c780 log-file:0xc00032c820 log-file-max-size:0xc00032c8c0 log-flush-frequency:0xc000719680 logtostderr:0xc00032c960 max-eligible-revision:0xc0005fe6e0 one-output:0xc00032ca00 protected-revisions:0xc0005fe780 resource-dir:0xc0005fe820 skip-headers:0xc00032caa0 skip-log-headers:0xc00032cb40 static-pod-name:0xc0005fe8c0 stderrthreshold:0xc00032cbe0 v:0xc00032cc80 vmodule:0xc00032cd20] [0xc0005fe6e0 0xc0005fe780 0xc0005fe820 0xc0005fe8c0 0xc0005fe960 0xc00032c5a0 0xc00032c640 0xc00032c6e0 0xc00032c780 0xc00032c820 0xc00032c8c0 0xc000719680 0xc00032c960 0xc00032ca00 0xc00032caa0 0xc00032cb40 0xc00032cbe0 0xc00032cc80 0xc00032cd20 0xc0005fefa0] [0xc00032c5a0 0xc00032c640 0xc0005fe960 0xc0005fefa0 0xc00032c6e0 0xc00032c780 0xc00032c820 0xc00032c8c0 0xc000719680 0xc00032c960 0xc0005fe6e0 0xc00032ca00 0xc0005fe780 0xc0005fe820 0xc00032caa0 0xc00032cb40 0xc0005fe8c0 0xc00032cbe0 0xc00032cc80 0xc00032cd20] map[104:0xc0005fefa0 118:0xc00032cc80] [] -1 0 0xc0005caff0 true <nil> []} 2021-11-03T04:25:43.102190905Z I1103 04:25:43.102101 1 cmd.go:42] (*prune.PruneOptions)(0xc0005e4550)({ 2021-11-03T04:25:43.102190905Z MaxEligibleRevision: (int) 7, 2021-11-03T04:25:43.102190905Z ProtectedRevisions: ([]int) (len=6 cap=6) { 2021-11-03T04:25:43.102190905Z (int) 2, 2021-11-03T04:25:43.102190905Z (int) 3, 2021-11-03T04:25:43.102190905Z (int) 4, 2021-11-03T04:25:43.102190905Z (int) 5, 2021-11-03T04:25:43.102190905Z (int) 6, 2021-11-03T04:25:43.102190905Z (int) 7 2021-11-03T04:25:43.102190905Z }, 2021-11-03T04:25:43.102190905Z ResourceDir: (string) (len=36) "/etc/kubernetes/static-pod-resources", 2021-11-03T04:25:43.102190905Z CertDir: (string) (len=20) "kube-scheduler-certs", 2021-11-03T04:25:43.102190905Z StaticPodName: (string) (len=18) "kube-scheduler-pod" 2021-11-03T04:25:43.102190905Z }) 2021-11-03T04:25:43.102203390Z F1103 04:25:43.102194 1 cmd.go:48] lstat /etc/kubernetes/static-pod-resources/kube-scheduler-certs: no such file or directory 2021-11-03T04:25:43.194947275Z goroutine 1 [running]: 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.stacks(0xc000012001, 0xc0001c81c0, 0x84, 0xda) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:1026 +0xb9 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.(*loggingT).output(0x3a8bd60, 0xc000000003, 0x0, 0x0, 0xc0003c81c0, 0x1, 0x2f628ac, 0x6, 0x30, 0x414600) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:975 +0x1e5 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.(*loggingT).printDepth(0x3a8bd60, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc000496910, 0x1, 0x1) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:735 +0x185 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.(*loggingT).print(...) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:717 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.Fatal(...) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:1494 2021-11-03T04:25:43.194947275Z github.com/openshift/library-go/pkg/operator/staticpod/prune.NewPrune.func1(0xc000267680, 0xc0003cb680, 0x0, 0x6) 2021-11-03T04:25:43.194947275Z github.com/openshift/library-go.0-20210915142033-188c3c82f817/pkg/operator/staticpod/prune/cmd.go:48 +0x3aa 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.(*Command).execute(0xc000267680, 0xc0003cb620, 0x6, 0x6, 0xc000267680, 0xc0003cb620) 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.3/command.go:856 +0x2c2 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.(*Command).ExecuteC(0xc000266c80, 0xc000056080, 0xc000266c80, 0xc000000180) 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.3/command.go:960 +0x375 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.(*Command).Execute(...) 2021-11-03T04:25:43.194947275Z github.com/spf13/cobra.3/command.go:897 2021-11-03T04:25:43.194947275Z main.main() 2021-11-03T04:25:43.194947275Z github.com/openshift/cluster-kube-scheduler-operator/cmd/cluster-kube-scheduler-operator/main.go:34 +0x176 2021-11-03T04:25:43.194947275Z 2021-11-03T04:25:43.194947275Z goroutine 6 [chan receive]: 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.(*loggingT).flushDaemon(0x3a8bd60) 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:1169 +0x8b 2021-11-03T04:25:43.194947275Z created by k8s.io/klog/v2.init.0 2021-11-03T04:25:43.194947275Z k8s.io/klog/v2.0/klog.go:420 +0xdf 2021-11-03T04:25:43.194947275Z 2021-11-03T04:25:43.194947275Z goroutine 64 [runnable]: 2021-11-03T04:25:43.194947275Z k8s.io/apimachinery/pkg/util/wait.Forever(0x27490f0, 0x12a05f200) 2021-11-03T04:25:43.194947275Z k8s.io/apimachinery.1/pkg/util/wait/wait.go:80 2021-11-03T04:25:43.194947275Z created by k8s.io/component-base/logs.InitLogs 2021-11-03T04:25:43.194947275Z k8s.io/component-base.1/logs/logs.go:58 +0x8a ~~~ The linked case has must-gather available for further investigation.
From openshift-kube-scheduler/pods/installer-7-ip-10-0-17-153.us-west-2.compute.internal/installer/installer/logs/current.log: ``` 2021-11-03T04:26:41.149182411Z I1103 04:26:41.148843 1 cmd.go:186] Creating target resource directory "/etc/kubernetes/static-pod-resources/kube-scheduler-certs" ... 2021-11-03T04:26:41.149182411Z I1103 04:26:41.148914 1 cmd.go:194] Getting secrets ... 2021-11-03T04:26:41.275975557Z I1103 04:26:41.275933 1 copy.go:32] Got secret openshift-kube-scheduler/kube-scheduler-client-cert-key 2021-11-03T04:26:41.276021276Z I1103 04:26:41.275976 1 cmd.go:207] Getting config maps ... 2021-11-03T04:26:41.276021276Z I1103 04:26:41.275985 1 cmd.go:226] Creating directory "/etc/kubernetes/static-pod-resources/kube-scheduler-certs/secrets/kube-scheduler-client-cert-key" ... 2021-11-03T04:26:41.276127775Z I1103 04:26:41.276101 1 cmd.go:449] Writing secret manifest "/etc/kubernetes/static-pod-resources/kube-scheduler-certs/secrets/kube-scheduler-client-cert-key/tls.crt" ... 2021-11-03T04:26:41.276210474Z I1103 04:26:41.276192 1 cmd.go:449] Writing secret manifest "/etc/kubernetes/static-pod-resources/kube-scheduler-certs/secrets/kube-scheduler-client-cert-key/tls.key" ... ``` revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal panics at 04:25:43.102194. ``` 2021-11-03T04:25:43.102203390Z F1103 04:25:43.102194 1 cmd.go:48] lstat /etc/kubernetes/static-pod-resources/kube-scheduler-certs: no such file or directory ``` So the missing /etc/kubernetes/static-pod-resources/kube-scheduler-certs directory is created eventually. Just not quick enough. From the openshift-kube-scheduler-operator: ``` 2021-11-03T04:25:39.184994392Z I1103 04:25:39.184944 1 request.go:665] Waited for 1.192417299s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal 2021-11-03T04:25:40.383204649Z I1103 04:25:40.383152 1 request.go:665] Waited for 1.190625443s due to client-side throttling, not priority and fairness, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods 2021-11-03T04:25:40.404609063Z I1103 04:25:40.404559 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-scheduler-operator", Name:"openshift-kube-scheduler-operator", UID:"6bb5dcf7-ffe6-4ace-a208-a476379fb082", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'PodCreated' Created Pod/revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal -n openshift-kube-scheduler because it was missing 2021-11-03T04:26:35.508527549Z I1103 04:26:35.508455 1 request.go:665] Waited for 1.179951135s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-7-ip-10-0-17-153.us-west-2.compute.internal 2021-11-03T04:26:36.557796191Z I1103 04:26:36.556367 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-scheduler-operator", Name:"openshift-kube-scheduler-operator", UID:"6bb5dcf7-ffe6-4ace-a208-a476379fb082", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'PodCreated' Created Pod/installer-7-ip-10-0-17-153.us-west-2.compute.internal -n openshift-kube-scheduler because it was missing 2021-11-03T04:26:36.722600664Z I1103 04:26:36.722546 1 request.go:665] Waited for 1.010649519s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-ip-10-0-16-77.us-west-2.compute.internal 2021-11-03T04:26:37.592274527Z I1103 04:26:37.592233 1 installer_controller.go:512] "ip-10-0-17-153.us-west-2.compute.internal" is in transition to 7, but has not made progress because installer is not finished, but in Pending phase 2021-11-03T04:26:37.904840556Z I1103 04:26:37.904787 1 request.go:665] Waited for 1.124217196s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal 2021-11-03T04:26:38.906680674Z I1103 04:26:38.906637 1 request.go:665] Waited for 1.310842986s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-7-ip-10-0-17-153.us-west-2.compute.internal 2021-11-03T04:26:40.111645240Z I1103 04:26:40.111607 1 request.go:665] Waited for 1.164360997s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-scheduler/pods/installer-7-ip-10-0-17-153.us-west-2.compute.internal ``` The operator created the pruner pod before the installer pod with almost 1 minute delay: ``` 2021-11-03T04:25:40.404609063Z I1103 04:25:40.404559 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-scheduler-operator", Name:"openshift-kube-scheduler-operator", UID:"6bb5dcf7-ffe6-4ace-a208-a476379fb082", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'PodCreated' Created Pod/revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal -n openshift-kube-scheduler because it was missing 2021-11-03T04:26:36.557796191Z I1103 04:26:36.556367 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-scheduler-operator", Name:"openshift-kube-scheduler-operator", UID:"6bb5dcf7-ffe6-4ace-a208-a476379fb082", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'PodCreated' Created Pod/installer-7-ip-10-0-17-153.us-west-2.compute.internal -n openshift-kube-scheduler because it was missing ``` The static pod builder creates both the pruner and the installer as two independently running controllers. The pruner has no check saying "wait until the installer pod of my revision finishes". So there's no safe guard for avoiding the incident We have two options: - have the pruner pod wait for the installer pod to finish (or for a presence of required directories) (e.g. up to few minutes before it fails) - update the pruner controller to create the pruner pod not before the corresponding installer pod is finished
The issue was reported yesterday. Will need another sprint to properly implement the changes.
Based on the provided oc get pods output for ip-10-0-17-153.us-west-2.compute.internal: ~~~ $ omg get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE installer-7-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.27 ip-10-0-17-153.us-west-2.compute.internal installer-8-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.28 ip-10-0-17-153.us-west-2.compute.internal openshift-kube-scheduler-ip-10-0-17-153.us-west-2.compute.internal 3/3 Running 0 9h29m 10.0.17.153 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-6-ip-10-0-17-153.us-west-2.compute.internal 0/1 Failed 0 9h32m 10.130.0.21 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-7-ip-10-0-17-153.us-west-2.compute.internal 0/1 Failed 0 9h31m 10.130.0.24 ip-10-0-17-153.us-west-2.compute.internal revision-pruner-8-ip-10-0-17-153.us-west-2.compute.internal 0/1 Succeeded 0 9h30m 10.130.0.29 ip-10-0-17-153.us-west-2.compute.internal ~~~ The pruner gets eventually running.
In this case the failing pruner does not cause any issues. The pruner has nothing to do over the non-existing directory. It just panics. Once the PR is merged, the panics disappears.
Neil, as it may seem, this issue has no impact on the functionality of the pruner. Thus, we will not be fixing it in 4.9. The missing cert directory (which is reported as missing) will get created eventually by one of the installer pods. Once done, the pruner will stop failing with the reported stack trace. Would it be sufficient for the customer to have this fixed in 4.10 with the explanation I provided?
Hello Jan, that is acceptable. I'll let the customer know. Thanks for taking a look into it.
I test Assisted Service and bumped with similar problem when run 4.9.15 version See more information https://issues.redhat.com/browse/MGMT-9036 No problem with OCP OCP 4.10.0-fc.0
Moving the bug to verified state as i did not see revision pruner in failed state during new install of cluster. Will reopen the bug again if i hit it during any of the installs. [knarra@knarra ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-24-070025 True False 4h52m Cluster version is 4.10.0-0.nightly-2022-01-24-070025 [knarra@knarra ~]$ oc get pods -n openshift-kube-scheduler NAME READY STATUS RESTARTS AGE installer-5-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 5h11m installer-5-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 5h10m installer-6-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 5h4m installer-6-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 5h9m installer-7-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 5h4m installer-7-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 5h3m installer-8-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 4h59m installer-8-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 5h2m installer-8-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 5h1m installer-9-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 4h36m installer-9-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 4h35m installer-9-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 4h34m openshift-kube-scheduler-guard-ip-10-0-156-150.ap-northeast-1.compute.internal 1/1 Running 0 5h13m openshift-kube-scheduler-guard-ip-10-0-189-93.ap-northeast-1.compute.internal 1/1 Running 0 5h12m openshift-kube-scheduler-guard-ip-10-0-204-104.ap-northeast-1.compute.internal 1/1 Running 0 5h10m openshift-kube-scheduler-ip-10-0-156-150.ap-northeast-1.compute.internal 3/3 Running 0 4h36m openshift-kube-scheduler-ip-10-0-189-93.ap-northeast-1.compute.internal 3/3 Running 0 4h35m openshift-kube-scheduler-ip-10-0-204-104.ap-northeast-1.compute.internal 3/3 Running 0 4h33m revision-pruner-8-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 5h1m revision-pruner-8-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 5h1m revision-pruner-8-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 5h1m revision-pruner-9-ip-10-0-156-150.ap-northeast-1.compute.internal 0/1 Completed 0 4h36m revision-pruner-9-ip-10-0-189-93.ap-northeast-1.compute.internal 0/1 Completed 0 4h36m revision-pruner-9-ip-10-0-204-104.ap-northeast-1.compute.internal 0/1 Completed 0 4h36m
Hi @knarra As I reported early, the issue is reproducible on OCP 4.9.15. Are you going to backport to 4.9.* version? Thank you
(In reply to Yuri Obshansky from comment #12) > Hi @knarra > As I reported early, the issue is reproducible on OCP 4.9.15. > Are you going to backport to 4.9.* version? > Thank you Hello Yuri, Yes, i already see that 4.9.z bug is in POST state, please see here https://bugzilla.redhat.com/show_bug.cgi?id=2044622 Thanks kasturi
Thank you for update Yuri (In reply to RamaKasturi from comment #13) > (In reply to Yuri Obshansky from comment #12) > > Hi @knarra > > As I reported early, the issue is reproducible on OCP 4.9.15. > > Are you going to backport to 4.9.* version? > > Thank you > > Hello Yuri, > > Yes, i already see that 4.9.z bug is in POST state, please see here > https://bugzilla.redhat.com/show_bug.cgi?id=2044622 > > Thanks > kasturi
Hello Yuri, I am trying to verify the 4.9.z bug and just wanted to understand about the cloud provider where you hit the issue. I am just trying to reproduce the issue so just wanted to understand. Thanks kasturi
(In reply to RamaKasturi from comment #16) > Hello Yuri, > > I am trying to verify the 4.9.z bug and just wanted to understand about > the cloud provider where you hit the issue. I am just trying to reproduce > the issue so just wanted to understand. > > Thanks > kasturi Hi kasturi, We test Assisted Service cloud approach to install Openshift. https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters Here is repo -> https://github.com/openshift/assisted-service https://cloud.redhat.com/blog/using-the-openshift-assisted-installer-service-to-deploy-an-openshift-cluster-on-metal-and-vsphere Let me know if you need more info Thanks Yuri
(In reply to Yuri Obshansky from comment #17) > (In reply to RamaKasturi from comment #16) > > Hello Yuri, > > > > I am trying to verify the 4.9.z bug and just wanted to understand about > > the cloud provider where you hit the issue. I am just trying to reproduce > > the issue so just wanted to understand. > > > > Thanks > > kasturi > > Hi kasturi, > > We test Assisted Service cloud approach to install Openshift. > https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters > > Here is repo -> https://github.com/openshift/assisted-service > > https://cloud.redhat.com/blog/using-the-openshift-assisted-installer-service- > to-deploy-an-openshift-cluster-on-metal-and-vsphere > > Let me know if you need more info > > Thanks > Yuri okay, thank you. Could you please help try if your issue is resolved in the new 4.9 build?
(In reply to RamaKasturi from comment #18) > > okay, thank you. > > Could you please help try if your issue is resolved in the new 4.9 build? We can test only version which is deployed on Cloud. I cannot update version on it. Now, it is 4.9.17 Where's the issue fixed ? What version?
(In reply to Yuri Obshansky from comment #19) > (In reply to RamaKasturi from comment #18) > > > > okay, thank you. > > > > Could you please help try if your issue is resolved in the new 4.9 build? > > We can test only version which is deployed on Cloud. > I cannot update version on it. > Now, it is 4.9.17 > Where's the issue fixed ? What version? Issue fixed in 4.9.19 build. Here is the complete change log https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.9.19 also 4.9 bug which i have moved to verified state https://bugzilla.redhat.com/show_bug.cgi?id=2044622
(In reply to RamaKasturi from comment #20) > Issue fixed in 4.9.19 build. Here is the complete change log > https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4- > stable/release/4.9.19 also 4.9 bug which i have moved to verified state > https://bugzilla.redhat.com/show_bug.cgi?id=2044622 Great. I'll verify issue when we got image 4.9.19 on Staging Will update bugzilla with results
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056
Hello Yuri, Could you please help set the right test coverage flag here ? Thanks kasturi
Hello Yuri, Any reason you cleared the needinfo with out setting the test coverage flag here nor i did not see any comments mentioning why, could you please help me understand the same ? Thanks kasturi
Hello Rama, We do not have image 4.9.19 on our Staging setup. So, I cannot verify this bug. Please, find attached screenshot with list of images on Staging. Let me know which version will be good for bug verification. Thank you Yuri
Hi Yuri, please help verify with 4.9.37 as i understand fix should be present there. Thanks kasturi
Hi, Just verified with 4.9.37 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.37 True False 93m Cluster version is 4.9.37 # oc get pods -n openshift-kube-scheduler NAME READY STATUS RESTARTS AGE installer-3-master-0-0 0/1 Completed 0 112m installer-4-master-0-0 0/1 Completed 0 111m installer-5-master-0-0 0/1 Completed 0 110m installer-5-master-0-2 0/1 Completed 0 108m installer-6-master-0-0 0/1 Completed 0 105m installer-6-master-0-1 0/1 Completed 0 101m installer-6-master-0-2 0/1 Completed 0 104m openshift-kube-scheduler-master-0-0 3/3 Running 0 105m openshift-kube-scheduler-master-0-1 3/3 Running 0 99m openshift-kube-scheduler-master-0-2 3/3 Running 0 104m revision-pruner-6-master-0-0 0/1 Completed 0 103m revision-pruner-6-master-0-1 0/1 Completed 0 101m revision-pruner-6-master-0-2 0/1 Completed 0 103m Issued resolved
Hello Yuri, Thanks for verifying. Could you please help set the right test coverage flag here ? Thanks kasturi
Hi Honestly I do not know what should be flag qe_tes_coverage Discuss with your QE managers. Sorry about that. Thank you
Hello Yuri, Since the issue happens when installtion during assissted installer, i was wondering if you have a test case added to check that. If you have test case added please help set '+' in the qe_test_coverage flag else set '-' saying why you think we do not need to add a case for this. Thanks kasturi
Hi, This issue is not specific to Assisted service deployment. We do no need add special test case for that. Probably should be tested in regular cluster deployment flow. Thank you Yuri