Bug 2120314
| Summary: | [4.11.z clone] Provider cannot deduce API server reachability of API server | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Mudit Agarwal <muagarwa> |
| Component: | ocs-operator | Assignee: | Dhruv Bindra <dbindra> |
| Status: | CLOSED ERRATA | QA Contact: | Jilju Joy <jijoy> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | dbindra, kramdoss, madam, mbukatov, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot, omitrani, sostapov |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.11.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2112852 | Environment: | |
| Last Closed: | 2022-09-14 15:15:05 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2136765, 2112852, 2123697 | ||
| Bug Blocks: | |||
Verified in version:
ODF 4.11.1-8
OCP 4.10.30
ocs-osd-deployer.v2.0.5
Followed the steps 1,2 and 3 given below.
1. All the report-to-status pods in consumer clusters should be in running or completed state as they are created from a cronJob.
The pods "report-status-to-provider" are in Completed state.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716284-gtw7x 0/1 Completed 0 3m7s 10.131.0.52 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716285-562t9 0/1 Completed 0 2m7s 10.131.0.53 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716286-js66g 0/1 Completed 0 67s 10.131.0.54 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716287-c8f26 0/1 Completed 0 7s 10.131.0.55 ip-10-0-157-29.ec2.internal <none> <none>
2. Check the status of storageConsumer CR, it should have a lastHeartbeat field with a timestamp and this timestamp should get update every min.
"lastHeartbeat" value is changing in every minute.
$ oc get storageconsumer -o yaml | grep lastHeartbeat
lastHeartbeat: "2022-09-12T10:08:06Z"
$ oc get storageconsumer -o yaml | grep lastHeartbeat
lastHeartbeat: "2022-09-12T10:09:08Z"
$ oc get storageconsumer -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
kind: StorageConsumer
metadata:
annotations:
ocs.openshift.io/provider-onboarding-ticket: |
eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
creationTimestamp: "2022-09-12T06:27:59Z"
finalizers:
- storagesconsumer.ocs.openshift.io
generation: 2
name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
namespace: openshift-storage
resourceVersion: "613508"
uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
spec:
capacity: 1Pi
enable: true
status:
cephResources:
- kind: CephClient
name: 370e476009884d204effbc012fb6b36d
status: Ready
grantedCapacity: 1Pi
lastHeartbeat: "2022-09-12T10:09:08Z"
state: Ready
kind: List
metadata:
resourceVersion: ""
selfLink: ""
3. Update the storageProviderEndpoint to some random value and check the lastHeartbeat field, it will not update anymore.
$ rosa edit addon ocs-consumer-qe -c jijoy-s12-c1
? Storage Provider API Endpoint: 10.0.102.2:31222
(Add a random value of Endpoint keeping everything else as the current value. 10.0.102.2 does not exist)
The value changed in the consumer cluster:
$ oc get storagecluster -o yaml | grep storageProviderEndpoint
storageProviderEndpoint: 10.0.102.2:31222
The value of lastHeartbeat field in the storageconsumer CR is not changing.
$ date && oc get storageconsumer -o yaml | grep lastHeartbeat && sleep 300 && oc get storageconsumer -o yaml | grep lastHeartbeat
Mon Sep 12 04:11:58 PM IST 2022
lastHeartbeat: "2022-09-12T10:29:07Z"
lastHeartbeat: "2022-09-12T10:29:07Z"
$ oc get storageconsumer -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
kind: StorageConsumer
metadata:
annotations:
ocs.openshift.io/provider-onboarding-ticket: |
eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
creationTimestamp: "2022-09-12T06:27:59Z"
finalizers:
- storagesconsumer.ocs.openshift.io
generation: 2
name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
namespace: openshift-storage
resourceVersion: "651745"
uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
spec:
capacity: 1Pi
enable: true
status:
cephResources:
- kind: CephClient
name: 370e476009884d204effbc012fb6b36d
status: Ready
grantedCapacity: 1Pi
lastHeartbeat: "2022-09-12T10:29:07Z"
state: Ready
kind: List
metadata:
resourceVersion: ""
selfLink: ""
The pods "report-status-to-provider" on the consumer cluster are not in the correct state. This is expected with a wrong value of storageproviderEndpoint.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716307-g2w45 0/1 Completed 0 17m 10.129.2.68 ip-10-0-173-22.ec2.internal <none> <none>
report-status-to-provider-27716308-v99ft 0/1 Completed 0 16m 10.131.0.66 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716309-66wf4 0/1 Completed 0 15m 10.131.0.67 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716318-gv6fg 0/1 CrashLoopBackOff 5 (109s ago) 6m7s 10.129.2.74 ip-10-0-173-22.ec2.internal <none> <none>
report-status-to-provider-27716319-6tgtc 0/1 CrashLoopBackOff 5 (43s ago) 5m7s 10.131.0.74 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716320-fz9jg 1/1 Running 5 (90s ago) 4m7s 10.129.2.75 ip-10-0-173-22.ec2.internal <none> <none>
report-status-to-provider-27716321-2jnrt 0/1 CrashLoopBackOff 4 (26s ago) 3m7s 10.131.0.75 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716322-xndcm 0/1 CrashLoopBackOff 3 (23s ago) 2m7s 10.131.0.76 ip-10-0-157-29.ec2.internal <none> <none>
report-status-to-provider-27716323-rgdz4 0/1 Error 2 (37s ago) 67s 10.129.2.76 ip-10-0-173-22.ec2.internal <none> <none>
report-status-to-provider-27716324-xtrlx 1/1 Running 0 7s 10.129.2.77 ip-10-0-173-22.ec2.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.11.1 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:6525 |
>> we verify the product bug via regression runs and then clone it is MS and verify the actual working of the fix once we get ODF 4.11 in our Managed services clusters Yes, that should be the approach for core product fixes, it should not cause any regression. For actual testing you can open a MS BZ.