2120314 – [4.11.z clone] Provider cannot deduce API server reachability of API server

Bug 2120314 - [4.11.z clone] Provider cannot deduce API server reachability of API server

Summary: [4.11.z clone] Provider cannot deduce API server reachability of API server

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.11.1
Assignee:	Dhruv Bindra
QA Contact:	Jilju Joy
Docs Contact:
URL:
Whiteboard:
Depends On:	2112852 2123697 2136765
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-22 14:09 UTC by Mudit Agarwal
Modified:	2023-08-09 17:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:	2112852
Environment:
Last Closed:	2022-09-14 15:15:05 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 1735	None	Merged	odf-to-odf: Introduce heartbeat mechanism to check connectivity	2022-08-26 12:24:02 UTC
Github	red-hat-storage ocs-operator pull 1776	None	Merged	Bug 2120314: [release-4.11] odf-to-odf: Introduce heartbeat mechanism to check connectivity	2022-08-30 04:48:56 UTC
Red Hat Product Errata	RHBA-2022:6525	None	None	None	2022-09-14 15:15:25 UTC

Comment 3 Mudit Agarwal 2022-08-23 07:20:22 UTC

>> we verify the product bug via regression runs and then clone it is MS and verify the actual working of the fix once we get ODF 4.11 in our Managed services clusters
Yes, that should be the approach for core product fixes, it should not cause any regression. For actual testing you can open a MS BZ.

Comment 16 Jilju Joy 2022-09-12 11:03:16 UTC

Verified in version:
ODF 4.11.1-8
OCP 4.10.30
ocs-osd-deployer.v2.0.5


Followed the steps 1,2 and 3 given below.


1. All the report-to-status pods in consumer clusters should be in running or completed state as they are created from a cronJob.

The pods "report-status-to-provider" are in Completed state.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716284-gtw7x           0/1     Completed   0             3m7s    10.131.0.52    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716285-562t9           0/1     Completed   0             2m7s    10.131.0.53    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716286-js66g           0/1     Completed   0             67s     10.131.0.54    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716287-c8f26           0/1     Completed   0             7s      10.131.0.55    ip-10-0-157-29.ec2.internal    <none>           <none>



2. Check the status of storageConsumer CR, it should have a lastHeartbeat field with a timestamp and this timestamp should get update every min.

"lastHeartbeat" value is changing in every minute.

$ oc get storageconsumer -o yaml | grep lastHeartbeat
    lastHeartbeat: "2022-09-12T10:08:06Z"

$ oc get storageconsumer -o yaml | grep lastHeartbeat
    lastHeartbeat: "2022-09-12T10:09:08Z"
    


$ oc get storageconsumer -o yaml 
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
  kind: StorageConsumer
  metadata:
    annotations:
      ocs.openshift.io/provider-onboarding-ticket: |
        eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
    creationTimestamp: "2022-09-12T06:27:59Z"
    finalizers:
    - storagesconsumer.ocs.openshift.io
    generation: 2
    name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
    namespace: openshift-storage
    resourceVersion: "613508"
    uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
  spec:
    capacity: 1Pi
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 370e476009884d204effbc012fb6b36d
      status: Ready
    grantedCapacity: 1Pi
    lastHeartbeat: "2022-09-12T10:09:08Z"
    state: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


3. Update the storageProviderEndpoint to some random value and check the lastHeartbeat field, it will not update anymore.

$ rosa edit addon ocs-consumer-qe -c jijoy-s12-c1
? Storage Provider API Endpoint: 10.0.102.2:31222


(Add a random value of Endpoint keeping everything else as the current value. 10.0.102.2 does not exist)


The value changed in the consumer cluster:

$ oc get storagecluster -o yaml | grep storageProviderEndpoint
      storageProviderEndpoint: 10.0.102.2:31222


The value of lastHeartbeat field in the storageconsumer CR is not changing.

$ date && oc get storageconsumer -o yaml | grep lastHeartbeat && sleep 300 && oc get storageconsumer -o yaml | grep lastHeartbeat
Mon Sep 12 04:11:58 PM IST 2022
    lastHeartbeat: "2022-09-12T10:29:07Z"
    lastHeartbeat: "2022-09-12T10:29:07Z"
    

$ oc get storageconsumer -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
  kind: StorageConsumer
  metadata:
    annotations:
      ocs.openshift.io/provider-onboarding-ticket: |
        eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
    creationTimestamp: "2022-09-12T06:27:59Z"
    finalizers:
    - storagesconsumer.ocs.openshift.io
    generation: 2
    name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
    namespace: openshift-storage
    resourceVersion: "651745"
    uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
  spec:
    capacity: 1Pi
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 370e476009884d204effbc012fb6b36d
      status: Ready
    grantedCapacity: 1Pi
    lastHeartbeat: "2022-09-12T10:29:07Z"
    state: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



The pods "report-status-to-provider" on the consumer cluster are not in the correct state. This is expected with a wrong value of storageproviderEndpoint.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716307-g2w45           0/1     Completed          0              17m    10.129.2.68    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716308-v99ft           0/1     Completed          0              16m    10.131.0.66    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716309-66wf4           0/1     Completed          0              15m    10.131.0.67    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716318-gv6fg           0/1     CrashLoopBackOff   5 (109s ago)   6m7s   10.129.2.74    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716319-6tgtc           0/1     CrashLoopBackOff   5 (43s ago)    5m7s   10.131.0.74    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716320-fz9jg           1/1     Running            5 (90s ago)    4m7s   10.129.2.75    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716321-2jnrt           0/1     CrashLoopBackOff   4 (26s ago)    3m7s   10.131.0.75    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716322-xndcm           0/1     CrashLoopBackOff   3 (23s ago)    2m7s   10.131.0.76    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716323-rgdz4           0/1     Error              2 (37s ago)    67s    10.129.2.76    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716324-xtrlx           1/1     Running            0              7s     10.129.2.77    ip-10-0-173-22.ec2.internal    <none>           <none>

Comment 20 errata-xmlrpc 2022-09-14 15:15:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.11.1 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6525

Note You need to log in before you can comment on or make changes to this bug.