Bug 2120314

Summary: [4.11.z clone] Provider cannot deduce API server reachability of API server
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Mudit Agarwal <muagarwa>
Component: ocs-operatorAssignee: Dhruv Bindra <dbindra>
Status: CLOSED ERRATA QA Contact: Jilju Joy <jijoy>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: dbindra, kramdoss, madam, mbukatov, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot, omitrani, sostapov
Target Milestone: ---   
Target Release: ODF 4.11.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 2112852 Environment:
Last Closed: 2022-09-14 15:15:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2136765, 2112852, 2123697    
Bug Blocks:    

Comment 3 Mudit Agarwal 2022-08-23 07:20:22 UTC
>> we verify the product bug via regression runs and then clone it is MS and verify the actual working of the fix once we get ODF 4.11 in our Managed services clusters
Yes, that should be the approach for core product fixes, it should not cause any regression. For actual testing you can open a MS BZ.

Comment 16 Jilju Joy 2022-09-12 11:03:16 UTC
Verified in version:
ODF 4.11.1-8
OCP 4.10.30
ocs-osd-deployer.v2.0.5


Followed the steps 1,2 and 3 given below.


1. All the report-to-status pods in consumer clusters should be in running or completed state as they are created from a cronJob.

The pods "report-status-to-provider" are in Completed state.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716284-gtw7x           0/1     Completed   0             3m7s    10.131.0.52    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716285-562t9           0/1     Completed   0             2m7s    10.131.0.53    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716286-js66g           0/1     Completed   0             67s     10.131.0.54    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716287-c8f26           0/1     Completed   0             7s      10.131.0.55    ip-10-0-157-29.ec2.internal    <none>           <none>



2. Check the status of storageConsumer CR, it should have a lastHeartbeat field with a timestamp and this timestamp should get update every min.

"lastHeartbeat" value is changing in every minute.

$ oc get storageconsumer -o yaml | grep lastHeartbeat
    lastHeartbeat: "2022-09-12T10:08:06Z"

$ oc get storageconsumer -o yaml | grep lastHeartbeat
    lastHeartbeat: "2022-09-12T10:09:08Z"
    


$ oc get storageconsumer -o yaml 
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
  kind: StorageConsumer
  metadata:
    annotations:
      ocs.openshift.io/provider-onboarding-ticket: |
        eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
    creationTimestamp: "2022-09-12T06:27:59Z"
    finalizers:
    - storagesconsumer.ocs.openshift.io
    generation: 2
    name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
    namespace: openshift-storage
    resourceVersion: "613508"
    uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
  spec:
    capacity: 1Pi
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 370e476009884d204effbc012fb6b36d
      status: Ready
    grantedCapacity: 1Pi
    lastHeartbeat: "2022-09-12T10:09:08Z"
    state: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


3. Update the storageProviderEndpoint to some random value and check the lastHeartbeat field, it will not update anymore.

$ rosa edit addon ocs-consumer-qe -c jijoy-s12-c1
? Storage Provider API Endpoint: 10.0.102.2:31222


(Add a random value of Endpoint keeping everything else as the current value. 10.0.102.2 does not exist)


The value changed in the consumer cluster:

$ oc get storagecluster -o yaml | grep storageProviderEndpoint
      storageProviderEndpoint: 10.0.102.2:31222


The value of lastHeartbeat field in the storageconsumer CR is not changing.

$ date && oc get storageconsumer -o yaml | grep lastHeartbeat && sleep 300 && oc get storageconsumer -o yaml | grep lastHeartbeat
Mon Sep 12 04:11:58 PM IST 2022
    lastHeartbeat: "2022-09-12T10:29:07Z"
    lastHeartbeat: "2022-09-12T10:29:07Z"
    

$ oc get storageconsumer -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
  kind: StorageConsumer
  metadata:
    annotations:
      ocs.openshift.io/provider-onboarding-ticket: |
        eyJpZCI6ImNkNjU0MDlhLTA2NDgtNGJlOS1hMjViLTk3ODJhMDlmMDZkNCIsImV4cGlyYXRpb25EYXRlIjoiMTY2MzEzNjc3MiJ9.iI/NdR4ZwI9wLU0U1TvzaMDqKgYQ0rMqV2AX78Z0wD66JUAwGatFLJ8gTkFD2ey5N0B/5pmNqdXVCsvmOseYnPER60C9SmfGXnm+Y9GgOlyxWjbQkNggTMAG59yWYj5rE0jvRZxyyZCu/O/0iiqOIp11pExRfHOHoYZRPPAjnzRUsNgon5U5Qs27LBzuy8qSQsTpY+I46Q0Mpwh5b4xxOEGq8tMwpjPXUT4p90MWKMVzfuELKPPjCf3eXPok4qO1hXLOtsa8y4zYg5MSmEqBr63rVcJd3+jQjrtSa+rb5VtfnBx254k+FJGR6j+MQGqeDTWVbh8zdCRgQyj9+VrC4bBuAZF+1wA/OKzaAzcT8oDkhssxhhNkVYNCqgFdL3KZlN2VSGcFDjJ+Ww+8Z9ObRbGZeI2gy3IqIAVCEgJtOqR7bGs/e+/uSTxJJ115XxOPl4FUFPnPCkbKnJaa/jJvqkx7p956Dg+DA8TJBLFPktnNnaTJJR2qNBx+TSNzJOkq5pPnVS/NT3CxicV/2nSHpdfjHEfKMf1cY+LEtbKUAt55wrT0b2uf5fhNS0teDtT+Y189OeBrM2HTs4NmH3Tk+Fa+BVKkanoX0wRL/NsVtMy8vYruNSbsfUCEZJ62QGPDt7WKshzqz+BvfdbxcsqPDpV8sLfEAmDxin70Rb9nQGI=
    creationTimestamp: "2022-09-12T06:27:59Z"
    finalizers:
    - storagesconsumer.ocs.openshift.io
    generation: 2
    name: storageconsumer-18595625-7c68-49c0-a411-88bf84b23b60
    namespace: openshift-storage
    resourceVersion: "651745"
    uid: ea0978f7-03cf-480f-a2ca-cbd74d24f8fe
  spec:
    capacity: 1Pi
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 370e476009884d204effbc012fb6b36d
      status: Ready
    grantedCapacity: 1Pi
    lastHeartbeat: "2022-09-12T10:29:07Z"
    state: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



The pods "report-status-to-provider" on the consumer cluster are not in the correct state. This is expected with a wrong value of storageproviderEndpoint.
$ oc get pods -o wide | grep report-status-to-provider
report-status-to-provider-27716307-g2w45           0/1     Completed          0              17m    10.129.2.68    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716308-v99ft           0/1     Completed          0              16m    10.131.0.66    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716309-66wf4           0/1     Completed          0              15m    10.131.0.67    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716318-gv6fg           0/1     CrashLoopBackOff   5 (109s ago)   6m7s   10.129.2.74    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716319-6tgtc           0/1     CrashLoopBackOff   5 (43s ago)    5m7s   10.131.0.74    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716320-fz9jg           1/1     Running            5 (90s ago)    4m7s   10.129.2.75    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716321-2jnrt           0/1     CrashLoopBackOff   4 (26s ago)    3m7s   10.131.0.75    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716322-xndcm           0/1     CrashLoopBackOff   3 (23s ago)    2m7s   10.131.0.76    ip-10-0-157-29.ec2.internal    <none>           <none>
report-status-to-provider-27716323-rgdz4           0/1     Error              2 (37s ago)    67s    10.129.2.76    ip-10-0-173-22.ec2.internal    <none>           <none>
report-status-to-provider-27716324-xtrlx           1/1     Running            0              7s     10.129.2.77    ip-10-0-173-22.ec2.internal    <none>           <none>

Comment 20 errata-xmlrpc 2022-09-14 15:15:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.11.1 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6525