Bug 2015305 - SR-IOV and PTP operators don't get past pending.
Summary: SR-IOV and PTP operators don't get past pending.
Keywords:
Status: CLOSED DUPLICATE of bug 2021151
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Ian Miller
QA Contact: Jeff Uphoff
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-18 20:28 UTC by Jeff Uphoff
Modified: 2021-12-08 21:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-08 21:09:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 2 yliu1 2021-10-18 23:33:45 UTC
Seems expected clusterrolebindings are missing for both sriov and ptp operators. 

For sriov, clusterolebinding for openshift-sriov-operator is missing: 
[kni@ran-vcl01-installer ~]$ oc get pods -n openshift-sriov-network-operator 
No resources found in openshift-sriov-network-operator namespace.

[kni@ran-vcl01-installer ~]$ oc get clusterrolebindings -A | grep sriov
sriov-network-operator.4.8.0-202110121407-65d74b9b84                        ClusterRole/sriov-network-operator.4.8.0-202110121407-65d74b9b84                        5h55m

[kni@ran-vcl01-installer ~]$ oc describe clusterrolebindings sriov-network-operator.4.8.0-202110121407-65d74b9b84 | grep Service
              olm.owner.kind=ClusterServiceVersion
  ServiceAccount  sriov-network-config-daemon  openshift-sriov-network-operator


For ptp, clusterrolebinding for ptp config daemon is missing while the operator crb exists:

kni@ran-vcl01-installer ~]$ oc get clusterrolebindings -A | grep ptp
ptp-operator.4.8.0-202110121407-6d46bbc9f5                                  ClusterRole/ptp-operator.4.8.0-202110121407-6d46bbc9f5                                  6h7m
[kni@ran-vcl01-installer ~]$ oc describe clusterrolebindings ptp-operator.4.8.0-202110121407-6d46bbc9f5 | grep Service
              olm.owner.kind=ClusterServiceVersion
  ServiceAccount  ptp-operator  openshift-ptp


For comprison, following is from a working cluster, which should have two clusterrolebindings (one for operator and one for daemon) for ptp and sriov in order for appropriate service accounts to be created.
Healthy SRIOV:
[yliu1@yliu1 ~]$ oc get clusterrolebindings -A | grep sriov
sriov-network-operator.4.9.0-202110121402-77656d84dd                        ClusterRole/sriov-network-operator.4.9.0-202110121402-77656d84dd                        5d4h
sriov-network-operator.4.9.0-202110121402-7b777bd698                        ClusterRole/sriov-network-operator.4.9.0-202110121402-7b777bd698                        5d4h

[yliu1@yliu1 ~]$ oc describe clusterrolebindings sriov-network-operator.4.9.0-202110121402-77656d84dd | grep ServiceAcc
  ServiceAccount  sriov-network-operator  openshift-sriov-network-operator
[yliu1@yliu1 ~]$ oc describe clusterrolebindings sriov-network-operator.4.9.0-202110121402-7b777bd698  |grep ServiceAcc
  ServiceAccount  sriov-network-config-daemon  openshift-sriov-network-operator

Healthy PTP:
[yliu1@yliu1 ~]$ oc get clusterrolebindings -A | grep ptp
ptp-operator.4.8.0-202110011559-8bb895549                                   ClusterRole/ptp-operator.4.8.0-202110011559-8bb895549                                   5d7h
ptp-operator.4.8.0-202110011559-b875d6955                                   ClusterRole/ptp-operator.4.8.0-202110011559-b875d6955                                   5d7h

[yliu1@yliu1 ~]$ oc describe clusterrolebindings ptp-operator.4.8.0-202110011559-8bb895549 | grep ServiceA
  ServiceAccount  linuxptp-daemon  openshift-ptp
[yliu1@yliu1 ~]$ oc describe clusterrolebindings ptp-operator.4.8.0-202110011559-b875d6955 | grep ServiceA
  ServiceAccount  ptp-operator  openshift-ptp



Operator versions used is 4.8.0-202110121407 along with ocp 4.8.15: 
[kni@ran-vcl01-installer ~]$ oc get csv -A |grep -E "sriov|ptp"
openshift-ptp                                      performance-addon-operator.v4.8.2           Performance Addon Operator   4.8.2                           Succeeded
openshift-ptp                                      ptp-operator.4.8.0-202110121407             PTP Operator                 4.8.0-202110121407              Pending
openshift-sriov-network-operator                   performance-addon-operator.v4.8.2           Performance Addon Operator   4.8.2                           Succeeded
openshift-sriov-network-operator                   sriov-network-operator.4.8.0-202110121407   SR-IOV Network Operator      4.8.0-202110121407              Pending

Comment 4 yliu1 2021-10-18 23:50:47 UTC
Workaround: Delete affected pending csv, subs and installplans. So the proper clusterrolebindings will get created in new installplan.

Comment 5 Ken Young 2021-11-01 17:14:24 UTC
Jeff,

What is the frequency of this occuring?

/KenY

Comment 6 Jeff Uphoff 2021-11-02 12:23:21 UTC
(In reply to Ken Young from comment #5)
> Jeff,
> 
> What is the frequency of this occuring?
> 
> /KenY

When I was actively trying to run tests on the system in question, it was something of order half the time or so.

Yang, does that sound about right? It was frequent enough to be a real roadblock. It was also observed on at least one other system.

Comment 7 yliu1 2021-11-02 13:05:11 UTC
Yes we deployed cnfocto2 with 4.8.13&4.8.15 a few times (~4-5) and encountered this at least 2 times as I can remember. 
I also encountered this issue on a qe cluster on 4.9, but it is much more rare than cnfocto2. Couple of differences between cnfocto2 and qe cluster (although not sure if they contribute to the frequency of this issue): 1) cnfocto2 is co-located in same lab with the hub cluster, 2) cnfocto2 has proper sriov and ptp configs.

Comment 8 Ken Young 2021-12-08 17:32:33 UTC
Ian,

I am assuming this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2021151.  Correct?

/KenY

Comment 9 Ian Miller 2021-12-08 19:53:01 UTC
Yes. This BZ has the same root cause as BZ 2021151 but presents with different symptoms (operators fail to install) and workaround for those symptoms.

Comment 10 Ken Young 2021-12-08 21:09:36 UTC

*** This bug has been marked as a duplicate of bug 2021151 ***


Note You need to log in before you can comment on or make changes to this bug.