Bug 1994434 - service account sriov-network-config-daemon disappeared when sriov operator upgrade from 4.8 to 4.9 version
Summary: service account sriov-network-config-daemon disappeared when sriov operator u...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1991493 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 09:20 UTC by zhaozhanqi
Modified: 2023-09-15 01:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:46:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 556 0 None None None 2021-08-18 02:29:15 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:47:06 UTC

Description zhaozhanqi 2021-08-17 09:20:32 UTC
Description of problem:

when upgrade sriov operator from 4.8 to 4.9, found the service account did not be created in namespace. 

When describe the install plan, I can see the serviceaccount


    Resolving:           sriov-network-operator.4.9.0-202108130204
    Resource:
      Group:             
      Kind:              ServiceAccount
      Manifest:          {"kind":"ConfigMap","name":"4d4332d303fc98b8e66a4cbc963b223d79db25315b4aa67148b4e4bc8039aed","namespace":"openshift-marketplace","catalogSourceName":"qe-app-registry","catalogSourceNamespace":"openshift-marketplace","replaces":"sriov-network-operator.4.8.0-202108130208","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovNetworkPoolConfig\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovNetwork\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovOperatorConfig\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovIBNetwork\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovNetworkNodePolicy\",\"version\":\"v1\"}},{\"type\":\"olm.gvk\",\"value\":{\"group\":\"sriovnetwork.openshift.io\",\"kind\":\"SriovNetworkNodeState\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"sriov-network-operator\",\"version\":\"4.9.0-202108130204\"}}]}"}
      Name:              sriov-network-config-daemon
      Source Name:       qe-app-registry
      Source Namespace:  openshift-marketplace
      Version:           v1
    Status:              Present


However actually, the sa 'sriov-network-config-daemon' is not exist. 

# oc get sa -n openshift-sriov-network-operator
NAME                            SECRETS   AGE
builder                         2         79m
default                         2         79m
deployer                        2         79m
network-resources-injector-sa   2         69m
operator-webhook-sa             2         69m
sriov-network-operator          2         77m
# 


# oc get csv
NAME                                        DISPLAY                   VERSION              REPLACES                                    PHASE
sriov-network-operator.4.9.0-202108130204   SR-IOV Network Operator   4.9.0-202108130204   sriov-network-operator.4.8.0-202108130208   Succeeded


Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. setup 4.9 cluster
2. setup sriov operator with 4.8 version
3. update channel to 4.9 for upgrade
4. check the serviceaccount 'sriov-network-config-daemon'

Actual results:

serviceaccount 'sriov-network-config-daemon' did not be created after upgrade.

Expected results:



Additional info:

Not sure if this is OLM issue. so assign olm team for helping check firstly.  please let know if you need more logs, thanks.

Comment 1 Kevin Rizza 2021-08-17 13:33:39 UTC
Is the 4.9 version of the operator bundle available anywhere for us to test this? The index image for 4.9 doesn't exist yet. Also, can you provide a must gather and an olm must gather `oc adm inspect --dest-dir=must-gather-olm -A olm`?

Comment 2 Kevin Rizza 2021-08-17 13:51:49 UTC
I poked around in the sriov manifest data, I think this is just because this fix didn't remove enough of the service accounts:

https://github.com/openshift/sriov-network-operator/pull/550/files

OLM generates all the service accounts defined in the CSV at bundle install time, and it looks like the sriov operator was already attempting to fix the pivoting problem you can run into on upgrade with the sriov-network-operator service account in the above pull request, but the manifest for the sriov-network-config-daemon service account wasn't removed

https://github.com/openshift/sriov-network-operator/blob/master/bundle/manifests/sriov-network-config-daemon_v1_serviceaccount.yaml#L5

The operator manifests need to be updated to remove that service account as well, otherwise OLM is going to have upgrade problems. We recently added validation logic to the operator-framework to prevent adding service accounts like this to your operator bundle: https://github.com/operator-framework/api/pull/144/files

Reassigning this to the sr-iov component

Comment 3 Peng Liu 2021-08-17 14:07:51 UTC
@krizza the service account 'sriov-network-config-daemon' is not defined in the CSV file, as it is not used by the operator deployment. If we remove the sriov-network-config-daemon_v1_serviceaccount.yaml from the operator bundle, how can this service account be created by OLM?

Comment 4 Peng Liu 2021-08-17 15:24:49 UTC
BTW, in my environment, I can see the SA sriov-network-config-daemon was first created, then removed, when the operator upgrade starts.

Comment 6 zhaozhanqi 2021-08-19 09:02:45 UTC
*** Bug 1991493 has been marked as a duplicate of this bug. ***

Comment 8 Peng Liu 2021-08-19 12:25:04 UTC
There is a limitation in OLM, that if a service account was defined in the CSV file in the previous release, it cannot be moved out of the CSV file, otherwise, the SA will be removed by OLM when we upgrade the bundle.

Comment 9 zhaozhanqi 2021-08-20 10:17:38 UTC
Verified this bug on 4.9.0-202108191042

# oc get csv
NAME                                        DISPLAY                   VERSION              REPLACES                                    PHASE
sriov-network-operator.4.9.0-202108191042   SR-IOV Network Operator   4.9.0-202108191042   sriov-network-operator.4.8.0-202108181331   Succeeded

# oc get sa
NAME                            SECRETS   AGE
builder                         2         8h
default                         2         8h
deployer                        2         8h
network-resources-injector-sa   2         8h
operator-webhook-sa             2         8h
sriov-device-plugin             2         7h58m
sriov-network-config-daemon     2         8h
sriov-network-operator          2         8h

Comment 12 errata-xmlrpc 2021-10-18 17:46:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 13 Red Hat Bugzilla 2023-09-15 01:13:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.