Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2017680

Summary:	[gcp] Couldn’t enable support for instances with GPUs on GCP
Product:	OpenShift Container Platform	Reporter:	sunzhaohua <zhsun>
Component:	Cloud Compute	Assignee:	Samuel Stuchly <sstuchly>
Cloud Compute sub component:	Other Providers	QA Contact:	sunzhaohua <zhsun>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified
Version:	4.10
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-10 16:22:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description sunzhaohua 2021-10-27 08:25:29 UTC

Description of problem: 
Couldn’t enable support for instances with GPUs on GCP


Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-10-25-190146

How reproducible:
always

Steps to Reproduce:
1. Create a new machineset
      providerSpec:
        value:
          apiVersion: gcpprovider.openshift.io/v1beta1
...
          guestAccelerators:
          - acceleratorCount: 1
            acceleratorType: nvidia-tesla-p100
          kind: GCPMachineProviderSpec
          machineType: n1-standard-1

2. Check new created machines
3.

Actual results:
GuestAccelerators are ignored in machine yaml file

  providerSpec:
    value:
      apiVersion: gcpprovider.openshift.io/v1beta1
      canIPForward: false
      credentialsSecret:
        name: gcp-cloud-credentials
      deletionProtection: false
      disks:
      - autoDelete: true
        boot: true
        image: projects/rhcos-cloud/global/images/rhcos-410-84-202110140201-0-gcp-x86-64
        labels: null
        sizeGb: 128
        type: pd-ssd
      kind: GCPMachineProviderSpec
      machineType: n1-standard-1
      metadata:
        creationTimestamp: null
      networkInterfaces:
      - network: wewang-gcp10-r5h4b-network
        subnetwork: wewang-gcp10-r5h4b-worker-subnet
      projectID: openshift-qe
      region: us-central1
      serviceAccounts:
      - email: wewang-gcp10-r5h4b-w.gserviceaccount.com
        scopes:
        - https://www.googleapis.com/auth/cloud-platform
      tags:
      - wewang-gcp10-r5h4b-worker
      userDataSecret:
        name: worker-user-data
      zone: us-central1-c


Expected results:
Could create instances with GPU successfully

Additional info:
https://issues.redhat.com/browse/OCPCLOUD-812

Comment 1 Joel Speed 2021-10-27 09:25:14 UTC

Comment 2 Joel Speed 2021-10-27 09:26:31 UTC

@Sam, Please make sure that the accelerated network fields have been copied over to the openshift/api repo as part of the migration and that the MAO repo has the latest copy of the api dependency. If you have issues, please speak to Alex who has been working on this migration.

Comment 7 sunzhaohua 2021-12-20 05:59:40 UTC

Tested with nightly build 4.10.0-0.nightly-2021-12-18-034942, all works well, move to verified.

$ oc get machine                                                                                           
NAME                               PHASE      TYPE            REGION        ZONE            AGE
zhsungcp201-r79l8-master-0         Running    n1-standard-4   us-central1   us-central1-a   169m
zhsungcp201-r79l8-master-1         Running    n1-standard-4   us-central1   us-central1-b   169m
zhsungcp201-r79l8-master-2         Running    n1-standard-4   us-central1   us-central1-c   169m
zhsungcp201-r79l8-worker-a-9knlf   Running    n1-standard-4   us-central1   us-central1-a   165m
zhsungcp201-r79l8-worker-b-xsflz   Running    n1-standard-4   us-central1   us-central1-b   165m
zhsungcp201-r79l8-worker-c-vcw54   Deleting   n1-standard-1   us-central1   us-central1-c   124m

$ oc edit machineset zhsungcp201-r79l8-worker-c
          gpus:
          - count: 1
            type: nvidia-tesla-p100
          kind: GCPMachineProviderSpec
          machineType: n1-standard-1
          onHostMaintenance: Terminate
          restartPolicy: Always

Comment 10 errata-xmlrpc 2022-03-10 16:22:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056