Login
Log in using an SSO provider:
Fedora Account System
Red Hat Associate
Red Hat Customer
Login using a Red Hat Bugzilla account
Forgot Password
Create an Account
Red Hat Bugzilla – Attachment 1628082 Details for
Bug 1656292
[backport][OSP 13] NVIDIA vGPU support for Guests in RHOSP (AI/ML use case)
Home
New
Search
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh90 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
[?]
This site requires JavaScript to be enabled to function correctly, please enable it.
vgpu-deployment+cuda-tests.txt
vgpu deployment steps.txt (text/plain), 37.82 KB, created by
Archit Modi
on 2019-10-22 16:06:09 UTC
(
hide
)
Description:
vgpu-deployment+cuda-tests.txt
Filename:
MIME Type:
Creator:
Archit Modi
Created:
2019-10-22 16:06:09 UTC
Size:
37.82 KB
patch
obsolete
>(undercloud) [stack@undercloud-0 images]$ cp overcloud-full.qcow2 overcloud-full-gpu.qcow2 >(undercloud) [stack@undercloud-0 images]$ ls >ironic-python-agent.initramfs overcloud-full-gpu.qcow2 overcloud-full.qcow2 overcloud-full-signature.manifest >ironic-python-agent.kernel overcloud-full.initrd overcloud-full-rpm.manifest overcloud-full.vmlinuz >(undercloud) [stack@undercloud-0 images]$ ll >total 3437652 >-rw-r--r--. 1 stack stack 466025643 Aug 27 15:23 ironic-python-agent.initramfs >-rwxr-xr-x. 1 stack stack 6730032 Aug 27 15:23 ironic-python-agent.kernel >-rw-r--r--. 1 stack stack 1486749696 Oct 7 16:57 overcloud-full-gpu.qcow2 >-rw-r--r--. 1 stack stack 66953629 Aug 27 15:36 overcloud-full.initrd >-rw-r--r--. 1 stack stack 1486749696 Aug 27 15:48 overcloud-full.qcow2 >-rw-r--r--. 1 stack stack 54877 Aug 27 15:44 overcloud-full-rpm.manifest >-rw-r--r--. 1 stack stack 144557 Aug 27 15:44 overcloud-full-signature.manifest >-rwxr-xr-x. 1 stack stack 6730032 Aug 27 15:36 overcloud-full.vmlinuz >(undercloud) [stack@undercloud-0 images]$ sudo yum install genisoimage -y >Package genisoimage-1.1.11-25.el7.x86_64 already installed and latest version >Nothing to do >(undercloud) [stack@undercloud-0 images]$ wget http://10.39.168.132/GRID-9.1.tar.gz >--2019-10-07 17:00:53-- http://10.39.168.132/GRID-9.1.tar.gz >Connecting to 10.39.168.132:80... connected. >HTTP request sent, awaiting response... 200 OK >Length: 200618724 (191M) [application/x-gzip] >Saving to: âGRID-9.1.tar.gzâ > >100%[====================================================================================================================>] 200,618,724 4.14MB/s in 34s > >2019-10-07 17:01:27 (5.64 MB/s) - âGRID-9.1.tar.gzâ saved [200618724/200618724] > >(undercloud) [stack@undercloud-0 images]$ tar -xvf GRID-9.1.tar.gz >GRID-9.1/ >GRID-9.1/GRID9.1-GA-430.46-RHEL-Host-Drivers.zip >GRID-9.1/NVIDIA-Linux-x86_64-430.46-grid.run >GRID-9.1/NVIDIA-Linux-x86_64-430.46-vgpu-kvm.run >(undercloud) [stack@undercloud-0 GRID-9.1]$ unzip GRID9.1-GA-430.46-RHEL-Host-Drivers.zip >Archive: GRID9.1-GA-430.46-RHEL-Host-Drivers.zip > inflating: NVIDIA-vGPU-rhel-7.5-430.46.x86_64.rpm > inflating: NVIDIA-vGPU-rhel-7.6-430.46.x86_64.rpm > inflating: NVIDIA-vGPU-rhel-7.7-430.46.x86_64.rpm > inflating: NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm >(undercloud) [stack@undercloud-0 GRID-9.1]$ ls >GRID9.1-GA-430.46-RHEL-Host-Drivers.zip NVIDIA-vGPU-rhel-7.5-430.46.x86_64.rpm NVIDIA-vGPU-rhel-8.0-430.46.x86_64.rpm >NVIDIA-Linux-x86_64-430.46-grid.run NVIDIA-vGPU-rhel-7.6-430.46.x86_64.rpm >NVIDIA-Linux-x86_64-430.46-vgpu-kvm.run NVIDIA-vGPU-rhel-7.7-430.46.x86_64.rpm >(undercloud) [stack@undercloud-0 images]$ genisoimage -o nvidia-guest.iso -R -J -V NVIDIA GRID-9.1/ >I: -input-charset not specified, using utf-8 (detected in locale settings) >... >... >commandrvf: stdout=n stderr=y flags=0x0 >commandrvf: udevadm --debug settle -E /dev/sdb >calling: settle >fsync /dev/sdb >commandrvf: stdout=n stderr=y flags=0x0 >commandrvf: udevadm --debug settle -E /dev/sdc >calling: settle >libguestfs: calling virDomainDestroy flags=VIR_DOMAIN_DESTROY_GRACEFUL >libguestfs: closing guestfs handle 0x2178830 (state 0) >libguestfs: command: run: rm >libguestfs: command: run: \ -rf /tmp/libguestfsvc8MFw >libguestfs: command: run: rm >libguestfs: command: run: \ -rf /tmp/libguestfs7npbBv >(undercloud) [stack@undercloud-0 images]$ virt-customize -a overcloud-full-gpu.qcow2 --selinux-relabel >[ 0.0] Examining the guest ... >[ 22.5] Setting a random seed >[ 22.7] SELinux relabelling > >[ 662.8] Finishing off >(undercloud) [stack@undercloud-0 images]$ mkdir image >(undercloud) [stack@undercloud-0 images]$ guestmount -a overcloud-full-gpu.qcow2 -i --ro image >(undercloud) [stack@undercloud-0 images]$ cd image/ >(undercloud) [stack@undercloud-0 image]$ ls >bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var >(undercloud) [stack@undercloud-0 image]$ ls boot/ >config-3.10.0-1062.el7.x86_64 initramfs-0-rescue-5ef221507c47465bb6159b71c407c40e.img vmlinuz-0-rescue-5ef221507c47465bb6159b71c407c40e >efi initramfs-3.10.0-1062.el7.x86_64.img vmlinuz-3.10.0-1062.el7.x86_64 >extlinux symvers-3.10.0-1062.el7.x86_64.gz >grub2 System.map-3.10.0-1062.el7.x86_64 >(undercloud) [stack@undercloud-0 image]$ cd .. >(undercloud) [stack@undercloud-0 images]$ mkdir vgpu >(undercloud) [stack@undercloud-0 images]$ cd vgpu/ >(undercloud) [stack@undercloud-0 vgpu]$ cp ../image/boot/vmlinuz-3.10.0-1062.el7.x86_64 ./overcloud-full-gpu.vmlinuz >(undercloud) [stack@undercloud-0 vgpu]$ ls >overcloud-full-gpu.vmlinuz >(undercloud) [stack@undercloud-0 vgpu]$ cp ../image/boot/initramfs-3.10.0-1062.el7.x86_64.img ./overcloud-full-gpu.initrd >(undercloud) [stack@undercloud-0 vgpu]$ mv ../overcloud-full-gpu.qcow2 . >(undercloud) [stack@undercloud-0 vgpu]$ ls >overcloud-full-gpu.initrd overcloud-full-gpu.qcow2 overcloud-full-gpu.vmlinuz >(undercloud) [stack@undercloud-0 vgpu]$ cd .. >(undercloud) [stack@undercloud-0 images]$ cp overcloud-full-rpm.manifest overcloud-full-signature.manifest ironic-python-agent.kernel ironic-python-agent.initramfs vgpu/ >(undercloud) [stack@undercloud-0 images]$ cd vgpu/ >(undercloud) [stack@undercloud-0 vgpu]$ ls >ironic-python-agent.initramfs overcloud-full-gpu.initrd overcloud-full-gpu.vmlinuz overcloud-full-signature.manifest >ironic-python-agent.kernel overcloud-full-gpu.qcow2 overcloud-full-rpm.manifest >(undercloud) [stack@undercloud-0 vgpu]$ openstack overcloud image upload --update-existing --os-image-name overcloud-full-gpu.qcow2 >Image "overcloud-full-gpu-vmlinuz" was uploaded. >+--------------------------------------+----------------------------+-------------+---------+--------+ >| ID | Name | Disk Format | Size | Status | >+--------------------------------------+----------------------------+-------------+---------+--------+ >| 0ac1e0df-d84f-4347-8fa7-db8e439c8cf5 | overcloud-full-gpu-vmlinuz | aki | 6730032 | active | >+--------------------------------------+----------------------------+-------------+---------+--------+ >Image "overcloud-full-gpu-initrd" was uploaded. >+--------------------------------------+---------------------------+-------------+----------+--------+ >| ID | Name | Disk Format | Size | Status | >+--------------------------------------+---------------------------+-------------+----------+--------+ >| 899701d8-ec13-47f3-85c5-c2764dd58f06 | overcloud-full-gpu-initrd | ari | 66953629 | active | >+--------------------------------------+---------------------------+-------------+----------+--------+ >Image "overcloud-full-gpu" was uploaded. >+--------------------------------------+--------------------+-------------+------------+--------+ >| ID | Name | Disk Format | Size | Status | >+--------------------------------------+--------------------+-------------+------------+--------+ >| 92b60d97-9c67-4118-bd9f-790c92e2bffd | overcloud-full-gpu | qcow2 | 1615331328 | active | >+--------------------------------------+--------------------+-------------+------------+--------+ >Image "bm-deploy-kernel" is up-to-date, skipping. >Image "bm-deploy-ramdisk" is up-to-date, skipping. >Image file "/httpboot/agent.kernel" is up-to-date, skipping. >Image file "/httpboot/agent.ramdisk" is up-to-date, skipping. >undercloud) [stack@undercloud-0 ~]$ cat gpu.yaml >parameter_defaults: > ComputeGpuExtraConfig: > nova::compute::vgpu::enabled_vgpu_types: > - nvidia-105 >(undercloud) [stack@undercloud-0 ~]$ cat virt/nodes_data.yaml >parameter_defaults: > ControllerCount: 1 > OvercloudControlFlavor: control > ComputeCount: 0 > OvercloudComputeFlavor: compute > OvercloudComputeGpuFlavor: compute-vgpu-nvidia > ComputeGpuCount: 1 > >undercloud) [stack@undercloud-0 ~]$ openstack overcloud roles generate -o /home/stack/templates/gpu_roles_data.yaml Controller Compute >(undercloud) [stack@undercloud-0 ~]$ vi /home/stack/templates/gpu_roles_data.yaml >(undercloud) [stack@undercloud-0 ~]$ cat /home/stack/templates/gpu_roles_data.yaml >~~~ >############################################################################### ># Role: ComputeGpu # >############################################################################### >- name: ComputeGpu > description: | > GPU Compute Node role > CountDefault: 1 > ImageDefault: overcloud-full-gpu > networks: > - InternalApi > - Tenant > - Storage > HostnameFormatDefault: '%stackname%-compute-%index%' > RoleParametersDefault: > TunedProfileName: "virtual-host" > # Deprecated & backward-compatible values (FIXME: Make parameters consistent) > # Set uses_deprecated_params to True if any deprecated params are used. > uses_deprecated_params: True > deprecated_param_image: 'NovaImage' > deprecated_param_extraconfig: 'NovaComputeExtraConfig' > deprecated_param_metadata: 'NovaComputeServerMetadata' > deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints' > deprecated_param_ips: 'NovaComputeIPs' > deprecated_server_resource_name: 'NovaCompute' > deprecated_nic_config_name: 'compute.yaml' > disable_upgrade_deployment: True > deprecated_nic_config_name: 'compute-gpu.yaml' > update_serial: 25 > ServicesDefault: > - OS::TripleO::Services::Aide > - OS::TripleO::Services::AuditD > - OS::TripleO::Services::CACerts > - OS::TripleO::Services::CephClient > - OS::TripleO::Services::CephExternal > - OS::TripleO::Services::CertmongerUser > - OS::TripleO::Services::Collectd > - OS::TripleO::Services::ComputeCeilometerAgent > - OS::TripleO::Services::ComputeNeutronCorePlugin > - OS::TripleO::Services::ComputeNeutronL3Agent > - OS::TripleO::Services::ComputeNeutronMetadataAgent > - OS::TripleO::Services::ComputeNeutronOvsAgent > - OS::TripleO::Services::Docker > - OS::TripleO::Services::Fluentd > - OS::TripleO::Services::IpaClient > - OS::TripleO::Services::Ipsec > - OS::TripleO::Services::Iscsid > - OS::TripleO::Services::Kernel > - OS::TripleO::Services::LoginDefs > - OS::TripleO::Services::MetricsQdr > - OS::TripleO::Services::MySQLClient > - OS::TripleO::Services::NeutronBgpVpnBagpipe > - OS::TripleO::Services::NeutronLinuxbridgeAgent > - OS::TripleO::Services::NeutronVppAgent > - OS::TripleO::Services::NovaCompute > - OS::TripleO::Services::NovaLibvirt > - OS::TripleO::Services::NovaLibvirtGuests > - OS::TripleO::Services::NovaMigrationTarget > - OS::TripleO::Services::Ntp > - OS::TripleO::Services::ContainersLogrotateCrond > - OS::TripleO::Services::OpenDaylightOvs > - OS::TripleO::Services::Rhsm > - OS::TripleO::Services::RsyslogSidecar > - OS::TripleO::Services::Securetty > - OS::TripleO::Services::SensuClient > - OS::TripleO::Services::SkydiveAgent > - OS::TripleO::Services::Snmp > - OS::TripleO::Services::Sshd > - OS::TripleO::Services::Timezone > - OS::TripleO::Services::TripleoFirewall > - OS::TripleO::Services::TripleoPackages > - OS::TripleO::Services::Tuned > - OS::TripleO::Services::Vpp > - OS::TripleO::Services::OVNController > - OS::TripleO::Services::OVNMetadataAgent > - OS::TripleO::Services::Ptp > >(undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh >#!/bin/bash > >openstack overcloud deploy \ >--timeout 100 \ >--templates /usr/share/openstack-tripleo-heat-templates \ >--stack overcloud \ >--libvirt-type kvm \ >--ntp-server 192.168.24.1 \ >-r /home/stack/templates/gpu_roles_data.yaml \ >-e /home/stack/virt/config_lvm.yaml \ >-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ >-e /home/stack/virt/network/network-environment.yaml \ >-e /home/stack/virt/hostnames.yml \ >-e /home/stack/virt/debug.yaml \ >-e /home/stack/virt/nodes_data.yaml \ >-e /home/stack/templates/overcloud_images.yaml \ >-e /home/stack/gpu.yaml \ >--log-file overcloud_deployment_33.log > >2019-10-09 21:17:37Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully > > Stack overcloud CREATE_COMPLETE > >Host 10.0.0.108 not found in /home/stack/.ssh/known_hosts >Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: 60b7f967-bd58-4733-96da-9453ca666467 >Overcloud Endpoint: http://10.0.0.108:5000/ >Overcloud Horizon Dashboard URL: http://10.0.0.108:80/dashboard >Overcloud rc file: /home/stack/overcloudrc >Overcloud Deployed > >(undercloud) [stack@undercloud-0 images]$ cat nvidia-prepare-guest.sh >#/bin/bash > ># Add build tooling >sudo yum install -y wget >sudo wget -O /tmp/rhos-release.rpm http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm >sudo rpm -ivh /tmp/rhos-release.rpm >sudo rhos-release 13 > >sudo yum upgrade -y >sudo yum install -y gcc make kernel-devel cpp glibc-devel glibc-headers kernel-headers libmpc mpfr > ># NVIDIA GRID guest script >mkdir /tmp/mount >mount LABEL=NVIDIA /tmp/mount >/bin/sh /tmp/mount/NVIDIA-Linux-x86_64-430.46-vgpu-kvm.run > >mkdir -p /etc/nvidia >cp /tmp/mount/gridd.conf /etc/nvidia >(overcloud) [stack@undercloud-0 images]$ ## skipping this as not needed by OSP engineering #virt-customize -a rhel-server-7.7-update-1-x86_64-kvm-gpu.qcow2 -v --run nvidia-prepare-guest.sh >(overcloud) [stack@undercloud-0 images]$ openstack image create rhelgpu --file rhel-server-7.7-update-1-x86_64-kvm-gpu.qcow2 --disk-format qcow2 --container-format bare --public >+------------------+------------------------------------------------------------------------------+ >| Field | Value | >+------------------+------------------------------------------------------------------------------+ >| checksum | b1273843189321ca849b63ab41e40686 | >| container_format | bare | >| created_at | 2019-10-10T12:40:48Z | >| disk_format | qcow2 | >| file | /v2/images/833dbd05-7af8-496f-bcc7-457d63190e46/file | >| id | 833dbd05-7af8-496f-bcc7-457d63190e46 | >| min_disk | 0 | >| min_ram | 0 | >| name | rhelgpu | >| owner | e1bf3db70216442d849059689fa7b9a2 | >| properties | direct_url='swift+config://ref1/glance/833dbd05-7af8-496f-bcc7-457d63190e46' | >| protected | False | >| schema | /v2/schemas/image | >| size | 823984128 | >| status | active | >| tags | | >| updated_at | 2019-10-10T12:40:53Z | >| virtual_size | None | >| visibility | public | >+------------------+------------------------------------------------------------------------------+ >(overcloud) [stack@undercloud-0 images]$ openstack image list >+--------------------------------------+---------+--------+ >| ID | Name | Status | >+--------------------------------------+---------+--------+ >| 833dbd05-7af8-496f-bcc7-457d63190e46 | rhelgpu | active | >+--------------------------------------+---------+--------+ > >(overcloud) [stack@undercloud-0 images]$ openstack flavor create --vcpus 6 --ram 8192 --disk 100 m1.small-gpu >+----------------------------+--------------------------------------+ >| Field | Value | >+----------------------------+--------------------------------------+ >| OS-FLV-DISABLED:disabled | False | >| OS-FLV-EXT-DATA:ephemeral | 0 | >| disk | 100 | >| id | e185c0d7-5023-4b66-ace9-2279b77c42cf | >| name | m1.small-gpu | >| os-flavor-access:is_public | True | >| properties | | >| ram | 8192 | >| rxtx_factor | 1.0 | >| swap | | >| vcpus | 6 | >+----------------------------+--------------------------------------+ >(overcloud) [stack@undercloud-0 images]$ openstack flavor set m1.small-gpu --property "resources:VGPU=1" > >[root@overcloud-compute-0 boot]# lsmod |grep nouveau >nouveau 1898794 0 >mxm_wmi 13021 1 nouveau >video 24538 1 nouveau >i2c_algo_bit 13413 2 mgag200,nouveau >drm_kms_helper 186531 2 mgag200,nouveau >ttm 96673 2 mgag200,nouveau >drm 456166 5 ttm,drm_kms_helper,mgag200,nouveau >wmi 21636 4 dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau >[root@overcloud-compute-0 boot]# cat /etc/default/grub >GRUB_TIMEOUT=5 >GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" >GRUB_DEFAULT=saved >GRUB_DISABLE_SUBMENU=true >GRUB_TERMINAL_OUTPUT="console" >GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet" >GRUB_DISABLE_RECOVERY="true" >[root@overcloud-compute-0 boot]# vi /etc/default/grub >[root@overcloud-compute-0 ~]# cat /etc/default/grub >GRUB_TIMEOUT=5 >GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" >GRUB_DEFAULT=saved >GRUB_DISABLE_SUBMENU=true >GRUB_TERMINAL_OUTPUT="console" >GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau" >GRUB_DISABLE_RECOVERY="true" >[root@overcloud-compute-0 boot]# grub2-mkconfig -o /boot/grub2/grub.cfg >Generating grub configuration file ... >Found linux image: /boot/vmlinuz-3.10.0-1062.el7.x86_64 >Found initrd image: /boot/initramfs-3.10.0-1062.el7.x86_64.img >Found linux image: /boot/vmlinuz-0-rescue-5ef221507c47465bb6159b71c407c40e >Found initrd image: /boot/initramfs-0-rescue-5ef221507c47465bb6159b71c407c40e.img >done >[root@overcloud-compute-0 boot]# vi /boot/grub2/grub.cfg >[root@overcloud-compute-0 boot]# reboot > > >[root@overcloud-compute-0 ~]# ls /sys/class/mdev_bus/*/mdev_supported_types >nvidia-105 nvidia-107 nvidia-109 nvidia-111 nvidia-113 nvidia-115 nvidia-217 nvidia-299 nvidia-301 >nvidia-106 nvidia-108 nvidia-110 nvidia-112 nvidia-114 nvidia-163 nvidia-247 nvidia-300 >[root@overcloud-compute-0 ~]# nvidia-smi >Fri Oct 11 18:35:58 2019 >+-----------------------------------------------------------------------------+ >| NVIDIA-SMI 430.46 Driver Version: 430.46 CUDA Version: N/A | >|-------------------------------+----------------------+----------------------+ >| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | >| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | >|===============================+======================+======================| >| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off | >| N/A 32C P0 27W / 250W | 1066MiB / 16383MiB | 0% Default | >+-------------------------------+----------------------+----------------------+ > >+-----------------------------------------------------------------------------+ >| Processes: GPU Memory | >| GPU PID Type Process name Usage | >|=============================================================================| >| 0 6295 C+G vgpu 1016MiB | >+-----------------------------------------------------------------------------+ > > >(overcloud) [stack@undercloud-0 images]$ openstack resource provider inventory list fee832cc-5e4b-48bb-bbea-adc7d502ac07 >+----------------+------------------+----------+----------+-----------+----------+--------+ >| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | >+----------------+------------------+----------+----------+-----------+----------+--------+ >| VCPU | 16.0 | 40 | 0 | 1 | 1 | 40 | >| MEMORY_MB | 1.0 | 261617 | 4096 | 1 | 1 | 261617 | >| VGPU | 1.0 | 16 | 0 | 1 | 1 | 16 | >| DISK_GB | 1.0 | 558 | 0 | 1 | 1 | 558 | >+----------------+------------------+----------+----------+-----------+----------+--------+ >(overcloud) [stack@undercloud-0 ~]$ openstack server create --image rhelgpu --key-name mykey --flavor m1.small-gpu testvm --nic net-id=$SID --wait >+-------------------------------------+----------------------------------------------------------+ >| Field | Value | >+-------------------------------------+----------------------------------------------------------+ >| OS-DCF:diskConfig | MANUAL | >| OS-EXT-AZ:availability_zone | nova | >| OS-EXT-SRV-ATTR:host | overcloud-compute-0.redhat.local | >| OS-EXT-SRV-ATTR:hypervisor_hostname | overcloud-compute-0.redhat.local | >| OS-EXT-SRV-ATTR:instance_name | instance-00000003 | >| OS-EXT-STS:power_state | Running | >| OS-EXT-STS:task_state | None | >| OS-EXT-STS:vm_state | active | >| OS-SRV-USG:launched_at | 2019-10-11T19:35:59.000000 | >| OS-SRV-USG:terminated_at | None | >| accessIPv4 | | >| accessIPv6 | | >| addresses | private=192.168.100.7 | >| adminPass | Mfh6GW48fk2U | >| config_drive | | >| created | 2019-10-11T19:35:49Z | >| flavor | m1.small-gpu (e185c0d7-5023-4b66-ace9-2279b77c42cf) | >| hostId | e31b16400cf1594d56f95053d0cfbd508c9c17d6135fac4fc504e7ed | >| id | aaba7faa-982a-4034-8101-36fa6d560511 | >| image | rhelgpu (46e33b52-104f-4950-842b-848938b38fae) | >| key_name | mykey | >| name | testvm | >| progress | 0 | >| project_id | e1bf3db70216442d849059689fa7b9a2 | >| properties | | >| security_groups | name='default' | >| status | ACTIVE | >| updated | 2019-10-11T19:35:59Z | >| user_id | f6b6862da74d49fb93ff95dd772c1474 | >| volumes_attached | | >+-------------------------------------+----------------------------------------------------------+ >(overcloud) [stack@undercloud-0 ~]$ IP=$(neutron floatingip-create public -f value -c floating_ip_address) >neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. >(overcloud) [stack@undercloud-0 ~]$ openstack server add floating ip testvm $IP >(overcloud) [stack@undercloud-0 ~]$ openstack server list --long >+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+------------+--------------------------------------+--------------+--------------------------------------+-------------------+----------------------------------+------------+ >| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties | >+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+------------+--------------------------------------+--------------+--------------------------------------+-------------------+----------------------------------+------------+ >| aaba7faa-982a-4034-8101-36fa6d560511 | testvm | ACTIVE | None | Running | private=192.168.100.7, 10.0.0.205 | rhelgpu | 46e33b52-104f-4950-842b-848938b38fae | m1.small-gpu | e185c0d7-5023-4b66-ace9-2279b77c42cf | nova | overcloud-compute-0.redhat.local | | >+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+------------+--------------------------------------+--------------+--------------------------------------+-------------------+----------------------------------+------------+ > >[cloud-user@testvm ~]$ cat /etc/redhat-release >Red Hat Enterprise Linux Server release 7.7 (Maipo) > >#remove nouveau drivers >[root@testvm cloud-user]# vi /etc/default/grub >[root@testvm cloud-user]# cat /etc/default/grub >GRUB_TIMEOUT=1 >GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" >GRUB_DEFAULT=saved >GRUB_DISABLE_SUBMENU=true >GRUB_TERMINAL_OUTPUT="console" >GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet modprobe.blacklist=nouveau rd.driver.blacklist=nouveau" >GRUB_DISABLE_RECOVERY="true" >[root@testvm cloud-user]# grub2-mkconfig -o /boot/grub2/grub.cfg >Generating grub configuration file ... >Found linux image: /boot/vmlinuz-3.10.0-1062.el7.x86_64 >Found initrd image: /boot/initramfs-3.10.0-1062.el7.x86_64.img >Found linux image: /boot/vmlinuz-0-rescue-bad7977df6364822b9965066e39ff2fb >Found initrd image: /boot/initramfs-0-rescue-bad7977df6364822b9965066e39ff2fb.img >done >[root@testvm cloud-user]# vi /boot/grub2/grub.cfg >[root@testvm cloud-user]# reboot >[root@testvm cloud-user]# lsmod |grep nouveau >[root@testvm cloud-user]# lsmod | grep nvidia >nvidia_vgpu_vfio 49783 0 >nvidia 19046028 10 nvidia_vgpu_vfio >mdev 20336 2 vfio_mdev,nvidia_vgpu_vfio >vfio 32657 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 >ipmi_msghandler 56728 2 ipmi_devintf,nvidia > >[cloud-user@testvm]$ bash NVIDIA-Linux-x86_64-430.46-grid.run --kernel-source-path /usr/src/kernels/3.10.0-1062.4.1.el7.x86_64/ > >[root@testvm cloud-user]# nvidia-smi >Thu Oct 17 11:36:32 2019 >+-----------------------------------------------------------------------------+ >| NVIDIA-SMI 430.46 Driver Version: 430.46 CUDA Version: N/A | >|-------------------------------+----------------------+----------------------+ >| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | >| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | >|===============================+======================+======================| >| 0 GRID V100-1Q On | 00000000:00:05.0 Off | N/A | >| N/A N/A P0 N/A / N/A | 80MiB / 1016MiB | 0% Default | >+-------------------------------+----------------------+----------------------+ > >+-----------------------------------------------------------------------------+ >| Processes: GPU Memory | >| GPU PID Type Process name Usage | >|=============================================================================| >| No running processes found | >+-----------------------------------------------------------------------------+ >[root@testvm cloud-user]# nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 >GRID V100-1Q > > 56 sudo sshuttle --dns -r root@10.0.0.1 0.0.0.0/0 > 62 sudo yum install wget -y > 63 wget http://10.39.168.132/GRID-9.1.tar.gz > 64 sudo yum install -y gcc make kernel-devel cpp glibc-devel glibc-headers kernel-headers libmpc mpfr pciutils > >tmux session: >[cloud-user@test-egallen ~]$ sudo sshuttle --dns -r root@10.0.0.1 10.19.158.15/32 >root@10.0.0.1's password: >Connected. > > >[root@test-egallen ~]# sudo systemctl status nvidia-gridd.service >â nvidia-gridd.service - NVIDIA Grid Daemon > Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled) > Active: active (running) since Fri 2019-10-18 12:27:21 EDT; 1min 10s ago > Process: 6763 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS) > Main PID: 6764 (nvidia-gridd) > CGroup: /system.slice/nvidia-gridd.service > ââ6764 /usr/bin/nvidia-gridd > >Oct 18 12:27:21 test-egallen systemd[1]: Starting NVIDIA Grid Daemon... >Oct 18 12:27:21 test-egallen nvidia-gridd[6764]: Started (6764) >Oct 18 12:27:21 test-egallen systemd[1]: Started NVIDIA Grid Daemon. >Oct 18 12:27:22 test-egallen nvidia-gridd[6764]: Ignore service provider licensing >Oct 18 12:27:23 test-egallen nvidia-gridd[6764]: Service provider detection complete. >Oct 18 12:27:23 test-egallen nvidia-gridd[6764]: Calling load_byte_array(tra) >Oct 18 12:27:24 test-egallen nvidia-gridd[6764]: Acquiring license for GRID vGPU Edition. >Oct 18 12:27:24 test-egallen nvidia-gridd[6764]: Calling load_byte_array(tra) >Oct 18 12:27:32 test-egallen nvidia-gridd[6764]: License acquired successfully. (Info: http://dhcp158-15.virt.lab.eng.bos.redhat.com:7070/request; GRID-Virtual-WS,2.0) > >[root@test-egallen cloud-user]# cat /etc/yum.conf >[main] >proxy=socks5://localhost:8080 > >[root@test-egallen cloud-user]# yum -y install nvidia-container-runtime-hook >[root@test-egallen cloud-user]# systemctl start docker >[root@test-egallen cloud-user]# sudo docker run docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1 >[Vector addition of 50000 elements] >Copy input data from the host memory to the CUDA device >CUDA kernel launch with 196 blocks of 256 threads >Copy output data from the CUDA device to the host memory >Test PASSED >Done > >[root@overcloud-compute-0 ~]# cat /etc/redhat-release >Red Hat Enterprise Linux Server release 7.7 (Maipo) >[root@overcloud-compute-0 ~]# cat /etc/rhosp-release >Red Hat OpenStack Platform release 13.0.8 (Queens) >[root@overcloud-compute-0 ~]# lspci -nn | grep NVIDIA >3b:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] [10de:1db4] (rev a1) >[root@overcloud-compute-0 ~]# sudo cat /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf | grep -i nvidia ># Some pGPUs (e.g. NVIDIA GRID K1) support different vGPU types. User can use ># enabled_vgpu_types = GRID K100,Intel GVT-g,MxGPU.2,nvidia-11 >enabled_vgpu_types=nvidia-105 >[root@overcloud-compute-0 ~]# nvidia-smi >Tue Oct 22 15:37:24 2019 >+-----------------------------------------------------------------------------+ >| NVIDIA-SMI 430.46 Driver Version: 430.46 CUDA Version: N/A | >|-------------------------------+----------------------+----------------------+ >| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | >| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | >|===============================+======================+======================| >| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off | >| N/A 32C P0 27W / 250W | 2078MiB / 16383MiB | 0% Default | >+-------------------------------+----------------------+----------------------+ >+-----------------------------------------------------------------------------+ >| Processes: GPU Memory | >| GPU PID Type Process name Usage | >|=============================================================================| >| 0 36719 C+G vgpu 1016MiB | >| 0 94989 C+G vgpu 1012MiB | >+-----------------------------------------------------------------------------+ >[root@overcloud-compute-0 ~]# systemctl status nvidia-vgpu-mgr >â nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon > Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled) > Active: active (running) since Fri 2019-10-11 14:02:13 UTC; 1 weeks 4 days ago > Process: 1771 ExecStart=/usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS) > Main PID: 1779 (nvidia-vgpu-mgr) > Tasks: 9 > Memory: 34.8M > CGroup: /system.slice/nvidia-vgpu-mgr.service > ââ 1779 /usr/bin/nvidia-vgpu-mgr > ââ36719 vgpu > ââ94989 vgpu > >Oct 22 14:52:58 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: (0x0): Init frame copy engine: syncing... >Oct 22 14:52:58 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: (0x0): vGPU migration disabled >Oct 22 14:52:58 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: display_init inst: 0 successful >Oct 22 14:53:07 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ######## >Oct 22 14:53:07 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: Driver Version: 430.46 >Oct 22 14:53:07 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: vGPU version: 0x30002 >Oct 22 14:53:07 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: (0x0): Current max guest pfn = 0x230e5f! >Oct 22 14:54:44 overcloud-compute-0 nvidia-vgpu-mgr[36719]: notice: vmiop_log: (0x0): vGPU license state: (0x00000001) > >[root@overcloud-compute-0 ~]# systemctl status nvidia-vgpud >â nvidia-vgpud.service - NVIDIA vGPU Daemon > Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpud.service; enabled; vendor preset: disabled) > Active: inactive (dead) since Fri 2019-10-11 14:02:14 UTC; 1 weeks 4 days ago > Process: 1909 ExecStopPost=/bin/rm -rf /var/run/nvidia-vgpud (code=exited, status=0/SUCCESS) > Process: 1773 ExecStart=/usr/bin/nvidia-vgpud (code=exited, status=0/SUCCESS) > Main PID: 1784 (code=exited, status=0/SUCCESS) > >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: ECC supported: 0x1 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: Multi vGPU supported: 0x1 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: Encoder Capacity: 0x64 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: BAR1 Length: 0x4000 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: Frame Rate Limiter enabled: 0x1 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: Number of Displays: 1 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: Display: width 4096, height 2160 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: License: NVIDIA-vComputeServer,9.0;Quadro-Virtual-DWS,5.0 >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: PID file unlocked. >Oct 11 14:02:14 overcloud-compute-0 nvidia-vgpud[1784]: PID file closed. >[root@overcloud-compute-0 ~]# virsh list --all > Id Name State >---------------------------------------------------- > 2 instance-00000003 running > 5 instance-00000004 running > >[root@overcloud-compute-0 ~]# sudo virsh dumpxml instance-00000003 | grep mdev > <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'> >[root@overcloud-compute-0 ~]# >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 1656292
: 1628082 |
1630626
|
1630627