Bug 1811537
Summary: | pv attach failure on the openstack environment for ppc64le platform | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | mkumatag |
Component: | Multi-Arch | Assignee: | Prashanth Sundararaman <psundara> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Barry Donahue <bdonahue> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.3.z | CC: | aos-bugs, bbennett, bbreard, danili, dustymabe, imcleod, jligon, jsafrane, lmcfadde, manokuma, miabbott, nstielau, psundara, smilner |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-01-04 15:13:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
mkumatag
2020-03-09 07:26:27 UTC
logs - http://file.rdu.redhat.com/~mkumatag/logs/BZ1811537/must-gather.local.7959666193381648547.tar.gz [got from the reporter on Slack] $ lsblk on the node: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 30G 0 disk |-vda1 252:1 0 384M 0 part /boot |-vda2 252:2 0 4M 0 part `-vda4 252:4 0 29.6G 0 part `-coreos-luks-root-nocrypt 253:0 0 29.6G 0 dm /sysroot $ ls -la /dev/disk/by-id total 0 drwxr-xr-x. 2 root root 60 Mar 6 22:28 . drwxr-xr-x. 8 root root 160 Mar 6 22:28 .. lrwxrwxrwx. 1 root root 10 Mar 9 06:58 dm-name-coreos-luks-root-nocrypt -> ../../dm-0 There is no cinder volume attached to the node. Or at least the node does not know about it - it misses its device. OpenStack indeed thinks the volume is attached: (overcloud) [root@director-ci07 ~]# openstack volume list +--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+ | 86003647-ced9-4874-9211-e812c2ef3384 | test-znnmj-dynamic-pvc-6d354e6a-4795-47fe-896d-d72a595e87a3 | in-use | 1 | Attached to test-znnmj-worker-wtktq on /dev/vdb | But there is no /dev/vdb on the node. One less headache because we have seen that volume is getting attached to the kvm vm(worker) properly but not discovered inside the worker vm (overcloud) [root@director-ci07 ~]# openstack volume list +--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+ | 2e8d1e4f-3277-4038-ad70-bcfcf934574b | test-znnmj-dynamic-pvc-91efefcf-e02b-4205-b850-6155c42ed5bc | in-use | 1 | Attached to test-znnmj-worker-wtktq on /dev/vdb | [root@overcloud-novacomputeppc64le-2 ~]# virsh dumpxml 9 <domain type='kvm' id='9'> <name>instance-00000031</name> <uuid>ac9ebeda-e4bd-4af7-8a89-7fabcd400389</uuid> <metadata> <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0"> <nova:package version="19.0.3-0.20190814170534.a8e19af.el8ost"/> <nova:name>test-znnmj-worker-wtktq</nova:name> <nova:creationTime>2020-03-06 22:22:50</nova:creationTime> <nova:flavor name="m1.medium"> <nova:memory>16384</nova:memory> <nova:disk>30</nova:disk> <nova:swap>0</nova:swap> <nova:ephemeral>0</nova:ephemeral> <nova:vcpus>4</nova:vcpus> </nova:flavor> <nova:owner> <nova:user uuid="a2f24e0d09d443c7b17152ab9f724a25">admin</nova:user> <nova:project uuid="b94e5d3223794b4e946e2f910a656930">admin</nova:project> </nova:owner> <nova:root type="image" uuid="b1f6daf6-3fce-47a3-9208-7f5f3553b3bf"/> </nova:instance> </metadata> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <vcpu placement='static'>4</vcpu> <cputune> <shares>4096</shares> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='ppc64le' machine='pseries-rhel7.6.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-model' check='partial'> <model fallback='forbid'/> <topology sockets='4' cores='1' threads='1'/> </cpu> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/disk'/> <backingStore type='file' index='1'> <format type='raw'/> <source file='/var/lib/nova/instances/_base/9af59a19bbce080fb61dd6ae35b6d124e309fb59'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/sdb'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>2e8d1e4f-3277-4038-ad70-bcfcf934574b</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <controller type='usb' index='0' model='qemu-xhci'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </controller> <controller type='pci' index='0' model='pci-root'> <model name='spapr-pci-host-bridge'/> <target index='0'/> <alias name='pci.0'/> </controller> <interface type='bridge'> <mac address='fa:16:3e:49:3d:20'/> <source bridge='qbr6daf1aa4-11'/> <target dev='tap6daf1aa4-11'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='512'/> <mtu size='1450'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/1'/> <log file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/console.log' append='off'/> <target type='spapr-vio-serial' port='0'> <model name='spapr-vty'/> </target> <alias name='serial0'/> <address type='spapr-vio' reg='0x30000000'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <log file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/console.log' append='off'/> <target type='serial' port='0'/> <alias name='serial0'/> <address type='spapr-vio' reg='0x30000000'/> </console> <input type='tablet' bus='usb'> <alias name='input0'/> <address type='usb' bus='0' port='1'/> </input> <input type='keyboard' bus='usb'> <alias name='input1'/> <address type='usb' bus='0' port='2'/> </input> <input type='mouse' bus='usb'> <alias name='input2'/> <address type='usb' bus='0' port='3'/> </input> <graphics type='vnc' port='5901' autoport='yes' listen='192.168.85.53'> <listen type='address' address='192.168.85.53'/> </graphics> <video> <model type='vga' vram='16384' heads='1' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </video> <memballoon model='virtio'> <stats period='10'/> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memballoon> <panic model='pseries'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+107:+107</label> <imagelabel>+107:+107</imagelabel> </seclabel> </domain> [root@overcloud-novacomputeppc64le-2 ~]# [root@overcloud-novacomputeppc64le-2 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 8M 0 part ├─sda2 8:2 0 1.8T 0 part / └─sda3 8:3 0 64M 0 part sdb 8:16 0 1G 0 disk [root@overcloud-novacomputeppc64le-2 ~]# We need to debug why the disks are not getting detected in the guest dynamically! Got the information from the libvirt expert that we need the following packages and services be running to detect the disk hotplugs Packages: librtas, powerpc-utils, ppc64-diag Service: rtas_errd Based on the request @psundara built the new coreos image which above packages and services. Unfortunately, I don't the complete OCP setup to test but instead I just mimic the environment by booting the coreos image in one of the openstack environment and able to see the disks getting attached runtime kernel messages for the hotplug: [Wed Mar 11 07:07:59 2020] RTAS: event: 1, Type: Unknown, Severity: 1 [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000 [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x10: [io 0x10000-0x1003f] [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x14: [mem 0x00000000-0x00000fff] [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: No hypervisor support for SR-IOV on this device, IOV BARs disabled. [Wed Mar 11 07:08:00 2020] iommu: Adding device 0000:00:06.0 to group 0 [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 4: assigned [mem 0x210000010000-0x210000013fff 64bit pref] [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 1: assigned [mem 0x200081000000-0x200081000fff] [Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 0: assigned [io 0x100c0-0x100ff] [Wed Mar 11 07:08:00 2020] virtio-pci 0000:00:06.0: enabling device (0000 -> 0003) [Wed Mar 11 07:08:00 2020] virtio-pci 0000:00:06.0: Using 64-bit direct DMA at offset 800000000000000 [Wed Mar 11 07:08:00 2020] virtio_blk virtio3: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) lspci: [root@mkumatag-new-coreos-test core]# lspci 00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device 00:02.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01) 00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:04.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon 00:05.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device <================ New interface lsblk: [root@mkumatag-new-coreos-test core]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 160G 0 disk |-vda1 252:1 0 4M 0 part |-vda2 252:2 0 384M 0 part /boot `-vda4 252:4 0 159.6G 0 part `-coreos-luks-root-nocrypt 253:0 0 159.6G 0 dm /sysroot vdb 252:16 0 1G 0 disk <================ New disk attached via cinder [root@mkumatag-new-coreos-test core]# rpm -qa | grep diag ppc64-diag-2.7.5-2.el8.ppc64le [root@mkumatag-new-coreos-test core]# rpm -qa | grep powerpc-utils powerpc-utils-1.3.6-4.el8.ppc64le powerpc-utils-core-1.3.6-4.el8.ppc64le [root@mkumatag-new-coreos-test core]# systemctl status rtas_errd ● rtas_errd.service - ppc64-diag rtas_errd (platform error handling) Service Loaded: loaded (/usr/lib/systemd/system/rtas_errd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-03-11 07:05:50 UTC; 21min ago Process: 6760 ExecStart=/usr/sbin/rtas_errd (code=exited, status=0/SUCCESS) Main PID: 6768 (rtas_errd) Tasks: 1 (limit: 26213) Memory: 15.3M CGroup: /system.slice/rtas_errd.service └─6768 /usr/sbin/rtas_errd Mar 11 07:05:50 mkumatag-new-coreos-test.openstacklocal systemd[1]: Starting ppc64-diag rtas_errd (platform error h> Mar 11 07:05:50 mkumatag-new-coreos-test.openstacklocal systemd[1]: Started ppc64-diag rtas_errd (platform error ha> Mar 11 07:08:00 mkumatag-new-coreos-test drmgr[6998]: drmgr: -c pci -a -s 0x40000030 -n -d4 -V [root@mkumatag-new-coreos-test core]# I'm moving this issue to coreos to include these packages and service into the coreos 43 builds Chatted with Dennis on this bug, I don't believe that OSP on OCP is supported at the moment, is this bug something that an IBM engineer could investigate into? If not, I would like to close this bug and track it via BZ 1659152. (In reply to Dan Li from comment #13) > Chatted with Dennis on this bug, I don't believe that OSP on OCP is > supported at the moment, is this bug something that an IBM engineer could > investigate into? If not, I would like to close this bug and track it via BZ > 1659152. @manokuma @lmcfadde ^^^ Adding "UpcomingSprint" as this bug is unlikely to be fixed before the end of the sprint on August 1st. The (in/out of) scope status is still being discussed upon. so this history is that we did have a fix for this: https://github.com/coreos/fedora-coreos-config/pull/296 The CoreOS team expressed concerns over adding these packages as it includes undesired dependencies like Perl. So https://bugzilla.redhat.com/show_bug.cgi?id=1814335 was worked on to exclude perl from these packages. Unfortunately the packages are only available in 8.3 so we will have to wait for RHCOS to move to 8.3 for this test to work in CI. Meanwhile I think Manju has already disabled these tests for CI. Hi Prashanth, will this bug be looked at before the end of this sprint this week (I assume probably not based on your Comment 16)? If not, I would like to add "UpcomingSprint" tag Hi @Prashanth and @Manju, since this bug was reported on 4.3.z and 4.3 will go end of maintenance support after 4.6 releases next week, do you still see this bug as needed? If not, we should close this bug. If needed, we should re-target the version to a later release. Adding UpcomingSprint as this bug is unlikely to be resolved before the end of the current sprint The latest 4.7 builds have the powerpc-utils-core and ppc64-diag packages and rtas_errd enabled. Manju, Could you test it and let me know if it works now? Thanks Prashanth Didn't have an entire OCP with Openstack setup to verify the bug but could able to verify with a minimal single node VM in Openstack environment and here is the detail: Now I'm able to see those new packages installed and service running as below: [root@mkumatag-rhcos-47 ~]# rpm -qa | grep power powerpc-utils-core-1.3.6-11.el8.ppc64le [root@mkumatag-rhcos-47 ~]# rpm -qa | grep diag ppc64-diag-rtas-2.7.6-2.el8.ppc64le [root@mkumatag-rhcos-47 ~]# systemctl status rtas_errd ● rtas_errd.service - ppc64-diag rtas_errd (platform error handling) Service Loaded: loaded (/usr/lib/systemd/system/rtas_errd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2020-12-24 00:52:20 UTC; 1min 13s ago Process: 1359 ExecStart=/usr/sbin/rtas_errd (code=exited, status=0/SUCCESS) Main PID: 1374 (rtas_errd) Tasks: 1 (limit: 103532) Memory: 9.1M CGroup: /system.slice/rtas_errd.service └─1374 /usr/sbin/rtas_errd Dec 24 00:52:20 mkumatag-rhcos-47 systemd[1]: Starting ppc64-diag rtas_errd (platform error handling) Service... Dec 24 00:52:20 mkumatag-rhcos-47 systemd[1]: Started ppc64-diag rtas_errd (platform error handling) Service. Before volume attach: [root@mkumatag-rhcos-47 ~]# lspci 00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device 00:02.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01) 00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:04.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon 00:05.0 VGA compatible controller: Device 1234:1111 (rev 02) [root@mkumatag-rhcos-47 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 160G 0 disk |-vda1 252:1 0 4M 0 part |-vda3 252:3 0 384M 0 part /boot `-vda4 252:4 0 159.6G 0 part /sysroot After volume attach: [root@mkumatag-rhcos-47 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 160G 0 disk |-vda1 252:1 0 4M 0 part |-vda3 252:3 0 384M 0 part /boot `-vda4 252:4 0 159.6G 0 part /sysroot vdb 252:16 0 3G 0 disk <==================== NEW volume got attached here [root@mkumatag-rhcos-47 ~]# ls /dev/vdb /dev/vdb RHCOS version: [root@mkumatag-rhcos-47 ~]# uname -a Linux mkumatag-rhcos-47 4.18.0-240.8.1.el8_3.ppc64le #1 SMP Fri Dec 4 12:02:23 EST 2020 ppc64le ppc64le ppc64le GNU/Linux [root@mkumatag-rhcos-47 ~]# [root@mkumatag-rhcos-47 ~]# cat /etc/os-release NAME="Red Hat Enterprise Linux CoreOS" VERSION="47.83.202012231312-0" VERSION_ID="4.7" OPENSHIFT_VERSION="4.7" RHEL_VERSION="8.3" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 47.83.202012231312-0 (Ootpa)" ID="rhcos" ID_LIKE="rhel fedora" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.7" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.7" OSTREE_VERSION='47.83.202012231312-0' [root@mkumatag-rhcos-47 ~]# So this issue has fixed and worked properly without any issues. Per Manju's Comment 30. I'm closing this bug as "Currentrelease" Please re-open if needed. |