Bug 1654196

Summary: [RHEL8.0][USB] guest failed to boot from emulated usb-storage
Product: Red Hat Enterprise Linux 8 Reporter: Minjia Cai <micai>
Component: SLOFAssignee: Laurent Vivier <lvivier>
Status: CLOSED CURRENTRELEASE QA Contact: Minjia Cai <micai>
Severity: medium Docs Contact:
Priority: high    
Version: 8.0CC: ddepaula, dgibson, knoel, kraxel, lvivier, micai, qzhang, rbalakri, thuth, virt-maint, wchadwic
Target Milestone: rcKeywords: Regression
Target Release: 8.0   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: SLOF-20171214-5.gitfa98132.module+el8+2616+396d822d Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1661201 (view as bug list) Environment:
Last Closed: 2019-06-14 01:50:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1661201    
Attachments:
Description Flags
boot up with error
none
Implement usb/storage write operation none

Description Minjia Cai 2018-11-28 08:59:49 UTC
Created attachment 1509416 [details]
boot up with error

Description of problem:
guest failed to boot from emulated usb-storage 

Version-Release number of selected component (if applicable):
Host:
Compose: RHEL-8.0-20181105.1
kernel-4.18.0-37.el8
qemu-kvm-2.12.0-42.module+el8+2173+537e5cb5
SLOF-20171214-4.gitfa98132.module+el8+2179+85112f94

Guest:
RHEL-8.0-20181105.1
kernel-4.18.0-37.el8


How reproducible:
5/5

Steps to Reproduce:
1.boot guest with command:

[root@ibm-p9wr-09 home]# cat boot.sh
/usr/libexec/qemu-kvm \
-M rhel6.2.0 \
-enable-kvm \
-m 2048 \
-smp 2 \
-uuid b5d1b5f3-6372-4c30-a080-ebb96fb23c49 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=on \
-machine  pseries-rhel7.6.0 \
-blockdev node-name=disk2,file.driver=file,driver=qcow2,file.driver=file,file.filename=/home/data1.qcow2 \
-device virtio-blk-pci,drive=disk2,id=virt0-0-0,bootindex=2 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:44 \
-device nec-usb-xhci,id=controller,bus=pci.0,addr=07 \
-device usb-hub,id=usbhub,bus=controller.0,port=1 \
-device usb-mouse,id=usbmouse,port=1.1 \
-device usb-kbd,id=usbkbd,port=1.2 \
-device usb-tablet,id=usbtablet,port=1.3 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0 \
-blockdev node-name=disk3,file.driver=file,driver=raw,file.driver=file,file.filename=/home/RHEL-8.0-20181105.1-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=disk3,bus=scsi0.0,bootindex=1 \
-device qemu-xhci,id=ehci \
-blockdev node-name=disk1,file.driver=file,driver=qcow2,file.driver=file,file.filename=/home/rhel81.qcow2 \
-device usb-storage,drive=disk1,id=virt0-0-1,bus=ehci.0,bootindex=0 \
-monitor stdio \
-vnc :2
2.
 guest boot from usb-storage disk
3.

Actual results:
boot guest from usb-storage disk failed, error please see attachment

Expected results:
boot guest from usb-storage disk normal
Additional info:
1.On the p9 test, under nec-usb-xhci/qemu xhci/ehci controller can not  boot up.Also, I use the usb-storage disk after installing the guest, after the GUI interface click reboot, guest can't restart the success, will be submitted to the same error.please help to confirm it.

2.According to the https://bugzilla.redhat.com/show_bug.cgi?id=1633060#c3.Report this bug for tracking.

Comment 1 Gerd Hoffmann 2018-11-28 09:41:34 UTC
fails with a SLOF error msg, so setting arch to powerpc.

Comment 2 Laurent Vivier 2018-11-28 10:16:27 UTC
BZ 1491988 describes a similar problem which prevents to boot RHEL8 install CD. The error is different because the SLOF driver is different but the cause can be the same.

Could you try the same disk with another SCSI interface, like virtio-scsi, rather than the XHCI?

Comment 4 Minjia Cai 2018-11-29 01:57:24 UTC
(In reply to Laurent Vivier from comment #2)
> BZ 1491988 describes a similar problem which prevents to boot RHEL8 install
> CD. The error is different because the SLOF driver is different but the
> cause can be the same.
> 
> Could you try the same disk with another SCSI interface, like virtio-scsi,
> rather than the XHCI?

I use virtio-scsi/virtio-blk-pci driver for the same disk, and the guest can boot up.

Comment 5 Laurent Vivier 2018-12-04 13:34:21 UTC
Reproduced on P8, with latest qemu (118caff) and latest SLOF (0198ba7)

Comment 6 Laurent Vivier 2018-12-04 14:35:00 UTC
Bisected SLOF to a commit enabling the write operation to a disk:

commit a0b96fe66fcd991b407c1d67ca842921e477a6fd
Author: Thomas Huth <thuth>
Date:   Tue Nov 15 14:02:52 2016 +0100

    Provide "write" function in the disk-label package
    
    As with the "read" function, the disk-label package should
    forward the "write" function to its parent.

Comment 7 Laurent Vivier 2018-12-04 15:41:56 UTC
I've added some traces, it seems the disk receives the command and execute it without problem:

usb-msd: Command on LUN 0
usb-msd: Command tag 0x115 flags 00000000 len 10 data 1024
 0x00 0x00 0x18 0x28 0x78 0x00 0x00 0x02 0x00
scsi-disk: Command: lun=0 tag=0x115 data=0x2a
[virt0-0-1.0 id=0] WRITE_10 0x00 0x00 0x18 0x28 0x78 0x00 0x00 0x02 0x00 - to-dev len=1024
scsi-disk: Write (sector 1583224, count 2)
scsi-disk: Write complete tag=0x115 more=1024

But then we have a transfer error:

1328:usb_xhci_xfer_success 0x3fff90004a10: len 31
1328:usb_xhci_queue_event v 0, idx 111, ER_TRANSFER, CC_SUCCESS, p 0x000000000001c040, s 0x01000000, c 0x01048000
1328:usb_xhci_runtime_write off 0x0038, val 0x00003700
1328:usb_xhci_runtime_write off 0x003c, val 0x00000000
1328:usb_xhci_doorbell_write off 0x0004, val 0x00000003
1328:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
1328:usb_xhci_fetch_trb addr 0x000000000001b070, TR_NORMAL, p 0x0000000000009000, s 0x00000400, c 0x00000421
1328:usb_xhci_xfer_start 0x3fff90004a10: slotid 1, epid 3, streamid 0
1328:usb_xhci_xfer_error 0x3fff90004a10: ret -3
1328:usb_xhci_queue_event v 0, idx 112, ER_TRANSFER, CC_STALL_ERROR, p 0x000000000001b070, s 0x06000400, c 0x01038000
1328:usb_xhci_ep_state slotid 1, epid 3, running -> halted
1328:usb_xhci_runtime_write off 0x0038, val 0x00003710
1328:usb_xhci_runtime_write off 0x003c, val 0x00000000
1328:usb_xhci_doorbell_write off 0x0004, val 0x00000003
1328:usb_xhci_ep_kick slotid 1, epid 3, streamid 0

That is reported by SLOF:

USB-DISK: Bulk commad failed!
SCSI-DISK: /pci@800000020000000/usb@0/storage@1/disk@101000000000000:0,write-blocks failed
SCSI-DISK: Status -1 [UNKNOWN]

and then GRUB:

error: ../../grub-core/disk/ieee1275/ofdisk.c:609:failure writing sector
0x182878 to `ieee1275/disk'.

Comment 8 Laurent Vivier 2018-12-04 17:44:29 UTC
Thomas,

do you know if slof/fs/usb/dev-storage.fs has write support?

It's not clear, but it seems execute-scsi-command and do-bulk-command never send the data to write to the storage, only the command.

Comment 9 Thomas Huth 2018-12-05 05:24:33 UTC
As far as I know, we only enabled/checked write-support for virtio-scsi and virtio-block devices, so I assume that there is no proper write support for usb storage yet.

Comment 10 David Gibson 2018-12-05 05:56:33 UTC
Thomas, sorry, it's not obvious to me why write support is necessary for this case.

Comment 11 Laurent Vivier 2018-12-05 08:19:26 UTC
(In reply to David Gibson from comment #10)
> Thomas, sorry, it's not obvious to me why write support is necessary for
> this case.

It seems grub tries to write something to the disk.

Comment 12 Thomas Huth 2018-12-05 08:27:41 UTC
Yes, recent versions of Grub want to write to the disk in certain cases. See:

https://lists.ozlabs.org/pipermail/slof/2016-November/001374.html

Comment 13 Laurent Vivier 2018-12-05 12:51:26 UTC
I've simple patch that avoids to have a boot failure with USB disk:

diff --git a/slof/fs/usb/dev-storage.fs b/slof/fs/usb/dev-storage.fs
index 94f8421..61c7917 100644
--- a/slof/fs/usb/dev-storage.fs
+++ b/slof/fs/usb/dev-storage.fs
@@ -174,6 +174,11 @@ CONSTANT cbw-length
     \ Cleanup virtio request and response
     to usb-cmd-len to usb-cmd-addr to usb-dir to usb-buf-len to usb-buf-addr
 
+    usb-dir not usb-buf-len and IF
+        ." USB-DISK: Write command not supported " cr
+        0 0 -1 EXIT
+    THEN
+
     dma-buf usb>cmd 40 0 fill
     dma-buf usb>csw 20 0 fill

Comment 14 Laurent Vivier 2018-12-05 14:31:33 UTC
(In reply to Laurent Vivier from comment #13)
> I've simple patch that avoids to have a boot failure with USB disk

A test package can be found at
http://people.redhat.com/~lvivier/BZ1654196/SLOF-20171214-4.gitfa98132.el8.test.noarch.rpm

Comment 15 Laurent Vivier 2018-12-05 19:35:02 UTC
Created attachment 1511891 [details]
Implement usb/storage write operation

I have tried to implement the missing write operation

Comment 16 Laurent Vivier 2018-12-05 19:38:33 UTC
(In reply to Laurent Vivier from comment #15)
> Created attachment 1511891 [details]
> Implement usb/storage write operation
> 
> I have tried to implement the missing write operation

A test package can be found at
http://people.redhat.com/~lvivier/BZ1654196/SLOF-20171214-4.gitfa98132.el8BZ1654196.noarch.rpm

Comment 17 Minjia Cai 2018-12-06 06:08:18 UTC
I tested it on comment16 new SLOF package, and the guest was able to launch it without any error messages:

1.[root@ibm-p8-garrison-06 home]# rpm -qa|grep qemu
qemu-kvm-debuginfo-2.12.0-44.module+el8+2259+6d80f0a6.ppc64le
qemu-img-2.12.0-44.module+el8+2259+6d80f0a6.ppc64le
qemu-kvm-block-curl-2.12.0-44.module+el8+2259+6d80f0a6.ppc64le
qemu-kvm-2.12.0-44.module+el8+2259+6d80f0a6.ppc64le
[root@ibm-p8-garrison-06 home]# rpm -qa|grep SLOF
SLOF-20171214-4.gitfa98132.module+el8+2246+78080371.noarch
[root@ibm-p8-garrison-06 home]# uname -r
4.18.0-48.el8.ppc64le
1.boot guest
2.connect console with error messages:
[root@ibm-p8-garrison-06 vmt]# nc -U /tmp/console0 
USB-DISK: Bulk commad failed!
USB-DISK: Bulk commad failed!
error: ../../grub-core/net/net.c:1548:disk `ieee1275/disk1,msdos2' not found.
USB-DISK: Bulk commad failed!
3.Update the packages available on comment16:
[root@ibm-p8-garrison-06 home]# rpm -qa|grep SLOF
SLOF-20171214-4.gitfa98132.el8BZ1654196.noarch
4.boot guest with same qemu cli:
[root@ibm-p8-garrison-06 home]# cat boot.sh
/usr/libexec/qemu-kvm \
-enable-kvm \
-m 2048 \
-smp 2 \
-uuid b5d1b5f3-6372-4c30-a080-ebb96fb23c49 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=on \
-machine  pseries-rhel7.6.0 \
-blockdev node-name=disk2,file.driver=file,driver=qcow2,file.driver=file,file.filename=/home/data1.qcow2 \
-device virtio-blk-pci,drive=disk2,id=virt0-0-0,bootindex=2 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:44 \
-device nec-usb-xhci,id=controller,bus=pci.0,addr=07 \
-device usb-hub,id=usbhub,bus=controller.0,port=1 \
-device usb-mouse,id=usbmouse,port=1.1 \
-device usb-kbd,id=usbkbd,port=1.2 \
-device usb-tablet,id=usbtablet,port=1.3 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0 \
-blockdev node-name=disk3,file.driver=file,driver=raw,file.driver=file,file.filename=/home/RHEL-8.0-20181204.0-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=disk3,bus=scsi0.0,bootindex=1 \
-device qemu-xhci,id=ehci \
-blockdev node-name=disk1,file.driver=file,driver=qcow2,file.driver=file,file.filename=/root/kar/vt_test_images/rhel80-ppc64le-virtio.qcow2 \
-device usb-storage,drive=disk1,id=virt0-0-1,bus=ehci.0,bootindex=0 \
-monitor stdio \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
-chardev socket,path=/tmp/serial0,nowait,id=idQdLRHP,server \
-device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
-object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
-device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
-nodefaults 
5.Guest is up and running

Comment 18 Laurent Vivier 2018-12-10 14:04:18 UTC
Patch sent upstream: https://patchwork.ozlabs.org/patch/1008807/

Comment 20 Laurent Vivier 2018-12-14 13:27:25 UTC
This is a regression regarding RHEL-7.3.
All other SLOF releases (7.4, 7.5, 7.6 ...) have the same bug introduced by the series point out by Thomas in comment 12.
This happens only if we try to boot with grub2 from an USB storage.

Comment 22 Laurent Vivier 2018-12-19 10:38:38 UTC
Justification for the exception request:
- System is unable to boot from an USB storage
- This is a regression since RHEL 7.3

Comment 23 Laurent Vivier 2018-12-20 10:14:45 UTC
Patches are now merged upstream:

usb/storage: Invert the logic of the IF-statements
  https://github.com/aik/SLOF/commit/d10500a4e0378b7f02f63f78a97e3440805f1374

usb/storage: Implement block write support
  https://github.com/aik/SLOF/commit/7d72d327e231d8ae9f1e8bce9f20faeaa2278b24

Comment 25 Laurent Vivier 2018-12-20 12:14:31 UTC
(In reply to Laurent Vivier from comment #22)
> Justification for the exception request:
> - System is unable to boot from an USB storage
> - This is a regression since RHEL 7.3

It's a regression since RHEL 7.3 because in 7.3 the write operation was ignored and correctly managed.

It's a regression since RHEL 7.6 because in 7.6 GRUB2 was not trying to write to the disk at boot and so was able to boot the system.

Comment 28 Danilo de Paula 2019-01-04 11:24:59 UTC
Fix included in SLOF-20171214-5.gitfa98132.module+el8+2616+396d822d

Comment 30 Minjia Cai 2019-01-07 06:27:23 UTC
host information:
[root@ibm-p9b-04 home]# rpm -qa|grep SLOF
SLOF-20171214-4.gitfa98132.module+el8+2529+a9686a4d.noarch
[root@ibm-p9b-04 home]# rpm -qa|grep qemu
qemu-kvm-2.12.0-51.module+el8+2608+a17c4bfe.ppc64le
[root@ibm-p9b-04 home]# uname -a
Linux ibm-p9b-04.pnr.lab.eng.bos.redhat.com 4.18.0-57.el8.ppc64le #1 SMP Tue Dec 18 13:41:41 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

1.boot guest
2.connect console with error messages:
Trying to load:  from: /pci@800000020000000/usb@3/storage@1/disk@101000000000000 ...   Successfully loaded
USB-DISK: Bulk commad failed!
SCSI-DISK: /pci@800000020000000/usb@3/storage@1/disk@101000000000000:0,write-blocks failed
SCSI-DISK: Status -1 [UNKNOWN]error: ../../grub-core/disk/ieee1275/ofdisk.c:609:failure writing sector
0x182a00 to `ieee1275/disk1'.
USB-DISK: Bulk commad failed!
USB-DISK: Bulk commad failed!
error: ../../grub-core/net/net.c:1548:disk `ieee1275/disk1,msdos2' not found.
USB-DISK: Bulk commad failed!
error: ../../grub-core/net/net.c:1548:disk `ieee1275/disk1,msdos2' not found.

3.Update the packages of SLOF:
[root@ibm-p9b-04 ~]# rpm -qa|grep SLOF
SLOF-20171214-5.gitfa98132.module+el8+2616+396d822d.noarch

4.boot guest with same qemu cli:
[root@ibm-p9b-04 home]# cat usb.sh 
/usr/libexec/qemu-kvm \
-enable-kvm \
-m 2048 \
-smp 2 \
-uuid b5d1b5f3-6372-4c30-a080-ebb96fb23c49 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=on \
-machine  pseries-rhel7.6.0 \
-blockdev node-name=disk2,file.driver=file,driver=qcow2,file.driver=file,file.filename=/home/data1.qcow2 \
-device virtio-blk-pci,drive=disk2,id=virt0-0-0,bootindex=2 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:44 \
-device nec-usb-xhci,id=controller,bus=pci.0,addr=07 \
-device usb-hub,id=usbhub,bus=controller.0,port=1 \
-device usb-mouse,id=usbmouse,port=1.1 \
-device usb-kbd,id=usbkbd,port=1.2 \
-device usb-tablet,id=usbtablet,port=1.3 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0 \
-blockdev node-name=disk3,file.driver=file,driver=raw,file.driver=file,file.filename=/home/RHEL-8.0-20181120.0-ppc64le-dvd1.iso   \
-device scsi-cd,id=cd1,drive=disk3,bus=scsi0.0,bootindex=1 \
-device qemu-xhci,id=ehci \
-blockdev node-name=disk1,file.driver=file,driver=qcow2,file.driver=file,file.filename=/home/rhel80-ppc64le-virtio-scsi.qcow2  \
-device usb-storage,drive=disk1,id=virt0-0-1,bus=ehci.0,bootindex=0 \
-monitor stdio \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
-chardev socket,path=/tmp/serial0,nowait,id=idQdLRHP,server \
-device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
-object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
-device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
-nodefaults

5.Guest is up and running

In summary, I think the status of this bug can be changed to validate.