Bug 1632833

Summary: scsi host device passthrough limits IO writes
Product: Red Hat Enterprise Linux 7 Reporter: joherr
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: gaojianan <jgao>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.5CC: coli, dyuan, eblake, hhan, jdenemar, jiyan, joherr, jsuchane, knoel, kwolf, lmen, lsurette, mtessun, mzhan, nsoffer, pbonzini, rbarry, salmy, srevivo, toneata, xuwei, xuzhang, ycui
Target Milestone: pre-dev-freezeKeywords: Upstream
Target Release: 7.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1639670 (view as bug list) Environment:
Last Closed: 2019-08-06 13:14:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
prepare_iscsi.sh
none
cleanup_iscsi.sh
none
code coverage none

Description joherr 2018-09-25 15:55:07 UTC
Description of problem:
When passing a local scsi device as a passthrough of a host device, the scsi disk get restricted writes to it.




Version-Release number of selected component (if applicable):
OS Version:        RHEL - 7.5 - 11.el7rhgs
Kernel Version:    3.10.0 - 862.11.6.el7.x86_64
KVM Version:       2.10.0 - 21.el7_5.4
LIBVIRT Version:   libvirt-3.9.0-14.el7_5.7
VDSM Version:      vdsm-4.20.39.1-1.el7ev







How reproducible:
Consistent


Steps to Reproduce:
1. Create a virtual machine
2. Select Host Devices on VM
3.    Select Add Device
4.    Select scsi for capability
5.    Select local scsi disk wanting to be passed through

Actual results:
VM cannot write to passthrough scsi disk.
Logs indicate the following:
The VM logs show this message:
    WARNING: Image format was not specified for '/dev/sg2' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.




Expected results:
Able to write to the disk.
When a scsi disk is passed through, I would expect the device to automatically specify raw for the device type when passing physical disks through.


Additional info:

Comment 3 Paolo Bonzini 2018-10-03 12:02:50 UTC
This is a libvirt bug.  libvirt should add format=raw when building the "-drive" option corresponding to "-device scsi-generic".

Something like:

diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index b8437463b..44124e2c5 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -4770,7 +4770,7 @@ qemuBuildSCSIHostdevDrvStr(virDomainHostdevDefPtr dev,
     } else {
         if (!(source = qemuBuildSCSIHostHostdevDrvStr(dev)))
             goto error;
-        virBufferAsprintf(&buf, "file=/dev/%s,if=none", source);
+        virBufferAsprintf(&buf, "file=/dev/%s,if=none,format=raw", source);
     }
     VIR_FREE(source);

Comment 9 Nir Soffer 2018-10-04 20:07:36 UTC
I agree that this is libvirt bug, but the way qemu fail is not helpful.

The flow is:
1. starting and provisioning a vm from iso "works" - no error.
2. after reboot the vm starts again from the iso
3. user pull hair from head
4. user find the hidden warning in /var/log/libvirt/qemu/vm-name.log

Can we change this to hard failure, easy to debug?

Comment 10 Kevin Wolf 2018-10-05 08:27:31 UTC
Nir, this behaviour is not what has been reported and I also don't think it would happen because of the missing format=raw. If you did see something like it, can you please share more details?

To expand on this, I wouldn't expect the restriction to take any effect in practice because I have never seen an actual boot sector that contains the magic of an image format at the "right" offset. This is the only case where QEMU would even try to restrict writes. And if QEMU refuses to write the magic of an image format to the disk, it doesn't fail silently, but makes this an I/O error that is visible to the guest.

On top of all of this, we have Paolo's observation that SCSI passthrough anyway bypasses the normal I/O path in QEMU, so despite the warning, QEMU won't actually refuse to write anything (because the block layer doesn't look at the SCSI commands it passes through).

So if there isn't only the warning, but actual problems with writing to the disk (what do they look like?), these are probably unrelated bugs and we need more details.

Comment 13 Jiri Denemark 2018-10-08 11:02:04 UTC
According to the comments in this BZ it looks like there's no evidence the
guest could not write to the disk. Can you confirm or provide an evidence QEMU
really refused to write something.

This certainly is a libvirt error, but if it's just about the warning with no
functional effect, the priority would be significantly different.

Comment 15 joherr 2018-10-10 04:05:47 UTC
What evidence would you like? A log file?

Paolo provided me with a patch file that prevents the messages in the libvirt log for the vm. But I still see the same issue.


Here is what I did. I also attached the messages file from the guest. I did not see any mention in the hypervisor logs of issues with the device.

From the guest OS I partition the disk. This seems to work. The disk had no partition table on it prior to this point.

But when I try to place a filesystem on the new partition, I get the following.

[root@mlbocs3 ~]# mkfs.ext2 /dev/sdb1
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
61014016 inodes, 244055552 blocks
12202777 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
7448 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information:          
Warning, had trouble writing out superblocks.


As expected from the warning, I cannot mount the filesystem.

[root@mlbocs3 ~]# ls /mnt
[root@mlbocs3 ~]# mount /dev/sdb1 /mnt
mount: /dev/sdb1 is write-protected, mounting read-only
mount: unknown filesystem type '(null)'



From the hypervisor, I can see the partition.

[root@mlb3 vdsm]# parted /dev/sdc print
Model: DELL PERC H700 (scsi)
Disk /dev/sdc: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  1000GB  1000GB               primary

Comment 16 yisun 2018-10-10 07:04:50 UTC
(In reply to joherr from comment #15)
> What evidence would you like? A log file?

In your vm log, there is a lot errors:
"Buffer I/O error on dev sdb1" and "blk_update_request: I/O error, dev sdb".

this reminds me of bz1492559 which has quite similar behaviour. Not sure if it's a qemu-kvm issue.

But I am still not able to reproduce it in my env. I'm preparing a clean 7.5.z env to reproduce it, at the same time, could pls help to confirm following info:
1. Just double confirm it's not a broken device (is the scsi device can be used on host machine? )
2. what is the scsi lun's backend device, a iscsi lun? fc lun? or something else.
3. waht's the "multipath -ll" returns on host.
Thx a lot.

Comment 17 Martin Tessun 2018-10-10 07:39:52 UTC
Hi John,

just one question: Could it be that the passed through device already is readonly, so that qemu simply can't write to it, because the kernel of the host already refuses doing so?

Looking at the following comments:
C#10:
To expand on this, I wouldn't expect the restriction to take any effect in practice because I have never seen an actual boot sector that contains the magic of an image format at the "right" offset. This is the only case where QEMU would even try to restrict writes. And if QEMU refuses to write the magic of an image format to the disk, it doesn't fail silently, but makes this an I/O error that is visible to the guest.

On top of all of this, we have Paolo's observation that SCSI passthrough anyway bypasses the normal I/O path in QEMU, so despite the warning, QEMU won't actually refuse to write anything (because the block layer doesn't look at the SCSI commands it passes through).

C#12
<Test with the same reported error, but still the writes do work>

Both of these comments look like there is an issue with the storage itself, so that the host kernel cannot write to the device already due to some reason (or that qemu might only have readonly rights to the device-file).

So could you elaborate on the following:
1. What type of storage is the device you are using (iSCSI, FC, etc.)
2. What are the permissions of the device file (/dev/sg...)
3. Can you write to the device file from the host (dd if=/dev/zero of=/dev/sg... bs=8M count=128 or something like this).
4. What does multipath -ll show? (E.g. is /dev/sg... or one of its corresponding scsi devices part of a multipath map).

Thanks!
Martin

Comment 18 joherr 2018-10-10 14:45:54 UTC
So could you elaborate on the following:
1. What type of storage is the device you are using (iSCSI, FC, etc.)
     This is a local hardware RAID1 disk.


2. What are the permissions of the device file (/dev/sg...)
     [root@mlb3 vdsm]# ls -l /dev/sg2
     crw-rw----. 1 root disk 21, 2 Oct 10 02:54 /dev/sg2


3. Can you write to the device file from the host (dd if=/dev/zero of=/dev/sg... bs=8M count=128 or something like this).
     I had tried putting a partition on the device, sdc, and creating a filesystem and that worked fine.
     However, I just tried your command using /dev/sg2, which maps to sdc, and it failed.
         [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sg2 bs=8M count=128
         dd: error writing ‘/dev/sg2’: Cannot allocate memory
         1+0 records in
         0+0 records out
         0 bytes (0 B) copied, 0.00390516 s, 0.0 kB/s

     But the same command using sdc works.
         [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sdc bs=8M count=128
         128+0 records in
         128+0 records out
         1073741824 bytes (1.1 GB) copied, 3.87749 s, 277 MB/s

     If I change the block size or remove it altogether, I get a different failure:
         [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sg2 bs=512 count=128
         dd: error writing ‘/dev/sg2’: Numerical argument out of domain
         2+0 records in
         1+0 records out
         512 bytes (512 B) copied, 0.000352753 s, 1.5 MB/s


4. What does multipath -ll show? (E.g. is /dev/sg... or one of its corresponding scsi devices part of a multipath map).
     the command returns nothing

Comment 21 Martin Tessun 2018-10-11 12:33:55 UTC
(In reply to joherr from comment #18)
> So could you elaborate on the following:
> 1. What type of storage is the device you are using (iSCSI, FC, etc.)
>      This is a local hardware RAID1 disk.
> 
> 
> 2. What are the permissions of the device file (/dev/sg...)
>      [root@mlb3 vdsm]# ls -l /dev/sg2
>      crw-rw----. 1 root disk 21, 2 Oct 10 02:54 /dev/sg2
> 
> 
> 3. Can you write to the device file from the host (dd if=/dev/zero
> of=/dev/sg... bs=8M count=128 or something like this).
>      I had tried putting a partition on the device, sdc, and creating a
> filesystem and that worked fine.
>      However, I just tried your command using /dev/sg2, which maps to sdc,
> and it failed.
>          [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sg2 bs=8M count=128
>          dd: error writing ‘/dev/sg2’: Cannot allocate memory
>          1+0 records in
>          0+0 records out
>          0 bytes (0 B) copied, 0.00390516 s, 0.0 kB/s
> 
>      But the same command using sdc works.
>          [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sdc bs=8M count=128
>          128+0 records in
>          128+0 records out
>          1073741824 bytes (1.1 GB) copied, 3.87749 s, 277 MB/s
> 
>      If I change the block size or remove it altogether, I get a different
> failure:
>          [root@mlb3 vdsm]# dd if=/dev/zero of=/dev/sg2 bs=512 count=128
>          dd: error writing ‘/dev/sg2’: Numerical argument out of domain
>          2+0 records in
>          1+0 records out
>          512 bytes (512 B) copied, 0.000352753 s, 1.5 MB/s

Checking this, /dev/sg* devices are character devices and no block devices. So I would expect this to fail.

The qemu command line shows that it uses /dev/sg2:
-drive file=/dev/sg2,if=none,id=drive-ua-e286ef86-f1f8-41c4-857a-06f80ee8eecd -device scsi-generic,bus=ua-a1b49165-4f24-4d17-bd00-5fdb3471cfe0.0,channel=0,scsi-id=0,lun=1,drive=drive-ua-e286ef86-f1f8-41c4-857a-06f80ee8eecd,id=ua-e286ef86-f1f8-41c4-857a-06f80ee8eecd

Once I do this for my config, I get (from the following libvirt.xml and an iSCSI disk):

    <hostdev mode='subsystem' type='scsi' managed='no' rawio='yes'>
      <source>
        <adapter name='scsi_host2'/>
        <address bus='0' target='0' unit='10'/>
      </source>
      <alias name='ua-eb183513-1f29-4f00-9664-840d6e8371f1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>

The qemu command line looks similar to yours:
-drive file=/dev/sg2,if=none,id=drive-ua-eb183513-1f29-4f00-9664-840d6e8371f1 -device scsi-generic,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-eb183513-1f29-4f00-9664-840d6e8371f1,id=ua-eb183513-1f29-4f00-9664-840d6e8371f1


Whithin the VM, I can see that iSCSI target:
[root@rhel7 ~]# lsscsi 
[0:0:0:0]    disk    LIO-ORG  shared01         4.0   /dev/sda 
[0:0:0:1]    cd/dvd  QEMU     QEMU CD-ROM      2.5+  /dev/sr0 
[0:0:0:2]    disk    QEMU     QEMU HARDDISK    2.5+  /dev/sdb 
[root@rhel7 ~]# 

Doing the dd on /dev/sda works as well:
[root@rhel7 ~]# dd if=/dev/zero of=/dev/sda bs=8M count=1024
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB) copied, 53.4567 s, 161 MB/s
[root@rhel7 ~]# 

The libvirt and qemu logs show:
2018-10-11T12:17:32.989530Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0)
WARNING: Image format was not specified for '/dev/sg2' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.

Trying to do dd on /dev/sg2 fails with similar errors, as it is a character device. So I would expect dd to fail here (sorry for that confusion).

So overall, I would say, this is the same setup than yours, just that mine does work, like liyuns does.

Maybe you could add a sosreport from the machine (or a complete log-collector) for further analysis. At least the device passthrough itself does not seem to be causing the issue.

Comment 22 Martin Tessun 2018-10-11 15:32:13 UTC
So I did the same test on my local RHEV environment.

Passthrough SCSI device works.
vdsm snippet:
2018-10-11 17:14:18,641+0200 INFO  (vm/923e4013) [virt.vm] (vmId='923e4013-de22-4f25-b92f-188d0a99b2b6') <?xml version="1.0" encoding="utf-8"?><domain type="kvm" xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm
/1.0">
[...]
        <hostdev managed="no" mode="subsystem" rawio="yes" type="scsi">
            <source>
                <adapter name="scsi_host22"/>
                <address bus="0" target="0" unit="1"/>
            </source>
            <alias name="ua-2d638242-ea41-4b93-8869-4f8559af1223"/>
        </hostdev>
[...]

qemu-commandline (excerpt of the disk):
-drive file=/dev/sg11,if=none,id=drive-ua-2d638242-ea41-4b93-8869-4f8559af1223 -device scsi-generic,bus=ua-372c33aa-7159-42ae-8072-154940ddab63.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-2d638242-ea41-4b93-8869-4f8559af1223,id=ua-2d638242-ea41-4b93-8869-4f8559af1223

So the setup should be similar to the one from John.

Looking into the VM:
[root@mtessun-rhel73 ~]# lsscsi 
[1:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     2.5+  /dev/sr0 
[2:0:0:0]    disk    NETAPP   LUN C-Mode       9000  /dev/sda 
[root@mtessun-rhel73 ~]# 

So the disk is there (/dev/sda)

Trying to write:
[root@mtessun-rhel73 ~]# dd if=/dev/zero of=/dev/sda bs=8M count=19
19+0 records in
19+0 records out
159383552 bytes (159 MB) copied, 1.41043 s, 113 MB/s
[root@mtessun-rhel73 ~]# 

Also works.
Trying to put a label on it, create a primary partition and format it with ext4:
[root@mtessun-rhel73 ~]# mkfs.ext4 /dev/sda1 
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=16 blocks
6553600 inodes, 26214144 blocks
1310707 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2174746624
800 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

[root@mtessun-rhel73 ~]# 

So this works as well.

Due to Nirs suggestion also doing a reboot of the VM and try the same again. Works as well:

[root@mtessun-rhel73 ~]# lsscsi 
[1:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     2.5+  /dev/sr0 
[2:0:0:0]    disk    NETAPP   LUN C-Mode       9000  /dev/sda 
[root@mtessun-rhel73 ~]# mount /dev/sda1 /mnt
[root@mtessun-rhel73 ~]# cd /mnt
[root@mtessun-rhel73 mnt]# dd if=/dev/urandom of=file.txt bs=8M count=10
10+0 records in
10+0 records out
83886080 bytes (84 MB) copied, 0.46445 s, 181 MB/s
[root@mtessun-rhel73 mnt]# cd /
[root@mtessun-rhel73 /]# umount /mnt
[root@mtessun-rhel73 /]# mount /dev/sda1 /mnt
[root@mtessun-rhel73 /]# cd /mnt
[root@mtessun-rhel73 mnt]# od file.txt | head -1
0000000 041346 076745 013046 073344 160706 142252 072777 003563
[root@mtessun-rhel73 mnt]# 


Finally checking the RHV Hypervisor kernel/vdsm/qemu versions:
[root@inf4 ~]# rpm -q -a vdsm* qemu* kernel* libvirt*
[...shortened to the relevant rpms...]
vdsm-4.20.39.1-1.el7ev.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
[root@inf4 ~]# 

As mentioned in C#21 already, could you provide sosreport/log-collector for further analysis as this is not reproducable; neither on plain qemu-kvm nor on RHV.

Comment 23 yisun 2018-10-12 02:55:27 UTC
Created attachment 1493108 [details]
prepare_iscsi.sh

Comment 24 yisun 2018-10-12 02:56:09 UTC
Created attachment 1493109 [details]
cleanup_iscsi.sh

Comment 25 yisun 2018-10-12 03:02:36 UTC
@John,
I attached two scripts to prepare and cleanup local iscsi lun. Maybe you can have a try to see if iscsi lun works in your env, thx

1. # sh prepare_iscsi.sh
2. # lsscsi -g
...
[25:0:0:0]   disk    LIO-ORG  device.logical-  4.0   /dev/sdj   /dev/sg6 
[26:0:0:0]   disk    LIO-ORG  device.logical-  4.0   /dev/sdi   /dev/sg5 
<==== now host machine login iscsi luns, you can pass them to vm such as 25:0:0:0
3. Test if the passed through device can be used in vm
4. # sh cleanup_iscsi.sh

Comment 26 Michal Privoznik 2018-10-12 11:15:40 UTC
I too am unable to reproduce. I can see the warning, but the disk from inside the VM is perfectly writable.

Anyway, I've posted a patch that Paolo suggests:

https://www.redhat.com/archives/libvir-list/2018-October/msg00740.html

Comment 27 joherr 2018-10-12 15:05:18 UTC
I tested a patched set of libvirt executables Paolo provided me. The message in the libvirt logs went away and I can see the format=raw added to the command.

But I am still encountering the issue.

Comment 28 Michal Privoznik 2018-10-12 15:08:43 UTC
(In reply to joherr from comment #27)

> 
> But I am still encountering the issue.

Therefore I'd like to revisit the decision that this is a libvirt bug. Now that libvirt is generating the correct command line, what would suggest that this is still a libvirt bug?

Comment 29 Martin Tessun 2018-10-12 15:27:42 UTC
Hi Jon,

(In reply to joherr from comment #27)
> I tested a patched set of libvirt executables Paolo provided me. The message
> in the libvirt logs went away and I can see the format=raw added to the
> command.
> 
> But I am still encountering the issue.

could you please provide the logcollector, including all sosreports from the hypervisor?

Thanks!
Martin

Comment 31 Paolo Bonzini 2018-10-13 09:58:53 UTC
What is the version of QEMU?

Comment 32 Martin Tessun 2018-10-15 08:51:49 UTC
(In reply to Paolo Bonzini from comment #31)
> What is the version of QEMU?

See Comment #0 (as KVM version):
KVM Version:       2.10.0 - 21.el7_5.4
So it is qemu-kvm-rhev-2.10.0-21.el7_5.4

Comment 35 Xueqiang Wei 2018-10-16 06:08:55 UTC
Tested on RHEL7.5, not hit this issue.

Details:

Host:
Kernel-3.10.0-862.14.3.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.7
libvirt-3.9.0-14.el7.x86_64

Guest:
Kernel-3.10.0-862.el7.x86_64


1. on host, login a iscsi server to get "/dev/sdd"

# lsscsi -g
[0:0:0:0]    disk    ATA      ST1000NX0423     LE43  /dev/sda   /dev/sg0 
[9:0:0:0]    disk    LIO-ORG  stor0            4.0   /dev/sdb   /dev/sg1 
[10:0:0:0]   disk    LIO-ORG  stor0            4.0   /dev/sdc   /dev/sg2 
[11:0:0:0]   disk    LIO-ORG  stor1            4.0   /dev/sdd   /dev/sg3

2. pass-through "/dev/sg3"

# cat test.sh

/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_nCcZP2/monitor-qmpmonitor1-20180912-022419-cHHd8c59,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_nCcZP2/monitor-catch_monitor-20180912-022419-cHHd8c59,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idQpDEaT  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_nCcZP2/serial-serial0-20180912-022419-cHHd8c59,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20180912-022419-cHHd8c59,path=/var/tmp/avocado_nCcZP2/seabios-20180912-022419-cHHd8c59,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180912-022419-cHHd8c59,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    drive id=drive_image1,if=none,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0 \
    -device virtio-net-pci,mac=9a:ad:ae:af:b0:b1,id=id942Wof,vectors=4,netdev=idirzdj4,bus=pci.0,addr=0x5  \
    -netdev tap,id=idirzdj4,vhost=on \
    -m 2G  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=d  \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:4444,server,nowait \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x6 \
    -drive file=/dev/sg3,if=none,format=raw,snapshot=off,aio=threads,cache=writethrough,id=drive-ua-2d638242-ea41-4b93-8869-4f8559af1223 \
    -device scsi-generic,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-2d638242-ea41-4b93-8869-4f8559af1223,id=ua-2d638242-ea41-4b93-8869-4f8559af1223 \

3. list disk in guest

# lsblk
NAME                             MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda                                8:0    0  20G  0 disk 
├─sda1                             8:1    0   1G  0 part /boot
└─sda2                             8:2    0  19G  0 part 
  ├─rhel_bootp--73--225--42-root 253:0    0  17G  0 lvm  /
  └─rhel_bootp--73--225--42-swap 253:1    0   2G  0 lvm  [SWAP]
sdb                                8:16   0  20G  0 disk

4. dd test on sdb

# dd if=/dev/zero of=/dev/sdb bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.9221 s, 96.0 MB/s

# dmesg |grep error
# dmesg |grep warning

5. format disk and mount it.

# parted /dev/sdb 
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt          
(parted) mkpart primary 0 10G
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? Ignore                                                     
(parted) print                                                            
Model: LIO-ORG stor1 (scsi)
Disk /dev/sdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size     File system  Name     Flags
 1      17.4kB  10.0GB  10000MB               primary

(parted) quit                                                             
Information: You may need to update /etc/fstab.

# partprobe /dev/sdb
# mkfs.ext2 /dev/sdb1 
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=1024 blocks
610800 inodes, 2441402 blocks
122070 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2503999488
75 block groups
32768 blocks per group, 32768 fragments per group
8144 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done 

# ls /mnt/
# mount /dev/sdb1  /mnt/
# dmesg |grep error

# dd if=/dev/zero of=/mnt/test bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 8.71298 s, 120 MB/s

# dmesg |grep error



If does not add “format=raw”, just get warning as below, it also works well.

# sh test.sh 
WARNING: Image format was not specified for '/dev/sg3' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
QEMU 2.10.0 monitor - type 'help' for more information

Comment 36 CongLi 2018-10-16 06:42:56 UTC
Hi John,

QE would like to confirm more details of the disk, could you please help provide  the output of 'smartctl -a /dev/sg2'?

Thanks.

Comment 37 joherr 2018-10-16 07:42:30 UTC
(In reply to CongLi from comment #36)
> Hi John,
> 
> QE would like to confirm more details of the disk, could you please help
> provide  the output of 'smartctl -a /dev/sg2'?
> 

This is what I ge when running the command.

[root@mlb1 ~]# smartctl -a /dev/sg2
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.14.4.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sg2 failed: DELL or MegaRaid controller, please try adding '-d megaraid,N'


> Thanks.

This is a RAID 1 array consisting of 2 disks. 


Here is the same command run against an individual drive. I think this drive makes up half of the RAID.

[root@mlb1 ~]# smartctl -a -d megaraid,5 /dev/sg2
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.14.4.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST1000NM0023
Revision:             GS0D
Compliance:           SPC-4
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   512 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50058afa8eb
Serial number:        Z1W24PXT
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Oct 16 07:41:50 2018 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     32 C
Drive Trip Temperature:        50 C

Manufactured in week 19 of year 2014
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  105
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  1454
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 2288115026
  Blocks received from initiator = 3432937493
  Blocks read from cache and sent to initiator = 3747053890
  Number of read and write commands whose size <= segment size = 9156646
  Number of read and write commands whose size > segment size = 948

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 7735.83
  number of minutes until next internal SMART test = 2

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2965547375        0         0  2965547375          0       2601.263           0
write:         0        0         0         0          0       1781.714           0
verify: 1157123013        0         0  1157123013          0      25516.877           0

Non-medium error count:        6

No self-tests have been logged

Comment 38 joherr 2018-10-16 07:44:24 UTC
I was able to configure another set of servers of a different vendor.

I created a vm and performed a hostdev passthrough and it is appearing to work on the second set of servers (at least on of them).

Comment 40 Paolo Bonzini 2018-10-16 10:05:30 UTC
It's 2.10.  Reproduced and preparing a fix; I'll first clone the bug for qemu-kvm-rhev.

Comment 41 Michal Privoznik 2018-10-17 07:42:35 UTC
Okay, so this bug stays on libvirt to put format=raw onto the command line. I've just pushed patch for that:

commit 641a95c9b64e74dccb55ebae8f00da3f10c1feae
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Oct 12 11:09:56 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Oct 17 09:16:20 2018 +0200

    qemu: Put format=raw onto cmd line for SCSI passthrough
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1632833
    
    When doing a SCSI passthrough we don't put format= onto the
    command line. This causes qemu to probe the format automatically
    which ends up in a warning in the domain log and possible qemu
    disabling writes to the first block (according to the warning
    message).
    
    Based-on-work-of: Paolo Bonzini <pbonzini>
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>

v4.8.0-111-g641a95c9b6

Comment 49 gaojianan 2019-04-19 03:05:46 UTC
Tried with the <hostdev> xml:
qemu-kvm-rhev-2.12.0-25.el7.x86_64
libvirt-4.5.0-12.virtcov.el7.x86_64

1. # lsscsi -g
[0:0:0:0]    disk    SEAGATE  ST9146852SS      HT62  /dev/sda   /dev/sg1 
[1:0:0:0]    cd/dvd  TEAC     DVD-ROM DV-28SW  R.2A  /dev/sr0   /dev/sg0 
[5:0:0:0]    disk    LIO-ORG  device.logical-  4.0   /dev/sdb   /dev/sg2 

2. ## virsh dumpxml demo | grep "<hostdev" -A7
    <hostdev mode='subsystem' type='scsi' managed='no' rawio='yes'>
      <source>
        <adapter name='scsi_host5'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <alias name='ua-eb183513-1f29-4f00-9664-840d6e8371f1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </hostdev>

3. ## virsh start demo 
Domain demo started

4. check the qemu log:
## cat /var/log/libvirt/qemu/avocado-vt-vm1.log | grep WARNING -A10

5. login vm and do disk write:
 ## virsh console demo
Connected to domain demo
Escape character is ^]
Red Hat Enterprise Linux Server 7.7 Beta (Maipo)
Kernel 3.10.0-837.el7.x86_64 on an x86_64
localhost login: root
Password: 

[root@localhost ~]# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0   10G  0 disk 
├─sda1          8:1    0    1G  0 part /boot
└─sda2          8:2    0    9G  0 part 
  ├─rhel-root 253:0    0    8G  0 lvm  /
  └─rhel-swap 253:1    0    1G  0 lvm  [SWAP]
sdb             8:16   0 1000M  0 disk 


[root@localhost ~]# mkfs.ext4 /dev/sdb 
mke2fs 1.42.9 (28-Dec-2013)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) y
...
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

[root@localhost ~]# mount /dev/sdb /mnt
[root@localhost ~]# dd if=/dev/urandom of=/mnt/1G.file count=1000 bs=1M
...
950+0 records in
949+0 records out
995962880 bytes (996 MB) copied, 7.16083 s, 139 MB/s


[root@localhost ~]# sync

[root@localhost ~]# ll -h /mnt/
total 950M
-rw-r--r--. 1 root root 950M Apr 19 10:03 1G.file
drwx------. 2 root root  16K Apr 19 10:02 lost+found


No error found,work as expected.

Comment 50 gaojianan 2019-04-22 03:36:26 UTC
Created attachment 1557083 [details]
code coverage

Comment 52 errata-xmlrpc 2019-08-06 13:14:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294