Bug 1413927 - Guest SCSI block LUN device exhibits spurious but apparently bogus disk I/O errors
Summary: Guest SCSI block LUN device exhibits spurious but apparently bogus disk I/O e...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ademar Reis
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-17 10:52 UTC by Didier
Modified: 2017-04-11 19:46 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-11 18:14:53 UTC
Target Upstream Version:


Attachments (Terms of Use)
I/O errors (3.30 KB, text/plain)
2017-01-17 10:54 UTC, Didier
no flags Details
libvirt XML definition (5.67 KB, text/html)
2017-01-17 10:59 UTC, Didier
no flags Details
dom0 inventory (71.18 KB, application/x-gzip)
2017-01-17 11:01 UTC, Didier
no flags Details
domU inventory (50.31 KB, application/x-gzip)
2017-01-17 11:03 UTC, Didier
no flags Details

Description Didier 2017-01-17 10:52:04 UTC
Description of problem:

In a domU, when accessing a physical SCSI dom0 disk device defined as a SCSI LUN, spurious disk I/O errors are reported.
Accessing the disk directly in the dom0 reveals no errors.

Workaround :

Attaching the SCSI device as a "SCSI Disk" :
    <disk type='block' device='disk'>
instead of a "SCSI Lun" :
    <disk type='block' device='lun'>
fixes the problem.


Version-Release number of selected component (if applicable):

kernel-3.10.0-514.2.2.el7.x86_64 (both dom0 and domU)
qemu-kvm-1.5.3-126.el7.x86_64


This is very reproducible, in that the I/O errors always occur, but on different disk locations : repeated reboots result in e.g. different mountpoints failing.

Note : both dom0 and domU are fully updated CentOS 7.3 installations. If required for support or bug report purposes, I could consider switching both of them to RHEL licenses if absolutely necessary.
However, as I am having a workaround (device="lun" -> device="disk"), I'd rather refrain from that.

Please find attached the relevant logs, libvirt definitions and inventory data
(cp0094 = dom0, cp0196 = domU).

Comment 1 Didier 2017-01-17 10:54:56 UTC
Created attachment 1241666 [details]
I/O errors

I/O dmesg errors revealed with xfs_repair, but easily duplicated with all disk-accessing tools, e.g. "dd", "mount", etc.

Comment 2 Didier 2017-01-17 10:59:15 UTC
Created attachment 1241671 [details]
libvirt XML definition

Accessing the three SCSI LUN devices result in I/O errors ;
Accessing the SCSI Disk device works OK.

For comparison purposes, the working SCSI Disk is pointing to the very same WWN (0x5001b4d01a16f500) as the erroneous SCSI LUN.

Comment 3 Didier 2017-01-17 11:01:36 UTC
Created attachment 1241673 [details]
dom0 inventory

FYI, "ps-aux.out" includes the qemu-kvm command line parameters.

Comment 4 Didier 2017-01-17 11:03:09 UTC
Created attachment 1241676 [details]
domU inventory

Comment 6 Amnon Ilan 2017-01-23 17:37:08 UTC
We do not support Xen Dom0 on RHEL7, so the usage of CentOS7.3 as Dom0 is not supported. 
Please discuss such issues upstream.

For information on how to contact the Red Hat support team, please visit: https://www.redhat.com/support/process/production/#howto

Comment 7 Didier 2017-01-23 21:29:49 UTC
(In reply to Amnon Ilan from comment #6)
> We do not support Xen Dom0 on RHEL7, so the usage of CentOS7.3 as Dom0 is
> not supported.

What makes you think Xen is involved ?
The dom0 is running a KVM hypervisor ; please reopen.

Comment 8 Miya Chen 2017-01-24 03:28:05 UTC
Xueqiang Could you please try to reproduce it with latest rhel7.4 qemu-kvm and qemu-kvm-rhev? Thanks.

Comment 9 Amnon Ilan 2017-01-26 16:17:11 UTC
(In reply to Didier from comment #7)
> (In reply to Amnon Ilan from comment #6)
> > We do not support Xen Dom0 on RHEL7, so the usage of CentOS7.3 as Dom0 is
> > not supported.
> 
> What makes you think Xen is involved ?
> The dom0 is running a KVM hypervisor ; please reopen.

OK, with KVM the term is "Host" instead of "Dom0".

Comment 10 Didier 2017-01-26 18:39:08 UTC
(In reply to Amnon Ilan from comment #9)

> OK, with KVM the term is "Host" instead of "Dom0".

Apologies, old habits ... :)

Comment 11 Xueqiang Wei 2017-02-10 10:16:06 UTC
Tested with <disk type='block' device='disk'>
 and <disk type='block' device='lun'>, both of them are passed, not hit this issue.

Version-Release number of selected component (if applicable):
kernel-3.10.0-514.2.2.el7.x86_64 (both host and guest)
qemu-kvm-1.5.3-126.el7.x86_64


Steps to Reproduce:
1. modify attached xml and start guest with scsi disk.
# virsh create Server-base_lun-and-scsi.xml
<domain type='kvm'>
  <name>Server-base</name>
  <uuid>b1c4a96f-b020-4756-b578-37085907c7b4</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type machine='pc-i440fx-rhel7.0.0'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/bug/rhel73-64-virtio-scsi.raw'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/bug/test.raw'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='lun'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/disk/by-id/wwn-0x50014ee20b03ec30'/>
      <target dev='sdc' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='scsi' index='1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </controller><controller type='scsi' index='2'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='scsi' index='3'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:b2:69:55'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>
    
2. do some operations on /dev/sdb: all disk-accessing tools, e.g. "dd", "mount", etc. 
3. Check I/O dmesg errors revealed with xfs_repair

Actual result:
after step 2: no error occurs
after step 3: no error occurs, and log like:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.


Some details:
# lsscsi
[2:0:0:0]    disk    QEMU     QEMU HARDDISK    1.5.  /dev/sda 
[2:0:0:1]    disk    QEMU     QEMU HARDDISK    1.5.  /dev/sdc 
[2:0:0:3]    disk    ATA      WDC WD5000AAKX-0 1H19  /dev/sdb
    
# ll /dev/disk/by-id/ | grep sd
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 lvm-pv-uuid-a30goi-WsFM-8siZ-leQB-MGsu-Agqt-IvvdLT -> ../../sdb2
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 lvm-pv-uuid-IfAKH8-zHJU-6vOt-n6SY-s8C1-Sizo-0qme3T -> ../../sda2
lrwxrwxrwx. 1 root root  9 Feb 10 16:50 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  9 Feb 10 16:50 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-1 -> ../../sdc
lrwxrwxrwx. 1 root root  9 Feb 10 16:50 scsi-350014ee20b03ec30 -> ../../sdb
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 scsi-350014ee20b03ec30-part1 -> ../../sb1
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 scsi-350014ee20b03ec30-part2 -> ../../sb2
lrwxrwxrwx. 1 root root  9 Feb 10 16:50 wwn-0x50014ee20b03ec30 -> ../../sdb
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 wwn-0x50014ee20b03ec30-part1 -> ../../sb1
lrwxrwxrwx. 1 root root 10 Feb 10 16:50 wwn-0x50014ee20b03ec30-part2 -> ../../sb2



Hi Didier,

I also tested on iscsi lun, not hit this issue.

The connection mode for scsi disk is different in our tests, could you help try it on "ATA  WDC WD5000AAKX-0" and not on "JetStor  Raid1133b_bkp2"?  Thank you very much.

[2:0:0:3]    disk    ATA      WDC WD5000AAKX-0 1H19  /dev/sdb
[2:0:0:4]    disk    JetStor  Raid1133b_bkp2   R001  /dev/sdb

Comment 12 Ademar Reis 2017-04-11 18:14:53 UTC
We could not reproduce this issue in our testing (see comment #11) and I'm closing this BZ.

Didier, if this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization that will result in a timely resolution.

For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto

Comment 13 Didier 2017-04-11 19:46:17 UTC
Dear Ademar,

Being heavily preoccupied (aren't we all ?), I did not find the time to further explore the issue, especially as I have a valid workaround ("SCSI Disk" replacing "SCSI LUN").

As such, I have no objection of this ticket being closed ; I'll reopen if/when I can reproduce.

Thanks,
Didier


Note You need to log in before you can comment on or make changes to this bug.