Bug 1264719 - Virsh vol-create-as sometimes failed to create a volume in disk pool
Virsh vol-create-as sometimes failed to create a volume in disk pool
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd (Show other bugs)
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: systemd-maint
Depends On:
  Show dependency treegraph
Reported: 2015-09-20 21:41 EDT by yangyang
Modified: 2018-02-14 18:05 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1496469 (view as bug list)
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
libvirtd.log (3.28 MB, text/plain)
2015-09-20 21:41 EDT, yangyang
no flags Details

  None (edit)
Description yangyang 2015-09-20 21:41:53 EDT
Created attachment 1075376 [details]

Description of problem:
Virsh vol-create-as sometimes failed to create a volume in disk pool. Following error out
error: Storage volume not found: no storage vol with matching path '/dev/sde1'

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Prepare an active disk pool
# virsh pool-define disk1.pool
# virsh pool-build disk1 --overwrite
# virsh pool-start disk1
# virsh pool-list --type disk
 Name                 State      Autostart
 disk1                active     no    

# virsh pool-dumpxml disk1
<pool type='disk'>
  <capacity unit='bytes'>42944186880</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>42944154624</available>
    <device path='/dev/sde'>
      <freeExtent start='32256' end='42944186880'/>
    <format type='dos'/>

2. create a volume in disk pool
# virsh vol-create-as disk1 sde1 1G
error: Failed to create vol sde1
error: Storage volume not found: no storage vol with matching path '/dev/sde1'

Actual results:
Failed to create volume in disk pool

Expected results:
Volume in disk pool can be created without error

Additional info:
Comment 1 John Ferlan 2015-09-22 18:15:36 EDT
This just worked for me using upstream libvirt and parted.x86_64 3.2-9.fc21...

Digging through the attached libvirtd.log one will find :

virCommandRunAsync:2428 : About to run /usr/sbin/parted /dev/sde mkpart --script 'primary ext2' 32256B 1074790399B
virCommandRunAsync:2431 : Command result 0, with PID 8336
virCommandRunAsync:2428 : About to run /sbin/udevadm settle
udevEventHandleCallback:1543 : udev action: 'add'
udevGetDeviceProperty:124 : udev reports device 'sde1' does not have property 'DRIVER'
udevGetDeviceType:1245 : Could not determine device type for device with sysfs name 'sde1'
udevAddOneDevice:1413 : Discarding device -1 0x7ff1e883c0f0 /sys/devices/platform/host11/session5/target11:0:0/11:0:0:1/block/sde/sde1
virCommandRunAsync:2428 : About to run /usr/libexec/libvirt_parthelper /dev/sde
virCommandRunAsync:2431 : Command result 0, with PID 8342
virStorageBackendVolOpen:1413 : Storage volume not found: no storage vol with matching path '/dev/sde1'
virCommandRunAsync:2428 : About to run /usr/sbin/parted /dev/sde rm --script 1

So it seems for some reason parted in through udev is having a problem creating the partition.

Perhaps it's best to show the results for running the commands without libvirt being involved...  e.g.

parted /dev/sde mklabel --script msdos
parted /dev/sde print
parted /dev/sde mkpart --script 'primary ext2' 32256B 1074790399B
parted /dev/sde print

NB: If things work, the first print should show nothing (clean), the second one should show a partition 1, such as perhaps:

Model: IET VIRTUAL-DISK (scsi)
Disk /dev/sde: 10.7GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  1075MB  1075MB  primary

Although I have a feeling something is failing in the mkpart command.
Comment 3 John Ferlan 2015-10-01 12:15:05 EDT
Perhaps this is a udev problem then at least with respect /dev/sdb1 creation.

Does /dev/sdb1 ever show up?  How do you know it's a udev problem? Is there some /var/log/messages type error?

Perhaps one thing libvirt could do in the create path is to perform the settle command.  Beyond that I'm not quite sure.  Running 1000's of times in a loop certainly thrashes things well though
Comment 6 Hao Liu 2015-12-24 02:28:37 EST
I'm experiencing this issue with a very high reproducibility ~50%.


Steps to Reproduce:
1. Prepare an iscsi empty disk say sdb.

2. Change DISK to your disk name and run following bash script:
#!/usr/bin/env bash



lsblk | grep $DISK

echo "==============================="
echo "Create a partition on $DISKPATH"
parted -s -a optimal $DISKPATH mklabel msdos -- mkpart primary ext4 0 50M
lsblk | grep $DISK
sleep 1

echo "==============================="
if [ -b $PARTPATH ]; then
    echo "$PARTPATH is a block device"
    echo "$PARTPATH is not a block device"

echo "==============================="
echo "Create a pool on $DISKPATH"
virsh pool-create-as --name disk-pool --type disk --target /dev  --source-dev $DISKPATH
sleep 1

echo "==============================="
echo "Destroy pool on $DISKPATH"
virsh pool-destroy disk-pool
sleep 1

echo "==============================="
echo "Remove partition on $DISKPATH"
parted -s -a optimal $DISKPATH rm 1
lsblk | grep $DISK

3. Occasionally we'll get following error when creating the pool. From the checking before
the creation. It's clear that /dev/sdb1 already exists as a block device.
sdb             8:16   0   100M  0 disk 
Create a partition on /dev/sdb
Warning: The resulting partition is not properly aligned for best performance.
sdb             8:16   0   100M  0 disk 
└─sdb1          8:17   0  47.7M  0 part 
/dev/sdb1 is a block device
Create a pool on /dev/sdb
error: Failed to create pool disk-pool
error: Storage volume not found: no storage vol with matching path '/dev/sdb1'
Comment 7 Hao Liu 2016-02-28 20:06:17 EST
Hi John,

Since the bug is blocking our automatic testing job, I digged more into it.

I found that the error is caused by an instantaneous disappear of the partition
device. It happens when libvirt calls '/usr/libexec/libvirt_parthelper /dev/sdb'
and **ONLY** if /dev/sdb is a **SCSI** device(at least for iscsi and scsi_debug)
and not observed on a normal block device.

The exact line causing this blackout is fsync and close in function linux_close
when parthelper trying to get the partition table using ped_disk_new:

(gdb) bt                                                                        
#0  linux_close (dev=0x5555557a4f60) at arch/linux.c:1619                       
#1  0x00007ffff7ba20fc in ped_disk_new (dev=dev@entry=0x5555557a4f60)           
    at disk.c:202                                                               
#2  0x00005555555622de in main (argc=<optimized out>, argv=0x7fffffffdc58)      
    at storage/parthelper.c:115   

I'm not sure which component this issue should be fixed on (libvirt, libparted,
gnulib or kernel). Could you please help check this out?

Thank you.

Steps to show the blackout:

1. Prepare an SCSI debug device (/dev/sdb here):
$ modprobe scsi_debug

2. Make a partition /dev/sdb1 on /dev/sdb using fdisk

3. Run following monitor script to monitor partition device
$ python monitor.py /dev/sdb
WARNING:root:Get device...
WARNING:root:New disk...
WARNING:root:/dev/sdc1 disappear at 0.00704002380371
WARNING:root:/dev/sdc1 reshow at 0.0120160579681

#---- Start of monitor.py ----

import os
import sys
import parted
import logging
import threading
import time

def run():
    logging.warn('Get device...')
    dev = parted.getDevice(device)
    logging.warn('New disk...')

if __name__ == '__main__':
    device = sys.argv[1]
    volume = device + '1'

    timeout = 300.0

    if not os.path.exists(volume):
        exit('Need to create a partition first')

    t_run = threading.Thread(target=run)

    exist = True
    start = time.time()
    while time.time() - start < timeout:
        if exist != os.path.exists(volume):
            exist = not exist
            if exist:
                logging.warn('%s reshow at %s', volume, time.time() - start)
                logging.warn('%s disappear at %s', volume, time.time() - start)
#---- End of monitor.py ----

4. The script demonstrate a blackout of /dev/sdb1 by directly calling libparted
python binding.

5. Run libvirt_parthelper in another terminal and check out the same blackout of
$ /usr/libexec/libvirt_parthelper /dev/sdb
Comment 8 John Ferlan 2016-02-29 15:13:49 EST
I wouldn't say 50% is very high reproducibility rate. It still seems like a timing issue to me, but right now I'm off in the middle of other things which I cannot just drop to chase this. I see this more as a testing/scripting/usage issue still...

You're involving iscsid/tgtd (and networking), libvirt, parted, systemd, udev, kernel, file system, etc. If you wanted to remove libvirt from the equation - I described the equivalent libvirt commands above (mklabel and mkpart) to use instead of pool-build and pool-create-as.  That is, can you run your same script removing libvirt and have it run "forever"?

IIUC, the primary purpose of libvirt_parthelper is to print the partition table in a way libvirt expects it without involving some other tool which "could" change output format over time and mess with what libvirt may expect.  There's also a licensing issue over using parted and linking with libvirt.so (it's described in the parthelper.c sources).

There was a slight change to the code recently to handle the device mapper issue (see bz 1265694), but it seems that isn't part of what you're seeing.

Anyway, I'm not fully clear on what you mean by blackout and I have no idea if this is a bug or one of those "how it works" situations.  If by blackout you mean the device disappears, then reappears well OK, but it does come back right? and then it's available?  You are using multiple subsystems that cannot "lock each other" out and you have an external python script you're using to determine when something is available.

So parted doesn't know anything about libvirt_parthelper (and mostly vice versa). But they do call functions in some library. The parthelper code uses "ped_device_get", "ped_disk_new", and "ped_disk_next_partition" (in a loop).  that seems to be a fairly normal path for a quick google search on using the next_partition function. So nothing abnormal there.

Having a python script checking for the existence of the path to the device just says to me that something is causing the "os.path.exists" to fail (not sure what, but it could be locking related). Perhaps some sort of better error or exception handling may tell you that os.path.exists cannot open the device for read because it's not available for I/O at the moment. Not sure what you can get from python.

I would think the script needs some extra knowledge built in...  prior to creating if the partition you are looking for exists, then fail... Go through the motions of creating.  Don't declare success until you see the device available.  For the deletion path, it's the opposite.  Does the device exist?  Yes, then continue... call the deletion API's... then do the same search for whether the file exists - if it does, wait until it no longer exists.  Then your script can go back to the beginning...  Thus

    return boolean of whether passed path exists

    if !find device then error

    delete device

    while find device
        sleep, retry to look

    if find device then error

    add device

    while !find device
        sleep, retry to look
Comment 9 Hao Liu 2016-03-02 02:29:10 EST
Hi John,

Sorry about the unclear statements and reasoning. Your guess is right, The
'blackout' I mentioned means the temporary disappearance of the testing
partition. And yes, the lousy script is not quite persuasive. Sorry about that

Here is a more detailed reasoning about this bug:

1. When and how often this bug happens?

This bug happens on all (RHEL7 + iscsi/scsi_debug) machines I've tested.
But on RHEL6, this bug don't exists and the 'blackout' doesn't happen too.

The ~50% I mentioned in the previous comments is an estimated value based on my
observation of automation test failures. But the real failure rates varies by
machine the test is running on. All 3 of my tested machines shows a failure rate
higher than 95%. And I believe even ~50% is not acceptable if customers rely on
this use case.

So for libvirt automation testing where we use iscsi disks for all disk backend
storage testing, it's not rare case.

2. How this bug happens?

This bug happens when 'virStorageBackendDiskReadPartitions' is called on the
testing disk. 'virCommandRunNul' calls 'libvirt_parthelper' asynchronously and
then monitor the output in a loop, run 'virStorageBackendDiskMakeVol' if a new
volume is found. The calling stack of 'virStorageBackendDiskMakeVol ->
virStorageBackendDiskMakeDataVol -> virStorageBackendUpdateVolInfo ->
virStorageBackendUpdateVolTargetInfo -> virStorageBackendVolOpen -> lstat' will
fail if the specified partition can not be accessed no matter for what reason.

The detailed process of this bug is illustrated below. If lstat happens before
or after the disappearance of the partition(Case 2 and Case 3). The call will
succeed. But if lstat happens during it like Case 1. The call will fail.

  |virStorageBackendDiskReadPartitions call virCommandRunNul|
               |                    |
               V                    V
  --------------------------- ----------------
  |call 'libvirt_parthelper'| |monitor output|
  --------------------------- ----------------
               |                        |
               V                        V
  -----------------------   -------------------------
  |output partition info|-->|  read partition info  |
  -----------------------   -------------------------
               |                | Case 1 | Case 2 | Case 3
               V                |        |        V
  ----------------------------  |        |--------------------
  |fsync & close partition fd|  |        ||lstat on partition|----->Success
  ----------------------------  |        |--------------------
               |                |        |
               V                |        |
  ---------------------------   |        |
  |  partition disappears   |   |        |
  ---------------------------   |        |
               |                V        |
               |    -------------------- |
              ...   |lstat on partition| |
               |    -------------------- |
               V                 |       |
  ---------------------------    V       |
  |  partition reappears    |  Fail      V
  ---------------------------      --------------------
                                   |lstat on partition|--->Success

So, the different fail rates on different machines could be related with the
performance difference, which will change the timing metrics of the partition
disappearance and reappearance.

Also this is not a script/timing/usage issue since the failing process doesn't
involve participation of script and the script don't have any control to it.

3. Script to verify this bug

The following script continuously run 'pool-create-as' on a automatically
created debugging SCSI disk and give a statistics on the result.

You might need to install numpy package before running this script:
$ yum install -y numpy

Then just save it and call as the following is OK.
$ python verify.py

# ----------------------- Start of verify.py ---------------------------------
#!/usr/bin/env python
Verify bug https://bugzilla.redhat.com/show_bug.cgi?id=1264719
import glob
import os
import subprocess
import threading
import time
import numpy

def prepare_disk():
    Prepare a SCSI disk for debug
    subprocess.call(['rmmod', 'scsi_debug'])
    pre_disks = glob.glob('/dev/sd?')
    subprocess.call(['modprobe', 'scsi_debug'])
    new_disks = set(glob.glob('/dev/sd?')) - set(pre_disks)
    if len(new_disks) > 1:
        raise Exception('Multiple new SCSI disks found: %s' % new_disks)
    elif len(new_disks) == 0:
        raise Exception("No new SCSI disks found after 'modprobe scsi_debug'")
    new_disk = list(new_disks)[0]

    subprocess.call(['parted', '-s', new_disk, 'mklabel', 'msdos'])
    subprocess.call(['parted', '-s', new_disk, 'mkpart', 'primary', 'ext4',
                     '0', '8M'], stderr=subprocess.STDOUT)
    print "Testing on %s" % new_disk

    return new_disk

def test_pool_create(disk, wait1=0, wait2=0):
    Run pool-create-as and get result on the given disk. Destroy the pool
    if creation succeed.
    proc = subprocess.Popen(
        ['virsh', 'pool-create-as', '--name', 'disk-pool', '--type',
         'disk', '--target', 'dev', '--source-dev', disk],
        stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = proc.communicate()
    return_code = proc.wait()
    if return_code == 0:
        proc = subprocess.Popen(['virsh', 'pool-destroy', 'disk-pool'],
                                stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return return_code, stdout, stderr

def monitor_partition(disk, status):
    Start a process to monitor change of partitions on given disk.
    part = disk + '1'
    exists = True
    while status['running']:
        if os.path.exists(part) != exists:
            exists = not exists
            status['changes'] += 1
            now = time.time()
            if exists:
                status['durations'].append(now - status['lastchange'])
            status['lastchange'] = now

def run():
    Main run function
    disk = prepare_disk()
    count = 0
    results = {}
    status = {'running': True, 'changes': 0, 'durations': [], 'lastchange': 0}
    t_mon = threading.Thread(target=monitor_partition, args=(disk, status))
        while True:
            count += 1
            result = test_pool_create(disk, wait1=0.0, wait2=0.0)
            if result in results:
                results[result] += 1
                results[result] = 1

            if count % 10 == 0:
                print "Iteration %8d: part changed %d times" % (
                    count, status['changes'])
                print "\tStats of blackout:"
                print "\t\tMean:%s" % numpy.mean(status['durations'])
                print "\t\tDeviation:%s" % numpy.std(status['durations'])
                for result, result_count in results.items():
                    result_str = '\n'.join(result[1:]).strip()
                    if result_str:
                        result_str = result_str.splitlines()[-1]
                    print "\t%5.2f%%: %s" % (
                        float(result_count) * 100 / count, result_str)
        status['running'] = False

if __name__ == '__main__':
# ----------------------- End of verify.py -----------------------------------

4. Some results of this script

1) On RHEL6, every thing is OK:

    Iteration    61890: part changed 0 times
        Stats of blackout:
        100.00%: Pool disk-pool created

2) On RHEL7, there are three types of result:

    Iteration   123550: part changed 247097 times
        Stats of blackout:
         0.17%: error: cannot open volume '/dev/sdb1': No such file or directory
         0.71%: Pool disk-pool created
        99.12%: error: Storage volume not found: no storage vol with matching path '/dev/sdb1'

It can be seen that on this machine, almost all tests failed when 'lstat' on
the partition.

There's another kind of failure "cannot open volume" is raised also in
'virStorageBackendVolOpen' when 'lstat' passed but following action of
'virFileOpenAs' failed. This indicates that a device disappearance happens
between 'lstat' and 'virtFileOpenAs'.

5. Possible solutions

This bug happens on RHEL7 only because on RHEL6 the partition device won't
disappear when sync&close its fd.

Solution 1 (fix partition disappearance):
    Find the root cause of disappearance of the partition (most probably in
    kernel) and fix it there. Maybe it's a bug and easy to be fixed. Or maybe
    it's a feature and we have to choose solution 2. I'm currently working on

Solution 2 (fix libvirt to wait partition reappear before continuing):
    There's also some difficulties like we don't know whether it's
    disappearance is temporary or eternal.

Since this bug is blocking all disk backend storage automation testing, I will
have to risk this whole feature before this bug fixed.

I know you are busy with something but I hope to hear your opinion on this and
try my best to solve it.
Comment 10 John Ferlan 2016-03-05 08:20:45 EST
OK - well thanks for the details. Let me clarify the 50% high reproducibility comment... In order to fix a bug I need a way to reliably reproduce. I am in agreement that failing 50% of the time isn't good, but in order for me to generate a fix that resolves 100% of the 50% cases without a lot of back and forth, I have to be able to reproduce or at least have a chance based on code traces/tracebacks. Whether a customer uses a script to create/destroy pools in a loop 10s, 100s, 1000s, etc. of time in order to validate that the create/destroy pool works properly - well I don't know about that.

Beyond that - there are far too many differences between RHEL6 and RHEL7 in order to pinpoint any one thing in this case. The libvirt_parthelper hasn't changed much in years... The only recent change was related to bz 1265694. The code path that handles reading the partition data also hasn't undergone much change under the concept of it it's not broken, don't fix it.

I'm still not sure what could be causing a partition to disappear - hard to believe libvirt is causing that. I would need to chance the RunCommandNul code again in order to find that lstat, fsync, close logic - I'm not finding it now in a quick cscope search.

In any case, I took your script and ran it on my system... I had to modify the name of the pool only because I use 'disk-pool' already!  Here's what I get:

# python verify.py
Warning: The resulting partition is not properly aligned for best performance.
Testing on /dev/sdd
Iteration       10: part changed 0 times
	Stats of blackout:
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:71: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:83: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:95: RuntimeWarning: invalid value encountered in true_divide
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:117: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
	100.00%: Pool bz1264719-pool created
Iteration       20: part changed 0 times
	Stats of blackout:
	100.00%: Pool bz1264719-pool created
Iteration     1000: part changed 0 times
	Stats of blackout:
	100.00%: Pool bz1264719-pool created

IOW: Other than those bizarre RuntimeWarnings on the first pass - the Iterations have no failures/blackouts.
Comment 11 yanbing du 2016-07-25 02:41:20 EDT
By using emulated iSCSI target, there are lots of libvirt auto cases failed as the similar problem, I will post them in the following comments.
Comment 12 yanbing du 2016-07-25 02:44:54 EDT
1. blockcopy may fail as:

internal error: unable to execute QEMU command 'drive-mirror': Could not open '/dev/sdb': No such device or address

case debug log slice:

2016-07-22 15:38:04,508 libvirt          L0515 DEBUG| iscsi device: /dev/sdb
2016-07-22 15:40:40,704 virsh_blockcopy  L0432 DEBUG| The snapshot xml is: <?xml version='1.0' encoding='UTF-8'?><domainsnapshot><name>blockcopy_snap</name><description>blockcopy snapshot</description><disks><disk name="vda" snapshot="external" type="file">      <driver name="qemu" type="qcow2" />      <target bus="virtio" dev="vda" />      <source file="/var/tmp/avocado_lsi7pz/blockcopy_shallow.snap" /></disk></disks></domainsnapshot>
2016-07-22 15:40:40,704 virsh            L0645 DEBUG| Running virsh command: snapshot-create avocado-vt-vm1 --disk-only --xmlfile /var/tmp/xml_utils_temp_RdU1R_.xml 
2016-07-22 15:40:40,705 process          L0333 INFO | Running '/bin/virsh snapshot-create avocado-vt-vm1 --disk-only --xmlfile /var/tmp/xml_utils_temp_RdU1R_.xml '
2016-07-22 15:40:41,155 process          L0420 DEBUG| [stdout] Domain snapshot blockcopy_snap created from '/var/tmp/xml_utils_temp_RdU1R_.xml'
2016-07-22 15:40:41,156 virsh            L0691 DEBUG| status: 0
2016-07-22 15:40:41,156 virsh            L0692 DEBUG| stdout: Domain snapshot blockcopy_snap created from '/var/tmp/xml_utils_temp_RdU1R_.xml'
2016-07-22 15:40:41,156 virsh            L0693 DEBUG| stderr: 
2016-07-22 15:40:41,174 virsh            L1332 DEBUG| Undefine VM avocado-vt-vm1
2016-07-22 15:40:41,188 virsh            L0645 DEBUG| Running virsh command: blockcopy avocado-vt-vm1 vda /dev/sdb  --blockdev --shallow
2016-07-22 15:40:41,189 process          L0333 INFO | Running '/bin/virsh blockcopy avocado-vt-vm1 vda /dev/sdb  --blockdev --shallow'
2016-07-22 15:42:59,592 process          L0420 DEBUG| [stderr] error: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/dev/sdb': No such device or address
2016-07-22 15:42:59,593 process          L0420 DEBUG| [stdout] 
2016-07-22 15:42:59,593 virsh            L0691 DEBUG| status: 1
2016-07-22 15:42:59,593 virsh            L0692 DEBUG| stdout: 
2016-07-22 15:42:59,593 virsh            L0693 DEBUG| stderr: error: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/dev/sdb': No
Comment 13 yanbing du 2016-07-25 03:12:42 EDT
2. create an iSCSI type pool, the volume may not appear in time( wait for 5 sec), then the following setps(such as vol-create-from) will fail

case debug log:

2016-07-12 05:37:03,896 libvirt          L0515 DEBUG| iscsi device: /dev/sdb
2016-07-12 05:37:03,896 process          L0333 INFO | Running 'iscsiadm --mode session'
2016-07-12 05:37:03,901 process          L0420 DEBUG| [stdout] tcp: [277],1 iqn.2016-07.com.virttest:iscsi-pool.target (non-flash)
2016-07-12 05:37:03,902 process          L0333 INFO | Running 'iscsiadm --mode node --logout -T iqn.2016-07.com.virttest:iscsi-pool.target'
2016-07-12 05:37:04,631 process          L0420 DEBUG| [stdout] Logging out of session [sid: 277, target: iqn.2016-07.com.virttest:iscsi-pool.target, portal:,3260]
2016-07-12 05:37:04,632 process          L0420 DEBUG| [stdout] Logout of [sid: 277, target: iqn.2016-07.com.virttest:iscsi-pool.target, portal:,3260] successful.
2016-07-12 05:37:04,632 virsh            L1909 INFO | Create iscsi type pool virt-iscsi-pool
2016-07-12 05:37:04,632 virsh            L0645 DEBUG| Running virsh command: pool-create-as --name virt-iscsi-pool --type iscsi --target /dev/disk/by-path  --source-host  --source-dev iqn.2016-07.com.virttest:iscsi-pool.target
2016-07-12 05:37:04,632 process          L0333 INFO | Running '/bin/virsh pool-create-as --name virt-iscsi-pool --type iscsi --target /dev/disk/by-path  --source-host  --source-dev iqn.2016-07.com.virttest:iscsi-pool.target'
2016-07-12 05:37:04,914 process          L0420 DEBUG| [stdout] Pool virt-iscsi-pool created
2016-07-12 05:37:04,915 process          L0420 DEBUG| [stdout] 
2016-07-12 05:37:04,915 virsh            L0691 DEBUG| status: 0
2016-07-12 05:37:04,915 virsh            L0692 DEBUG| stdout: Pool virt-iscsi-pool created
2016-07-12 05:37:04,915 virsh            L0693 DEBUG| stderr: 
2016-07-12 05:37:04,973 utils_misc       L0607 DEBUG| Waitting for volume show up (0.000007 secs)
2016-07-12 05:37:05,987 utils_misc       L0607 DEBUG| Waitting for volume show up (1.014249 secs)
2016-07-12 05:37:07,005 utils_misc       L0607 DEBUG| Waitting for volume show up (2.032042 secs)
2016-07-12 05:37:08,021 utils_misc       L0607 DEBUG| Waitting for volume show up (3.047514 secs)
2016-07-12 05:37:09,035 utils_misc       L0607 DEBUG| Waitting for volume show up (4.062413 secs)
2016-07-12 05:37:11,450 stacktrace       L0042 ERROR| TestError: No volume in pool virt-iscsi-pool
Comment 14 yanbing du 2016-07-25 03:16:39 EDT
3. start disk pool may failed as:
error: Failed to start pool virsh_pool_test
error: Storage volume not found: no storage vol with matching path '/dev/sdb1'

case debug log:
2016-07-12 05:19:22,009 libvirt          L0515 DEBUG| iscsi device: /dev/sdb
2016-07-12 05:19:22,009 process          L0333 INFO | Running 'parted -s -a optimal /dev/sdb mklabel msdos -- mkpart primary ext4 0 1M'
2016-07-12 05:19:39,531 process          L0420 DEBUG| [stderr] Warning: The resulting partition is not properly aligned for best performance.
2016-07-12 05:19:39,540 virsh            L2064 DEBUG| Try to define disk type pool virsh_pool_test
2016-07-12 05:19:39,540 virsh            L0645 DEBUG| Running virsh command: pool-define-as --name virsh_pool_test --type disk  --source-dev /dev/sdb --source-format bsd --target /dev
2016-07-12 05:19:39,541 process          L0333 INFO | Running '/bin/virsh pool-define-as --name virsh_pool_test --type disk  --source-dev /dev/sdb --source-format bsd --target /dev'
2016-07-12 05:19:39,699 process          L0420 DEBUG| [stdout] Pool virsh_pool_test defined
2016-07-12 05:19:39,699 process          L0420 DEBUG| [stdout] 
2016-07-12 05:19:39,699 virsh            L0691 DEBUG| status: 0
2016-07-12 05:19:39,699 virsh            L0692 DEBUG| stdout: Pool virsh_pool_test defined
2016-07-12 05:19:39,699 virsh            L0693 DEBUG| stderr: 
2016-07-12 05:19:39,713 libvirt          L1074 DEBUG| New prepared pool XML: <pool type='disk'>  <name>virsh_pool_test</name>  <uuid>2ca685d2-494e-4218-a858-19275321e2f0</uuid>  <capacity unit='bytes'>0</capacity>  <allocation unit='bytes'>0</allocation>  <available unit='bytes'>0</available>  <source>    <device path='/dev/sdb'/>    <format type='bsd'/>  </source>  <target>    <path>/dev</path>  </target></pool>
Comment 15 John Ferlan 2016-08-25 00:16:27 EDT
Hao -

As promised from our pre KVM Forum conversation - I looked at this a bit and thought more about it and of course did some hacking... If the partition is disappearing during/between fsync and close in linux_close called during ped_disk_new in the libparthelper.so (code snippet from recently found code):

static int      
linux_close (PedDevice* dev)
        LinuxSpecific*          arch_specific = LINUX_SPECIFIC (dev);

        if (dev->dirty)
                _flush_cache (dev);
        if (fsync (arch_specific->fd) < 0 || close (arch_specific->fd) < 0)

then there's not libvirt can do. To me that's a problem deeper in the stack. Could be an iSCSI issue or it could be a kernel issue - hard to say for sure as it's not exactly my area of expertise...

As noted in possible solutions #2 from comment 9, having libvirt attempt to have some recovery loop while possible may just be "hiding" some root cause.  If I add a retry loop to virStorageBackendDiskMakeDataVol:

-        if (virStorageBackendUpdateVolInfo(vol, false,
-                                           VIR_STORAGE_VOL_OPEN_DEFAULT, 0) < 0)
+        int retry = 0;
+        int rc;
+        do {
+            rc = virStorageBackendUpdateVolInfo(vol, false,
+                                                VIR_STORAGE_VOL_OPEN_DEFAULT,
+                                                0);

Then verify.py hasn't failed through 5000 iterations so far *and* I know I hit the condition many times since I added some debug code to tell me so:

2016-08-25 04:03:19.385+0000: 29970: warning : virStorageBackendVolOpen:1685 : lstat errno=2

It's not 50% of the time, but it does happen and I can work around the issue, but it just doesn't feel like the "right" or "proper" solution.
Comment 16 Hao Liu 2016-08-29 15:08:03 EDT

Thanks. I agree with you on the improperness of "solution" which hiding the problem. How about transferring this bug to kernel and ask them to find the root cause?

BTW, Luyao has found a workaround to this problem by stopping systemd-udevd* service/sockets. This of course is way far from a proper solution. At least this bug won't block our automated testing anymore before properly resolved.
Comment 17 John Ferlan 2016-08-29 15:38:46 EDT
Transferring is fine by me - although I'm not quite sure who would be the recipient... libparthelper would seemingly be the obvious choice, but if the window for the link to disappear is between sync and close in linux_close, then that could be elsewhere. Of course, if by stopping/restarting systemd/udev service/sockets fixes the issue, then perhaps that's more related to the iSCSI code handling of sync/close and the device file generation. 

I'm really not all that familiar with those details, but since it happens in the iSCSI environment, perhaps it's best to start there.
Comment 18 Luyao Huang 2016-08-30 05:46:11 EDT
It looks like systemd-udevd will call ioctl to re-read the partition table and that cause iscsi device disappear in a few time. Here is the code in src/udev/udevd.c in systemd source code:

static int synthesize_change(struct udev_device *dev) {
                 * Try to re-read the partition table. This only succeeds if
                 * none of the devices is busy. The kernel returns 0 if no
                 * partition table is found, and we will not get an event for
                 * the disk.
                fd = open(udev_device_get_devnode(dev), O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
                if (fd >= 0) {
                        r = flock(fd, LOCK_EX|LOCK_NB);
                        if (r >= 0)
                                r = ioctl(fd, BLKRRPART, 0);

And i used gdb to verify my thought:

(gdb) p *dev
$4 = {udev = 0x7f4282d58030, parent_device = 0x0, syspath = 0x7f4282dac200 "/sys/devices/pseudo_0/adapter0/host24/target24:0:0/24:0:0:0/block/sdb", 
  devpath = 0x7f4282dac204 "/devices/pseudo_0/adapter0/host24/target24:0:0/24:0:0:0/block/sdb", sysname = 0x7f4282dabb60 "sdb", sysnum = 0x0, devnode = 0x7f4282dabd00 "/dev/sdb", devnode_mode = 0, 
  devnode_uid = 0, devnode_gid = 0, subsystem = 0x0, devtype = 0x7f4282dabc60 "disk", driver = 0x0, action = 0x0, devpath_old = 0x0, id_filename = 0x0, envp = 0x0, monitor_buf = 0x0, monitor_buf_len = 0, 
  devlinks_list = {udev = 0x7f4282d58030, node = {next = 0x7f4282dac2e8, prev = 0x7f4282dac2e8}, entries = 0x0, entries_cur = 0, entries_max = 0, unique = true}, properties_list = {udev = 0x7f4282d58030, 
    node = {next = 0x7f4282dabda0, prev = 0x7f4282dabea0}, entries = 0x7f4282dacec0, entries_cur = 5, entries_max = 64, unique = true}, sysattr_value_list = {udev = 0x7f4282d58030, node = {
      next = 0x7f4282dac348, prev = 0x7f4282dac348}, entries = 0x0, entries_cur = 0, entries_max = 0, unique = true}, sysattr_list = {udev = 0x7f4282d58030, node = {next = 0x7f4282dac378, 
      prev = 0x7f4282dac378}, entries = 0x0, entries_cur = 0, entries_max = 0, unique = false}, tags_list = {udev = 0x7f4282d58030, node = {next = 0x7f4282dac3a8, prev = 0x7f4282dac3a8}, entries = 0x0, 
    entries_cur = 0, entries_max = 0, unique = true}, seqnum = 0, usec_initialized = 0, devlink_priority = 0, refcount = 1, devnum = 2064, ifindex = 0, watch_handle = -1, maj = 0, min = 0, parent_set = false, 
  subsystem_set = false, devtype_set = true, devlinks_uptodate = false, envp_uptodate = false, tags_uptodate = false, driver_set = false, info_loaded = false, db_loaded = false, uevent_loaded = true, 
  is_initialized = false, sysattr_list_read = false, db_persist = false}
(gdb) n
721	            streq_ptr("disk", udev_device_get_devtype(dev)) &&
(gdb) n
720	        if (streq_ptr("block", udev_device_get_subsystem(dev)) &&
(gdb) n
722	            !startswith(udev_device_get_sysname(dev), "dm-")) {
(gdb) n
721	            streq_ptr("disk", udev_device_get_devtype(dev)) &&
(gdb) n
726	                struct udev *udev = udev_device_get_udev(dev);
(gdb) n
736	                fd = open(udev_device_get_devnode(dev), O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
(gdb) n
726	                struct udev *udev = udev_device_get_udev(dev);
(gdb) n
736	                fd = open(udev_device_get_devnode(dev), O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
(gdb) n
737	                if (fd >= 0) {
(gdb) n
736	                fd = open(udev_device_get_devnode(dev), O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
(gdb) n
737	                if (fd >= 0) {
(gdb) n
738	                        r = flock(fd, LOCK_EX|LOCK_NB);
(gdb) n
739	                        if (r >= 0)
(gdb) n
738	                        r = flock(fd, LOCK_EX|LOCK_NB);
(gdb) n
739	                        if (r >= 0)
(gdb) n
740	                                r = ioctl(fd, BLKRRPART, 0);
(gdb) n

And you can just use partprobe to reproduce this strange device disappear. But i don't know why kernel will make iscsi device disappear during re-read partition table :(
Comment 19 Luyao Huang 2016-08-31 04:20:50 EDT
I have checked why this happened (not quite sure):

Libvirt call libparted api ped_disk_new() in partedhelper, and ped_disk_new() will open() and close() that scsi device (/dev/sdb). Then kernel will send a inotify event to systemd-udevd since systemd-udevd have call inotify_add_watch() to watch file close event to that device. After that, systemd-udevd will use ioctl to re-read the partition table (comment 18). Finally, Kernel will call rescan_partitions() and at this time the scsi partition will disappear and reappear (i am not sure which function will make partition disappear, maybe delete_partition() ?).

This problem cannot be reproduced on RHEL6 since there is no systemd in RHEL6.
Comment 20 John Ferlan 2016-09-08 10:16:35 EDT
I'm moving to 7.4 and reassigning to systemd for additional triage based upon Comment 18 & 19.  If this becomes "this is how it works", then perhaps we can add retry logic in libvirt, but let's get that answer first.
Comment 21 Tyson Charbonnier 2017-04-03 12:10:54 EDT
I am seeing this same issue with 100% reproducibility. 

in /var/log/messages I can see for example the disk go sdb, then sdb1, then back to sdb. This does not happen in RHEL6. Are there any work arounds? I see that people are reporting 50% repo, but I cannot get this to work at all.

Note You need to log in before you can comment on or make changes to this bug.