Description of problem: The unexpected behaviour of the libguestfs has been detected in attempt to inspect thin provisioned (qcow2) disk image on the RHEV Hypervisor (part of the RHEV 3.1 Highly Available cluster). Libguestfs handles the preallocated disk images fine, but cannot launch with a thin provisioned disk. Richard W. M. Jones suggests to abandon the use of relative pathes for backing disks (https://www.redhat.com/archives/libguestfs/2014-January/msg00100.html). Version-Release number of selected component (if applicable): ovirt-node-2.5.0-17.el6_4.3.noarch libguestfs-1.16.34-2.el6.x86-64 qemu-img-rhev-0.12.1.2-2.355.el6_4.2.x86_64 How reproducible: Steps to Reproduce: 1. Deploy RHEV HA Cluster with two RHEV-M nodes, two RHEV-H hypervisors and 1 SCSI storage and create a VM with a thin provisioned disk image, from which the system will boot; 2. Get the disk's image_id (fron Python oVirt SDK API, for instance); 3. Go to the hypervisor the VM is running on and make sure that the symbolic link with a previously defined image_id exists in the /dev/mapper directory; 4. Try to execute the following code: #!/usr/bin/env python import guestfs g = guestfs.GuestFS() g.add_drive_opts("/dev/mapper/<some_constant>-<disk_image_id>", readonly=1) g.launch() Actual results: Libguestfs failure (full debugging info - libguestfs-test-tool.log is attached to this message): could not open disk image /dev/mapper/1a9aa971--f81f--4ad8--932f--607034c924fc-666faa62--da73--4465--aed2--912119fcf67f: No such file or directory libguestfs: child_cleanup: 0x23dc5d0: child process died Try to check disk image with a `qemu-img info` command and you'll see that the backing file field has a relative path value which does not exist on the hypervisor's file system. Probably this breaks the libguestfs session. Expected results: Disk image correct handling by libguestfs. Additional info: Full research on this topic is available at https://www.redhat.com/archives/libguestfs/2014-January/msg00094.html
Created attachment 850579 [details] Full debugging log of libguestfs failed launch
Created attachment 850581 [details] libguest-test-tool's output (OK - it's working correctly)
The basic problem is you shouldn't open the /dev/mapper/... file directly. You should open some other path so that the relative path to the backing file can be resolved. Try: find / 666faa62-da73-4465-aed2-912119fcf67f Open the one which *isn't* /dev/mapper.
(In reply to Richard W.M. Jones from comment #3) > The basic problem is you shouldn't open the /dev/mapper/... > file directly. You should open some other path so that the > relative path to the backing file can be resolved. > > Try: > > find / 666faa62-da73-4465-aed2-912119fcf67f Grr, I mean: find / -name 666faa62-da73-4465-aed2-912119fcf67f > Open the one which *isn't* /dev/mapper.
(In reply to Richard W.M. Jones from comment #3) > The basic problem is you shouldn't open the /dev/mapper/... > file directly. You should open some other path so that the > relative path to the backing file can be resolved. > > Try: > > find / 666faa62-da73-4465-aed2-912119fcf67f > > Open the one which *isn't* /dev/mapper. Hello, Richard. First I checked the starting directory for qemu processes (as you supposed it in https://www.redhat.com/archives/libguestfs/2014-January/msg00100.html, but assuming that you mean RHEV-H): [root@rhevh1 /]# pids=`ps aux | grep qemu | gawk '{print $2}'` [root@rhevh1 /]# for i in ${pids}; do ll /proc/$i/cwd; done lrwxrwxrwx. 1 qemu qemu 0 2014-01-13 09:04 /proc/5486/cwd -> / ls: cannot read symbolic link /proc/6175/cwd: No such file or directory lrwxrwxrwx. 1 root root 0 2014-01-13 09:04 /proc/6175/cwd lrwxrwxrwx. 1 qemu qemu 0 2014-01-13 09:04 /proc/9253/cwd -> / lrwxrwxrwx. 1 qemu qemu 0 2013-11-22 10:16 /proc/13747/cwd -> / lrwxrwxrwx. 1 qemu qemu 0 2014-01-17 07:50 /proc/15217/cwd -> / lrwxrwxrwx. 1 qemu qemu 0 2014-01-13 09:04 /proc/17936/cwd -> / lrwxrwxrwx. 1 qemu qemu 0 2014-01-13 09:04 /proc/19676/cwd -> / lrwxrwxrwx. 1 qemu qemu 0 2013-12-03 12:19 /proc/24174/cwd -> / ls: cannot access /proc/25508/cwd: No such file or directory lrwxrwxrwx. 1 qemu qemu 0 2014-01-14 11:53 /proc/32161/cwd -> / So the following test script will start from the "/" directory: #!/usr/bin/env python import guestfs import sys g = guestfs.GuestFS() g.add_drive_opts(sys.argv[1], readonly = 1) g.launch() g.shutdown() g.close() Now I am searching the files with a specified disk image id over the RHEV-H filesystem: [root@rhevh1 integrity]# find / -name 666faa62-da73-4465-aed2-912119fcf67f /dev/1a9aa971-f81f-4ad8-932f-607034c924fc/666faa62-da73-4465-aed2-912119fcf67f /var/lib/stateless/writable/rhev/data-center/mnt/blockSD/1a9aa971-f81f-4ad8-932f-607034c924fc/images/6439863f-2d4e-48ae-a150-f9054650789c/666faa62-da73-4465-aed2-912119fcf67f /rhev/data-center/mnt/blockSD/1a9aa971-f81f-4ad8-932f-607034c924fc/images/6439863f-2d4e-48ae-a150-f9054650789c/666faa62-da73-4465-aed2-912119fcf67f Unfortunately none of these files could be handled with libguestfs correctly (full debug is attached to this thread as "failed_launches_with_other_files" file).
Created attachment 851487 [details] Failed libguestfs launches with different filenames of the inspected disk image
Vitaly, I'm assuming you have an iSCSI storage domain? that would mean that all the LVs in the qcow chain need to be activated first for this to function (regardless of relative paths. Once that is done, you would need to run from /dev/vg_name/
Hello, Ayal, yes, I do have an an iSCSI storage domain. I suggest that activation of qcow chain can be implemented via powering all the machines up (see all_machines_up.png attachment - 11 of 17 VMs are running on rhevh1.ksa1.test host). Here is directory that you have specified (i placed my script test.py directly to this folder): [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# pwd /dev/1a9aa971-f81f-4ad8-932f-607034c924fc [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# ll total 4 lrwxrwxrwx. 1 root root 8 2014-01-14 11:32 03e533ab-2ddb-4079-83cb-4dc432b27ec2 -> ../dm-26 lrwxrwxrwx. 1 root root 8 2013-09-30 09:39 0e8f0bbe-b834-4560-9f38-7ab513d71ae2 -> ../dm-27 lrwxrwxrwx. 1 root root 8 2013-10-28 09:22 139e30ea-be93-4e74-9fa3-7f01fd072ae0 -> ../dm-33 lrwxrwxrwx. 1 root root 8 2013-11-13 08:25 200a570c-69a3-48aa-a420-db5c19490413 -> ../dm-38 lrwxrwxrwx. 1 root root 8 2014-01-15 15:05 2374b68a-1ca0-434f-b696-cf5ac9c1e001 -> ../dm-42 lrwxrwxrwx. 1 root root 8 2013-11-20 11:01 28c54d35-c858-4597-bc61-72f3a0653f97 -> ../dm-43 lrwxrwxrwx. 1 root root 8 2014-01-14 11:29 2d2127c6-ef68-467a-a8ca-029813831dc1 -> ../dm-24 lrwxrwxrwx. 1 root root 8 2013-09-30 10:09 3843db5d-7061-4a8b-82e1-b4c5a81f1da4 -> ../dm-29 lrwxrwxrwx. 1 root root 8 2013-09-19 11:36 428bf4ae-054a-4b91-aaf0-6d3bc571c742 -> ../dm-25 lrwxrwxrwx. 1 root root 8 2013-11-07 07:39 584cfe53-7d40-4bcf-98ce-08a976cf0434 -> ../dm-37 lrwxrwxrwx. 1 root root 8 2013-11-20 10:59 59bd91e8-1055-4f55-84d6-d6e712ff0771 -> ../dm-41 lrwxrwxrwx. 1 root root 8 2014-01-15 14:56 62bfb2d1-4494-4b09-ae39-31abfdd8afd4 -> ../dm-32 lrwxrwxrwx. 1 root root 8 2013-12-18 08:28 666faa62-da73-4465-aed2-912119fcf67f -> ../dm-30 lrwxrwxrwx. 1 root root 8 2014-01-15 14:56 691f3ddb-89bb-4a57-9a82-d47d75c3a44b -> ../dm-39 lrwxrwxrwx. 1 root root 8 2013-09-18 15:57 6f607159-ebf8-40db-bb53-04960f64c0be -> ../dm-21 lrwxrwxrwx. 1 root root 8 2013-09-18 15:57 6f95a933-887a-42f0-88e0-9f103e2f0036 -> ../dm-20 lrwxrwxrwx. 1 root root 8 2014-01-15 15:05 6ff643dc-f726-4281-ba59-5bc5932e14c9 -> ../dm-36 lrwxrwxrwx. 1 root root 8 2013-09-18 15:56 82ff231a-2d80-4352-88f9-4340ce0d9f27 -> ../dm-17 lrwxrwxrwx. 1 root root 8 2013-10-28 10:51 8650fb24-e434-4f4a-a2b6-cd48c0bc76b5 -> ../dm-34 lrwxrwxrwx. 1 root root 8 2013-12-06 11:52 88cb053d-7fbb-446d-803e-86f493da8e62 -> ../dm-23 lrwxrwxrwx. 1 root root 8 2013-11-29 06:06 9c93e8e9-d0de-42a4-8689-c0c9194ae5e6 -> ../dm-47 lrwxrwxrwx. 1 root root 8 2013-11-07 07:39 9d7af9ea-fe2f-48cf-8399-4153d48ba164 -> ../dm-35 lrwxrwxrwx. 1 root root 8 2014-01-15 14:41 b54aa662-8b38-4683-9bad-0da806f4e661 -> ../dm-31 lrwxrwxrwx. 1 root root 8 2013-09-30 10:04 b81b5527-4553-4a98-9704-855a454d3e0e -> ../dm-28 lrwxrwxrwx. 1 root root 8 2014-01-14 11:40 c4ccabaf-9315-4e25-a1da-779921060cbf -> ../dm-18 lrwxrwxrwx. 1 root root 8 2013-09-18 15:56 cbe36298-6397-4ffa-ba8c-5f64e90023e5 -> ../dm-19 lrwxrwxrwx. 1 root root 8 2013-11-20 10:59 cc6e4400-7c98-4170-9075-5f5790dfcff3 -> ../dm-40 lrwxrwxrwx. 1 root root 8 2013-11-20 11:03 ce545c26-a278-4fe7-8913-9b138c1bcc60 -> ../dm-46 lrwxrwxrwx. 1 root root 8 2013-11-20 11:03 cff558e2-9b68-4020-b6b5-b8dc07872785 -> ../dm-45 lrwxrwxrwx. 1 root root 8 2013-09-27 10:43 ecba6bc7-9fd0-438b-8803-fcd0db94e036 -> ../dm-22 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 ids -> ../dm-12 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 inbox -> ../dm-14 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 leases -> ../dm-13 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 master -> ../dm-16 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 metadata -> ../dm-11 lrwxrwxrwx. 1 root root 8 2013-09-18 15:51 outbox -> ../dm-15 -rwxr-xr-x. 1 root root 302 2014-01-20 09:25 test.py Now I am trying to start this rough-and-ready script: #!/usr/bin/env python import subprocess import guestfs import sys #qemu img check cmd = ["/usr/bin/qemu-img", "info"] cmd.append(sys.argv[1]) p = subprocess.Popen(cmd, stdout = subprocess.PIPE) out = p.stdout.read() if "qcow2" in out: type = "qcow2" elif "raw" in out: type = "raw" else: type = "another" #guestfs routines g = guestfs.GuestFS() g.add_drive_opts(sys.argv[1], readonly = 1) try: g.launch() except Exception,e: print "id: ", sys.argv[1], " ", type, " launch failed ", else: print "id: ", sys.argv[1], " ", type, " os: ", g.inspect_os() finally: g.shutdown() g.close() And this is the output of script launched in a loop over /dev/<vg-name>: [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# for f in ./*; do ./test.py $f; done id: ./03e533ab-2ddb-4079-83cb-4dc432b27ec2 raw os: ['/dev/vda1'] id: ./0e8f0bbe-b834-4560-9f38-7ab513d71ae2 raw os: [] id: ./139e30ea-be93-4e74-9fa3-7f01fd072ae0 raw os: ['/dev/vg_koji/lv_root'] id: ./200a570c-69a3-48aa-a420-db5c19490413 raw os: ['/dev/vg_koji/lv_root'] id: ./2374b68a-1ca0-434f-b696-cf5ac9c1e001 qcow2 launch failed id: ./28c54d35-c858-4597-bc61-72f3a0653f97 raw os: [] id: ./2d2127c6-ef68-467a-a8ca-029813831dc1 raw os: [] id: ./3843db5d-7061-4a8b-82e1-b4c5a81f1da4 raw os: [] id: ./428bf4ae-054a-4b91-aaf0-6d3bc571c742 raw os: ['/dev/vda1'] id: ./584cfe53-7d40-4bcf-98ce-08a976cf0434 raw os: [] id: ./59bd91e8-1055-4f55-84d6-d6e712ff0771 raw os: [] id: ./62bfb2d1-4494-4b09-ae39-31abfdd8afd4 raw os: ['/dev/vda1'] id: ./666faa62-da73-4465-aed2-912119fcf67f qcow2 launch failed id: ./691f3ddb-89bb-4a57-9a82-d47d75c3a44b raw os: [] id: ./6f607159-ebf8-40db-bb53-04960f64c0be qcow2 launch failed id: ./6f95a933-887a-42f0-88e0-9f103e2f0036 raw os: ['/dev/vda1'] id: ./6ff643dc-f726-4281-ba59-5bc5932e14c9 qcow2 launch failed id: ./82ff231a-2d80-4352-88f9-4340ce0d9f27 raw os: ['/dev/vda1'] id: ./8650fb24-e434-4f4a-a2b6-cd48c0bc76b5 raw os: [] id: ./88cb053d-7fbb-446d-803e-86f493da8e62 raw os: ['/dev/vda1'] id: ./9c93e8e9-d0de-42a4-8689-c0c9194ae5e6 qcow2 os: ['/dev/vg_test1/lv_root'] id: ./9d7af9ea-fe2f-48cf-8399-4153d48ba164 raw os: ['/dev/vg_kojit/lv_root'] id: ./b54aa662-8b38-4683-9bad-0da806f4e661 raw os: ['/dev/vda2'] id: ./b81b5527-4553-4a98-9704-855a454d3e0e qcow2 launch failed id: ./c4ccabaf-9315-4e25-a1da-779921060cbf qcow2 launch failed id: ./cbe36298-6397-4ffa-ba8c-5f64e90023e5 raw os: ['/dev/vda1'] id: ./cc6e4400-7c98-4170-9075-5f5790dfcff3 raw os: [] id: ./ce545c26-a278-4fe7-8913-9b138c1bcc60 raw os: [] id: ./cff558e2-9b68-4020-b6b5-b8dc07872785 raw os: ['/dev/vda1'] id: ./ecba6bc7-9fd0-438b-8803-fcd0db94e036 qcow2 launch failed You can see that launching this script from that folder does not affect the qcow2 issue. However, Richard Jones (https://www.redhat.com/archives/libguestfs/2014-January/msg00151.html) and Federico Simoncelli (https://www.redhat.com/archives/libguestfs/2014-January/msg00153.html) offered other locations to run this script from. I'm going to test it tonigh. Please correct me if I misunderstood you and thank you in advance. The attachments: 1. all_machines_up.png - to be sure that all VM's are on the go; 2. VM_bootable_disks_list - to compare /dev/<vg-name> filenames with disk image IDs received from Python oVirt SDK.
Created attachment 852710 [details] to be sure that all VM's are on the go
Created attachment 852712 [details] to compare /dev/<vg-name> filenames with disk image IDs received from Python oVirt SDK
(In reply to Vitaly Isaev from comment #10) > Created attachment 852712 [details] > to compare /dev/<vg-name> filenames with disk image IDs received from Python > oVirt SDK Please note that running qemu-img on a disk image while the VM is running is *not* supported. Can you temporarily move selinux to permissive mode? (or just check if you have any selinux avc alerts).
(In reply to Ayal Baron from comment #11) > > Can you temporarily move selinux to permissive mode? (or just check if you > have any selinux avc alerts). Ayal, unfortunately this did not help (`qemu-img` check is removed from the test): [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# setenforce Permissive [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# getenforce Permissive [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# for f in ./*; do ./test.py $f; done id: ./03e533ab-2ddb-4079-83cb-4dc432b27ec2 os: ['/dev/vda1'] id: ./0e8f0bbe-b834-4560-9f38-7ab513d71ae2 os: [] id: ./139e30ea-be93-4e74-9fa3-7f01fd072ae0 os: ['/dev/vg_koji/lv_root'] id: ./200a570c-69a3-48aa-a420-db5c19490413 os: ['/dev/vg_koji/lv_root'] id: ./2374b68a-1ca0-434f-b696-cf5ac9c1e001 launch failed id: ./28c54d35-c858-4597-bc61-72f3a0653f97 os: [] id: ./2d2127c6-ef68-467a-a8ca-029813831dc1 os: [] id: ./3843db5d-7061-4a8b-82e1-b4c5a81f1da4 os: [] id: ./428bf4ae-054a-4b91-aaf0-6d3bc571c742 os: ['/dev/vda1'] id: ./584cfe53-7d40-4bcf-98ce-08a976cf0434 os: [] id: ./59bd91e8-1055-4f55-84d6-d6e712ff0771 os: [] id: ./62bfb2d1-4494-4b09-ae39-31abfdd8afd4 os: ['/dev/vda1'] id: ./666faa62-da73-4465-aed2-912119fcf67f launch failed id: ./691f3ddb-89bb-4a57-9a82-d47d75c3a44b os: [] id: ./6f607159-ebf8-40db-bb53-04960f64c0be launch failed id: ./6f95a933-887a-42f0-88e0-9f103e2f0036 os: ['/dev/vda1'] id: ./6ff643dc-f726-4281-ba59-5bc5932e14c9 launch failed id: ./82ff231a-2d80-4352-88f9-4340ce0d9f27 os: ['/dev/vda1'] id: ./8650fb24-e434-4f4a-a2b6-cd48c0bc76b5 os: [] id: ./88cb053d-7fbb-446d-803e-86f493da8e62 os: ['/dev/vda1'] id: ./9c93e8e9-d0de-42a4-8689-c0c9194ae5e6 os: ['/dev/vg_test1/lv_root'] id: ./9d7af9ea-fe2f-48cf-8399-4153d48ba164 os: ['/dev/vg_kojit/lv_root'] id: ./b54aa662-8b38-4683-9bad-0da806f4e661 os: ['/dev/vda2'] id: ./b81b5527-4553-4a98-9704-855a454d3e0e launch failed id: ./c4ccabaf-9315-4e25-a1da-779921060cbf launch failed id: ./cbe36298-6397-4ffa-ba8c-5f64e90023e5 os: ['/dev/vda1'] id: ./cc6e4400-7c98-4170-9075-5f5790dfcff3 os: [] id: ./ce545c26-a278-4fe7-8913-9b138c1bcc60 os: [] id: ./cff558e2-9b68-4020-b6b5-b8dc07872785 os: ['/dev/vda1'] id: ./ecba6bc7-9fd0-438b-8803-fcd0db94e036 launch failed I noticed (see Comment 8) that one of the qcow2 disks (unlike other ones) still could be launched with libguestfs. The difference is in absence of the backing file field in the `qemu-img info` output (all the other drives have some relative path in this field): [root@rhevh1 1a9aa971-f81f-4ad8-932f-607034c924fc]# qemu-img info 9c93e8e9-d0de-42a4-8689-c0c9194ae5e6 image: 9c93e8e9-d0de-42a4-8689-c0c9194ae5e6 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 0 cluster_size: 65536
Running against the /dev directory is still not going to work. You have to see Federico's comment: https://www.redhat.com/archives/libguestfs/2014-January/msg00153.html Afraid there is no simple answer here.
Hello, I would like to thank you for all your replies. I found a not too elegant workaround several days ago and as there is no any clear solutions of my problem, I post it here. I doubt if it was useful to someone, but nevertheless. Well, the aim is to resolve every qcow2 disk image to raw disk image recursively through the chain of symbolic links in the backing file field. The resolving is full (until the /dev/dm-xx). Richard Jones warned me of this (https://www.redhat.com/archives/libguestfs/2014-January/msg00151.html) but for me - surely I might be wrong - this performed better results with libguestfs: <...> def check_qemu_output(self, filename): #Going to parse qemu-img output cmd = ["/usr/bin/qemu-img", "info"] cmd.append(filename) p = subprocess.Popen(cmd, stdout = subprocess.PIPE) out = p.stdout.read() if ("raw" in out): print("\t\tPreallocated disk is found") return self.check_os_system(os.path.realpath(filename)) elif ("qcow2" in out and "backing file" in out): print("\t\tThin provisioned disk is found") line = out.split("\n")[5].split(":")[1] backing_file_broken_path = line.strip().split("/")[-1] if not line.endswith("(actual path") else line.rstrip("(actual path").strip().split("/")[-1] backing_file_filename = backing_file_broken_path.split("/")[-1] found = [] b = re.compile(backing_file_filename) print("\t\tLooking for a backing file {0}".format(backing_file_filename)) for r, d, f in os.walk(self.searchdir): for file in f: if b.match(file): found.append(os.path.join(r, file)) devices = map(os.readlink, found) print(found, devices) unique_devices = set(devices) if unique_devices.__len__() != 1: print("\t\t\tWrong number of symbolic links") return None #recursive call else: unique_device = unique_devices.pop() print("\t\t\tResolved into {0}".format(unique_device)) return self.check_qemu_output(unique_device) else: return None <...> This function finally returns the block device in /dev/dm-xx which corresponds to raw disk image underlying(?) every qcow2 image. In most of cases libguestfs launches succesfully with discovered block devices.
This should be solved by the prepare/teardownImage API we introduced with bug 1092166. The volumes/images must be accessed through the symlinks in /rhev/data-center/...
(In reply to Federico Simoncelli from comment #19) > This should be solved by the prepare/teardownImage API we introduced with > bug 1092166. > > The volumes/images must be accessed through the symlinks in > /rhev/data-center/... aren't we deprecating the /rhev/data-center symlinks going forward?
(In reply to Itamar Heim from comment #20) > (In reply to Federico Simoncelli from comment #19) > > This should be solved by the prepare/teardownImage API we introduced with > > bug 1092166. > > > > The volumes/images must be accessed through the symlinks in > > /rhev/data-center/... > > aren't we deprecating the /rhev/data-center symlinks going forward? Fede, since bug 1092166 is already ON_QA - what's left to do here?
(In reply to Federico Simoncelli from comment #19) > This should be solved by the prepare/teardownImage API we introduced with > bug 1092166. > > The volumes/images must be accessed through the symlinks in > /rhev/data-center/... Moving to ON_QA based on this comment, to be verified by QA
Fede, can you provide some doctext for this one please?
Verified on oVirt 3.6.0.3 steps taken: - executed the script from comment #1. - verified that the script executed successfully. - verified the creation of a new qemu process.
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.