Created attachment 1700604 [details] failed export log Created attachment 1700604 [details] failed export log Description of problem: Everyday at 01:00 AM we perform a daily backup on many VMs hosted in multiple hosts (all with oVirt 4.3) using a custom sciprt written in python3 (using SDK) and everything works "almost" fine. There is one single VM (Windows Server 2016) with a disk of 600 GB (the real disk usage is about 150 GB) hosted on a single node that has a strange behaviour. - The export start after a few seconds of the execution of the script, we use "vm_service.export_to_path_on_host" class for exporting the VM as ova file; - After few minutes i see in the engine the command "START, DumpXmlsVDSCommand" that is completely istantly both on the host side ad the engine side, and it's fine - after several hours (more than 10) i see the following logs about the failed export task (see failed_export.log) - His task seems block any other export tasks, leave them in pending. There were about 5 export tasks and 1 import temaplte task and as soon we killed the following process, the other 4 tasks and the import one completed after few minutes. The process that we killed on the oVirt host was: vdsm 25338 25332 99 04:14 pts/0 07:40:09 qemu-img measure -O qcow2 /rhev/data-center/mnt/_data/6775c41c-7d67-451b-8beb-4fd086eade2e/images/a084fa36-0f93-45c2-a323-ea9ca2d16677/55b3eac5-05b2-4bae-be50-37cde705069 A strace -p of the PID 25338 showed following lines: lseek(11, 3056795648, SEEK_DATA) = 3056795648 lseek(11, 3056795648, SEEK_HOLE) = 13407092736 lseek(14, 128637468672, SEEK_DATA) = 128637468672 lseek(14, 128637468672, SEEK_HOLE) = 317708828672 lseek(14, 128646250496, SEEK_DATA) = 128646250496 lseek(14, 128646250496, SEEK_HOLE) = 317708828672 lseek(14, 128637730816, SEEK_DATA) = 128637730816 lseek(14, 128637730816, SEEK_HOLE) = 317708828672 lseek(14, 128646774784, SEEK_DATA) = 128646774784 lseek(14, 128646774784, SEEK_HOLE) = 317708828672 lseek(14, 128646709248, SEEK_DATA) = 128646709248 By the way, this operation takes A LOT of time, more than ten hours before it fails... It's normal? The data domain is mounted on a local LVM device and the backup is done locally, on a second device (xfs) Can you help us figuring out what is going on? If any other informations is needed I'll provide 'em ASAP. Thank you for your patience and your time, Regards Francesco ovirt-engine 4.3.10 How reproducible: - perform an export of a big VM not using the UI, but using SDK python and try to perform other task on other VM (even on different DC) like add/del snapshot, import template, deploy new VMS. Steps to Reproduce: 1. 2. 3. Actual results: Alle the tasks is hanging, waiting the end of the export task. Expected results: Alle the tasks should run fine and not stucks waiting the end of another, unrelated task Additional info: I can provide you any information you need for sort out this anomaly
Hey Francesco, Can you please add the full engine log? also the log from the host that perform the operation (vdsm.log)?
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
(In reply to Eyal Shenitzky from comment #1) > Hey Francesco, > > Can you please add the full engine log? also the log from the host that > perform the operation (vdsm.log)? Hi Eyal and sorry for the delay, the system try to perform the backup on daily bases, so I'll attach the today's log of engine and the vdsm.log even if today at 2020/07/15 08:28:34 we launched it manually and stopped it after an hour. The backup is performed via a python script in the oVirt host named pvt-d0a05-001.serverprovider.tld. The VM is named shpcvm-1fae8.serverprovider.tld. The data domain is named data-pvt-d0a05. Yesterday we even tried to perform an "export as OVA" via oVirt engine GUI AND perofrm an operation on a snapshot on a different VM/DC/cluster and still, the task hangs until we killed the backup of the VM shpcvm-1fae8.serverprovider.tld. We tried it today as well, if you search the engine.log with the ID 1462a9cc-9bde-47f5-a198-45308ef3d1b2 you'll find all the occurance of the snapshot task. You will notice a moderate time gap of more than half hour (this VM is really really small, with no I/O at all, on a DC that is pretty unloaded as well): 2020-07-15 08:46:39,480+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-79) [1462a9cc-9bde-47f5-a198-45308ef3d1b2] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity '57603db3-11fb-4dcc-959d-bc07a6a9b81d' 2020-07-15 09:29:20,145+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-57) [1462a9cc-9bde-47f5-a198-45308ef3d1b2] Command 'CreateSnapshotForVm' (id: 'cd19d43b-4d1e-4c45-af25-8d351366ac6b') waiting on child command id: 'df5f3a60-394d-4643-b8b4-4e855b152324' type:'CreateSnapshotDisk' to complete I attacched engine log and vdsm.og.
Created attachment 1701163 [details] full_engine.log_2020-15-07
Created attachment 1701165 [details] full_vdsm.log_15-07-2020
It's most probably a storage issue.
I think this is a duplicate of bug 1850660. Looks like you should be able to defragment the file using xfs_fsr, and after that you may be able to set a extent size hint using: xfs_io -c "extsize 1m" <imgfile> To prevent the future fragmentation. I think future qemu-img will set a size hint to avoid this issue.
(In reply to Nir Soffer from comment #7) > I think this is a duplicate of bug 1850660. > > Looks like you should be able to defragment the file using xfs_fsr, > and after that you may be able to set a extent size hint using: > > xfs_io -c "extsize 1m" <imgfile> > > To prevent the future fragmentation. > > I think future qemu-img will set a size hint to avoid this issue. Hi Nir, thank you for your advice, I'll try defrag the the file asap. But nontheless, it's normale that every other tasks (that is not related to the actual storage/DC/cluster) is hanging, waiting for the end of the export's task?
(In reply to francesco from comment #8) ... > But nontheless, it's normale that every other tasks (that is not related to the > actual storage/DC/cluster) is hanging, waiting for the end of the export's > task? When running a storage operation on a VM, it may need to lock some or all the VM disks, and prevent changes in the VM state. The details depends on the specific operation. On vdsm side there is a limited number of tasks (10) for storage operations so if the thread pool is busy, new tasks will be queued. In this case the export is not done in vdsm so I don't see how it can affect this. I don't think operations on other VMs should be affected otherwise. My guess is that this is related to the slow "qemu-img measure" command, and there is a locking issue in the kernel. Kevin, do you have suggestion how to debug this on the file system side?
I'm not sure if it's even worth spending much time looking into this part of the question, because it's clear that we have two other conditions that need fixing: The file fragmentation, which makes lseek() slow, and the unnecessary repeated lseek() calls that always return the same result. qemu-img measure should behave better since commit 69f47505ee6 (which is in QEMU 4.1.0). As for the fragmentation, this seems to increasingly become an issue. While the extent size hint will certainly bring some improvement for images created in QEMU 5.1 and later, this isn't applicable to existing images. Maybe you could automatically defragment images on local filesystems occasionally?
We will consider defragmentation or detecting fragmented files, but local file systems are not our main use case. Don't we have the same issue with remote XFS file system on NFS and GlusterFS? In this case we don't have any way to fix the issue.
But local filesystems are the cases that get reported recently, so probably the most relevant ones in this context. The strictly correct answer to your question about NFS/GlusterFS is probably "depends on the server". As far as I know, NFS doesn't pass through O_DIRECT, so I would assume that most (if not all) NFS servers will actually use a writeback cache on their end, which greatly reduces the fragmentation. I'm not sure what GlusterFS does by default, but I wouldn't be surprised if it's configurable. If you want to give it a try, the fragmentation on a local XFS filesystem was actually very easy to reproduce with 'qemu-img bench' in write mode, with O_DIRECT enabled and writing every other block (not sure if the last condition is even necessary). You could run the same on some NFS or GlusterFS mounts and then check the fragmentation on the server.
Right, NFS is not an issue, but GlusterFS does pass the O_DIRECT and the recommended configuration for oVirt is to use O_DIRECT on the server side. So this should be handled by GlusterFS.
Maybe an RFE to GlusterFS to implement the FS_IOC_FSGET/SETXATTR ioctls and pass them through to the server? This would be for the FUSE mount, of course. The native gluster driver would need to access the same with some libgfapi interface.
Nir, what do you think about Kevin's suggestion, are we OK with opening a bug on Gluster?
(In reply to Tal Nisan from comment #15) It is not related to this bug, but yes, Gluster needs to handle this issue.
(In reply to Nir Soffer from comment #7) > I think this is a duplicate of bug 1850660. > > Looks like you should be able to defragment the file using xfs_fsr, > and after that you may be able to set a extent size hint using: > > xfs_io -c "extsize 1m" <imgfile> > > To prevent the future fragmentation. > > I think future qemu-img will set a size hint to avoid this issue. Hi Nir, during this week I'll try to perform the defrag and see if the speed of the qemu-img command will result any faster. Before doing this, I ask you: - I guess the defrag time should take a lot of time, maybe long as the qemu-img measure would take (so, several hours)? - Can I safely interrupt the defrag operation and continue in a second time? - The extent operaion is pretty much istant? I ask this because this is a VM in prod and I cannot shutdown when I want or keep it offline too much time, and I'm not able to reproduce the case in a test environment. Regards, Francesco
(In reply to francesco from comment #17) > - I guess the defrag time should take a lot of time, maybe long as the > qemu-img measure would take (so, several hours)? I don't know, never tried it. You should test first on another file to understand the performance. Please check the bug 1850660 for more info on how to reproduce this. This comment should help: https://bugzilla.redhat.com/show_bug.cgi?id=1850660#c47 > - Can I safely interrupt the defrag operation and continue in a second time? I don't know, check the manual. > - The extent operaion is pretty much istant? It should be a conflagration change, so it should be fast.
Francesco, based on this: > vdsm 25338 25332 99 04:14 pts/0 07:40:09 qemu-img measure -O qcow2 /rhev/data-center/mnt/_data/6775c41c-7d67-451b-8beb-4fd086eade2e/images/a084fa36-0f93-45c2-a323-ea9ca2d16677/55b3eac5-05b2-4bae-be50-37cde705069 This issue is about local XFS storage, right?
Hi Nir, I'm a colleague of Francesco. Yes, we have this issue on a local xfs storage. In a later post you say: > On vdsm side there is a limited number of tasks (10) for storage operations > so if the thread pool is busy, new tasks will be queued. In this case the > export is not done in vdsm so I don't see how it can affect this. > I don't think operations on other VMs should be affected otherwise. All the stucked exports are in different datacenter: how this issue can be releted to vdsm? At this moment we have a lot of datacenter with single-node host and single local storage doman
(In reply to Tommaso from comment #20) > Hi Nir, I'm a colleague of Francesco. > Yes, we have this issue on a local xfs storage. Thanks for confirming. > In a later post you say: > > On vdsm side there is a limited number of tasks (10) for storage operations > > so if the thread pool is busy, new tasks will be queued. In this case the > > export is not done in vdsm so I don't see how it can affect this. > > I don't think operations on other VMs should be affected otherwise. > > All the stucked exports are in different datacenter: how this issue can be > releted to vdsm? > At this moment we have a lot of datacenter with single-node host and single > local storage doman If you use one datacenter per host (local storage), command stuck in on host should not affect commands running on other hosts. So this is likely an engine issue. Also it looks like you are using ova export, this feature is not using vdsm at all.
Tal, see comment 20, this issue is about local XFS storage, not related to gluster. Since this issue is about ova export, I think the virt team should handle this.
(In reply to Nir Soffer from comment #21) > > If you use one datacenter per host (local storage), command stuck in > on host should not affect commands running on other hosts. So this is > likely an engine issue. > > Also it looks like you are using ova export, this feature is not using vdsm > at all. Yes. in another test we had lounched 2 export on 2 different datacenter. When one of that was performin any operation with "quemu-img" command, the other task was waiting for the end of operation to move on. if the task was A and B we have: A : qemu-img measure B: qemu-img measure A: qemu-img convert B: qemu-img convert and so on. No parallelization on this operations...
Based on Comment#22, looks like it's a virt issue. Please confirm.
(In reply to Nir Soffer from comment #22) > Tal, see comment 20, this issue is about local XFS storage, not related > to gluster. Since this issue is about ova export, I think the virt team > should handle this. Would the "measure" command be quicker when triggered by VDSM on an image that resides in that XFS storage?
(In reply to Gobinda Das from comment #24) > Based on Comment#22, looks like it's a virt issue. Please confirm. For our point of view it's a virt issued. The real big problem is the lack of parallelization of the exports. Slow operation on measure it's a minor issue if we can export more than one VM at the same time.
Based on Comment#26, moving this to Virt team.
Hi. we are now thesting the upgrade 4.3 > 4.4. We have the same issue on 4.4.2
any update?
I would like to get an update on this too. If you run export_vm_as_ova.py for vm1 and then for vm2 after one minute, they both to start initially as jobs, but actually the "qemu-img convert" process for the second vm doesn't start until the first one has completed. Eg in engine events you see Starting to export Vm c8server as a Virtual Appliance 10/2/202:50:23 PM Starting to export Vm c8client as a Virtual Appliance 10/2/202:51:41 PM but only when the first one completes, you see on host that after that the qemu-img process for the second one start. I see that the derived ansible job on engine, named "/usr/share/ovirt-engine/playbooks/ovirt-ova-export.yml" and executed as this command: ovirt 9534 1642 6 14:50 ? 00:00:42 /usr/bin/python2 /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory1162818200209914533 --extra-vars=target_directory="/save_ova/base/dump" --extra-vars=entity_type="vm" --extra-vars=ova_name="c8server.ova" --extra-vars=ovirt_ova_pack_ovf=" . . . ovf definition . . ." /usr/share/ovirt-engine/playbooks/ovirt-ova-export.yml is only present for the first VM, so culd it be that the lock is perhaps at db level? EG I see: engine=# select * from job order by start_time desc; job_id | action_type | descripti on | status | owner_id | visible | start_time | end_time | last_update_time | cor relation_id | is_external | is_auto_cleared | engine_session_seq_id --------------------------------------+---------------+--------------------------------------------- --------------------------------------+---------+--------------------------------------+---------+-- --------------------------+----------------------------+----------------------------+--------------- -----------------------+-------------+-----------------+----------------------- 1f304799-5f86-422d-af60-46fd52047858 | ExportVmToOva | Exporting VM c8client as an OVA to /save_ova /base/dump/c8client.ova on Host ov301 | STARTED | d1429f1f-2bea-4f60-bd2e-5bed997716ed | t | 2 020-10-02 14:51:37.423+02 | | 2020-10-02 14:51:45.031+02 | e4276504-c2ca- 4454-996e-86a61cc265db | f | t | 1247 ea76fe36-7274-4603-870e-3f142e6e268b | ExportVmToOva | Exporting VM c8server as an OVA to /save_ova /base/dump/c8server.ova on Host ov301 | STARTED | d1429f1f-2bea-4f60-bd2e-5bed997716ed | t | 2 020-10-02 14:50:19.911+02 | | 2020-10-02 14:50:43.191+02 | 333df0a9-ccfe- 4034-9f67-e1cec49a4468 | f But perhaps there is a check that if another task of this kind is already in STARTED status it doesn't fire up the ansible playbook? just a guess..
I tried also to use the "async_" option when calling the method export_to_path_on_host in /usr/lib64/python2.7/site-packages/ovirtsdk4/services.py but it seems it doesn't change the behaviour... While it is not clear to me the other option "storage_domain", described as: `storage_domain`:: The (export) storage domain to export the virtual machine to. Does this mean that I can export as ova to an export domain in any way? Or is it meant to be a (file based) data domain as an alternative?
*** Bug 1885286 has been marked as a duplicate of this bug. ***
Here is the result of the investigation as sent in our development email thread[1]: The problem is felt when we have synchronized operation (such as running ansible runner service) that takes a bit time to be executed. The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all the commands running on the engine using callbacks. The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation. Some technical details: We have one thread in CommandCallbacksPoller that runs, collects the current command on the engine and processes them. Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. While there are more actions to perform we will do performNextOperation, which calls executeNextOperation. When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working. The reasonable thing to do is to make ansible calls asynchronous, it will resolve it since the assumption is that child commands shouldn't have long synchronized operations. [1] - https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/777RMYRQWXRJ6AOFLO6BPG7HKVPLYHRI/#W45G7FUW736ODBJAELCBU65KLMS2DHB4
This bug has a low overall severity and passed automated regreession suite, it is not going to be further verified by QE. If you believe a special care is required feel free to re-open to ON_QA status
Verified: ovirt-engine-4.4.4.5-0.10.el8ev vdsm-4.40.40-1.el8ev.x86_64 qemu-kvm-5.1.0-14.module+el8.3.0+8790+80f9c6d8.1.x86_64 libvirt-daemon-6.6.0-7.1.module+el8.3.0+8852+b44fca9f.x86_64 Verification scenario: 1. Export 5 VMs in parallel to the same host 2. Observe host processes and verify host can handle pack_ova.py at the same time. for example: [root@hostname ova_test_delete_me]# ps -aux | grep -i ansible root 1699271 0.0 0.0 10404 3064 pts/2 Ss+ 11:01 0:00 /bin/sh -c /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1609750861.9362593-485170-43011801543562/pack_ova.py "vm" "/home/nisim/ova_test_delete_me/1_RHEL7_Q35_UEFI.ova.tmp root 1699287 0.1 0.0 66196 12276 pts/2 S+ 11:01 0:00 /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1609750861.9362593-485170-43011801543562/pack_ova.py vm /home/nisim/ova_test_delete_me/1_RHEL7_Q35_UEFI.ova.tmp <?xml version= root 1699508 0.0 0.0 10400 3208 pts/3 Ss+ 11:01 0:00 /bin/sh -c /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1609750864.4523098-485219-158194925913684/pack_ova.py "vm" "/home/nisim/ova_test_delete_me/1_RHEL8.2_SecureBoot.ov root 1699524 0.1 0.0 66192 12212 pts/3 S+ 11:01 0:00 /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1609750864.4523098-485219-158194925913684/pack_ova.py vm /home/nisim/ova_test_delete_me/1_RHEL8.2_SecureBoot.ova.tmp <?xml ver
This bugzilla is included in oVirt 4.4.4 release, published on December 21st 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.