Created attachment 1340015 [details] engine , vdsm ,libvirt log Description of problem: Resume VM based on a disk in ISCSI storage takes 14min instead of 30sec as a VM base on disk in NFS storage domain. Errors occur at VDSM logs during VM resume. Version-Release number of selected component (if applicable): Engine: ovirt-engine-4.2.0-0.0.master.20171013142622.git15e767c.el7.centos.noarch VDSM: 4.20.3-95 How reproducible: each time I try it so far. Steps to Reproduce: 1.Create a template from imported glance bootable image (rhel7.4_ovirt4.2_guest_disk) 2.create a VM from template using ISCSI storage domain . 3.start VM ( named "golden_env_mixed_virtio_2_0" in this test) 4.suspend VM 5.restart/resume VM (Oct 18, 2017, 9:14:19 AM) Actual results: ISCSI based VM takes 14 min to start, VM should be started in less than 1 min as in NFS (takes ~30 secs) Timestamp when VM finally started : Oct 18, 2017, 9:30:53 AM VM golden_env_mixed_virtio_2_0 started on Host host_mixed_1 Expected results: VM takes ~14min (!) to start in ISCSI Additional info: Starting VM regularly (NOT from suspend) the issue does not occur. Only on ISCSI, I see this issue. VDSM log: First occurrence: 2017-10-18 09:14:12,695+0300 ERROR (migsrc/e31054a1) [virt.vm] (vmId='e31054a1-996d-42c2-9839-12c30aace839') Domain not found: no domain with matching uuid 'e31054a1-996d-42c2-9839-12c30aace839' (golden_env_mixed_virtio_2_0) (migration:288) 2017-10-18 09:14:12,698+0300 ERROR (migsrc/e31054a1) [root] FINISH thread <Thread(migsrc/e31054a1, started daemon 140558066267904)> failed (concurrent:198) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 191, in run ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 394, in run self._regular_run() File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 447, in _regular_run self._recover(str(e)) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 297, in _recover self._vm.cont(ignoreStatus=True) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1404, in cont self._underlyingCont() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3832, in _underlyingCont hooks.before_vm_cont(self._dom.XMLDesc(0), self._custom) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 494, in XMLDesc if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid 'e31054a1-996d-42c2-9839-12c30aace839' (golden_env_mixed_virtio_2_0) Latest occurs: 2017-10-18 09:30:29,304+0300 ERROR (jsonrpc/3) [jsonrpc.JsonRpcServer] Internal server error (__init__:668) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 663, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in getAllVmIoTunePolicies File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1355, in getAllVmIoTunePolicies io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 510, in getAllVmIoTunePolicies 'current_values': v.getIoTune()} File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3332, in getIoTune result = self.getIoTuneResponse() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3351, in getIoTuneResponse res = self._dom.blockIoTune( File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) libvirt log (timestap is 3H behind other logs so 06:14:28 is really 09:14:28): 2017-10-18 06:14:28.891+0000: 20395: error : virProcessRunInMountNamespace:1159 : internal error: child reported: unable to set security context 'system_u:object_r:virt_content_t:s0' on '/rhev/data-center/mnt/bl ockSD/c870fa3c-6890-44d7-a6c8-13f94006ff60/images/0f55c20e-be5e-4b28-b545-2552e8517317/6ceae47e-4041-488a-b4b3-1e83bb5d7595': No such file or directory No Environment/storage issue here as I checked ISCSI storage (XtreamIO) Vs NFS storage (yellow.vdsb) throughput & both storages are with similar values (about ~200MB/S R/W IO)
Francesco, does this make any sense to you? Why would retrieving the the IoTune attribute block for so long?
(In reply to Allon Mureinik from comment #1) > Francesco, does this make any sense to you? Why would retrieving the the > IoTune attribute block for so long? Disclaimer: haven't see the logs yet. IOTune attributes are QEMU properties. So we need to reach QEMU to fetch them, it *could* be a regression/bug. Will check the log to learn more ASAP.
Any news Francesco?
(In reply to Tal Nisan from comment #3) > Any news Francesco? Yes, sorry for the delay. From Vdsm perspective, there is nothing suspicious. We *do* see some stacktraces, but those are only noise because of the slow start. From libvirt perspective I haven't found anything obvious yet. It "just" seems it is slow to start. Can we reproduce the slow start with virsh? If so, I think it is qemu issue[1] and we should move the bug accordingly. +++ [1] I'm not aware of existing bugs related to the slow start
(In reply to Francesco Romani from comment #4) > From libvirt perspective I haven't found anything obvious yet. It "just" > seems it is slow to start. > > Can we reproduce the slow start with virsh? If so, I think it is qemu > issue[1] and we should move the bug accordingly. > > +++ > > [1] I'm not aware of existing bugs related to the slow start Avihai, is this reproducible without RHV around it?
(In reply to Allon Mureinik from comment #5) > (In reply to Francesco Romani from comment #4) > > From libvirt perspective I haven't found anything obvious yet. It "just" > > seems it is slow to start. > > > > Can we reproduce the slow start with virsh? If so, I think it is qemu > > issue[1] and we should move the bug accordingly. > > > > +++ > > > > [1] I'm not aware of existing bugs related to the slow start > Avihai, is this reproducible without RHV around it? I tried with virsh & its very fast maybe a couple of seconds . So no, this is not reproducible without RHV around it.
Avihai, when it's taking so long, what is going on? Can you check, for example, if a process is taking 100% CPU? In my case, it was the lzop process expanding the memory that (is apparently) not very efficient. I don't know why it'd make a difference between iSCSI and NFS though, but worth trying to understand what is slowing us down here.
(In reply to Yaniv Kaul from comment #7) > Avihai, when it's taking so long, what is going on? Can you check, for > example, if a process is taking 100% CPU? In my case, it was the lzop > process expanding the memory that (is apparently) not very efficient. I > don't know why it'd make a difference between iSCSI and NFS though, but > worth trying to understand what is slowing us down here. I rerun it on 6 VM's (2 ISCSI, 2 NFS, 2 gluster) on latest 4.2 upstream & saw also that on both of the ISCSI VM's hosts(each VM was on a different host) the lzop is taking almost 100% CPU. For some reason only on ISCSI sd the lzop is taking almost 100% CPU reading 1 byte at a time (see strace below) VDSM machine top: top - 14:38:27 up 8 days, 2:28, 1 user, load average: 1.43, 0.48, 0.26 Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie %Cpu(s): 14.3 us, 85.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 3879800 total, 992624 free, 860684 used, 2026492 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 2587008 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1463 root 20 0 6700 628 456 R 94.4 0.0 0:13.83 lzop 949 root 20 0 0 0 0 S 1.3 0.0 0:00.76 kworker/0:2 3916 vdsm 0 -20 1364524 96092 13572 S 1.3 2.5 50:13.75 vdsmd - Checking strace I I got many of the following line which means it read/expand: >strace -p 1463 read(0, "\0", 1) = 1 read(0, "\0", 1) = 1 read(0, "\0", 1) = 1 read(0, "\0", 1) = 1 read(0, "\0", 1) = 1 read(0, "\0", 1) = 1 ............................................ - Checking on NFS SD the lzop proccess did not appear at all.
Also , again I see the same VDSM error: VDSM log: 2017-11-27 15:33:48,429+0200 ERROR (jsonrpc/0) [jsonrpc.JsonRpcServer] Internal server error (__init__:611) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in getAllVmIoTunePolicies File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in getAllVmIoTunePolicies io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 514, in getAllVmIoTunePolicies 'current_values': v.getIoTune()} File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3415, in getIoTune result = self.getIoTuneResponse() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3434, in getIoTuneResponse res = self._dom.blockIoTune( File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM '3d9a1724-0805-46a5-bc07-971823eafe51' was not defined yet or was undefined Also on /var/log/messages: Nov 27 15:33:48 storage-ge4-vdsm2 vdsm[26353]: ERROR Internal server error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request#012 res = method(**params)#012 File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod#012 result = fn(*methodArgs)#012 File "<string>", line 2, in getAllVmIoTunePolicies#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in getAllVmIoTunePolicies#012 io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()#012 File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 514, in getAllVmIoTunePolicies#012 'current_values': v.getIoTune()}#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3415, in getIoTune#012 result = self.getIoTuneResponse()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3434, in getIoTuneResponse#012 res = self._dom.blockIoTune(#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__#012 % self.vmid)#012NotConnectedError: VM '3d9a1724-0805-46a5-bc07-971823eafe51' was not defined yet or was undefined
(In reply to Avihai from comment #9) > Also , again I see the same VDSM error: > VDSM log: > 2017-11-27 15:33:48,429+0200 ERROR (jsonrpc/0) [jsonrpc.JsonRpcServer] > Internal server error (__init__:611) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, > in _handle_request > res = method(**params) > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in > _dynamicMethod > result = fn(*methodArgs) > File "<string>", line 2, in getAllVmIoTunePolicies > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method > ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in > getAllVmIoTunePolicies > io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() > File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 514, in > getAllVmIoTunePolicies > 'current_values': v.getIoTune()} > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3415, in > getIoTune > result = self.getIoTuneResponse() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3434, in > getIoTuneResponse > res = self._dom.blockIoTune( > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, > in __getattr__ > % self.vmid) > NotConnectedError: VM '3d9a1724-0805-46a5-bc07-971823eafe51' was not defined > yet or was undefined > > Also on /var/log/messages: > Nov 27 15:33:48 storage-ge4-vdsm2 vdsm[26353]: ERROR Internal server > error#012Traceback (most recent call last):#012 File > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in > _handle_request#012 res = method(**params)#012 File > "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in > _dynamicMethod#012 result = fn(*methodArgs)#012 File "<string>", line 2, > in getAllVmIoTunePolicies#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method#012 ret = func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in > getAllVmIoTunePolicies#012 io_tune_policies_dict = > self._cif.getAllVmIoTunePolicies()#012 File > "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 514, in > getAllVmIoTunePolicies#012 'current_values': v.getIoTune()}#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3415, in > getIoTune#012 result = self.getIoTuneResponse()#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3434, in > getIoTuneResponse#012 res = self._dom.blockIoTune(#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in > __getattr__#012 % self.vmid)#012NotConnectedError: VM > '3d9a1724-0805-46a5-bc07-971823eafe51' was not defined yet or was undefined I agree this is annoying, but it is a whole different class of errors. It is a race on shutdown, mostly (=99%) just noise. Will post a patch to fix it, though.
Posted https://gerrit.ovirt.org/#/c/84733/ only to remove the noise, no real fix yet (not sure a fix on our side is needed at all...). Moving back to NEW.
(In reply to Avihai from comment #8) > (In reply to Yaniv Kaul from comment #7) > > Avihai, when it's taking so long, what is going on? Can you check, for > > example, if a process is taking 100% CPU? In my case, it was the lzop > > process expanding the memory that (is apparently) not very efficient. I > > don't know why it'd make a difference between iSCSI and NFS though, but > > worth trying to understand what is slowing us down here. > > I rerun it on 6 VM's (2 ISCSI, 2 NFS, 2 gluster) on latest 4.2 upstream & > saw also that on both of the ISCSI VM's hosts(each VM was on a different > host) the lzop is taking almost 100% CPU. > > For some reason only on ISCSI sd the lzop is taking almost 100% CPU reading > 1 byte at a time (see strace below) > > VDSM machine top: > top - 14:38:27 up 8 days, 2:28, 1 user, load average: 1.43, 0.48, 0.26 > Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie > %Cpu(s): 14.3 us, 85.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 > st > KiB Mem : 3879800 total, 992624 free, 860684 used, 2026492 buff/cache > KiB Swap: 2097148 total, 2097148 free, 0 used. 2587008 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 1463 root 20 0 6700 628 456 R 94.4 0.0 0:13.83 lzop > > 949 root 20 0 0 0 0 S 1.3 0.0 0:00.76 > kworker/0:2 > > 3916 vdsm 0 -20 1364524 96092 13572 S 1.3 2.5 50:13.75 vdsmd > > > - Checking strace I I got many of the following line which means it > read/expand: > >strace -p 1463 > read(0, "\0", 1) = 1 > read(0, "\0", 1) = 1 > read(0, "\0", 1) = 1 > read(0, "\0", 1) = 1 > read(0, "\0", 1) = 1 > read(0, "\0", 1) = 1 > ............................................ > > - Checking on NFS SD the lzop proccess did not appear at all. Very interesting find! 1. Please open a qemu-kvm bug on this. 2. Does that happen on nested hosts by any chance? Have you seen it happening on physical hosts as well?
(In reply to Francesco Romani from comment #11) > Posted https://gerrit.ovirt.org/#/c/84733/ only to remove the noise, no real > fix yet (not sure a fix on our side is needed at all...). Moving back to NEW. Same story with today's upload
Martin, comment 12 above kind of hints of a qemu-kvm issue?
(In reply to Yaniv Kaul from comment #14) > Martin, comment 12 above kind of hints of a qemu-kvm issue? I cannot see how right now. If I did not misunderstand the finding the lzop that is causing the slowdown is running on the host, not in the VM, so this is a bare metal issue/iSCSI issue? Anything I missed in that update? Thanks! Martin
(In reply to Martin Tessun from comment #15) > (In reply to Yaniv Kaul from comment #14) > > Martin, comment 12 above kind of hints of a qemu-kvm issue? > > I cannot see how right now. If I did not misunderstand the finding the lzop > that is causing the slowdown is running on the host, not in the VM, so this > is a bare metal issue/iSCSI issue? > > Anything I missed in that update? > > Thanks! > Martin lzop is being executed by qemu to compress/decompress the memory. (I doubt it is libvirt).
(In reply to Yaniv Kaul from comment #16) > (In reply to Martin Tessun from comment #15) > > (In reply to Yaniv Kaul from comment #14) > > > Martin, comment 12 above kind of hints of a qemu-kvm issue? > > > > I cannot see how right now. If I did not misunderstand the finding the lzop > > that is causing the slowdown is running on the host, not in the VM, so this > > is a bare metal issue/iSCSI issue? > > > > Anything I missed in that update? > > > > Thanks! > > Martin > > lzop is being executed by qemu to compress/decompress the memory. (I doubt > it is libvirt). Can we move the bug to qemu-kvm?
The vdsm patch doesn't fix (nor it tries to) the qemu issue.
(In reply to Yaniv Kaul from comment #17) > (In reply to Yaniv Kaul from comment #16) > > (In reply to Martin Tessun from comment #15) > > > (In reply to Yaniv Kaul from comment #14) > > > > Martin, comment 12 above kind of hints of a qemu-kvm issue? > > > > > > I cannot see how right now. If I did not misunderstand the finding the lzop > > > that is causing the slowdown is running on the host, not in the VM, so this > > > is a bare metal issue/iSCSI issue? > > > > > > Anything I missed in that update? > > > > > > Thanks! > > > Martin > > > > lzop is being executed by qemu to compress/decompress the memory. (I doubt > > it is libvirt). > > Can we move the bug to qemu-kvm? I don't think this makes sense, as lzop (even if called by qemu) is a separate binary, which is maintained by ??? Changelog does not reveal much (has it really been touched the last time in 2014??): [root@inf21 ~]# rpm -q --changelog lzop * Fri Jan 24 2014 Daniel Mach <dmach> - 1.03-10 - Mass rebuild 2014-01-24 Also comment #6 states that it is only reproducible in a RHV environment. So I am not sure what qemu should investigate here. Maybe considering using some different compression tool (to the cost of speed) could be the outcome, but as it only happens on iSCSI I am not sure this makes sense as well. I believe we need to identify the issue on iSCSI first. If help from qemu teams are needed, I can organise this. But from the current status of the BZ, I don't see, how this should be a qemu issue. Also note that comment #0 states that libvirt could not find the VM and the disk: libvirtError: Domain not found: no domain with matching uuid 'e31054a1-996d-42c2-9839-12c30aace839' (golden_env_mixed_virtio_2_0) [...] libvirt log (timestap is 3H behind other logs so 06:14:28 is really 09:14:28): 2017-10-18 06:14:28.891+0000: 20395: error : virProcessRunInMountNamespace:1159 : internal error: child reported: unable to set security context 'system_u:object_r:virt_content_t:s0' on '/rhev/data-center/mnt/bl ockSD/c870fa3c-6890-44d7-a6c8-13f94006ff60/images/0f55c20e-be5e-4b28-b545-2552e8517317/6ceae47e-4041-488a-b4b3-1e83bb5d7595': No such file or directory So to me this looks like the requested file cannot be found in the first place. Maybe this is just a race condition or something. I am just trying the same on my iSCSI RHV environment (4.1.8 with RHEL 7.4 and qemu 2.9; iSCSI is a NetApp). I also get the lzop process there (and it seems to read just one byte at a time), but besides the IOTune errors I see no further errors in vdsm.log. [root@inf21 ~]# ps -ef | grep lzop root 18328 3975 96 15:39 ? 00:04:20 lzop -dc --ignore-warn And I agree that the resume takes quite a significant amount of time (2 minutes for a 4GB VM). Not sure if that is "ok" or too much. So to summarize: 1. It looks like the lzop process is the problematic process here, as it only reads 1 byte at a time. 2. The vdsm error messages are unrelated to the issue? 3. NFS seems to work faster. (Is lzop called there as well?) Questions: How is suspend/resume realized in RHV? So where is the memory file written to? On NFS I assume it is simply the NFS share, but for iSCSI? Is a new/separate volume created? Looping in Ademar for his take on this.
1. Changing the compression engine is a great idea - to a higher performing one, higher compression, threaded - but it's a qemu change, AFAIK. 2. Notice how your lzop process is consuming 100% of a single core - certainly a bottleneck! it doesn't seem like it's making any progress in recent years (https://www.lzop.org/lzop_news.php) 3. VDSM errors are unrelated - we have seen that without them. 4. NFS seems faster - I haven't looked at the details. Suspend/resume is simply memory written to a disk we create for the memory, just like any other disk (LV on iSCSI/FC, file on file-based storage).
(In reply to Martin Tessun from comment #19) > (In reply to Yaniv Kaul from comment #17) > > (In reply to Yaniv Kaul from comment #16) > > > (In reply to Martin Tessun from comment #15) > > > > (In reply to Yaniv Kaul from comment #14) > > > > > Martin, comment 12 above kind of hints of a qemu-kvm issue? > > > > > > > > I cannot see how right now. If I did not misunderstand the finding the lzop > > > > that is causing the slowdown is running on the host, not in the VM, so this > > > > is a bare metal issue/iSCSI issue? > > > > > > > > Anything I missed in that update? > > > > > > > > Thanks! > > > > Martin > > > > > > lzop is being executed by qemu to compress/decompress the memory. (I doubt > > > it is libvirt). > > > > Can we move the bug to qemu-kvm? > > I don't think this makes sense, as lzop (even if called by qemu) is a > separate binary, which is maintained by ??? lzop is called by libvirt. Support for it was introduced in libvirt-0.7.1 (Sep/2009). It should be possible to change the compression method to xz, gzip or bzip2 via qemu.conf.
(In reply to Ademar Reis from comment #22) > (In reply to Martin Tessun from comment #19) > > (In reply to Yaniv Kaul from comment #17) > > > (In reply to Yaniv Kaul from comment #16) > > > > (In reply to Martin Tessun from comment #15) > > > > > (In reply to Yaniv Kaul from comment #14) > > > > > > Martin, comment 12 above kind of hints of a qemu-kvm issue? > > > > > > > > > > I cannot see how right now. If I did not misunderstand the finding the lzop > > > > > that is causing the slowdown is running on the host, not in the VM, so this > > > > > is a bare metal issue/iSCSI issue? > > > > > > > > > > Anything I missed in that update? > > > > > > > > > > Thanks! > > > > > Martin > > > > > > > > lzop is being executed by qemu to compress/decompress the memory. (I doubt > > > > it is libvirt). > > > > > > Can we move the bug to qemu-kvm? > > > > I don't think this makes sense, as lzop (even if called by qemu) is a > > separate binary, which is maintained by ??? > > lzop is called by libvirt. Support for it was introduced in libvirt-0.7.1 > (Sep/2009). It should be possible to change the compression method to xz, > gzip or bzip2 via qemu.conf. And will libvirt magically recognize in which format the suspended disk is? So for backward compat. it'll know to decompess lzop and we'll switch to gzip, for example?
(In reply to Yaniv Kaul from comment #23) > (In reply to Ademar Reis from comment #22) > > lzop is called by libvirt. Support for it was introduced in libvirt-0.7.1 > > (Sep/2009). It should be possible to change the compression method to xz, > > gzip or bzip2 via qemu.conf. > > And will libvirt magically recognize in which format the suspended disk is? > So for backward compat. it'll know to decompess lzop and we'll switch to > gzip, for example? I have no idea :-) (sorry, can't test right now). Jarda should know, so changing needinfo.
Avihai, does this reproduce with the recent RHEL 7.5 versions of qemu and libvirt?
Restoring needinfos
(In reply to Allon Mureinik from comment #27) > Avihai, does this reproduce with the recent RHEL 7.5 versions of qemu and > libvirt? The same issue occurs also in RHEL 7.5 with the following builds latest downstream 4.2.2 : ovirt-engine-4.2.2.2-0.1.el7.noarch vdsm-4.20.20-1.el7ev.x86_64 libvirt-3.9.0-13.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64
Restoring needinfo for comment 24
(In reply to Ademar Reis from comment #24) > (In reply to Yaniv Kaul from comment #23) > > (In reply to Ademar Reis from comment #22) > > > lzop is called by libvirt. Support for it was introduced in libvirt-0.7.1 > > > (Sep/2009). It should be possible to change the compression method to xz, > > > gzip or bzip2 via qemu.conf. > > > > And will libvirt magically recognize in which format the suspended disk is? > > So for backward compat. it'll know to decompess lzop and we'll switch to > > gzip, for example? > > I have no idea :-) (sorry, can't test right now). > > Jarda should know, so changing needinfo. Libvirt stores used compression method in the image header and uses it later for decompression.
Avihai, based on comment 31 , can you try changing the method via qemu.conf, and see if makes a difference? Start with gzip please.
(In reply to Yaniv Kaul from comment #32) > Avihai, based on comment 31 , can you try changing the method via qemu.conf, > and see if makes a difference? Start with gzip, please. In the conf file I see the following comment: "# The default format for QEMU/KVM guest save images is raw" -> not lzop (just saying) Anyway, I changed qemu.conf following entries as requested : save_image_format = "X" dump_image_format = "X" snapshot_image_format = "X" + restarted libvirt service 1) "X" = "gzip" time to resume VM - 38 sec. 2) "X" = "xz" time to resume VM - 58 sec. 3) "X" = "bzip" time to resume VM - 57 sec. 4) "X" = "lzop" time to resume VM - 1200 sec! (20min). The winner of slowest resume VM by FAR is ......... lzop :)
(In reply to Avihai from comment #33) > (In reply to Yaniv Kaul from comment #32) > > Avihai, based on comment 31 , can you try changing the method via qemu.conf, > > and see if makes a difference? Start with gzip, please. > > In the conf file I see the following comment: > "# The default format for QEMU/KVM guest save images is raw" -> not lzop > (just saying) > > Anyway, I changed qemu.conf following entries as requested : > save_image_format = "X" > dump_image_format = "X" > snapshot_image_format = "X" > + > restarted libvirt service > > 1) "X" = "gzip" time to resume VM - 38 sec. > 2) "X" = "xz" time to resume VM - 58 sec. > 3) "X" = "bzip" time to resume VM - 57 sec. > 4) "X" = "lzop" time to resume VM - 1200 sec! (20min). > > The winner of slowest resume VM by FAR is ......... lzop :) Interesting... It should be the fastest compression. I think a libvirt bug may be needed here, but we should also change our default. I'd change to gzip - we don't need super compression, and speed is important. Sent a naive patch to VDSM, let's see.
Created attachment 1426981 [details] Domain xml Please see the domain xml in the attachment.
Test all the supported compress methods, lzop takes the most time to resume vm(~12mins), xz takes the most time to suspend(~11mins) and vm can not resume from suspend. Please see the detail results in below. (1)lzop: #time virsh save vm2 /dev/test1/lv05 Domain vm2 saved to /dev/test1/lv05 real 0m29.216s user 0m0.011s sys 0m0.010s #time virsh restore /dev/test1/lv05 Domain restored from /dev/test1/lv05 real 11m59.348s user 0m0.025s sys 0m0.029s (2)bzip2 #time virsh save /dev/test1/lv05 Domain vm2 saved to /dev/test1/lv05 real 4m7.634s user 0m0.015s sys 0m0.020s #time virsh restore /dev/test1/lv05 Domain restored from /dev/test1/lv05 real 1m38.586s user 0m0.011s sys 0m0.013s (3)gzip # time virsh save vm2 /dev/test1/lv07 Domain vm2 saved to /dev/test1/lv07 real 1m15.947s user 0m0.010s sys 0m0.020s ## time virsh restore /dev/test1/lv07 Domain restored from /dev/test1/lv07 real 0m26.810s user 0m0.010s sys 0m0.009s (4)xz # time virsh save vm2 /dev/test1/lv07 Domain vm2 saved to /dev/test1/lv07 real 11m14.321s user 0m0.026s sys 0m0.039s # time virsh restore /dev/test1/lv07 error: Failed to restore domain from /dev/test1/lv07 error: internal error: Child process (xz -dc) unexpected exit status 1: 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 24 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 26 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 29 xz: (stdin): Compressed data is corrupt real 0m32.459s user 0m0.012s sys 0m0.008s
(In reply to yafu from comment #39) > (4)xz > # time virsh save vm2 /dev/test1/lv07 > Domain vm2 saved to /dev/test1/lv07 > real 11m14.321s > user 0m0.026s > sys 0m0.039s > > # time virsh restore /dev/test1/lv07 > error: Failed to restore domain from /dev/test1/lv07 > error: internal error: Child process (xz -dc) unexpected exit status 1: > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 24 > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 26 > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 29 > xz: (stdin): Compressed data is corrupt > real 0m32.459s > user 0m0.012s > sys 0m0.008s Well, there is something fishy in the iscsi/xz setup. 1) Is it 100 % reproducible? 2) If you run the 'xz -cd /dev/test1/lv07 > /dev/null' manually, does it finish successfully?
(In reply to Jaroslav Suchanek from comment #40) > (In reply to yafu from comment #39) > > > (4)xz > > # time virsh save vm2 /dev/test1/lv07 > > Domain vm2 saved to /dev/test1/lv07 > > real 11m14.321s > > user 0m0.026s > > sys 0m0.039s > > > > # time virsh restore /dev/test1/lv07 > > error: Failed to restore domain from /dev/test1/lv07 > > error: internal error: Child process (xz -dc) unexpected exit status 1: > > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 24 > > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 26 > > 2018-04-27 05:40:15.330+0000: 64773: debug : virFileClose:110 : Closed fd 29 > > xz: (stdin): Compressed data is corrupt > > real 0m32.459s > > user 0m0.012s > > sys 0m0.008s > > > Well, there is something fishy in the iscsi/xz setup. > > 1) Is it 100 % reproducible? Yes, it is 100% reproducible. > 2) If you run the 'xz -cd /dev/test1/lv07 > /dev/null' manually, does it > finish successfully? No, It fails and reports error "xz: /dev/test1/lv01: File format not recognize". But other compress methods also report similar error if using compress tools to extract '/dev/test1/lv07' directly.
That's expected, libvirt's saved files contain uncompressed header followed by the compressed migration data from QEMU. In other words, you can't just directly decompress the saved file.
(In reply to Jiri Denemark from comment #42) > That's expected, libvirt's saved files contain uncompressed header followed > by > the compressed migration data from QEMU. In other words, you can't just > directly decompress the saved file. Thanks for your explanation. Test xz with iscsi: 1.# virsh save vm1 /tmp/save Domain 4 saved to /tmp/save 2.#xz -zc /tmp/save > /dev/test1/lv01 3.#xz -cd /dev/test1/lv01 > /dev/null xz: /dev/test1/lv01: Compressed data is corrupt The test results is consistent with comment 40.
Hmm, could it be that /dev/test1/lv* is actually larger than the data stored inside (which it surely is) and thus xz doesn't know when to finish reading it and goes way beyond the end of the saved file? Was storing saved data directly to volumes a new invention or did it even work in some cases?
Anyway, even if it's not a new functionality and it happened to work in the past (probably because no compression was used), it can't work in general. When saving domain state to a disk, libvirt opens a file, stores its header (with compression algorithm, domain XML, etc.) and dumps data from QEMU (either raw or compressed) right after the header. There's no indication of how many bytes are stored in the file since the file system will handle this and the reader will just get an EOF at the end. However, when storing the data directly to a block device which is bigger, the reader gets no indication it read all data. That is, anything can happen. It might work with uncompressed data since QEMU might know it read all data and just stop reading even though it didn't get EOF. But a generic compression command will just continue reading to the end of the device. So the resume process will either take ages if the device is big enough or the decompression will fail in case the compression algorithm is not willing to decompress random garbage. That said, this is not a libvirt bug. It's a usage bug since the virDomainSave API is expecting a file name rather than a block device. Of course, we might be able to improve the data file format so that storing the data directly in a block device will work (e.g., by storing data length in the header and piping the data through dd to make sure the compression command doesn't try to read more), but that's actually a feature request rather than a bug fix.
Tal - any ideas? We can't really not support it right now.
For starters we can just use raw instead of memory compression, the drawbacks are clear but at least it's a working solution. The other options will require changes in Libvirt like storing the compressed date length in the file metadata or add a unique EOF signature in the end of the file right after the compressed data
(In reply to Jiri Denemark from comment #45) > Anyway, even if it's not a new functionality and it happened to work in the > past (probably because no compression was used), it can't work in general. > When saving domain state to a disk, libvirt opens a file, stores its header > (with compression algorithm, domain XML, etc.) and dumps data from QEMU > (either raw or compressed) right after the header. There's no indication of > how many bytes are stored in the file since the file system will handle this > and the reader will just get an EOF at the end. However, when storing the > data > directly to a block device which is bigger, the reader gets no indication it > read all data. That is, anything can happen. It might work with uncompressed > data since QEMU might know it read all data and just stop reading even though > it didn't get EOF. But a generic compression command will just continue > reading to the end of the device. So the resume process will either take ages > if the device is big enough or the decompression will fail in case the > compression algorithm is not willing to decompress random garbage. End of the device looks like the correct action to me - we store it in a separate LV - it won't cross the boundaries of the LV, right? That being said, we need to look at how we determine the size of that LV. Tal - let's check. As it worked reliably for years, I'm kinda surprised by this (and it all started because SUDDENLY lzop become slow). > > That said, this is not a libvirt bug. It's a usage bug since the > virDomainSave API is expecting a file name rather than a block device. > > Of course, we might be able to improve the data file format so that storing > the data directly in a block device will work (e.g., by storing data length > in > the header and piping the data through dd to make sure the compression > command > doesn't try to read more), but that's actually a feature request rather than > a > bug fix.
(In reply to Yaniv Kaul from comment #49) > (In reply to Jiri Denemark from comment #45) > > Anyway, even if it's not a new functionality and it happened to work in the > > past (probably because no compression was used), it can't work in general. > > When saving domain state to a disk, libvirt opens a file, stores its header > > (with compression algorithm, domain XML, etc.) and dumps data from QEMU > > (either raw or compressed) right after the header. There's no indication of > > how many bytes are stored in the file since the file system will handle this > > and the reader will just get an EOF at the end. However, when storing the > > data > > directly to a block device which is bigger, the reader gets no indication it > > read all data. That is, anything can happen. It might work with uncompressed > > data since QEMU might know it read all data and just stop reading even though > > it didn't get EOF. But a generic compression command will just continue > > reading to the end of the device. So the resume process will either take ages > > if the device is big enough or the decompression will fail in case the > > compression algorithm is not willing to decompress random garbage. > > End of the device looks like the correct action to me - we store it in a > separate LV - it won't cross the boundaries of the LV, right? > That being said, we need to look at how we determine the size of that LV. > Tal - let's check. As it worked reliably for years, I'm kinda surprised by > this (and it all started because SUDDENLY lzop become slow). Seems like the configuration with lzop is there for years so indeed I find it strange why it just started to fail. I wonder though if the compression matters as we most likely create an LV in the fixed size of the VM's memory as we don't know in advance what the compressed size will be. Francesco any insights on those two questions? > > > > > That said, this is not a libvirt bug. It's a usage bug since the > > virDomainSave API is expecting a file name rather than a block device. > > > > Of course, we might be able to improve the data file format so that storing > > the data directly in a block device will work (e.g., by storing data length > > in > > the header and piping the data through dd to make sure the compression > > command > > doesn't try to read more), but that's actually a feature request rather than > > a > > bug fix.
(In reply to Tal Nisan from comment #50) > > End of the device looks like the correct action to me - we store it in a > > separate LV - it won't cross the boundaries of the LV, right? > > That being said, we need to look at how we determine the size of that LV. > > Tal - let's check. As it worked reliably for years, I'm kinda surprised by > > this (and it all started because SUDDENLY lzop become slow). > Seems like the configuration with lzop is there for years so indeed I find > it strange why it just started to fail. > I wonder though if the compression matters as we most likely create an LV in > the fixed size of the VM's memory as we don't know in advance what the > compressed size will be. > Francesco any insights on those two questions? Sorry for late answer, somehow I missed the bugzilla reminders. I have no clue about why lzop started to be slow, no obvious suspects in the lzo changelog. About compression, but the fact we used lzop which privileges speed over compression ration suggests we don't care that much.
Hi, guys Is there something others i should pay attention, after changing lzop to gzip in /etc/libvirt/qemu.conf ? Because i suffer an exception in vdsm.log which say a "gzip -dc" problem. If I change back to "save_image_format="lzop"", the problem disappared. --------- 2019-04-17 19:33:58,835+0800 ERROR (vm/baaabf11) [virt.vm] (vmId='baaabf11-172d-4fac-9862-86ff05feb085') The vm start process failed (vm:937) 378 Traceback (most recent call last): 379 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in _startUnderlyingVm 380 self._run() 381 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2819, in _run 382 self._connection.restore(fname) 383 File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper 384 ret = f(*args, **kwargs) 385 File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper 386 return func(inst, *args, **kwargs) 387 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4513, in restore 388 if ret == -1: raise libvirtError ('virDomainRestore() failed', conn=self) 389 libvirtError: internal error: Child process (gzip -dc) unexpected exit status 2: 390 gzip: stdin: decompression OK, trailing garbage ignored 391 392 2019-04-17 19:33:58,835+0800 INFO (vm/baaabf11) [virt.vm] (vmId='baaabf11-172d-4fac-9862-86ff05feb085') Changed state to Down: internal error: Child process (gzip -dc) unexpected exit status 2: 393 gzip: stdin: decompression OK, trailing garbage ignored 394 (code=1) (vm:1675)
BTW, why is this a feature request and not a bug?
(In reply to Marina Kalinin from comment #55) > BTW, why is this a feature request and not a bug? See comment 45. Libvirt stores the data in a file and relies on the filesystem to report EOF once all data stored in the file are read. It may be a bug in RHV since it passes a block device as the file to store the data to rather than a regular file. But from libvirt's point of view this is a request to support this use case.
hi,folks How should i avoid the error warned by "gzip -dc" error. Should I submit a new bug for vdsm?
*** Bug 1708031 has been marked as a duplicate of this bug. ***
Why don’t we just change it to gzip intil it’s fixed? https://bugzilla.redhat.com/show_bug.cgi?id=1503468#c33
Easy enough to do
(In reply to Ryan Barry from comment #61) > Easy enough to do Ryan, I see our KCS says that it is resolved on RHV side: https://access.redhat.com/solutions/4195071, but I have no proof other than that. IS there a BZ for that? If true, is this bug still relevant for RHV?
Same bug, Marina. Patch is here: https://gerrit.ovirt.org/#/c/90234/ But it's not a complete fix due to https://bugzilla.redhat.com/show_bug.cgi?id=1708031 We'd like to change back to lzo if it can be made reasonably fast again
Is this still reproducible?
Try to reproduce on rhel-av8.1.1 but encounter another issue: pkgs: libvirt-5.6.0-9.module+el8.1.1+4955+f0b25565.x86_64 qemu-kvm-4.1.0-17.module+el8.1.1+5019+2d64ad78.x86_64 steps: 1. login an iscsi disk 2. # pvcreate /dev/sdc # vgcreate d8a85c79-4150-47f0-a590-29804675292a /dev/sdc # vgs # lvcreate -L 15G -n lv01 d8a85c79-4150-47f0-a590-29804675292a # lvcreate -L 2350m -n lv02 d8a85c79-4150-47f0-a590-29804675292a 3.# qemu-img convert /var/lib/libvirt/images/yqz-RHEL-8.1.0-20191015.0-x86_64.qcow2 /dev/d8a85c79-4150-47f0-a590-29804675292a/lv01 # tail -1 /etc/libvirt/qemu.conf save_image_format="lzop" # systemctl restart libvirtd # virsh start yqz # virsh dumpxml yqz |grep 'disk t' -A8 <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/d8a85c79-4150-47f0-a590-29804675292a/lv01'/> <backingStore/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> 4. # time virsh save yqz /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 Domain yqz saved to /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 5. # time virsh restore /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 error: Failed to restore domain from /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 error: internal error: child reported (status=125): unable to get SELinux context of /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02: No such file or directory Check error in libvirtd.log: # cat /var/log/libvirt/libvirtd.log |grep 'error :' 2019-12-10 09:07:03.252+0000: 8151: error : virProcessRunInFork:1170 : internal error: child reported (status=125): unable to get SELinux context of /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02: No such file or directory 2019-12-10 09:07:03.465+0000: 8151: error : virCommandWait:2796 : internal error: Child process (lzop -dc --ignore-warn) unexpected fatal signal 13: 2019-12-10 09:07:03.164+0000: 10028: debug : virFileClose:114 : Closed fd 32 2019-12-10 09:07:03.465+0000: 8151: error : qemuDomainObjBeginNestedJob:8048 : internal error: unexpected async job 6 type expected 0 Additional info: 1. Set selinux permissive can make it succeed. # setenforce 0 # time virsh restore /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 Domain restored from /dev/d8a85c79-4150-47f0-a590-29804675292a/lv02 real 0m2.512s user 0m0.013s sys 0m0.007s
(In reply to Ryan Barry from comment #64) > Same bug, Marina. Patch is here: https://gerrit.ovirt.org/#/c/90234/ > > But it's not a complete fix due to > https://bugzilla.redhat.com/show_bug.cgi?id=1708031 > > We'd like to change back to lzo if it can be made reasonably fast again Hi Arik, Avihai, It is an old bug, and might have been resolved. Can you please advise? Who can tell if it is not reproducible in 4.4 anymore?
(In reply to Marina Kalinin from comment #69) > Who can tell if it is not reproducible in 4.4 anymore? I think RHV storage QE can do this by running the test that was described in comment 33 on RHV 4.4.3 (RHEL 8.3)
(In reply to Marina Kalinin from comment #69) > (In reply to Ryan Barry from comment #64) > > Same bug, Marina. Patch is here: https://gerrit.ovirt.org/#/c/90234/ > > > > But it's not a complete fix due to > > https://bugzilla.redhat.com/show_bug.cgi?id=1708031 > > > > We'd like to change back to lzo if it can be made reasonably fast again > > Hi Arik, Avihai, > > It is an old bug, and might have been resolved. > Can you please advise? > Who can tell if it is not reproducible in 4.4 anymore? Shir can someone in the team please check this out and see if it reproduce?
This bug doesn't reproduce in 4.4, I checked this bug by the following steps with the latest build of 4.4.7-2 : vdsm-4.40.70.2-1.el8ev.x86_64 libvirt-7.0.0-14.1.module+el8.4.0+11095+d46acebf.x86_64 ovirt-engine-4.4.7.1-0.9.el8ev.noarch 1.Create a template from imported glance bootable image -> The latest RHEL 2.Create a VM from a template using ISCSI storage domain (checked it also on another vm with nfs SD for comparing) 3.start VM 4.suspend VM 5.restart/resume VM
From our check it seems that the issue with lzop is fixed on rhel 8. Shir, can you please verify it?
The (In reply to Liran Rotenberg from comment #77) > From our check it seems that the issue with lzop is fixed on rhel 8. > Shir, can you please verify it? The context for this is bz 1978672 - we'd like to switch back to lzop to overcome an issue with saving the memory volume on block devices
(In reply to Liran Rotenberg from comment #77) > From our check it seems that the issue with lzop is fixed on rhel 8. > Shir, can you please verify it? Comment 76 was checked with save_image_format="gzip", I added new details and logs to bug 1978672 when I checked it with save_image_format="lzop" (thin/pre disk) and it was very slow (12-17 min).