Description of problem: Writing to thin isci disk with dd pauses VM for 20-40 seconds during lvextend Version-Release number of selected component (if applicable): rhevm-3.6.0.3-0.1.el6.noarch vdsm-4.17.12-0.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Create a VM with thin iscsi disk and install Rhel OS 2. Add a second thin iscsi disk of 10g 3. Install file system on the second disk 4. Using dd write 5G of zero's to the disk >>>>> During the write the VM is paused up to 40 minutes during the lvextend operation Actual results: The VM is paused during the Write operation Expected results: The VM should not be paused Additional info: From engine log: --------------------------------------------------------- .tlv.redhat.com' 2016-01-03 16:12:30,057 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-97) [671c66c1] VM '4c041a95-81e2-4114-bbb3-dae0f9188f49(kwolf_vm) is running in db and not r unning in VDS 'leopard01.qa.lab.tlv.redhat.com' 2016-01-03 16:12:30,310 INFO [org.ovirt.engine.core.bll.ProcessDownVmCommand] (org.ovirt.thread.pool-7-thread-32) [725625d5] Running command: ProcessDownVmCommand internal: true. 2016-01-03 16:17:39,007 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-62) [49f4e0c4] VM '5a795aea-0ea4-48b4-8e9b-bee0be82e399'(vm1) moved from 'Up' --> 'Paused' 2016-01-03 16:17:39,500 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [49f4e0c4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused. 2016-01-03 16:17:39,575 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [49f4e0c4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused due to no Storage space error. 2016-01-03 16:17:44,792 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-10) [] Failed to find host on any provider by host name 'kevin-rhevm.scl.lab.tlv.redhat.com' 2016-01-03 16:18:18,391 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-97) [671c66c1] VM '5a795aea-0ea4-48b4-8e9b-bee0be82e399'(vm1) moved from 'Paused' --> 'Up' 2016-01-03 16:18:18,571 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-97) [671c66c1] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has recovered from paused back to up. 2016-01-03 16:18:19,617 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-19) [] Failed to find host on any provider by host name 'kevin-rhevm.scl.lab.tlv.redhat.com' From vdsm log: ----------------------------- 1b01b075-637e-43de-92d5-844db6e371ec::DEBUG::2016-01-03 16:17:46,531::storage_mailbox::146::Storage.SPM.Messages.Extend::(processRequest) processRequest, payload:'1xtnd2\x92\xd91\xfb,\xf1\xb7\x8dJ%\ x0e\xb3\x8ap\x7f\xaf\xef\xb2\xa3\xf9T\x7f\x91\x02J\xec\xff\xe4\x97\xd8j000000000000140000000000000' 1b01b075-637e-43de-92d5-844db6e371ec::INFO::2016-01-03 16:17:46,531::storage_mailbox::161::Storage.SPM.Messages.Extend::(processRequest) processRequest: extending volume 6ad897e4-ffec-4a02-917f-54f9 a3b2efaf in domain 7f708ab3-0e25-4a8d-b7f1-2cfb31d99232 (pool cce31f29-940a-487b-8e15-bffbf7870bc0) to size 5120 1b01b075-637e-43de-92d5-844db6e371ec::DEBUG::2016-01-03 16:17:46,533::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapp er/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516000c9|/dev/mapper/3514f0c5a516000ca|'\'', '\''r|.*|'\'' ] } global { locki ng_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 5120m 7f708ab3-0e25-4a8d-b7f1-2cfb31d99232/6ad897e4-ffec- 4a02-917f-54f9a3b2efaf (cwd None) 1b01b075-637e-43de-92d5-844db6e371ec::DEBUG::2016-01-03 16:17:46,638::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabli ng it!\n WARNING: This metadata update is NOT backed up\n'; <rc> = 0 . . . 9f11fb41-1103-43ce-8312-9848fbcfc776::DEBUG::2016-01-03 16:17:50,660::storage_mailbox::146::Storage.SPM.Messages.Extend::(processRequest) processRequest, payload:'1xtnd2\x92\xd91\xfb,\xf1\xb7\x8dJ%\ x0e\xb3\x8ap\x7f\xaf\xef\xb2\xa3\xf9T\x7f\x91\x02J\xec\xff\xe4\x97\xd8j000000000000180000000000000' 9f11fb41-1103-43ce-8312-9848fbcfc776::INFO::2016-01-03 16:17:50,660::storage_mailbox::161::Storage.SPM.Messages.Extend::(processRequest) processRequest: extending volume 6ad897e4-ffec-4a02-917f-54f9 a3b2efaf in domain 7f708ab3-0e25-4a8d-b7f1-2cfb31d99232 (pool cce31f29-940a-487b-8e15-bffbf7870bc0) to size 6144 9f11fb41-1103-43ce-8312-9848fbcfc776::DEBUG::2016-01-03 16:17:50,661::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516000c8|/dev/mapper/3514f0c5a516000c9|/dev/mapper/3514f0c5a516000ca|/dev/mapper/3514f0c5a516000cb|/dev/mapper/3514f0c5a516000cc|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 6144m 7f708ab3-0e25-4a8d-b7f1-2cfb31d99232/6ad897e4-ffec-4a02-917f-54f9a3b2efaf (cwd None) 9f11fb41-1103-43ce-8312-9848fbcfc776::DEBUG::2016-01-03 16:17:50,900::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n WARNING: This metadata update is NOT backed up\n'; <rc> = 0
Created attachment 1111182 [details] engine, vdsm, server logs Adding logs
Kevin, is the VM paused for several seconds or minutes? The description is not clear: "Writing to thin isci disk with dd pauses VM for 20-40 seconds during lvextend" vs. "4. Using dd write 5G of zero's to the disk >>>>> During the write the VM is paused up to 40 minutes during the lvextend operation" Also, you have not provided information on the storage specs, etc.
Created attachment 1112396 [details] logs-7.1.15 Checked on my setup with: rhevm-3.6.2-0.1.el6.noarch vdsm-4.17.15-0.el7ev.noarch libvirt-daemon-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 kernel-3.10.0-327.4.4.el7.x86_64 lvm2-2.02.130-5.el7_2.1.x86_64 sanlock-3.2.4-2.el7_2.x86_64 device-mapper-multipath-0.4.9-85.el7.x86_64 iscsi-initiator-utils-6.2.0.873-32.el7.x86_64 Tested over iSCSI using XtremIO as storage backend. Writing 5G of zeros to the second sparse disk causes the VM to get paused for short period of ~1 minute. Attached full logs ============================================================================= 2016-01-07 08:07:07,949 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM '57df2c81-e457-49ce-9e6c-d43fa9f30aa5'(1) moved from 'Up' --> 'Paused' 2016-01-07 08:07:08,285 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM 1 has been paused. 2016-01-07 08:07:08,551 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM 1 has been paused due to no Storage space error. 2016-01-07 08:08:00,099 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-81) [7e609e66] VM '57df2c81-e457-49ce-9e6c-d43fa9f30aa5'(1) moved from 'Paused' --> 'Up' 2016-01-07 08:08:00,201 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-81) [7e609e66] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM 1 has recovered from paused back to up. ============================================================================= lvextend: c7ed6b56-b2e3-4f2b-b875-64afac789dba::DEBUG::2016-01-07 10:07:07,522::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516008b0|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 2048m 7b4e9053-46a3-4bcf-aa6f-93913c573341/00000104-86f0-48b4-834f-ca79cb0131d5 (cwd None) 61c5086c-26ac-4224-9d3a-d04be27b7680::DEBUG::2016-01-07 10:08:44,927::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a51600096|/dev/mapper/3514f0c5a51600097|/dev/mapper/3514f0c5a51600098|/dev/mapper/3514f0c5a51600099|/dev/mapper/3514f0c5a5160009a|/dev/mapper/3514f0c5a516008b0|/dev/mapper/3514f0c5a516008b1|/dev/mapper/3514f0c5a516008b2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 9216m 7b4e9053-46a3-4bcf-aa6f-93913c573341/00000104-86f0-48b4-834f-ca79cb0131d5 (cwd None)
Nir, It looks like the lvextend operation took merely 5 seconds but VDSM reported that the VM is stopped for much more time (probably duo to other related operations) Is there anything we can do here?
Vdsm reports about vm being paused or resume are in Francesco territory, and not related to storage. We need to analyze the extend events, to check that we did receive extend events on time, and that we did handle them on as expected. Generally, I think this test is not valid use case; when writing zeros, storage server can optimize the write, making the write operation much faster then we can extend. We should test extend flow with a real world use case, such as provisioning a vm, or copying data from one storage to another storage. Setting the severity to medium since this is not a valid way to test extend. Please update the severity if you can reproduce this issue during real flows.
Elad & Kevin, Can you please try to test this with a real file (maybe dd of ISO file), does the VM also gets paused?
Writing zeros using dd doesn't create a file? Do you ask to copy using dd instead of creating new file? I am not sure what is the difference for the system.
yes, using dd operation with a real source file. Something that the user might use. XtremIO might have special optimizations for zero blocks, which makes the operation finishe really fast and that could be the reason why the lvextend operation completed only after we got no space error from qemu.
(In reply to Maor from comment #6) > Elad & Kevin, > Can you please try to test this with a real file (maybe dd of ISO file), > does the VM also gets paused? Kevin, can you please try?
(In reply to Aharon Canan from comment #9) > (In reply to Maor from comment #6) > > Elad & Kevin, > > Can you please try to test this with a real file (maybe dd of ISO file), > > does the VM also gets paused? > > Kevin, > > can you please try? I copied an iso file to one of the disks using 'wget': wget https://download.fedoraproject.org/pub/fedora/linux/releases/23/Workstation/x86_64/iso/Fedora-Live-Workstation-x86_64-23-10.iso Then I copied this file to the 2nd disk using 'cp' as follows: cp data/Fedora-Live-Workstation-x86_64-23-10.iso data/1 cp data/Fedora-Live-Workstation-x86_64-23-10.iso data/2 cp data/Fedora-Live-Workstation-x86_64-23-10.iso data/3 The vm was paused again for 50 seconds
Kevin, can you please attach the VDSM and engine logs
Also, just from curiosity, didn't you want to do that with 'dd' instead of cp?
(In reply to Maor from comment #11) > Kevin, can you please attach the VDSM and engine logs vdsm.log: ------------ rts_iso/4b7cf7b1-98c0-4e60-b515-409a7daaf072/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-5188::DEBUG::2016-02-02 16:38:27,728::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n377 bytes (377 B) copied, 0.000850107 s, 443 kB/s\n'; <rc> = 0 libvirtEventLoop::INFO::2016-02-02 16:38:27,860::vm::3728::virt.vm::(onIOError) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::abnormal vm stop device virtio-disk2 error enospc libvirtEventLoop::INFO::2016-02-02 16:38:27,860::vm::5084::virt.vm::(_logGuestCpuStatus) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::CPU stopped: onIOError libvirtEventLoop::INFO::2016-02-02 16:38:27,895::vm::1026::virt.vm::(extendDrivesIfNeeded) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::Requesting extension for volume d9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6 on domain 2deae191-8162-4b46-8135-0f390c159ded (apparent: 4294967296, capacity: 11811160064, allocated: 4302503424, physical: 4294967296) libvirtEventLoop::DEBUG::2016-02-02 16:38:27,895::vm::1091::virt.vm::(__extendDriveVolume) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::Requesting an extension for the volume: {'newSize': 5368709120, 'domainID': u'2deae191-8162-4b46-8135-0f390c159ded', 'name': u'vdc', 'volumeID': u'd9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6', 'imageID': u'9ffc9eac-56d0-4583-9d73-5b67a75b0f6f', 'internal': False, 'poolID': u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2'} libvirtEventLoop::DEBUG::2016-02-02 16:38:27,896::task::595::Storage.TaskManager.Task::(_updateState) Task=`57fa165a-27dd-4ed0-8807-48598958d36e`::moving from state init -> state preparing libvirtEventLoop::INFO::2016-02-02 16:38:27,896::logUtils::48::dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID=u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2', volDict={'newSize': 5368709120, 'domainID': u'2deae191-8162-4b46-8135-0f390c159ded', 'name': u'vdc', 'volumeID': u'd9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6', 'imageID': u'9ffc9eac-56d0-4583-9d73-5b67a75b0f6f', 'internal': False, 'poolID': u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2'}, newSize=5368709120, callbackFunc=<bound method Vm.__afterVolumeExtension of <virt.vm.Vm object at 0x7f96ec029590>>) libvirtEventLoop::DEBUG::2016-02-02 16:38:27,896::storage_mailbox::123::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 2deae191-8162-4b46-8135-0f390c159ded, volume: d9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6 libvirtEventLoop::INFO::2016-02-02 16:38:27,897::logUtils::51::dispatcher::(wrapper) Run and protect: sendExtendMsg, Return response: None libvirtEventLoop::DEBUG::2016-02-02 16:38:27,897::task::1191::Storage.TaskManager.Task::(prepare) Task=`57fa165a-27dd-4ed0-8807-48598958d36e`::finished: None libvirtEventLoop::DEBUG::2016-02-02 16:38:27,897::task::595::Storage.TaskManager.Task::(_updateState) Task=`57fa165a-27dd-4ed0-8807-48598958d36e`::moving from state preparing -> state finished libvirtEventLoop::DEBUG::2016-02-02 16:38:27,897::resourceManager::943::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} libvirtEventLoop::DEBUG::2016-02-02 16:38:27,897::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} libvirtEventLoop::DEBUG::2016-02-02 16:38:27,897::task::993::Storage.TaskManager.Task::(_decref) Task=`57fa165a-27dd-4ed0-8807-48598958d36e`::ref 0 aborting False libvirtEventLoop::DEBUG::2016-02-02 16:38:27,898::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"0ee7709a-c4db-443d-ad5e-8c4cb941c62d": {"status": "Paused", "ioerror": {"alias": "virtio-disk2", "name": "vdc", "path": "/rhev/data-center/c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2/2deae191-8162-4b46-8135-0f390c159ded/images/9ffc9eac-56d0-4583-9d73-5b67a75b0f6f/d9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6"}, "pauseCode": "ENOSPC"}, "notify_time": 4464231980}, "jsonrpc": "2.0", "method": "|virt|VM_status|0ee7709a-c4db-443d-ad5e-8c4cb941c62d"} libvirtEventLoop::DEBUG::2016-02-02 16:38:27,899::vm::4563::virt.vm::(onLibvirtLifecycleEvent) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2016-02-02 16:38:27,899::vm::5084::virt.vm::(_logGuestCpuStatus) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::CPU stopped: onSuspend libvirtEventLoop::INFO::2016-02-02 16:38:27,899::vm::3728::virt.vm::(onIOError) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::abnormal vm stop device virtio-disk2 error enospc libvirtEventLoop::INFO::2016-02-02 16:38:27,899::vm::5084::virt.vm::(_logGuestCpuStatus) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::CPU stopped: onIOError libvirtEventLoop::INFO::2016-02-02 16:38:27,910::vm::1026::virt.vm::(extendDrivesIfNeeded) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::Requesting extension for volume d9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6 on domain 2deae191-8162-4b46-8135-0f390c159ded (apparent: 4294967296, capacity: 11811160064, allocated: 4302503424, physical: 4294967296) libvirtEventLoop::DEBUG::2016-02-02 16:38:27,911::vm::1091::virt.vm::(__extendDriveVolume) vmId=`0ee7709a-c4db-443d-ad5e-8c4cb941c62d`::Requesting an extension for the volume: {'newSize': 5368709120, 'domainID': u'2deae191-8162-4b46-8135-0f390c159ded', 'name': u'vdc', 'volumeID': u'd9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6', 'imageID': u'9ffc9eac-56d0-4583-9d73-5b67a75b0f6f', 'internal': False, 'poolID': u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2'} libvirtEventLoop::DEBUG::2016-02-02 16:38:27,911::task::595::Storage.TaskManager.Task::(_updateState) Task=`dacd6406-591b-46fd-9255-36bac2859335`::moving from state init -> state preparing libvirtEventLoop::INFO::2016-02-02 16:38:27,911::logUtils::48::dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID=u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2', volDict={'newSize': 5368709120, 'domainID': u'2deae191-8162-4b46-8135-0f390c159ded', 'name': u'vdc', 'volumeID': u'd9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6', 'imageID': u'9ffc9eac-56d0-4583-9d73-5b67a75b0f6f', 'internal': False, 'poolID': u'c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2'}, newSize=5368709120, callbackFunc=<bound method Vm.__afterVolumeExtension of <virt.vm.Vm object at 0x7f96ec029590>>) libvirtEventLoop::DEBUG::2016-02-02 16:38:27,912::storage_mailbox::123::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 2deae191-8162-4b46-8135-0f390c159ded, volume: d9e9897b-fdb1-4c73-9ff9-a5d9b0ec2be6 engine.log: --------------- 2016-02-02 16:31:43,666 INFO [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [] Stage: Pre-termination 2016-02-02 16:31:43,692 INFO [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [] Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-mgmt-20160202163143-10.35.102.15-n ull.log' 2016-02-02 16:31:43,863 INFO [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [] Stage: Termination 2016-02-02 16:38:26,658 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-4) [] VM '0ee7709a-c4db-443d-ad5e-8c4cb941c62d'(vm1) moved from 'Up' --> 'Paused' 2016-02-02 16:38:26,735 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused. 2016-02-02 16:38:26,793 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused due to no Storage space error. 2016-02-02 16:39:11,971 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-22) [] VM '0ee7709a-c4db-443d-ad5e-8c4cb941c62d'(vm1) moved from 'Paused' --> 'Up' 2016-02-02 16:39:12,060 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-22) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has recovered from paused back to up. 2016-02-02 17:01:09,052 INFO [org.ovirt.engine.core.bll.AddVmCommand] (ajp-/127.0.0.1:8702-4) [2ae17833] Lock Acquired to object 'EngineLock:{exclusiveLocks='[vm3=<VM_NAME, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[00000000-0000-0000-0000-000000000000=<TEMPLATE, ACTION_TYPE_FAILED_TEMPLATE_IS_USED_FOR_CREATE_VM$VmName vm3>]'}' 2016-02-02 17:01:09,299 INFO [org.ovirt.engine.core.bll.AddVmCommand] (ajp-/127.0.0.1:8702-4) [] Running command: AddVmCommand internal: false. Entities affected : ID: dace548b-40bf-4bfa-9a76-a5fcaa058d77 Type: VdsGroupsAction group CREATE_VM with role type USER, ID: 00000000-0000-0000-0000-000000000000 Type: VmTemplateAction group CREATE_VM with role type USER 2016-02-02 17:01:09,827 INFO [org.ovirt.engine.core.bll.AddGraphicsDeviceCommand] (ajp-/127.0.0.1:8702-4) [55c5cd43] Running command: AddGraphicsDeviceCommand internal: true. Entities affected : ID: 0e283ef3-1fd6-43ac-84da-bbc1815baff2 Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2016-02-02 17:01:09,964 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-4) [55c5cd43] Correlation ID: 2ae17833, Job ID: baab7406-d79d-4166-b2ea-702d37771164, Call Stack: null, Custom Event ID: -1, Message: VM vm3 was created by admin@internal. 2016-02-02 17:01:10,025 INFO [org.ovirt.engine.core.bll.AddVmCommand] (ajp-/127.0.0.1:8702-4) [55c5cd43] Lock freed to object 'EngineLock:{exclusiveLocks='[vm3=<VM_NAME, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[00000000-0000-0000-0000-000000000000=<TEMPLATE, ACTION_TYPE_FAILED_TEMPLATE_IS_USED_FOR_CREATE_VM$VmName vm3>]'}' 2016-02-02 17:01:11,130 INFO [org.ovirt.engine.core.bll.network.vm.AddVmInterfaceCommand] (ajp-/127.0.0.1:8702-7) [20721d59] Running command: AddVmInterfaceCommand internal: false. Entities affected : ID: 0e283ef3-1fd6-43ac-84da-bbc1815baff2 Type: VMAction group CONFIGURE_VM_NETWORK with role type USER, ID: 14e712d8-f4e4-4812-8094-bd1ec406e214 Type: VnicProfileAction group CONFIGURE_VM_NETWORK with role type USER 2016-02-02 17:01:11,517 INFO [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ajp-/127.0.0.1:8702-7) [6c5fbdab] Running command: ActivateDeactivateVmNicCommand internal: true. Entities affected : ID: 0e283ef3-1fd6-43ac-84da-bbc1815baff2 Type: VM 2016-02-02 17:01:11,701 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [6c5fbdab] Correlation ID: 6c5fbdab, Call Stack: null, Custom Event ID: -1, Message: Network Interface nic1 (VirtIO) was plugged to VM vm3. (User: admin@internal) 2016-02-02 17:01:11,782 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [6c5fbdab] Correlation ID: 20721d59, Call Stack: null, Custom Event ID: -1, Message: Interface nic1 (VirtIO) was added to VM vm3. (User: admin@internal) 2016-02-02 17:01:11,966 INFO [org.ovirt.engine.core.bll.network.vm.ReorderVmNicsCommand] (ajp-/127.0.0.1:8702-15) [16dafa38] Running command: ReorderVmNicsCommand internal: false. Entities affected : ID: 0e283ef3-1fd6-43ac-84da-bbc1815baff2 Type: VMAction group CREATE_VM with role type USER 2016-02-02 17:01:16,237 INFO [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-7) [1c83a37a] Lock Acquired to object 'EngineLock:{exclusiveLocks='[0e283ef3-1fd6-43ac-84da-bbc1815baff2=<VM_DISK_BOOT, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[0e283ef3-1fd6-43ac-84da-bbc1815baff2=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2016-02-02 17:01:16,364 INFO [org.ovirt.engine.core.bll.AddDiskCommand] (ajp-/127.0.0.1:8702-7) [1c83a37a] Running command: AddDiskCommand internal: false. Entities affected : ID: 0e283ef3-1fd6-43ac-84da-bbc1815baff2 Type: VMAction group CONFIGURE_VM_STORAGE with role type USER, ID: 2deae191-8162-4b46-8135-0f390c159ded Type: StorageAction group CREATE_DISK with role type USER 2016-02-02 17:01:16,399 INFO [org.ovirt.engine.core.bll.AddImageFromScratchCommand] (ajp-/127.0.0.1:8702-7) [47b383] Running command: AddImageFromScratchCommand internal: true. Entities affected : ID: 2deae191-8162-4b46-8135-0f390c159ded Type: Storage 2016-02-02 17:01:16,550 INFO [org.ovirt.engine.core.bll.AddImageFromScratchCommand] (ajp-/127.0.0.1:8702-7) [47b383] Lock freed to object 'EngineLock:{exclusiveLocks='[0e283ef3-1fd6-43ac-84da-bbc1815baff2=<VM_DISK_BOOT, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[0e283ef3-1fd6-43ac-84da-bbc1815baff2=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]'}' 2016-02-02 17:01:17,602 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (ajp-/127.0.0.1:8702-7) [47b383] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{runAsync='true', storagePoolId='c96b5ce2-9b6b-467f-85f2-ef212eb3ebc2', ignoreFailoverLimit='false', storageDomainId='2deae191-8162-4b46-8135-0f390c159ded', imageGroupId='bcab9f13-d6e0-490a-99d5-4142a177287a', imageSizeInBytes='5368709120', volumeFormat='COW', newImageId='1d14a138-21eb-4376-b12b-f6d8b6eb2c54', newImageDescription='{"DiskAlias":"vm3_Disk1","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 2e2be3b3 2016-02-02 17:01:17,690 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (ajp-/127.0.0.1:8702-7) [47b383] -- executeIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID 2016-02-02 17:01:18,790 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (ajp-/127.0.0.1:8702-7) [47b383] FINISH, CreateImageVDSCommand, return: 1d14a138-21eb-4376-b12b-f6d8b6eb2c54, log id: 2e2be3b3
Created attachment 1120791 [details] engine, vdsm, server logs Added requested logs
(In reply to Nir Soffer from comment #5) > Vdsm reports about vm being paused or resume are in Francesco territory, and > not related to storage. > > We need to analyze the extend events, to check that we did receive extend > events > on time, and that we did handle them on as expected. > > Generally, I think this test is not valid use case; when writing zeros, > storage > server can optimize the write, making the write operation much faster then we > can extend. We should test extend flow with a real world use case, such as > provisioning a vm, or copying data from one storage to another storage. Specifically, XtremIO indeed optimizes that and does not write anything. In fact, even metadata is not allocated (reads from unallocated blocks return zero, so it's the same thing). > > Setting the severity to medium since this is not a valid way to test extend. > Please update the severity if you can reproduce this issue during real flows. Agreed. Setting NEEDINFO on reporter.
Created attachment 1130862 [details] Repoplot of vdsm.log
Looking in repoplot of vdsm log (attachment 1130862 [details]) we can see slow mailbox check for mail commands (1 second) during the entire log (16:00 - 19:00). At 18:36, we see very slow mailbox check for mail command (5 seconds). At the same time, there is also high read delay (6 seconds). Slow spm mailbox check leads to slow extend, which may lead to pausing of a vm. So a possible theory for this failure is: 1. Doing big copy, saturating the storage bandwidth 2. Mailbox check for mail commands become slow 3. Extend operation delayed 4. Vm get paused Todo: 1. Understand why we see slow check for mail (1 second) - this should be much faster. On my poor storage (targetcli, 1Gib nic), this takes about 50 milliseconds. 2. Check the logs around 18:36, to understand if the vm paused at this point. Aharon, can you describe the environment where this was tested? - iscsi/fc - nics used for storage - storage server
Created attachment 1130868 [details] Repoplot of vdsm.log (fixed) This version fix display of values around 1 second.
Regarding the high mailbox check for mail dd commands, this was the result of truncating timestamps to seconds resolution. Attachment 1130862 [details] show that the values are pretty low and should not introduce a significant delay in the mailbox mechanism.
(In reply to Nir Soffer from comment #19) > of truncating timestamps to seconds resolution. Attachment 1130862 [details] Correction: attachment 1130868 [details]
Info needed from Kevin
Based on the discussion above there doesn't seem to be anything actionable here - this is the "by design" limitation. Pushing out to 4.0 to reconsider.
(In reply to Yaniv Kaul from comment #2) > Kevin, is the VM paused for several seconds or minutes? The description is > not clear: > > "Writing to thin isci disk with dd pauses VM for 20-40 seconds during > lvextend" > > vs. > > "4. Using dd write 5G of zero's to the disk >>>>> During the write the VM is > paused up to 40 minutes during the lvextend operation" > > > Also, you have not provided information on the storage specs, etc. The VM is paused for a for 20-40 Seconds, Not minutes. Typo. Sorry
(In reply to Nir Soffer from comment #17) > Looking in repoplot of vdsm log (attachment 1130862 [details]) we can see > slow > mailbox check for mail commands (1 second) during the entire log > (16:00 - 19:00). > > At 18:36, we see very slow mailbox check for mail command (5 seconds). At > the > same time, there is also high read delay (6 seconds). Slow spm mailbox check > leads to slow extend, which may lead to pausing of a vm. > > So a possible theory for this failure is: > > 1. Doing big copy, saturating the storage bandwidth > 2. Mailbox check for mail commands become slow > 3. Extend operation delayed > 4. Vm get paused > > Todo: > > 1. Understand why we see slow check for mail (1 second) - this should be > much faster. On my poor storage (targetcli, 1Gib nic), this takes about > 50 milliseconds. > > 2. Check the logs around 18:36, to understand if the vm paused at this point. > > Aharon, can you describe the environment where this was tested? > - iscsi/fc > - nics used for storage > - storage server Environment is: iscsi 1 Nic ExtremeIO
(In reply to Aharon Canan from comment #21) > Info needed from Kevin In response to Comment 17, The environment is as follows: iscsi 1 Nic ExtremeIO
(In reply to Kevin Alon Goldblatt from comment #25) > (In reply to Aharon Canan from comment #21) > > Info needed from Kevin > > > In response to Comment 17, > The environment is as follows: > iscsi > 1 Nic 10g or 1g? In any case, quite an unusual configuration for iSCSI - usually, multipathing is employed. > ExtremeIO XtremIO. Did you check the latency on the XtremIO of that initiator? Volume? Regardless, it should not take so long. However, you are writing zero's. XtremIO does not write zeros, it's just metadata operation. Therefore, it is very fast (and unrealistic) scenario. Did it ever reproduce with any other IO pattern? I assume 1G or 10G link, you can probably close to saturating the link?
Created attachment 1133561 [details] Repoplot of vdsm.log (improved) Created with latest version of repoplot, providing more information.
(In reply to Yaniv Kaul from comment #26) > (In reply to Kevin Alon Goldblatt from comment #25) > > (In reply to Aharon Canan from comment #21) > > > Info needed from Kevin > > > > > > In response to Comment 17, > > The environment is as follows: > > iscsi > > 1 Nic > > 10g or 1g? In any case, quite an unusual configuration for iSCSI - usually, > multipathing is employed. We are using multipathing of 4 paths We are using a 1G ethernet connection from the hosts > > > ExtremeIO > > XtremIO. > > Did you check the latency on the XtremIO of that initiator? Volume? > > Regardless, it should not take so long. > However, you are writing zero's. XtremIO does not write zeros, it's just > metadata operation. Therefore, it is very fast (and unrealistic) scenario. > Did it ever reproduce with any other IO pattern? I assume 1G or 10G link, > you can probably close to saturating the link? I encountered the same behaviour when using 'dd' from /dev/zero AND as mentioned in comment 10, when copying a regular file with 'cp'. So that removes the 'dd' from /dev/zero possibility from the equation
(In reply to Kevin Alon Goldblatt from comment #28) > (In reply to Yaniv Kaul from comment #26) > > (In reply to Kevin Alon Goldblatt from comment #25) > > > (In reply to Aharon Canan from comment #21) > > > > Info needed from Kevin > > > > > > > > > In response to Comment 17, > > > The environment is as follows: > > > iscsi > > > 1 Nic > > > > 10g or 1g? In any case, quite an unusual configuration for iSCSI - usually, > > multipathing is employed. > > We are using multipathing of 4 paths > We are using a 1G ethernet connection from the hosts But essentially a single NIC? That's very low. It's good that you have multi-pathing, but meaningless on a 1GB link (it's easy to saturate a single 1GB with a single TCP connection - on 10GB it's better to have multiple TCP connections, which is exactly what you have with multipathing). > > > > > > ExtremeIO > > > > XtremIO. > > > > Did you check the latency on the XtremIO of that initiator? Volume? > > > > Regardless, it should not take so long. > > However, you are writing zero's. XtremIO does not write zeros, it's just > > metadata operation. Therefore, it is very fast (and unrealistic) scenario. > > Did it ever reproduce with any other IO pattern? I assume 1G or 10G link, > > you can probably close to saturating the link? > > I encountered the same behaviour when using 'dd' from /dev/zero AND as > mentioned in comment 10, when copying a regular file with 'cp'. So that > removes the 'dd' from /dev/zero possibility from the equation Where are you copying it from? From XtremIO to XtremIO? It's dedup'ing - again, no real write. What I keep trying to understand here - could it be that you are expanding faster than we are extending? For this, I need to know the effective speed at which you are writing. It may well be that you are doing it faster.
I ran the same operation again as follows: Using the following code: ---------------------------------------- rhevm-4.0.0-0.7.master.el7ev.noarch vdsm-4.18.0-16.git51df339.el7.centos.x86_64 Ran the following scenario: --------------------------------------- 1. Created a VM and installed Rhel7.2 2. Added 1 iscsi thin disk and 1 nfs thin disk 3. Installed file system on both disks 4. Mounted both disks 5. Copied an iso file of 1.4GB from the fedora site to the nfs disk 6. Using 'cp' copied the file from the nfs disk to the iscsi disk >>>>> After about a minute or so of copying the VM was paused as before in all my other attemps. So I ruled out the 'dd from /dev/zero' issue by using cp of a file And I ruled out the de-duplication possibility on the xtremeIO I also ruled out the 1G NIC issue by using a host with a 10G NIC I am attaching the new logs
Created attachment 1160629 [details] vdsm server and engine logs Added logs
Engine.log: --------------------- 2016-05-22 18:40:00,022 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-60) [] Autorecovering 1 storage domains 2016-05-22 18:40:00,022 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-60) [] Autorecovering storage domains id: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 2016-05-22 18:40:00,026 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] Running command: ConnectDomainToSto rageCommand internal: true. Entities affected : ID: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 Type: Storage 2016-05-22 18:40:00,026 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] ConnectDomainToStorage. Before Conn ect all hosts to pool. Time: Sun May 22 18:40:00 IDT 2016 2016-05-22 18:40:00,056 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-23) [39c2c7fa] START, ConnectStorageServerVDSComman d(HostName = puma22.scl.lab.tlv.redhat.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='9a9847eb-8132-4215-8b93-90cb8bfb6440', storagePoolId='a7484455-20aa-4dd e-87b3-f60956fc7e9d', storageType='ISCSI', connectionList='[StorageServerConnections:{id='b449f09b-0613-4af4-8f9d-80951f4741bd', connection='10.35.146.225', iqn='iqn.2008-05.com.xtremio:xi o00153500071-514f0c50023f6c05', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id ='2c148526-496c-4a9c-86cd-a068aa0b2846', connection='10.35.146.193', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c04', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='9eecb527-422e-4360-af37-190800c6b502', connection='10.35.146.129', iqn='iqn.2008-05.co m.xtremio:xio00153500071-514f0c50023f6c00', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerCon nections:{id='e1d0e6c0-5088-46ad-872e-acda578f7568', connection='10.35.146.161', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c01', vfsType='null', mountOptions='null', nfsVers ion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 3c00a0d1 2016-05-22 18:40:03,197 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-23) [39c2c7fa] FINISH, ConnectStorageServerVDSComma nd, return: {b449f09b-0613-4af4-8f9d-80951f4741bd=0, 2c148526-496c-4a9c-86cd-a068aa0b2846=0, e1d0e6c0-5088-46ad-872e-acda578f7568=0, 9eecb527-422e-4360-af37-190800c6b502=0}, log id: 3c00a0 d1 2016-05-22 18:40:03,198 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] ConnectDomainToStorage. After Conne ct all hosts to pool. Time: Sun May 22 18:40:03 IDT 2016 2016-05-22 18:41:28,090 WARN [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-6) [] skipping VM '521e0a5d-737d-4382-ad47-e2d884b24786' from this monitorin g cycle - the VM data has changed since fetching the data 2016-05-22 18:42:11,141 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler_Worker-23) [] VM '521e0a5d-737d-4382-ad47-e2d884b24786'(vm1) moved from 'Up' - -> 'Paused' 2016-05-22 18:42:11,266 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-23) [] Correlation ID: null, Call Stack: null, Custom Ev ent ID: -1, Message: VM vm1 has been paused. 2016-05-22 18:42:11,394 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-23) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused due to no Storage space error. VDSM.log: -------------------- 2016-05-22 18:35:00,041 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-28) [2d5fdbbe] START, ConnectStorageServerVDSComman d(HostName = puma22.scl.lab.tlv.redhat.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='9a9847eb-8132-4215-8b93-90cb8bfb6440', storagePoolId='a7484455-20aa-4dd e-87b3-f60956fc7e9d', storageType='ISCSI', connectionList='[StorageServerConnections:{id='b449f09b-0613-4af4-8f9d-80951f4741bd', connection='10.35.146.225', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='2c148526-496c-4a9c-86cd-a068aa0b2846', connection='10.35.146.193', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c04', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='9eecb527-422e-4360-af37-190800c6b502', connection='10.35.146.129', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='e1d0e6c0-5088-46ad-872e-acda578f7568', connection='10.35.146.161', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c01', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 1cd3cb8d 2016-05-22 18:35:02,432 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-28) [2d5fdbbe] FINISH, ConnectStorageServerVDSCommand, return: {b449f09b-0613-4af4-8f9d-80951f4741bd=0, 2c148526-496c-4a9c-86cd-a068aa0b2846=0, e1d0e6c0-5088-46ad-872e-acda578f7568=0, 9eecb527-422e-4360-af37-190800c6b502=0}, log id: 1cd3cb8d 2016-05-22 18:35:02,433 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-49) [2d5fdbbe] ConnectDomainToStorage. After Connect all hosts to pool. Time: Sun May 22 18:35:02 IDT 2016 2016-05-22 18:40:00,022 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-60) [] Autorecovering 1 storage domains 2016-05-22 18:40:00,022 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-60) [] Autorecovering storage domains id: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 2016-05-22 18:40:00,026 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] Running command: ConnectDomainToStorageCommand internal: true. Entities affected : ID: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 Type: Storage 2016-05-22 18:40:00,026 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] ConnectDomainToStorage. Before Connect all hosts to pool. Time: Sun May 22 18:40:00 IDT 2016 2016-05-22 18:40:00,056 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-23) [39c2c7fa] START, ConnectStorageServerVDSCommand(HostName = puma22.scl.lab.tlv.redhat.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='9a9847eb-8132-4215-8b93-90cb8bfb6440', storagePoolId='a7484455-20aa-4dde-87b3-f60956fc7e9d', storageType='ISCSI', connectionList='[StorageServerConnections:{id='b449f09b-0613-4af4-8f9d-80951f4741bd', connection='10.35.146.225', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='2c148526-496c-4a9c-86cd-a068aa0b2846', connection='10.35.146.193', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c04', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='9eecb527-422e-4360-af37-190800c6b502', connection='10.35.146.129', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}, StorageServerConnections:{id='e1d0e6c0-5088-46ad-872e-acda578f7568', connection='10.35.146.161', iqn='iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c01', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 3c00a0d1 2016-05-22 18:40:03,197 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-23) [39c2c7fa] FINISH, ConnectStorageServerVDSCommand, return: {b449f09b-0613-4af4-8f9d-80951f4741bd=0, 2c148526-496c-4a9c-86cd-a068aa0b2846=0, e1d0e6c0-5088-46ad-872e-acda578f7568=0, 9eecb527-422e-4360-af37-190800c6b502=0}, log id: 3c00a0d1 2016-05-22 18:40:03,198 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-60) [39c2c7fa] ConnectDomainToStorage. After Connect all hosts to pool. Time: Sun May 22 18:40:03 IDT 2016 2016-05-22 18:41:28,090 WARN [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-6) [] skipping VM '521e0a5d-737d-4382-ad47-e2d884b24786' from this monitoring cycle - the VM data has changed since fetching the data 2016-05-22 18:42:11,141 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler_Worker-23) [] VM '521e0a5d-737d-4382-ad47-e2d884b24786'(vm1) moved from 'Up' --> 'Paused' 2016-05-22 18:42:11,266 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-23) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 has been paused. 2016-05-22 18:42:11,394 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-23) [] Correlation ID: null, Call Stack: null, Custom Ev: 2016-05-22 18:45:00,023 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-40) [] Autorecovering 1 storage domains 2016-05-22 18:45:00,024 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-40) [] Autorecovering storage domains id: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 2016-05-22 18:45:00,027 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-40) [2966a79b] Running command: ConnectDomainToStorageCommand internal: true. Entities affected : ID: e3cf4b5c-ba13-40aa-83e5-5496f36fb036 Type: Storage 2016-05-22 18:45:00,027 INFO [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-40) [2966a79b] ConnectDomainToStorage. Before Connect all hosts to pool. Time: Sun May 22 18:45:00 IDT 2016
Eyal - with the workaround suggestion, what is the latest on this?
What is the workaround?
(In reply to Yaniv Dary from comment #34) > What is the workaround? Under the [irs] section, set something like: volume_utilization_percent=25 volume_utilization_chunk_mb=10240 This will start increasing the disk after 25% is filled and will increase it by 10G.
(In reply to Yaniv Kaul from comment #33) > Eyal - with the workaround suggestion, what is the latest on this? Returning the needinfo that was removed by mistake.
Pushing out. If we have actionable data, we can consider bringing this back in to 4.2.
Closing. If QE gets to reproduce this, please feel free to re-open.
Clearing needinfo