Description of problem: Unknown drive for vm - ignored block threshold event Version-Release number of selected component (if applicable): vdsm-4.40.40-1.el8ev.x86_64 qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64 libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64 How reproducible: reported once Steps to Reproduce: 1. create a snapshot on raw drive using virtio-scsi 2. do some changes in active layer so the extension of the active layer is needed 3. Actual results: drive become unknown and block threshold event is ignored Expected results: drive did not become unkwnown for vm to libvirt Additional info:
Seems related to the extend operation, thus moving to the storage team for further investigation
Marian, Can you please attach the engine.log to get the complete picture?
Nir, can you please have a look? Maybe the latest changes in the extension mechanism should solve this too.
Marian, did we do live storage migration in this vm before this error? If yes, this is a duplicate of bug 1951507. If you don't think this is a duplicate, please provide libvirt debug log showing showing the time this error started. The libvirt debug log should use these settings: # Libvirt logging for debugging qemu vms # https://www.libvirt.org/kbase/debuglogs.html#targeted-logging-for-debugging-qemu-vms # NOTE: filter order matters, util must be last to avoid noisy object logs. log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log"
Marian, ignore comment 9, this seems to be another issue.
Nir, any data is needed here in order to declare that this bug is different from bug 1951507 ? It is not clear from comment #10.
Combining vdsm.log.1 from attachment 1770950 [details] and vdsm.log.2 from attachment 1770951 [details] we see: $ xzegrep "for drive 'sdi'|exceeded on 'sdi'" vdsm-combined.log.xz 2021-04-10 07:11:55,829+0800 INFO (periodic/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 86436216832 bytes for drive 'sdi' (apparentsize 88046829568) (drivemonitor:116) 2021-04-10 08:12:10,550+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 08:12:17,345+0800 INFO (mailbox-hsm/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 88583700480 bytes for drive 'sdi' (apparentsize 90194313216) (drivemonitor:116) 2021-04-10 08:12:22,412+0800 INFO (mailbox-hsm/0) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 88583700480 bytes for drive 'sdi' (apparentsize 90194313216) (drivemonitor:116) 2021-04-10 09:56:15,172+0800 INFO (periodic/0) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 97173635072 bytes for drive 'sdi' (apparentsize 98784247808) (drivemonitor:116) 2021-04-10 10:03:52,366+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold -1610612736 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) This looks like a bug in libvirt or qemu. We set the threshold to 97173635072 7 minutes earlier. How the threshold can be -1610612736? Libvirt documentation says: https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectDomainEventBlockThresholdCallback excess number of bytes written beyond the threshold So -1610612736 bytes (-1.5G) means that the threshold was not exceeded. 2021-04-10 10:04:04,590+0800 INFO (mailbox-hsm/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 99321118720 bytes for drive 'sdi' (apparentsize 100931731456) (drivemonitor:116) 2021-04-10 10:09:54,741+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 10:10:03,604+0800 INFO (mailbox-hsm/0) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:07,967+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,011+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,041+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,054+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,067+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,089+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,102+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,121+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,158+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,178+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,204+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:10:08,231+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:12:15,584+0800 INFO (periodic/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 2021-04-10 10:21:29,772+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold -1610612736 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) Again, same impossible threshold value (1.5 GiB). 2021-04-10 10:21:39,542+0800 INFO (mailbox-hsm/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 103616086016 bytes for drive 'sdi' (apparentsize 105226698752) (drivemonitor:116) 2021-04-10 10:31:52,393+0800 INFO (periodic/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101361228186 bytes for drive 'sdi' (apparentsize 105226698752) (drivemonitor:116) 2021-04-10 10:32:23,459+0800 INFO (mailbox-hsm/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 105656195482 bytes for drive 'sdi' (apparentsize 109521666048) (drivemonitor:116) 2021-04-10 10:36:52,001+0800 INFO (periodic/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 105870943847 bytes for drive 'sdi' (apparentsize 109521666048) (drivemonitor:116) 2021-04-10 10:38:38,353+0800 INFO (periodic/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 107911053312 bytes for drive 'sdi' (apparentsize 109521666048) (drivemonitor:116) Regarding messages about drive "sdi[7]": $ xzegrep "sdi\[7\]" vdsm-combined.log.xz 2021-04-10 08:12:10,550+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi[7]' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 08:12:10,550+0800 WARN (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unknown drive 'sdi[7]' for vm 7af5361c-012b-41f9-9ea9-33917df23f21 - ignored block threshold event (drivemonitor:177) 2021-04-10 10:03:52,366+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold -1610612736 exceeded on 'sdi[7]' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 10:03:52,366+0800 WARN (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unknown drive 'sdi[7]' for vm 7af5361c-012b-41f9-9ea9-33917df23f21 - ignored block threshold event (drivemonitor:177) 2021-04-10 10:09:54,741+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi[7]' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 10:09:54,741+0800 WARN (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unknown drive 'sdi[7]' for vm 7af5361c-012b-41f9-9ea9-33917df23f21 - ignored block threshold event (drivemonitor:177) 2021-04-10 10:21:29,772+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold -1610612736 exceeded on 'sdi[7]' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2021-04-10 10:21:29,772+0800 WARN (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unknown drive 'sdi[7]' for vm 7af5361c-012b-41f9-9ea9-33917df23f21 - ignored block threshold event (drivemonitor:177) Vdsm never set block threshold for 'sdi[7]', we don't support it yet. So this look like libvirt bug, reporting threshold for drive that was never registered for the event. Looking at the path reported for 'sdi[7]': /rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9 This is same path used by 'sdi': /rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9 So it looks like libvirt is confused about events from qemu, sending events with the wrong argument. Looking at what happened to the drive 'sdi' volume c3592154-f34d-4f86-8fa5-d1ea832bedd9: First event around 2021-04-10 08:12:10: 1. Block threshold exceeded 536870912 bytes (0.5 GiB) 2021-04-10 08:12:10,550+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi[7]' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2. Vdsm try to extend drive 90194313216 bytes from 82G to 84 GiB 2021-04-10 08:12:12,580+0800 INFO (periodic/2) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Requesting extension for volume c3592154-f34d-4f86-8fa5-d1ea832bedd9 on domain e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6 (apparent: 88046829568, capacity: 429496729600, allocated: 87225532416, physical: 88046829568 threshold_state: exceeded) (vm:1328) 2021-04-10 08:12:12,581+0800 INFO (periodic/2) [vdsm.api] START sendExtendMsg(spUUID='33f59a82-e273-433f-a59f-4ae661527d04', volDict={'domainID': 'e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6', 'imageID': 'aaf6adc8-eb42-4c5f-9279-1e1ca20fab24', 'internal': False, 'name': 'sdi', 'newSize': 90194313216, 'poolID': '33f59a82-e273-433f-a59f-4ae661527d04', 'volumeID': 'c3592154-f34d-4f86-8fa5-d1ea832bedd9', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>}, newSize=90194313216, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7f224c5a3ef0>>) from=internal, task_id=28fa4948-9f4b-476a-8d6c-6282a6e164cb (api:48) 3. Extend completed 2021-04-10 08:12:17,345+0800 INFO (mailbox-hsm/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Extend volume c3592154-f34d-4f86-8fa5-d1ea832bedd9 completed <Clock(total=4.77, extend-volume=2.10, refresh-volume=2.67)> (vm:1451) 4. Setting new threshold at (82.5 GiB) 2021-04-10 08:12:17,345+0800 INFO (mailbox-hsm/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 88583700480 bytes for drive 'sdi' (apparentsize 90194313216) (drivemonitor:116) No interesting event for a while... 5. Vm paused 2021-04-10 08:38:23,683+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') CPU stopped: onSuspend (vm:5842) I don't see any abnormal behaivor on RHV side. Looking at the vm crash at 2021-04-10 10:10:06,098: 1. Threshold exceeded on drive sdi 2021-04-10 10:09:54,741+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold 536870912 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) 2. Vdsm try to extend the volume 2021-04-10 10:09:55,220+0800 INFO (periodic/3) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Requesting extension for volume c3592154-f34d-4f86-8fa5-d1ea832bedd9 on domain e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6 (apparent: 100931731456, capacity: 429496729600, allocated: 99522248704, physical: 100931731456 threshold_state: exceeded) (vm:1328) 2021-04-10 10:09:55,221+0800 INFO (periodic/3) [vdsm.api] START sendExtendMsg(spUUID='33f59a82-e273-433f-a59f-4ae661527d04', volDict={'domainID': 'e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6', 'imageID': 'aaf6adc8-eb42-4c5f-9279-1e1ca20fab24', 'internal': False, 'name': 'sdi', 'newSize': 103079215104, 'poolID': '33f59a82-e273-433f-a59f-4ae661527d04', 'volumeID': 'c3592154-f34d-4f86-8fa5-d1ea832bedd9', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>}, newSize=103079215104, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7f224c2c02b0>>) from=internal, task_id=1ea96659-ca90-407a-ab36-c16fdf8a7c0c (api:48) 3. VM paused after 2 seconds 2021-04-10 10:09:56,966+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') abnormal vm stop device ua-aaf6adc8-eb42-4c5f-9279-1e1ca20fab24 error enospc (vm:4732) 2021-04-10 10:09:56,966+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') CPU stopped: onIOError (vm:5842) This usually happens when VM writes faster than vdsm is able to extend. The VM is resumed when vdsm finish the extend. 4. Extend completed after 8 seconds 2021-04-10 10:10:03,603+0800 INFO (mailbox-hsm/0) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Extend volume c3592154-f34d-4f86-8fa5-d1ea832bedd9 completed <Clock(total=8.38, extend-volume=4.19, refresh-volume=4.19)> (vm:1451) 5. Setting block threshold: 2021-04-10 10:10:03,604+0800 INFO (mailbox-hsm/0) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') setting block threshold to 101468602368 bytes for drive 'sdi' (apparentsize 103079215104) (drivemonitor:116) 6. Qemu crash? 2021-04-10 10:10:06,098+0800 ERROR (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') Unable to get watermarks for drive sdi: Unable to read from monitor: Connection reset by peer (vm:1305) This looks similar to other reports we have about qemu crashing after extending drives. This issue should be fixed in RHEL 8.4. I see 2 issues: - qemu crash, may be fixed in 8.4 - unexpected events from libvirt look like a libvirt issue. Marian: - Is this reproducible with libvirt 7.0.0 and qemu 5.2.0 in RHEL 8.4? - Do you have more detailed instructions on how to reproduce this? - Do you have libvirt debug log from this indecent?
Peter, do you have any idea why - libvirt reports block threshold event for drive "sdi[7]" when vdsm registered only "sdi"? - libvit reports negative excess value in block threshold events? (See comment 14)
Correction - the log: 2021-04-10 10:03:52,366+0800 INFO (libvirt/events) [virt.vm] (vmId='7af5361c-012b-41f9-9ea9-33917df23f21') block threshold -1610612736 exceeded on 'sdi' (/rhev/data-center/mnt/blockSD/e8fc6e55-b4e5-4977-8aaf-cca0a4cf6ec6/images/aaf6adc8-eb42-4c5f-9279-1e1ca20fab24/c3592154-f34d-4f86-8fa5-d1ea832bedd9) (drivemonitor:171) Is logged by: def on_block_threshold(self, dev, path, threshold, excess): """ Callback to be executed in the libvirt event handler when a BLOCK_THRESHOLD event is delivered. Args: dev: device name (e.g. vda, sdb) path: device path threshold: the threshold (in bytes) that was exceeded causing the event to trigger excess: amount (in bytes) written past the threshold """ self._log.info('block threshold %d exceeded on %r (%s)', threshold, dev, path) So -1610612736 is the threshold value, not the excess value. Negative value is still not a valid value.
The attached patch improves related logs in vdsm so we will have move info about block threshold events and makes it easier to grep the related logs.
(In reply to Nir Soffer from comment #15) > Peter, do you have any idea why > - libvirt reports block threshold event for drive "sdi[7]" > when vdsm registered only "sdi"? Nowadays all images have an indexed name. Libvirt reports the event for the image that it was registered for, which means that if it's a top level image, you get both 'sdi' and 'sdi[7]'. > - libvit reports negative excess value in block threshold events? The callback itself can't report negative values: void virConnectDomainEventBlockThresholdCallback (virConnectPtr conn, virDomainPtr dom, const char * dev, const char * path, unsigned long long threshold, unsigned long long excess, void * opaque) I think the problem is in the dispatching code in the python bindings, where we use: /* Call the Callback Dispatcher */ pyobj_ret = PyObject_CallMethod(pyobj_conn, (char*)"_dispatchDomainEventBlockThresholdCallback", (char*)"OssiiO", pyobj_dom, dev, path, threshold, excess, pyobj_cbData); the 'i' conversion expects an int: i (int) [int] Convert a plain C int to a Python integer object. While correct conversion would be: K (int) [unsigned long long] Convert a C unsigned long long to a Python integer object. I'll file a bug to fix the bindings.
(In reply to Peter Krempa from comment #19) > (In reply to Nir Soffer from comment #15) > > Peter, do you have any idea why > > - libvirt reports block threshold event for drive "sdi[7]" > > when vdsm registered only "sdi"? > > Nowadays all images have an indexed name. Libvirt reports the event for the > image that it was registered for, which means that if it's a top level > image, you get both 'sdi' and 'sdi[7]'. To explain the rationale a bit more, both events are reported as the top level image may have a threshold registered, but if it's moved into the backing chain by creating a snapshot and then e.g. hits the threshold during a block commit or other manipulations it would no longer be the top of the chain. Thus reporting both for top level images saves us from the possible ambiguity. > > > - libvit reports negative excess value in block threshold events? [...] > I'll file a bug to fix the bindings. https://bugzilla.redhat.com/show_bug.cgi?id=1976109 https://gitlab.com/libvirt/libvirt-python/-/merge_requests/56
(In reply to Peter Krempa from comment #20) > (In reply to Peter Krempa from comment #19) > > (In reply to Nir Soffer from comment #15) > > > Peter, do you have any idea why > > > - libvirt reports block threshold event for drive "sdi[7]" > > > when vdsm registered only "sdi"? > > > > Nowadays all images have an indexed name. Libvirt reports the event for the > > image that it was registered for, which means that if it's a top level > > image, you get both 'sdi' and 'sdi[7]'. > > To explain the rationale a bit more, both events are reported as the top > level image may have a threshold registered, but if it's moved into the > backing chain by creating a snapshot and then e.g. hits the threshold during > a block commit or other manipulations it would no longer be the top of the > chain. Thus reporting both for top level images saves us from the possible > ambiguity. Reporting event twice is not great, and not documented, but I don't want to make libvirt code more complicated than needed. I think on libvirt side, using bare drive name ("vda") should be deprecated, and the fact that you will get 2 events should be documented. This bug should be solved in vdsm by always registering "{name}[{index}]". I think we need it anyway in vdsm to monitor scratch disks and block copy destination disk.
Eyal, this bug should be fixed a side affect of the changes needed for bug 1913387, so I suggest to move this to 4.4.8.
(In reply to Nir Soffer from comment #22) > Eyal, this bug should be fixed a side affect of the changes needed for > bug 1913387, so I suggest to move this to 4.4.8. Do we have clear way to reproduce it?
(In reply to Eyal Shenitzky from comment #23) > Do we have clear way to reproduce it? Yes, we always get 2 events from libvirt, one will coarse a warning.
(In reply to Nir Soffer from comment #21) > (In reply to Peter Krempa from comment #20) > > (In reply to Peter Krempa from comment #19) > > > (In reply to Nir Soffer from comment #15) > > > > Peter, do you have any idea why > > > > - libvirt reports block threshold event for drive "sdi[7]" > > > > when vdsm registered only "sdi"? > > > > > > Nowadays all images have an indexed name. Libvirt reports the event for the > > > image that it was registered for, which means that if it's a top level > > > image, you get both 'sdi' and 'sdi[7]'. > > > > To explain the rationale a bit more, both events are reported as the top > > level image may have a threshold registered, but if it's moved into the > > backing chain by creating a snapshot and then e.g. hits the threshold during > > a block commit or other manipulations it would no longer be the top of the > > chain. Thus reporting both for top level images saves us from the possible > > ambiguity. > > Reporting event twice is not great, and not documented, but I don't want to > make libvirt code more complicated than needed. > > I think on libvirt side, using bare drive name ("vda") should be deprecated, > and the fact that you will get 2 events should be documented. Hmm, yeah. I'll do it upstream. > This bug should be solved in vdsm by always registering "{name}[{index}]". > I think we need it anyway in vdsm to monitor scratch disks and block copy > destination disk. Note that even if you register name[index] and the drive is the top one you also get the event without the index. I guess I can change that if it will make your life simpler.
(In reply to Peter Krempa from comment #25) ... > > This bug should be solved in vdsm by always registering "{name}[{index}]". > > I think we need it anyway in vdsm to monitor scratch disks and block copy > > destination disk. > > Note that even if you register name[index] and the drive is the top one you > also get the event without the index. I guess I can change that if it will > make your life simpler. Yes, I found this during testing, we still get 2 events. We can handle it now since we can handle now both drive name (vda) and indexed name (vda[7]) when we get the event, so we will not log a warning about this, but this is still not great. Should I file libvirt bug to submit only one event when registering indexed name?
(In reply to Nir Soffer from comment #26) > (In reply to Peter Krempa from comment #25) > ... > > > This bug should be solved in vdsm by always registering "{name}[{index}]". > > > I think we need it anyway in vdsm to monitor scratch disks and block copy > > > destination disk. > > > > Note that even if you register name[index] and the drive is the top one you > > also get the event without the index. I guess I can change that if it will > > make your life simpler. > > Yes, I found this during testing, we still get 2 events. We can handle it now > since we can handle now both drive name (vda) and indexed name (vda[7]) when > we get the event, so we will not log a warning about this, but this is still > not great. > > Should I file libvirt bug to submit only one event when registering indexed > name? Yes please, in the meanwhile I've already posted patches: https://listman.redhat.com/archives/libvir-list/2021-July/msg00013.html
(In reply to Peter Krempa from comment #27) > Yes please, in the meanwhile I've already posted patches: > > https://listman.redhat.com/archives/libvir-list/2021-July/msg00013.html Thanks, filed bug 1983429.
## How to reproduce: 1. Install vdsm < 4.40.80.2 2. Add 2g disk to running vm 3. In the guest write 512 MiB to the new disk Expected: Vdsm will warn about missing drive "vda[7]". The number may different, depending on the number of disk snapshots. ## What was changed in vdsm Under the hood, vdsm changed the way block threshold events are set and how events are handled. Previously it always used the drive name (vda). Now it always uses the indexed name (vda[7]). This allows dropping double events from libvirt for "vda". When receiving an event from libvirt, vdsm parse the indexed name (vda[7]) and use the drive name (vda) to find the drive and update the drive threshold state. ## Testing with the fix: 1. Install vdsm >= 4.40.80.2 2. Add 2g disk to running vm 3. In the guest write 512 MiB to the new disk Expected: Vdsm will not warn missing drive. Otherwise everything else should work in the same way, so running all automated tests should be enough. ## How indexed names work To check that vdsm set block threshold on the right volume, you can check the VM XML: This is a VM without any snapshots, but the disk "vda" is using a template. # virsh -r dumpxml vm1 ... <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native' discard='unmap'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/ab41ae09-0c8d-4072-89fe-632c31fe8553' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='block' index='3'> <format type='qcow2'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/9d63f782-7467-4243-af1e-5c1f8b49c111'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> <target dev='sda' bus='scsi'/> <serial>3649d84b-6f35-4314-900a-5e8024e3905c</serial> <boot order='1'/> <alias name='ua-3649d84b-6f35-4314-900a-5e8024e3905c'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> The disk name is 'sda'. <target dev='sda' bus='scsi'/> The top volume index is '1'. <source dev='/rhev/data-center/.../ab41ae09-0c8d-4072-89fe-632c31fe8553' index='1'> This indexed name of the top volume is "sda[1]". The template volume is <backingStore type='block' index='3'> The indexed name of the template volume is "sda[3]". To make your life more interesting, the index is set on the backingStore instead of the source element. When this VM starts, we see that it set block threshold on the top volume: 2021-07-19 21:31:09,038+0300 INFO (periodic/3) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 536870912 bytes for drive 'sda[1]' apparentsize 1073741824 (drivemonitor:121) When we create a snapshot, a new volume is created at the top: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native' discard='unmap'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/0a4adba8-6cd4-46ec-ad02-e0b0b786256b' index='4'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='block' index='1'> <format type='qcow2'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/ab41ae09-0c8d-4072-89fe-632c31fe8553'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='block' index='3'> <format type='qcow2'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/9d63f782-7467-4243-af1e-5c1f8b49c111'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> </backingStore> <target dev='sda' bus='scsi'/> <serial>3649d84b-6f35-4314-900a-5e8024e3905c</serial> <boot order='1'/> <alias name='ua-3649d84b-6f35-4314-900a-5e8024e3905c'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> Vdsm clears the threshold on the previous top volume: 2021-07-19 21:40:23,543+0300 INFO (virt/e8003037) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Clearing block threshold for drive 'sda[1]' (drivemonitor:153) And set the threshold on the new top volume: 2021-07-19 21:40:25,155+0300 INFO (periodic/2) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 536870912 bytes for drive 'sda[4]' apparentsize 1073741824 (drivemonitor:121) From this point, vdsm continues to monitor volume "sda[4]". When deleting a snapshot, the VM switches to use "sda[1]": <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native' discard='unmap'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/ab41ae09-0c8d-4072-89fe-632c31fe8553' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='block' index='3'> <format type='qcow2'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/9d63f782-7467-4243-af1e-5c1f8b49c111'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> <target dev='sda' bus='scsi'/> <serial>3649d84b-6f35-4314-900a-5e8024e3905c</serial> <boot order='1'/> <alias name='ua-3649d84b-6f35-4314-900a-5e8024e3905c'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> So vdsm starts monitoring "sda[1]": 2021-07-19 21:44:11,200+0300 INFO (periodic/1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 2684354560 bytes for drive 'sda[1]' apparentsize 3221225472 (drivemonitor:121) Note that the threshold changes as the volume is extended. In this example the volume actual size is 3.0 GiB. With the default chunk size, the threshold is set to 2.5 GiB. If we add a new thin disk (sdb) to this vm: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native' discard='unmap'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/ab41ae09-0c8d-4072-89fe-632c31fe8553' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='block' index='3'> <format type='qcow2'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/3649d84b-6f35-4314-900a-5e8024e3905c/9d63f782-7467-4243-af1e-5c1f8b49c111'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> <target dev='sda' bus='scsi'/> <serial>3649d84b-6f35-4314-900a-5e8024e3905c</serial> <boot order='1'/> <alias name='ua-3649d84b-6f35-4314-900a-5e8024e3905c'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/7ab94149-5409-41bb-8189-641a14d95afb/055b903f-2aff-4cf7-9e85-e1c586d5561c' index='5'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sdb' bus='scsi'/> <serial>7ab94149-5409-41bb-8189-641a14d95afb</serial> <alias name='ua-7ab94149-5409-41bb-8189-641a14d95afb'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> vdsm will start to monitor the new disk (sdb[5]): 2021-07-19 21:55:29,335+0300 INFO (periodic/0) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 536870912 bytes for drive 'sdb[5]' apparentsize 1073741824 (drivemonitor:121) ## How automatic disk extensions works If we write 512M to the new disk in the guest: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 615M 0 part [SWAP] └─sda4 8:4 0 4.4G 0 part / sdb 8:16 0 3G 0 disk sr0 11:0 1 1024M 0 rom # dd if=/dev/zero bs=1M count=512 of=/dev/sdb oflag=direct conv=fsync 512+0 records in 512+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 2.26763 s, 237 MB/s Vdsm will receive a block threshold event about disk sdb[5]: 2021-07-19 21:59:48,413+0300 INFO (libvirt/events) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Block threshold 536870912 exceeded by 327680 for drive 'sdb[5]' (/rhev/data-center/.../055b903f-2aff-4cf7-9e85-e1c586d5561c) (drivemonitor:176) Then it will extend the volume to 2 GiB: 2021-07-19 21:59:49,370+0300 INFO (periodic/0) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Requesting extension for volume 055b903f-2aff-4cf7-9e85-e1c586d5561c on domain feab3738-c158-4d48-8a41-b5a95c057a50 (apparent: 1073741824, capacity: 3221225472, allocated: 537198592, physical: 1073741824 threshold_state: exceeded) (vm:1349) ... 2021-07-19 21:59:49,370+0300 INFO (periodic/0) [vdsm.api] START sendExtendMsg(spUUID='3f8e08e6-5485-429e-a686-63754906b27b', volDict={'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50', 'imageID': '7ab94149-5409-41bb-8189-641a14d95afb', 'internal': False, 'name': 'sdb', 'newSize': 2147483648, 'poolID': '3f8e08e6-5485-429e-a686-63754906b27b', 'volumeID': '055b903f-2aff-4cf7-9e85-e1c586d5561c', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=2147483648, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7f72e1d64cf8>>) from=internal, task_id=ae11e083-4893-4546-a16a-0388532d9332 (api:48) ... 2021-07-19 21:59:51,548+0300 INFO (mailbox-hsm/1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Extend volume 055b9 03f-2aff-4cf7-9e85-e1c586d5561c completed <Clock(total=2.18, extend-volume=2.03, refresh-volume=0.15)> (vm:1584) When extension is completed, it will set a new block threhold: 2021-07-19 21:59:51,555+0300 INFO (mailbox-hsm/1) [virt.vm] (vmId='9c4050b1-8070-429f-ae36-9eae0935ccab') Setting block threshold to 1610612736 bytes for drive 'sdb[5]' apparentsize 2147483648 (drivemonitor:121)
Verified. Version: vdsm-4.40.80.2-1.el8ev.x86_64 QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4) Ovirt-Engine-4.4.8.1-0.9.el8ev Compared to vdsm version<4.40.80.2: Could see the problem : 021-07-25 15:10:08,025+0300 WARN (libvirt/events) [virt.vm] (vmId='c20bf29b-4be2-4391-b75c-688fefc13094') Unknown drive 'sdb[6]' for vm c20bf29b-4be2-4391-b75c-688fefc13094 - ignored block threshold event (drivemonitor:179) On the fixed versio (vdsm-4.40.80.2): 2021-07-25 15:30:03,514+0300 INFO (libvirt/events) [virt.vm] (vmId='778d1edb-8ff7-4d9b-8b5f-7394992c2b80') Block threshold 536870912 exceeded by 131072 for drive 'sda[1]' (/rhev/data-center/mnt/blockSD/4c3e9f6e-048e-4398-b161-e5168a5c0bbb/images/c70c3c85-82d2-4311-9943-6b3960823be7/b0c3e5d1-14c9-4eaa-bfd3-70cc97bb2df0) (drivemonitor:163) 2021-07-25 15:30:04,295+0300 INFO (periodic/0) [virt.vm] (vmId='778d1edb-8ff7-4d9b-8b5f-7394992c2b80') Requesting extension for volume b0c3e5d1-14c9-4eaa-bfd3-70cc97bb2df0 on domain 4c3e9f6e-048e-4398-b161-e5168a5c0bbb (apparent: 1073741824, capacity: 3221225472, allocated: 539885568, physical: 1073741824 threshold_state: exceeded) (vm:1349) 2021-07-25 15:30:04,296+0300 INFO (periodic/0) [vdsm.api] START sendExtendMsg(spUUID='67491ffb-097f-4858-8c8f-f815f6b362b3', volDict={'domainID': '4c3e9f6e-048e-4398-b161-e5168a5c0bbb', 'imageID': 'c70c3c85-82d2-4311-9943-6b3960823be7', 'internal': False, 'name': 'sda', 'newSize': 2147483648, 'poolID': '67491ffb-097f-4858-8c8f-f815f6b362b3', 'volumeID': 'b0c3e5d1-14c9-4eaa-bfd3-70cc97bb2df0', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'callback': None}, newSize=2147483648, callbackFunc=<bound method Vm.after_volume_extension of <vdsm.virt.vm.Vm object at 0x7fd5fc563978>>) from=internal, task_id=5e162567-023d-438f-8d34-ca6181e4a9d8 (api:48) 2021-07-25 15:30:04,296+0300 DEBUG (periodic/0) [storage.SPM.Messages.Extend] new extend msg created: domain: 4c3e9f6e-048e-4398-b161-e5168a5c0bbb, volume: b0c3e5d1-14c9-4eaa-bfd3-70cc97bb2df0 (mailbox:130) 2021-07-25 15:30:04,296+0300 INFO (periodic/0) [vdsm.api] FINISH sendExtendMsg return=None from=internal, task_id=5e162567-023d-438f-8d34-ca6181e4a9d8 (api:54) 2021-07-25 15:30:04,296+0300 DEBUG (periodic/0) [storage.TaskManager.Task] (Task='5e162567-023d-438f-8d34-ca6181e4a9d8') finished: None (task:1202) 2021-07-25 15:30:04,297+0300 DEBUG (periodic/0) [storage.TaskManager.Task] (Task='5e162567-023d-438f-8d34-ca6181e4a9d8') moving from state finished -> state finished (task:624) 2021-07-25 15:30:06,795+0300 INFO (mailbox-hsm/3) [virt.vm] (vmId='778d1edb-8ff7-4d9b-8b5f-7394992c2b80') Setting block threshold to 1610612736 bytes for drive 'sda[1]' apparentsize 2147483648 (drivemonitor:111)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Virtualization Host security and bug fix update [ovirt-4.4.8]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3459
Due to QE capacity, we are not going to cover this issue in our automation