1648659 – VM crash with error "Lost connection with qemu process" after disk resize + fast IO - Core dump to iothread

Bug 1648659 - VM crash with error "Lost connection with qemu process" after disk resize + fast IO - Core dump to iothread

Summary: VM crash with error "Lost connection with qemu process" after disk resize + f...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.2.7
Hardware:	x86_64
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:	Avihai
Docs Contact:
URL:
Whiteboard:
Depends On:	1671173 1716349 1809434
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-11 13:04 UTC by Yosi Ben Shimon
Modified:	2022-01-18 11:32 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2021-09-29 11:33:06 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs (1.75 MB, application/x-gzip) 2018-11-11 13:04 UTC, Yosi Ben Shimon	no flags	Details
engine and vdsm logs of the issue on ovirt-engine 4.3.2.1-0.1.el7 (776.40 KB, application/zip) 2019-03-21 16:15 UTC, Avihai	no flags	Details
4.3.3.1 logs (1.05 MB, application/gzip) 2019-03-30 18:26 UTC, Avihai	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-37857	0	None	None	None	2021-08-20 08:29:18 UTC

Description Yosi Ben Shimon 2018-11-11 13:04:19 UTC

Created attachment 1504383 [details]
logs

Description of problem:
The automation test case failed while trying to perform dd command on the VM.

From the art log:
2018-10-31 00:40:59,411 - MainThread - rhevmtests.storage.helpers - INFO - Performing command '/bin/dd bs=1M count=1358 if=/dev/vda of=/mount-point_3100405716/written_test_storage status=none'
2018-10-31 00:40:59,542 - MainThread - RemoteExecutor - DEBUG - [root.17.129/qum5net] Executing: /bin/dd bs=1M count=1358 if=/dev/vda of=/mount-point_3100405716/written_test_storage status=none
2018-10-31 00:46:59,854 - MainThread - RemoteExecutor - DEBUG - [root.17.129/qum5net] Results of command: /bin/dd bs=1M count=1358 if=/dev/vda of=/mount-point_3100405716/written_test_storage status=none
2018-10-31 00:46:59,858 - MainThread - RemoteExecutor - DEBUG - [root.17.129/qum5net]   OUT: None
2018-10-31 00:46:59,858 - MainThread - RemoteExecutor - DEBUG - [root.17.129/qum5net]   ERR: None
2018-10-31 00:46:59,858 - MainThread - RemoteExecutor - DEBUG - [root.17.129/qum5net]   RC: None
2018-10-31 00:47:00,198 - MainThread - art.logging - ERROR - Status: failed

*** Note that OUT & ERR are None.

From engine log:
2018-10-31 00:41:03,980+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] VM '37749548-6574-40ac-986b-ec89b85877f8'(vm_TestCase5063_3100372732) moved from 'Up' --> 'Paused'
2018-10-31 00:41:04,063+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_PAUSED(1,025), VM vm_TestCase5063_3100372732 has been paused.
2018-10-31 00:41:04,079+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_PAUSED_ENOSPC(138), VM vm_TestCase5063_3100372732 has been paused due to no Storage space error.
2018-10-31 00:41:18,655+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] VM '37749548-6574-40ac-986b-ec89b85877f8' was reported as Down on VDS '34ef902a-a22f-4aa3-83ec-82b2ebe534d4'(host_mixed_2)
2018-10-31 00:41:18,657+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='34ef902a-a22f-4aa3-83ec-82b2ebe534d4', vmId='37749548-6574-40ac-986b-ec89b85877f8', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 34b65d10
2018-10-31 00:41:18,663+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] FINISH, DestroyVDSCommand, log id: 34b65d10
2018-10-31 00:41:18,663+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] VM '37749548-6574-40ac-986b-ec89b85877f8'(vm_TestCase5063_3100372732) moved from 'Paused' --> 'Down'
2018-10-31 00:41:18,745+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5063_3100372732 is down with error. Exit message: Lost connection with qemu process.


From qemu log:
2018-10-30 22:41:06.745+0000: shutting down, reason=crashed


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.7.4-0.1.el7ev.noarch
vdsm-4.20.43-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.2.x86_64
qemu-img-rhev-2.12.0-18.el7_6.1.x86_64


How reproducible:
Unknown. Didn't manage to reproduce it.
Happened once during the analysis.

Steps to Reproduce:
1.
2.
3.

Actual results:
The dd command failed

Expected results:
The operation should succeed.

Additional info:
Attached art, engine, vdsm and relevant qemu logs.

Comment 1 Sandro Bonazzola 2019-01-28 09:34:32 UTC

This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 2 Avihai 2019-03-21 16:14:44 UTC

The issue happened again on ovirt-engine 4.3.2.1-0.1.el7 in automation Tier1 runs on the same TestCase5063.

Try to reproduce locally did not succeed.
Attached logs.

Comment 3 Avihai 2019-03-21 16:15:44 UTC

Created attachment 1546616 [details]
engine and vdsm logs of the issue on ovirt-engine 4.3.2.1-0.1.el7

Comment 4 Avihai 2019-03-21 16:25:52 UTC

From VDSM log it looks like the extend did not really sinked in as allocated size: 1074659328 .
Can you(Dev) please check it out ?

Libvirt logs of that time are missing.

2019-03-20 17:57:08,780+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') Requesting extension for volume ca783c33-0ce1-4ee4-b6b3-8a462ed21fef on domain d3fc8698-a66f-4f00-a869-88557d5b1555 (apparent: 1073741824, capacity: 2147483648, allocated: 1074659328, physical: 1073741824 threshold_state: exceeded) (vm:1262)


VDSM log:
2019-03-20 17:57:07,638+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2019-03-20 17:57:07,807+0200 DEBUG (mpathlistener) [storage.udev] Received udev event (action=change, device=Device(u'/sys/devices/virtual/block/dm-13')) (udev:218)
2019-03-20 17:57:08,729+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') abnormal vm stop device ua-4f688f32-db98-4862-9c12-392d3dca88a9 error enospc (vm:5042)
2019-03-20 17:57:08,729+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') CPU stopped: onIOError (vm:6020)
2019-03-20 17:57:08,747+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') Requesting extension for volume ca783c33-0ce1-4ee4-b6b3-8a462ed21fef on domain d3fc8698-a66f-4f00-a869-88557d5b1555 (apparent: 1073741824, capacity: 2147483648, allocated: 1074659328, physical: 1073741824 threshold_state: exceeded) (vm:1262)
2019-03-20 17:57:08,748+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5') moving from state finished -> state preparing (task:602)
2019-03-20 17:57:08,748+0200 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='f7507337-e82f-464a-be2e-dfadf68d1349', volDict={'newSize': 2147483648, 'domainID': 'd3fc8698-a66f-4f00-a869-88557d5b1555', 'name': 'vdb', 'poolID': 'f7507337-e82f-464a-be2e-dfadf68d1349', 'clock': <Clock(total=0.01*, extend-volume=0.01*)>, 'internal': False, 'volumeID': 'ca783c33-0ce1-4ee4-b6b3-8a462ed21fef', 'imageID': '4f688f32-db98-4862-9c12-392d3dca88a9'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fabd035aa10>>) from=internal, task_id=2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5 (api:48)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.SPM.Messages.Extend] new extend msg created: domain: d3fc8698-a66f-4f00-a869-88557d5b1555, volume: ca783c33-0ce1-4ee4-b6b3-8a462ed21fef (mailbox:128)
2019-03-20 17:57:08,749+0200 INFO  (libvirt/events) [vdsm.api] FINISH sendExtendMsg return=None from=internal, task_id=2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5 (api:54)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5') finished: None (task:1201)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5') moving from state finished -> state finished (task:602)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:911)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:948)
2019-03-20 17:57:08,749+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='2ab6b5e8-dfde-41d3-8be3-bac017a3f0b5') ref 0 aborting False (task:1002)
2019-03-20 17:57:08,758+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') CPU stopped: onSuspend (vm:6020)
2019-03-20 17:57:08,761+0200 WARN  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') device vdb out of space (vm:3961)
2019-03-20 17:57:08,762+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') abnormal vm stop device ua-4f688f32-db98-4862-9c12-392d3dca88a9 error enospc (vm:5042)
2019-03-20 17:57:08,763+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') CPU stopped: onIOError (vm:6020)
2019-03-20 17:57:08,767+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') Requesting extension for volume ca783c33-0ce1-4ee4-b6b3-8a462ed21fef on domain d3fc8698-a66f-4f00-a869-88557d5b1555 (apparent: 1073741824, capacity: 2147483648, allocated: 1074659328, physical: 1073741824 threshold_state: exceeded) (vm:1262)
2019-03-20 17:57:08,767+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171') moving from state finished -> state preparing (task:602)
2019-03-20 17:57:08,767+0200 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='f7507337-e82f-464a-be2e-dfadf68d1349', volDict={'newSize': 2147483648, 'domainID': 'd3fc8698-a66f-4f00-a869-88557d5b1555', 'name': 'vdb', 'poolID': 'f7507337-e82f-464a-be2e-dfadf68d1349', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': 'ca783c33-0ce1-4ee4-b6b3-8a462ed21fef', 'imageID': '4f688f32-db98-4862-9c12-392d3dca88a9'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fabd035aa10>>) from=internal, task_id=6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171 (api:48)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.SPM.Messages.Extend] new extend msg created: domain: d3fc8698-a66f-4f00-a869-88557d5b1555, volume: ca783c33-0ce1-4ee4-b6b3-8a462ed21fef (mailbox:128)
2019-03-20 17:57:08,768+0200 INFO  (libvirt/events) [vdsm.api] FINISH sendExtendMsg return=None from=internal, task_id=6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171 (api:54)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171') finished: None (task:1201)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171') moving from state finished -> state finished (task:602)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:911)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:948)
2019-03-20 17:57:08,768+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='6e08c1d8-f1a4-46ae-a0e3-a026b8f8e171') ref 0 aborting False (task:1002)
2019-03-20 17:57:08,776+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') abnormal vm stop device ua-4f688f32-db98-4862-9c12-392d3dca88a9 error enospc (vm:5042)
2019-03-20 17:57:08,776+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') CPU stopped: onIOError (vm:6020)
2019-03-20 17:57:08,780+0200 INFO  (libvirt/events) [virt.vm] (vmId='d8b804a1-3337-4e10-b480-5a73f4171b6c') Requesting extension for volume ca783c33-0ce1-4ee4-b6b3-8a462ed21fef on domain d3fc8698-a66f-4f00-a869-88557d5b1555 (apparent: 1073741824, capacity: 2147483648, allocated: 1074659328, physical: 1073741824 threshold_state: exceeded) (vm:1262)
2019-03-20 17:57:08,780+0200 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='02c8d42d-ca83-4b82-8c0d-fa5bd4d7c8a0') moving from state finished -> state preparing (task:602)
2019-03-20 17:57:08,781+0200 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='f7507337-e82f-464a-be2e-dfadf68d1349', volDict={'newSize': 2147483648, 'domainID': 'd3fc8698-a66f-4f00-a869-88557d5b1555', 'name': 'vdb', 'poolID': 'f7507337-e82f-464a-be2e-dfadf68d1349', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': 'ca783c33-0ce1-4ee4-b6b3-8a462ed21fef', 'imageID': '4f688f32-db98-4862-9c12-392d3dca88a9'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fabd035aa10>>) from=internal, task_id=02c8d42d-ca83-4b82-8c0d-fa5bd4d7c8a0 (api:48)
2019-03-20 17:57:08,781+0200 DEBUG (libvirt/events) [storage.SPM.Messages.Extend] new extend msg created: domain: d3fc8698-a66f-4f00-a869-88557d5b1555, volume: ca783c33-0ce1-4ee4-b6b3-8a462ed21fef (mailbox:128)
2019-03-20 17:57:08,781+0200 INFO  (libvirt/events) [vdsm.api] FINISH sendExtendMsg return=None from=internal, task_id=02c8d42d-ca83-4b82-8c0d-fa5bd4d7c8a0 (api:54)

Comment 5 Avihai 2019-03-30 18:25:50 UTC

Once again same issue is encountered in automation TC5063 at a specific environment (infra env which is probably much faster than our(storage) envs). 
Also Core dump to |/usr/libexec/abrt-hook-ccpp 6 18446744073709551615 114797 107 107 1553965254 IO iothread1 is seen in journalctl log.

See ALL logs(including libvirt and qemu) are attached.
See details below.

Issue occures multiple in latest 4.3 releases , Please assign and check it out.

Build info:
ovirt-engine	ovirt-engine-4.3.3.1-0.1.el7.noarch
ovirt-imageio-proxy	ovirt-imageio-proxy-1.5.1-0.el7ev.noarch
redhat_release	Red Hat Enterprise Linux Server 7.6 (Maipo)
build_info_product	rhv
build_info_version	4.3
build_info_name	rhv-4.3.3-1
uptime	12021
kernel_release	3.10.0-957.el7.x86_64
nics_number	2
Host Name	lynx18.lab.eng.tlv2.redhat.com
vdsm	vdsm-4.30.12-1.el7ev.x86_64
vdsm-cli	null
libvirt	libvirt-4.5.0-10.el7_6.6.x86_64
qemu-img-rhev	qemu-img-rhev-2.12.0-21.el7.x86_64
glusterfs	glusterfs-3.12.2-47.el7.x86_64
vdsm-python	vdsm-python-4.30.12-1.el7ev.noarch
sanlock	sanlock-3.6.0-1.el7.x86_64
ovirt-imageio-daemon	ovirt-imageio-daemon-1.5.1-0.el7ev.noarch
redhat_release	Red Hat Enterprise Linux Server 7.6 (Maipo)
uuid	4c4c4544-005a-3010-8056-b7c04f563232
build_info_product	rhv
build_info_version	4.2
build_info_name	rhv-4.2.8-9
uptime	1642
kernel_release	3.10.0-957.10.1.el7.x86_64
nics_number	9



Details :
Scenario:

16:54:43 2019-03-30 19:54:42,968 INFO       Test Setup   1: Creating VM vm_TestCase5063_3019544296
16:54:43 2019-03-30 19:54:43,151 INFO       Test Setup   2: [class] Description: clone vm from a pre-defined template golden_template_default_rhel8.0 with {'vol_sparse': True, 'name': 'vm_TestCase5063_3019544296', 'clone': False, 'cluster': 'golden_env_mixed_1', 'storagedomain': 'iscsi_1', 'display_type': 'spice', 'timeout': 900, 'virtio_scsi': True, 'vol_format': 'cow', 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
16:55:05 2019-03-30 19:55:05,816 INFO       Test Setup   3: Starting VM vm_TestCase5063_3019544296
16:55:05 2019-03-30 19:55:05,819 INFO       Test Setup   4: [class] Start VM vm_TestCase5063_3019544296 with {'wait_for_ip': False, 'pause': False, 'use_cloud_init': False, 'timeout': 600, 'wait_for_status': 'up'}
16:56:17 2019-03-30 19:56:17,613 INFO       Test Setup   5: Creating disk disk_TestCase5063_3019561761
16:56:17 2019-03-30 19:56:17,616 INFO       Test Setup   6: [class] Add a disk with {'description': '', 'provisioned_size': 1073741824, 'format': 'cow', 'bootable': False, 'wipe_after_delete': False, 'alias': 'disk_TestCase5063_3019561761', 'storagedomain': 'iscsi_1', 'shareable': False, 'sparse': True, 'active': True, 'interface': 'virtio'}
16:56:38 2019-03-30 19:56:38,353 INFO       Test Setup   7: Attach disk disk_TestCase5063_3019561761 to VM vm_TestCase5063_3019544296
16:56:38 2019-03-30 19:56:38,356 INFO       Test Setup   8: [class] Attach disk to VM with {'read_only': False, 'vm_name': 'vm_TestCase5063_3019544296', 'alias': 'disk_TestCase5063_3019561761', 'bootable': False, 'interface': 'virtio', 'active': True}
16:56:39 2019-03-30 19:56:39,577 INFO 001: storage/rhevmtests.storage.storage_virtual_disk_resize.test_virtual_disk_resize.TestCase5063.test_thin_block_resize[iscsi]
16:56:39 2019-03-30 19:56:39,577 INFO     - VM with thin disk and OS
16:56:39 2019-03-30 19:56:39,578 INFO     - Resize the VM disk to 2G total
16:56:39 2019-03-30 19:56:39,578 INFO     - Send IOs to disk
16:56:39 2019-03-30 19:56:39,578 INFO     - Check LV size on VDSM and disk size on guest
16:56:39 2019-03-30 19:56:39,578 INFO STORAGE: ISCSI
16:56:39 2019-03-30 19:56:39,578 INFO       Test Step   9: Resizing disk disk_TestCase5063_3019561761
16:56:39 2019-03-30 19:56:39,579 INFO       Test Step  10: Resizing disk disk_TestCase5063_3019561761
16:56:39 2019-03-30 19:56:39,581 INFO       Test Step  11: Extend an already existing vm vm_TestCase5063_3019544296 disk disk_TestCase5063_3019561761 with {'provisioned_size': 2147483648}

=> Step  12 (Send IOs to disk):  
On the re-sized disk we create a partition,file-system + mount and at about 20:00:46(17:00:46 local time) we start dd(Send IOs to disk) of 1358MB to disk disk  to make the disk extend 
2019-03-30 20:00:46,034 - MainThread - RemoteExecutor - DEBUG - [root.17.139/qum5net] Executing: /bin/dd bs=1M count=1358 if=/dev/vda of=/mount-point_3020004311/written_test_storage status=none

-> RIGHT AFTER THAT THE ISSUE(iothread1 pip failed + VM pause due to no space )  OCCURS:

journalctl log:
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com kernel: Core dump to |/usr/libexec/abrt-hook-ccpp 6 18446744073709551615 114797 107 107 1553965254 IO iothread1 114804 pipe failed
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com kvm[116765]: 0 guests now active
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com kernel: device vnet0 left promiscuous mode
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com NetworkManager[6547]: <info>  [1553965254.4229] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com NetworkManager[6547]: <info>  [1553965254.4233] device (vnet0): released from master device ovirtmgmt
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com lldpad[85234]: recvfrom(Event interface): No buffer space available
Mar 30 20:00:54 lynx18.lab.eng.tlv2.redhat.com systemd-machined[114798]: Machine qemu-1-vmTestCase50633019544296 terminated.


Engine:
2019-03-30 20:00:52,403+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_PAUSED_ENOSPC(138), VM vm_TestCase5063_3019544296 has been paused due to no Storage space error.
2019-03-30 20:00:56,432+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [] VM '2e0eed87-5b81-4c2f-a439-c9fdd1b036c0'(vm_TestCase5063_3019544296) moved from 'Paused' --> 'Up'
2019-03-30 20:00:56,472+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [] EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM vm_TestCase5063_3019544296 has recovered from paused back to up.

VDSM:
2019-03-30 20:00:51,826+0300 DEBUG (mpathlistener) [storage.udev] Received udev event (action=change, device=Device(u'/sys/devices/virtual/block/dm-13')) (udev:218)
2019-03-30 20:00:52,287+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') abnormal vm stop device ua-12d0b157-1409-4c8b-987b-06cc22c964b6 error enospc (vm:5051)
2019-03-30 20:00:52,288+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') CPU stopped: onIOError (vm:6029)
2019-03-30 20:00:52,304+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') Requesting extension for volume c8741dc2-72a9-4941-a4b5-3be664b05519 on domain 0d8d55a2-0e4e-43a2-b179
-6530b1571ad0 (apparent: 1073741824, capacity: 2147483648, allocated: 1075118080, physical: 1073741824 threshold_state: exceeded) (vm:1262)
qum5net

2019-03-30 20:00:52,320+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') CPU stopped: onSuspend (vm:6029)
2019-03-30 20:00:52,323+0300 WARN  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') device vdb out of space (vm:3970)
2019-03-30 20:00:52,324+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') abnormal vm stop device ua-12d0b157-1409-4c8b-987b-06cc22c964b6 error enospc (vm:5051)
2019-03-30 20:00:52,324+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') CPU stopped: onIOError (vm:6029)
2019-03-30 20:00:52,328+0300 INFO  (libvirt/events) [virt.vm] (vmId='2e0eed87-5b81-4c2f-a439-c9fdd1b036c0') Requesting extension for volume c8741dc2-72a9-4941-a4b5-3be664b05519 on domain 0d8d55a2-0e4e-43a2-b179-6530b1571ad0 (apparent: 1073741824, capacity: 2147483648, allocated: 1075118080, physical: 1073741824 threshold_state: exceeded) (vm:1262)
2019-03-30 20:00:52,329+0300 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='26d09cc3-c9b3-419c-a48a-54bb44d3f1c4') moving from state finished -> state preparing (task:602)
2019-03-30 20:00:52,329+0300 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='85ee87fb-fc8c-4e1e-8c53-9b26857c8e03', volDict={'newSize': 2147483648, 'domainID': '0d8d55a2-0e4e-43a2-b179-6530b1571ad0', 'name': 'vdb', 'poolID': '85ee87fb-fc8c-4e1e-8c53-9b26857c8e03', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': 'c8741dc2-72a9-4941-a4b5-3be664b05519', 'imageID': '12d0b157-1409-4c8b-987b-06cc22c964b6'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7f67d40cd550>>) from=internal, task_id=26d09cc3-c9b3-419c-a48a-54bb44d3f1c4 (api:48)

Comment 6 Avihai 2019-03-30 18:26:27 UTC

Created attachment 1549893 [details]
4.3.3.1 logs

Comment 7 Michal Skrivanek 2021-08-20 08:27:36 UTC

This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. 
Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.

Comment 8 Michal Skrivanek 2021-09-29 11:33:06 UTC

This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it.
Please feel free to reopen if you have a plan how to contribute this feature/bug fix.

Note You need to log in before you can comment on or make changes to this bug.