Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1158140

Summary:

[BLOCKED] Couldn't resume guest from EIO: Error in parsing vm pause status. Setting value to NONE

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Elad <ebenahar>

Component:

ovirt-engine

Assignee:

Nir Soffer <nsoffer>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Elad <ebenahar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.4.3

CC:

acanan, amureini, danken, ebenahar, ecohen, eedri, gklein, iheim, kwolf, lpeer, lsurette, lsvaty, michal.skrivanek, mprivozn, nsoffer, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon

Target Milestone:

---

Target Release:

3.4.4

Hardware:

ppc64

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

vdsm-4.14.17-2

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-12-07 12:25:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1122979

Attachments:

Description	Flags
vdsm, libvirt, qemu and engine logs	none
resume	none

Description Elad 2014-10-28 17:04:08 UTC

Created attachment 951460 [details]
vdsm, libvirt, qemu and engine logs

Description of problem:
Tried to resume a VM from paused state due to EIO. Operation faile on engine.
I using an IBM PPC host but I'm not sure it is related since the error occured in the engine.


Version-Release number of selected component (if applicable):
rhevm-3.4.3-1.2.el6ev.noarch
RHEV for IBM POWER release 3.4 build 37 service (pkvm2_1)
vdsm-4.14.17-1.pkvm2_1.ppc64
libvirt-1.1.3-1.pkvm2_1.17.10.ppc64
qemu-1.6.0-2.pkvm2_1.17.9.ppc64
qemu-kvm-1.6.0-2.pkvm2_1.17.9.ppc64

How reproducible:
Unknown

Steps to Reproduce:
On a shared DC with a 4 FC domains:
1. Get to a situation in which a VM enters to paused state. (can be reached by unmapping the LUN from the host in LUN masking in the storage server). 
2. Once the VM is pasued, try to resume it (start it)


Actual results:
Operation fails on engine:

2014-10-28 17:49:41,360 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (DefaultQuartzScheduler_Worker-96) [29e060bd] Error in parsing vm pause status. Setting value to NONE
2014-10-28 17:49:41,360 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (DefaultQuartzScheduler_Worker-96) [29e060bd] Error in parsing vm pause status. Setting value to NONE
2014-10-28 17:49:58,144 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (DefaultQuartzScheduler_Worker-36) [75e29cc7] Error in parsing vm pause status. Setting value to NONE
2014-10-28 17:49:58,144 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (DefaultQuartzScheduler_Worker-36) [75e29cc7] Error in parsing vm pause status. Setting value to NONE


Expected results:
Manually resume from paused should succeed.

Additional info: vdsm, libvirt, qemu and engine logs

Comment 1 Allon Mureinik 2014-10-28 21:05:28 UTC

Nir, Tal, don't we have another BZ with the same symptoms?

Comment 2 Allon Mureinik 2014-10-28 21:07:00 UTC

(In reply to Allon Mureinik from comment #1)
> Nir, Tal, don't we have another BZ with the same symptoms?

Michal, actually, didn't one of your guys encounter something similar?

Comment 3 Michal Skrivanek 2014-10-28 21:18:28 UTC

Was the selinux workaround applied?

Allon, well, there are plenty errors in the log, ssl, selinux on qemu start...but this is resume, so I wonder....

Comment 4 Michal Skrivanek 2014-10-28 21:19:43 UTC

Also, any selinux access denials?

Comment 5 Nir Soffer 2014-10-28 21:35:22 UTC

Elad, please check when host is using jsonrpc and xmlrpc - does the error appear on both?

Comment 6 Nir Soffer 2014-10-28 21:37:05 UTC

Elad, my question is not relevant to 3.4.3, please ignore it.

Comment 7 Nir Soffer 2014-10-28 21:49:21 UTC

Elad, when the vm is paused, what is the selinux label on the disk backing the lv?

1. Get the lv name from vdsm log   
2. Share the output of
   ls -Z `readlink /dev/vgname/lvname`

Comment 8 Dan Kenigsberg 2014-10-28 22:15:14 UTC

Vdsm handles a libvirt.VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON, but the "reason" arg that is propagated from libvirt is empty (it should be eio/enospc/eother):

libvirtEventLoop::INFO::2014-10-28 16:48:18,682::vm::4602::vm.Vm::(_onIOError) vmId=`23582146-6d8b-4856-ac6f-dc09250ffdb8`::abnormal vm stop device virtio-disk0 error 

2014-10-28 16:27:21.530+0000: 36899: debug : qemuMonitorIOProcess:393 : QEMU_MONITOR_IO_PROCESS: mon=0x3fff80017140 buf={"timestamp": {"seconds": 1414513641, "microseconds": 530527}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "operation": "write", "action": "stop"}}
 len=173
2014-10-28 16:27:21.530+0000: 36899: debug : qemuMonitorEmitIOError:1242 : mon=0x3fff80017140
2014-10-28 16:27:21.530+0000: 36899: debug : qemuProcessHandleIOError:938 : Transitioned guest vm_fc_04 to paused state due to IO error
2014-10-28 16:27:21.531+0000: 36899: debug : qemuProcessHandleIOError:948 : Preserving lock state '(null)'
2014-10-28 16:27:21.531+0000: 36899: debug : virDomainFree:2413 : dom=0x1002010d390, (VM: name=vm_fc_04, uuid=23582146-6d8b-4856-ac6f-dc09250ffdb8)
2014-10-28 16:27:21.531+0000: 36899: debug : virDomainFree:2413 : dom=0x1002010d390, (VM: name=vm_fc_04, uuid=23582146-6d8b-4856-ac6f-dc09250ffdb8)
2014-10-28 16:27:21.559+0000: 36899: debug : qemuMonitorIOProcess:393 : QEMU_MONITOR_IO_PROCESS: mon=0x3fff80017140 buf={"timestamp": {"seconds": 1414513641, "microseconds": 559809}, "event": "STOP"}
 len=81
2014-10-28 16:27:21.559+0000: 36899: debug : qemuMonitorEmitStop:1190 : mon=0x3fff80017140

Michal, is this a libvirt, or a qemu problem?

Comment 9 Michal Privoznik 2014-10-28 22:32:11 UTC

(In reply to Dan Kenigsberg from comment #8)
> Vdsm handles a libvirt.VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON, but the "reason"
> arg that is propagated from libvirt is empty (it should be
> eio/enospc/eother):
> 
> libvirtEventLoop::INFO::2014-10-28
> 16:48:18,682::vm::4602::vm.Vm::(_onIOError)
> vmId=`23582146-6d8b-4856-ac6f-dc09250ffdb8`::abnormal vm stop device
> virtio-disk0 error 
> 
> 2014-10-28 16:27:21.530+0000: 36899: debug : qemuMonitorIOProcess:393 :
> QEMU_MONITOR_IO_PROCESS: mon=0x3fff80017140 buf={"timestamp": {"seconds":
> 1414513641, "microseconds": 530527}, "event": "BLOCK_IO_ERROR", "data":
> {"device": "drive-virtio-disk0", "operation": "write", "action": "stop"}}

I can't see any reason here...

>  len=173
> 2014-10-28 16:27:21.530+0000: 36899: debug : qemuMonitorEmitIOError:1242 :
> mon=0x3fff80017140
> 2014-10-28 16:27:21.530+0000: 36899: debug : qemuProcessHandleIOError:938 :
> Transitioned guest vm_fc_04 to paused state due to IO error
> 2014-10-28 16:27:21.531+0000: 36899: debug : qemuProcessHandleIOError:948 :
> Preserving lock state '(null)'
> 2014-10-28 16:27:21.531+0000: 36899: debug : virDomainFree:2413 :
> dom=0x1002010d390, (VM: name=vm_fc_04,
> uuid=23582146-6d8b-4856-ac6f-dc09250ffdb8)
> 2014-10-28 16:27:21.531+0000: 36899: debug : virDomainFree:2413 :
> dom=0x1002010d390, (VM: name=vm_fc_04,
> uuid=23582146-6d8b-4856-ac6f-dc09250ffdb8)
> 2014-10-28 16:27:21.559+0000: 36899: debug : qemuMonitorIOProcess:393 :
> QEMU_MONITOR_IO_PROCESS: mon=0x3fff80017140 buf={"timestamp": {"seconds":
> 1414513641, "microseconds": 559809}, "event": "STOP"}
>  len=81
> 2014-10-28 16:27:21.559+0000: 36899: debug : qemuMonitorEmitStop:1190 :
> mon=0x3fff80017140
> 
> Michal, is this a libvirt, or a qemu problem?

... that's why I think this is a qemu problem. It should have reported why the write() failed.

Comment 10 Nir Soffer 2014-10-28 22:34:29 UTC

Returning the need info for Elad - please see comment 7.

Comment 11 Dan Kenigsberg 2014-10-29 08:44:14 UTC

Kevin, any idea why EIO/EPERM/whatever was not reported by qemu (see comment 9)? Could it be an old known bug in qemu?

Comment 12 Elad 2014-10-29 08:44:39 UTC

(In reply to Nir Soffer from comment #10)
> Returning the need info for Elad - please see comment 7.

Done it with another paused VM:

[root@ibm-p8-rhevm-hv-02 c227cd1f-cddd-42b2-bad0-81290e86348c]# ls -Z readlink /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-d563c3c88dbc                                                                                                                                                                                                                       
lrwxrwxrwx. root root system_u:object_r:device_t:s0    /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-d563c3c88dbc -> ../dm-24

Comment 13 Michal Skrivanek 2014-10-29 09:10:02 UTC

Elad, I suppose this has nothing to do with selinux. I.e. it doesn't work in permissive nor disabled mode

Comment 14 Kevin Wolf 2014-10-29 09:17:21 UTC

(In reply to Dan Kenigsberg from comment #11)
> Kevin, any idea why EIO/EPERM/whatever was not reported by qemu (see comment
> 9)? Could it be an old known bug in qemu?

(In reply to Elad from comment #0)
> qemu-1.6.0-2.pkvm2_1.17.9.ppc64
> qemu-kvm-1.6.0-2.pkvm2_1.17.9.ppc64

Not our qemu-kvm. Upstream doesn't have the reason, it's a RHEL-specific field.
Upstream qemu 2.2 will introduce a field like this, but with a different name,
so that will require a newer libvirt version, too.

In short: This is not expected to work without a port of the RHEL-specific patch.

Comment 15 Elad 2014-10-29 09:29:02 UTC

[root@ibm-p8-rhevm-04 ~]# getenforce 
Enforcing

Comment 16 Nir Soffer 2014-10-29 10:55:09 UTC

(In reply to Elad from comment #12)
> (In reply to Nir Soffer from comment #10)
> > Returning the need info for Elad - please see comment 7.
> 
> Done it with another paused VM:
> 
> [root@ibm-p8-rhevm-hv-02 c227cd1f-cddd-42b2-bad0-81290e86348c]# ls -Z
> readlink
> /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-
> d563c3c88dbc                                                                
> 
> lrwxrwxrwx. root root system_u:object_r:device_t:s0   
> /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-
> d563c3c88dbc -> ../dm-24

Elad here are corrected instructions:

1. Find the vg/lv names in vdsm log, or the /rhev/data-center link
2. Find the device
   ls -l /dev/vgname/lvname
3. Check the device labels
   ls -Z /dev/dm-24

Comment 17 Michal Skrivanek 2014-10-29 10:59:50 UTC

(In reply to Elad from comment #15)
> [root@ibm-p8-rhevm-04 ~]# getenforce 
> Enforcing

yes, and I'm asking if you get the same result in disabled/permissive. You should.
This bug should not depend on any selinux

Comment 18 Elad 2014-10-29 11:13:40 UTC

Created attachment 951752 [details]
resume

(In reply to Michal Skrivanek from comment #17)
> (In reply to Elad from comment #15)
> > [root@ibm-p8-rhevm-04 ~]# getenforce 
> > Enforcing
> 
> yes, and I'm asking if you get the same result in disabled/permissive. You
> should.
> This bug should not depend on any selinux

Actually, after moving to permissive, resume VM from paused succeeded.


2014-10-29 12:09:21,704 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-65) [7b26ca0f] VM vm_fc_01 eb74bf73-8e92-46d9-9586-09f88f9aa323 moved from Paused --> Up


Attaching the logs

Comment 20 Michal Skrivanek 2014-10-29 11:33:09 UTC

Elad, this is on FC. If you do the same with NFS, then you should see the same errors in logs about paused reason.
But the resume should work just fine.

Can you confirm this?

Comment 23 Elad 2014-10-29 12:03:57 UTC

Anyway, as I wrote in comment #18, it seems that in permissive mode, resume guest works fine

Comment 24 Elad 2014-10-29 13:20:06 UTC

(In reply to Michal Skrivanek from comment #20)
> Elad, this is on FC. If you do the same with NFS, then you should see the
> same errors in logs about paused reason.
> But the resume should work just fine.
> 
> Can you confirm this?

Can't get to a situation in which VM enter to paused due to EIO on NFS. The VM enters to "not-responding" but not paused.

Comment 25 Michal Skrivanek 2014-10-29 13:25:27 UTC

doesn't have to be EIO, any paused state is ok. just pause it by virsh

Comment 27 Nir Soffer 2014-10-29 13:49:52 UTC

This looks like a duplicate of bug 1150015 which is fixed in rhev 3.4.3-1 (build av12.4).

However without the information requested in comment 16 we cannot tell.

Comment 28 Aharon Canan 2014-10-29 13:53:49 UTC

You can use the setup to get the info (ask elad via IRC for it). sorry.

Comment 29 Michal Skrivanek 2014-10-29 14:03:11 UTC

Nir, yes, however contrary to the comment there we don't have the patch in PPC

----
the reported reason is not relevant anymore as we ship qemu-kvm-rhev for all platforms
there's no report about this issue in any known environment
- hence decreasing the urgency
----

Comment 30 Michal Skrivanek 2014-10-29 14:13:20 UTC

the difference between qemu-kvm upstream and qemu-kvm-rhev:

it was added in commits 771a3a33 ('error reason in BLOCK_IO_ERROR / BLOCK_JOB_ERROR events (RHEL 6->7 fwd)') and there is the related commit bfea65d6 ('improve debuggability of BLOCK_IO_ERROR / BLOCK_JOB_ERROR (RHEL 6->7 fwd)'). Each commit corresponds to a patch in the SRPM named after the subject line of the commit.

Comment 31 Elad 2014-10-29 14:18:25 UTC

(In reply to Nir Soffer from comment #16)
> (In reply to Elad from comment #12)
> > (In reply to Nir Soffer from comment #10)
> > > Returning the need info for Elad - please see comment 7.
> > 
> > Done it with another paused VM:
> > 
> > [root@ibm-p8-rhevm-hv-02 c227cd1f-cddd-42b2-bad0-81290e86348c]# ls -Z
> > readlink
> > /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-
> > d563c3c88dbc                                                                
> > 
> > lrwxrwxrwx. root root system_u:object_r:device_t:s0   
> > /dev/e63d5654-5766-4ff1-ad0b-019de990b476/c20855f9-87ca-475c-abfa-
> > d563c3c88dbc -> ../dm-24
> 
> Elad here are corrected instructions:
> 
> 1. Find the vg/lv names in vdsm log, or the /rhev/data-center link
> 2. Find the device
>    ls -l /dev/vgname/lvname
> 3. Check the device labels
>    ls -Z /dev/dm-24

[root@ibm-p8-rhevm-04 3f38e4dc-a330-4615-8d1c-87889a2a47ba]# ls -l 
total 0
lrwxrwxrwx. 1 vdsm kvm 78 Oct 28 17:05 05d0359f-1cea-41d9-b559-c3b88dd3a0ad -> /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/05d0359f-1cea-41d9-b559-c3b88dd3a0ad
lrwxrwxrwx. 1 vdsm kvm 78 Oct 28 13:57 41e54cb2-72b4-4f43-8591-98c5b22842ea -> /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/41e54cb2-72b4-4f43-8591-98c5b22842ea
lrwxrwxrwx. 1 vdsm kvm 78 Oct 28 17:08 8866cf8a-4ba9-46b4-8347-1b81ce99a086 -> /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/8866cf8a-4ba9-46b4-8347-1b81ce99a086
lrwxrwxrwx. 1 vdsm kvm 78 Oct 29 11:08 9f5f94df-e165-4661-ab4c-1cf75ab337e4 -> /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/9f5f94df-e165-4661-ab4c-1cf75ab337e4
lrwxrwxrwx. 1 vdsm kvm 78 Oct 28 15:26 d0633214-b613-4015-893b-b13c79d26ead -> /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/d0633214-b613-4015-893b-b13c79d26ead
[root@ibm-p8-rhevm-04 3f38e4dc-a330-4615-8d1c-87889a2a47ba]# ls -Z /dev/
Display all 264 possibilities? (y or n)
[root@ibm-p8-rhevm-04 3f38e4dc-a330-4615-8d1c-87889a2a47ba]# ls -Z /dev/e4556645-98aa-44e5-834b-bfa49279cbc2/
lrwxrwxrwx. root root system_u:object_r:device_t:s0    05d0359f-1cea-41d9-b559-c3b88dd3a0ad -> ../dm-35
lrwxrwxrwx. root root system_u:object_r:device_t:s0    41e54cb2-72b4-4f43-8591-98c5b22842ea -> ../dm-21
lrwxrwxrwx. root root system_u:object_r:device_t:s0    8866cf8a-4ba9-46b4-8347-1b81ce99a086 -> ../dm-36
lrwxrwxrwx. root root system_u:object_r:device_t:s0    9f5f94df-e165-4661-ab4c-1cf75ab337e4 -> ../dm-41
lrwxrwxrwx. root root system_u:object_r:device_t:s0    d0633214-b613-4015-893b-b13c79d26ead -> ../dm-34
lrwxrwxrwx. root root system_u:object_r:device_t:s0    ids -> ../dm-12
lrwxrwxrwx. root root system_u:object_r:device_t:s0    inbox -> ../dm-13
lrwxrwxrwx. root root system_u:object_r:device_t:s0    leases -> ../dm-11
lrwxrwxrwx. root root system_u:object_r:device_t:s0    master -> ../dm-14
lrwxrwxrwx. root root system_u:object_r:device_t:s0    metadata -> ../dm-9
lrwxrwxrwx. root root system_u:object_r:device_t:s0    outbox -> ../dm-10

Comment 32 Nir Soffer 2014-10-29 14:40:36 UTC

Elad you keep repeating the same thing, which is *not* what I asked - we need to run:

    ls -Z /dev/dm-xx

Not:

    ls -Z /rhev/data-center/...

Which give the selinux label of the symbolic link. We need selinux label of the device.

Comment 34 Nir Soffer 2014-10-29 16:31:26 UTC

Checking the host with the paused vm.

1. In engine, we see one disk with uuid 3f38e4dc-a330-4615-8d1c-87889a2a47ba

2. Using image uuid, we can get the selinux label on the devices backing the image volumes:

# for vol in /rhev/data-center/*/*/images/3f38e4dc-a330-4615-8d1c-87889a2a47ba/*; do ls -Z $(realpath $vol); done

Note: this command will not work on rhel6 because we don't have realpath there.

brw-rw----. vdsm qemu system_u:object_r:virt_content_t:s0 /dev/dm-35
brw-rw----. vdsm qemu system_u:object_r:virt_content_t:s0 /dev/dm-21
brw-rw----. vdsm qemu system_u:object_r:fixed_disk_device_t:s0 /dev/dm-36
brw-rw----. vdsm qemu system_u:object_r:svirt_image_t:s0:c131,c471 /dev/dm-41
brw-rw----. vdsm qemu system_u:object_r:virt_content_t:s0 /dev/dm-34

- dm-35, dm-21 and dm-34 are internal volumes, virt_content_t:s0 is expected
- dm-41 is the active layer, svirt_image_t:s0:c131,c471 is expected
- dm-36 is internal volume, and should have svirt_content_t, but it has
  fixed_disk_device_t:s0 - with this label the machine cannot read from
  this volume, which will cause it to pause.

3. I manually changed the label on this device:
# chcon -t virt_content_t /dev/dm-36

4. And resume the vm in the engine

So what we have here is probably a change event on this device, which
trigger 12-vdsm-lvm.rules, which causes the device to loose the selinux
label assigned by libvirt, and get the default selinux label. In other
words, a duplicate of bug 1150015.

Comment 35 Allon Mureinik 2014-10-30 07:56:53 UTC

(In reply to Nir Soffer from comment #34)
> So what we have here is probably a change event on this device, which
> trigger 12-vdsm-lvm.rules, which causes the device to loose the selinux
> label assigned by libvirt, and get the default selinux label. In other
> words, a duplicate of bug 1150015.
Moving to ON_QA to be verified with the same patch.

Comment 36 Lukas Svaty 2014-10-30 12:14:32 UTC

verified with latest build_39

Comment 39 Elad 2014-11-13 14:54:51 UTC

Will be tested again with the next build by getting to a scenario in which a VM is paused on no space left problem due to extension issues as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1160568. Then try to resume it.

Comment 40 Elad 2014-11-16 16:13:47 UTC

Checked it on RHEV 3.4.4 av13.1 (not ppc). The bug seems to be fixed.
Steps:
1) Created VM with a thin-provision disk on FC domain which its size is larger than the free space on the domain.
2) Installed OS and wrote with dd to the disk, after a few minutes the VM moved to paused due to 'no space left error'. The VM resumed automatically after a few seconds:

 2014-11-16 17:15:23,359 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-58) VM vm-1 7cc95d3e-8dd0-42c3-a13c-32c21cc35fd0 moved from Up --> Paused
2014-11-16 17:15:23,379 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-58) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm-1
 has paused due to no Storage space error.
2014-11-16 17:15:26,591 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-57) VM vm-1 7cc95d3e-8dd0-42c3-a13c-32c21cc35fd0 moved from Paused --> Up
2014-11-16 17:15:38,482 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-86) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Critica
l, Low disk space. fc1 domain has 4 GB of free space


Since this bug is ON_QA for 3.4.4 and it was reported for PPC, I'm not sure whether to move it to VERIFIED or to wait until a new build of PPC will be released.

Gil, what do you think?

Comment 41 Aharon Canan 2014-11-17 08:37:21 UTC

The PPC setup we used was upgraded, please give it a try and close this one.

Comment 42 Michal Skrivanek 2014-11-18 07:46:12 UTC

(In reply to Elad from comment #40)
> Checked it on RHEV 3.4.4 av13.1 (not ppc). The bug seems to be fixed.

this was never an issue on x86 due to qemu-kvm-rhev

Comment 43 Elad 2014-11-18 10:37:11 UTC

Michal, is there a fix for the issue already? If not, this bug should not be ON_QA

Comment 45 Michal Skrivanek 2014-11-18 11:56:33 UTC

IBM_PowerKVM release 2.1.0 build 39 service (pkvm2_1) is fixed
IBM_PowerKVM release 2.1.1 build 25 service (pkvm2_1_1) is not yet fixed

Comment 46 Allon Mureinik 2014-11-18 18:45:09 UTC

(In reply to Michal Skrivanek from comment #45)
> IBM_PowerKVM release 2.1.0 build 39 service (pkvm2_1) is fixed
> IBM_PowerKVM release 2.1.1 build 25 service (pkvm2_1_1) is not yet fixed
So, just to be sure, we're waiting for a newer PowerKVM 2.1.* build?

Comment 47 Michal Skrivanek 2014-11-19 11:11:27 UTC

yes. nothing to fix here either way, moving to POST (well, bz release is wrong anyway, so refer to comment #45 regarding the actual bug status)

Comment 49 Elad 2014-11-25 14:19:59 UTC

It seems to be fixed, VM resumes from 'no Storage space error' automatically:

2014-11-25 15:15:38,718 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-62) [69b35ffc] VM elad-fc 9555f68b-2637-4dcb-9b9f-26aa3aa6eed2 moved from Up --> Paused
2014-11-25 15:15:38,793 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) [69b35ffc] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM elad-fc has paused due to no Storage space error.
2014-11-25 15:15:42,808 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-8) [1c3bce8b] VM elad-fc 9555f68b-2637-4dcb-9b9f-26aa3aa6eed2 moved from Paused --> Up
2014-11-25 15:15:49,714 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-76) [4a2a3bc4] VM elad-fc 9555f68b-2637-4dcb-9b9f-26aa3aa6eed2 moved from Up --> Paused
2014-11-25 15:15:49,747 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-76) [4a2a3bc4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM elad-fc has paused due to no Storage space error.
2014-11-25 15:15:53,313 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-72) [3ba0ab20] VM elad-fc 9555f68b-2637-4dcb-9b9f-26aa3aa6eed2 moved from Paused --> Up

Checked with FC.
Installed OS in VM with a thin provision disk attached, wrote to the disk and got a scenario in which the VM entered pasued on no space left error.

Moving to VERIFIED.
Veirified using:
qemu-kvm-2.0.0-2.1.pkvm2_1_1.20.40.ppc64