Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1263383

Summary:

[ppc64le] Vm is unresponsive after extending a secondary disk and fill it with dump data

Product:

[oVirt] ovirt-engine

Reporter:

Carlos Mestre González <cmestreg>

Component:

General

Assignee:

Martin Polednik <mpoledni>

Status:

CLOSED WORKSFORME

QA Contact:

Aharon Canan <acanan>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

---

CC:

bugs, cmestreg, ecohen, gklein, hannsj_uhl, lsurette, michal.skrivanek, ofrenkel, rbalakri, yeylon

Target Milestone:

ovirt-3.6.0-ga

Flags:

ofrenkel: ovirt-3.6.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?

Target Release:

3.6.0

Hardware:

ppc64le

OS:

Unspecified

Whiteboard:

virt

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-10-13 15:14:41 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Virt

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1201513, 1277183, 1277184

Attachments:

Description	Flags
engine.log	none
host_mixed_1 vdsm log	none
hos_mixed_2 vdsm log (spm)	none
qemu log ono host_mixed_1	none
qemu log on host_mixed_2	none

Description Carlos Mestre González 2015-09-15 17:18:01 UTC

Description of problem:
I don't know how to proper asset this issue, so if you have any help to investigate this further would be great help.

I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks). So after the steps to reproduce I'm providing the vm becomes non responsive completely (cannot ssh to it, cannot connect via vnc, and I'm not able to shut it down).

I'm adding this bz to rhev for further investigation.

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.15.master.el6.noarch

RHEL PPC hosts (machines are IBM POWER 8):
vdsm-4.17.6-1.el7ev.noarch
qemu-kvm-tools-rhev-2.3.0-22.el7.ppc64le
qemu-kvm-common-rhev-2.3.0-22.el7.ppc64le
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.17-8.el7.ppc64le
qemu-img-rhev-2.3.0-22.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
libvirt-client-1.2.17-8.el7.ppc64le

How reproducible:
50%

Steps to Reproduce: (all this via REST API)
1. Create a vm from a template, with type server, vnc display and os_type rhel7ppc64 (this are requirements I've had because of WA on PPC for rest api). Disk should be cow/sparse/virtio in iscsi.
2. Edit the boot device to change the name.
3. Start the vm and stop the vm
4. Add a new 1Gb disk to the vm (Can be any combination, for example RAW/virtio on ISCSI)
5. start the vm
6. Edit the disk and extend the size another 1 Gb
7. Access the vm via ssh and fill the new disks with data (I normally dd from urandom)


Actual results:
The dd success but soon after the vm is not responsive, I cannot ssh again, connect via vnc, shut it down (I had to power it off). If I try to start the vm again it happens the same.

Now I've also seen this issue after only extending the disk, but it's more rare occurrence.

THe only error I could find anywhere is this in the engine.log:

2015-09-15 19:18:40,756 INFO  [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] Received a vnc Device without an address when processing VM 0d80bd83-79f8-4c80-a6c3-ef49897603e4 devices, skipping device: {specParams={displayIp=0}, deviceType=graphics, deviceId=96c352bf-e29f-4739-b3cf-1b016bf66f5e, device=vnc, type=graphics, port=5900}
2015-09-15 19:18:40,756 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='96c352bf-e29f-4739-b3cf-1b016bf66f5e', vmId='0d80bd83-79f8-4c80-a6c3-ef49897603e4'}', device='vnc', type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}'
2015-09-15 19:18:55,983 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-58) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4'(virtual_disk_resize_iscsi) moved from 'WaitForLaunch' --> 'PoweringUp'

and I don't know how to debug this further. If you guys could take a look and the log provide some insight would be great.


Additional info:

Comment 1 Carlos Mestre González 2015-09-15 17:19:04 UTC

Created attachment 1073738 [details]
engine.log

Comment 2 Omer Frenkel 2015-09-17 11:53:27 UTC

can you please attach relevant vdsm log?
also, this works well on x86 setup?

Comment 3 Carlos Mestre González 2015-09-18 15:42:47 UTC

Yes, it works well in x86

Comment 4 Carlos Mestre González 2015-09-18 15:44:31 UTC

Created attachment 1074980 [details]
host_mixed_1 vdsm log

This was where the host was starting the vms

Comment 5 Carlos Mestre González 2015-09-18 15:46:11 UTC

Created attachment 1074982 [details]
hos_mixed_2 vdsm log (spm)

At the end the vm run on the SPM, this hosts.

Comment 6 Michal Skrivanek 2015-09-24 13:24:48 UTC

you say "I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks)"
so you can reproduce this by:
create a VM
start, stop
add a disk
start
?

does the filling of disk play any role? could you reporduce that without any extra disk added, just filling it with dd? 
qemu log of that VM may help

Comment 7 Carlos Mestre González 2015-09-25 08:11:36 UTC

Created attachment 1076942 [details]
qemu log ono host_mixed_1

Comment 8 Carlos Mestre González 2015-09-25 08:12:16 UTC

Created attachment 1076943 [details]
qemu log on host_mixed_2

Comment 9 Carlos Mestre González 2015-09-25 08:15:01 UTC

Michal,

That's because the error I've posted before was seen after starting the vm with the attached disks, but I'm not sure if that's the case.

Regarding the reproduced, in our test suite we haven plenty of tests like that and this is the only one failing, so no, it has to do maybe with the disk type or the resize/filling of data.

I'll try to reproduce when I have access to the ppc64 setup again and update you with more info.

Comment 10 Martin Polednik 2015-10-08 14:00:12 UTC

Any reproduction news?

There doesn't seem to be anything pointing to issue in the logs and I'm not sure what to focus on in reproduction - does it still occur with regards to Michal's comments?

Comment 11 Carlos Mestre González 2015-10-13 15:14:41 UTC

I don't seem to be able to reproduce this with the last packages for PPC (run it multiple times with different interfaces/provisioning types) with packages:

qemu-img-rhev-2.3.0-29.el7.ppc64le
qemu-kvm-rhev-2.3.0-29.el7.ppc64le
libvirt-client-1.2.17-12.el7.ppc64le
vdsm-4.17.8-1.el7ev.noarch

Closing it.