Bug 1263383 - [ppc64le] Vm is unresponsive after extending a secondary disk and fill it with dump data
Summary: [ppc64le] Vm is unresponsive after extending a secondary disk and fill it wit...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: ppc64le
OS: Unspecified
high
urgent
Target Milestone: ovirt-3.6.0-ga
: 3.6.0
Assignee: Martin Polednik
QA Contact: Aharon Canan
URL:
Whiteboard: virt
Depends On:
Blocks: RHEV3.6PPC 1277183 1277184
TreeView+ depends on / blocked
 
Reported: 2015-09-15 17:18 UTC by Carlos Mestre González
Modified: 2016-02-21 13:30 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-13 15:14:41 UTC
oVirt Team: Virt
Embargoed:
ofrenkel: ovirt-3.6.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine.log (243.26 KB, text/plain)
2015-09-15 17:19 UTC, Carlos Mestre González
no flags Details
host_mixed_1 vdsm log (693.48 KB, text/plain)
2015-09-18 15:44 UTC, Carlos Mestre González
no flags Details
hos_mixed_2 vdsm log (spm) (2.23 MB, text/plain)
2015-09-18 15:46 UTC, Carlos Mestre González
no flags Details
qemu log ono host_mixed_1 (13.09 KB, text/plain)
2015-09-25 08:11 UTC, Carlos Mestre González
no flags Details
qemu log on host_mixed_2 (2.86 KB, text/plain)
2015-09-25 08:12 UTC, Carlos Mestre González
no flags Details

Description Carlos Mestre González 2015-09-15 17:18:01 UTC
Description of problem:
I don't know how to proper asset this issue, so if you have any help to investigate this further would be great help.

I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks). So after the steps to reproduce I'm providing the vm becomes non responsive completely (cannot ssh to it, cannot connect via vnc, and I'm not able to shut it down).

I'm adding this bz to rhev for further investigation.

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.15.master.el6.noarch

RHEL PPC hosts (machines are IBM POWER 8):
vdsm-4.17.6-1.el7ev.noarch
qemu-kvm-tools-rhev-2.3.0-22.el7.ppc64le
qemu-kvm-common-rhev-2.3.0-22.el7.ppc64le
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.17-8.el7.ppc64le
qemu-img-rhev-2.3.0-22.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
libvirt-client-1.2.17-8.el7.ppc64le

How reproducible:
50%

Steps to Reproduce: (all this via REST API)
1. Create a vm from a template, with type server, vnc display and os_type rhel7ppc64 (this are requirements I've had because of WA on PPC for rest api). Disk should be cow/sparse/virtio in iscsi.
2. Edit the boot device to change the name.
3. Start the vm and stop the vm
4. Add a new 1Gb disk to the vm (Can be any combination, for example RAW/virtio on ISCSI)
5. start the vm
6. Edit the disk and extend the size another 1 Gb
7. Access the vm via ssh and fill the new disks with data (I normally dd from urandom)


Actual results:
The dd success but soon after the vm is not responsive, I cannot ssh again, connect via vnc, shut it down (I had to power it off). If I try to start the vm again it happens the same.

Now I've also seen this issue after only extending the disk, but it's more rare occurrence.

THe only error I could find anywhere is this in the engine.log:

2015-09-15 19:18:40,756 INFO  [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] Received a vnc Device without an address when processing VM 0d80bd83-79f8-4c80-a6c3-ef49897603e4 devices, skipping device: {specParams={displayIp=0}, deviceType=graphics, deviceId=96c352bf-e29f-4739-b3cf-1b016bf66f5e, device=vnc, type=graphics, port=5900}
2015-09-15 19:18:40,756 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='96c352bf-e29f-4739-b3cf-1b016bf66f5e', vmId='0d80bd83-79f8-4c80-a6c3-ef49897603e4'}', device='vnc', type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}'
2015-09-15 19:18:55,983 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-58) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4'(virtual_disk_resize_iscsi) moved from 'WaitForLaunch' --> 'PoweringUp'

and I don't know how to debug this further. If you guys could take a look and the log provide some insight would be great.


Additional info:

Comment 1 Carlos Mestre González 2015-09-15 17:19:04 UTC
Created attachment 1073738 [details]
engine.log

Comment 2 Omer Frenkel 2015-09-17 11:53:27 UTC
can you please attach relevant vdsm log?
also, this works well on x86 setup?

Comment 3 Carlos Mestre González 2015-09-18 15:42:47 UTC
Yes, it works well in x86

Comment 4 Carlos Mestre González 2015-09-18 15:44:31 UTC
Created attachment 1074980 [details]
host_mixed_1 vdsm log

This was where the host was starting the vms

Comment 5 Carlos Mestre González 2015-09-18 15:46:11 UTC
Created attachment 1074982 [details]
hos_mixed_2 vdsm log (spm)

At the end the vm run on the SPM, this hosts.

Comment 6 Michal Skrivanek 2015-09-24 13:24:48 UTC
you say "I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks)"
so you can reproduce this by:
create a VM
start, stop
add a disk
start
?

does the filling of disk play any role? could you reporduce that without any extra disk added, just filling it with dd? 
qemu log of that VM may help

Comment 7 Carlos Mestre González 2015-09-25 08:11:36 UTC
Created attachment 1076942 [details]
qemu log ono host_mixed_1

Comment 8 Carlos Mestre González 2015-09-25 08:12:16 UTC
Created attachment 1076943 [details]
qemu log on host_mixed_2

Comment 9 Carlos Mestre González 2015-09-25 08:15:01 UTC
Michal,

That's because the error I've posted before was seen after starting the vm with the attached disks, but I'm not sure if that's the case.

Regarding the reproduced, in our test suite we haven plenty of tests like that and this is the only one failing, so no, it has to do maybe with the disk type or the resize/filling of data.

I'll try to reproduce when I have access to the ppc64 setup again and update you with more info.

Comment 10 Martin Polednik 2015-10-08 14:00:12 UTC
Any reproduction news?

There doesn't seem to be anything pointing to issue in the logs and I'm not sure what to focus on in reproduction - does it still occur with regards to Michal's comments?

Comment 11 Carlos Mestre González 2015-10-13 15:14:41 UTC
I don't seem to be able to reproduce this with the last packages for PPC (run it multiple times with different interfaces/provisioning types) with packages:

qemu-img-rhev-2.3.0-29.el7.ppc64le
qemu-kvm-rhev-2.3.0-29.el7.ppc64le
libvirt-client-1.2.17-12.el7.ppc64le
vdsm-4.17.8-1.el7ev.noarch

Closing it.


Note You need to log in before you can comment on or make changes to this bug.