Description of problem: I don't know how to proper asset this issue, so if you have any help to investigate this further would be great help. I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks). So after the steps to reproduce I'm providing the vm becomes non responsive completely (cannot ssh to it, cannot connect via vnc, and I'm not able to shut it down). I'm adding this bz to rhev for further investigation. Version-Release number of selected component (if applicable): rhevm-3.6.0-0.15.master.el6.noarch RHEL PPC hosts (machines are IBM POWER 8): vdsm-4.17.6-1.el7ev.noarch qemu-kvm-tools-rhev-2.3.0-22.el7.ppc64le qemu-kvm-common-rhev-2.3.0-22.el7.ppc64le ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch libvirt-daemon-driver-qemu-1.2.17-8.el7.ppc64le qemu-img-rhev-2.3.0-22.el7.ppc64le qemu-kvm-rhev-2.3.0-22.el7.ppc64le libvirt-client-1.2.17-8.el7.ppc64le How reproducible: 50% Steps to Reproduce: (all this via REST API) 1. Create a vm from a template, with type server, vnc display and os_type rhel7ppc64 (this are requirements I've had because of WA on PPC for rest api). Disk should be cow/sparse/virtio in iscsi. 2. Edit the boot device to change the name. 3. Start the vm and stop the vm 4. Add a new 1Gb disk to the vm (Can be any combination, for example RAW/virtio on ISCSI) 5. start the vm 6. Edit the disk and extend the size another 1 Gb 7. Access the vm via ssh and fill the new disks with data (I normally dd from urandom) Actual results: The dd success but soon after the vm is not responsive, I cannot ssh again, connect via vnc, shut it down (I had to power it off). If I try to start the vm again it happens the same. Now I've also seen this issue after only extending the disk, but it's more rare occurrence. THe only error I could find anywhere is this in the engine.log: 2015-09-15 19:18:40,756 INFO [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] Received a vnc Device without an address when processing VM 0d80bd83-79f8-4c80-a6c3-ef49897603e4 devices, skipping device: {specParams={displayIp=0}, deviceType=graphics, deviceId=96c352bf-e29f-4739-b3cf-1b016bf66f5e, device=vnc, type=graphics, port=5900} 2015-09-15 19:18:40,756 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-66) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='96c352bf-e29f-4739-b3cf-1b016bf66f5e', vmId='0d80bd83-79f8-4c80-a6c3-ef49897603e4'}', device='vnc', type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}' 2015-09-15 19:18:55,983 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-58) [] VM '0d80bd83-79f8-4c80-a6c3-ef49897603e4'(virtual_disk_resize_iscsi) moved from 'WaitForLaunch' --> 'PoweringUp' and I don't know how to debug this further. If you guys could take a look and the log provide some insight would be great. Additional info:
Created attachment 1073738 [details] engine.log
can you please attach relevant vdsm log? also, this works well on x86 setup?
Yes, it works well in x86
Created attachment 1074980 [details] host_mixed_1 vdsm log This was where the host was starting the vms
Created attachment 1074982 [details] hos_mixed_2 vdsm log (spm) At the end the vm run on the SPM, this hosts.
you say "I'm testing the (live) resizing of disks for ovirt hosts in PPC, but I think the issue starts before (after creating and editing the disks)" so you can reproduce this by: create a VM start, stop add a disk start ? does the filling of disk play any role? could you reporduce that without any extra disk added, just filling it with dd? qemu log of that VM may help
Created attachment 1076942 [details] qemu log ono host_mixed_1
Created attachment 1076943 [details] qemu log on host_mixed_2
Michal, That's because the error I've posted before was seen after starting the vm with the attached disks, but I'm not sure if that's the case. Regarding the reproduced, in our test suite we haven plenty of tests like that and this is the only one failing, so no, it has to do maybe with the disk type or the resize/filling of data. I'll try to reproduce when I have access to the ppc64 setup again and update you with more info.
Any reproduction news? There doesn't seem to be anything pointing to issue in the logs and I'm not sure what to focus on in reproduction - does it still occur with regards to Michal's comments?
I don't seem to be able to reproduce this with the last packages for PPC (run it multiple times with different interfaces/provisioning types) with packages: qemu-img-rhev-2.3.0-29.el7.ppc64le qemu-kvm-rhev-2.3.0-29.el7.ppc64le libvirt-client-1.2.17-12.el7.ppc64le vdsm-4.17.8-1.el7ev.noarch Closing it.