Created attachment 1329184 [details] engine log during the time the disk was resized Description of problem: We updated to 4.1.6.2 on Tuesday and today tried to resize the root disk for a VM while powered on, this worked on 4.0.6.3. The disk that was resized was an LVM root partition Version-Release number of selected component (if applicable): oVirt - 4.1.6.2 Ceph - 11.2.1 Cinder - 9.1.4 CentOS - 7.2 Scientific Linux - 6.9 How reproducible: 100% Steps to Reproduce: 1. Increase the size of a cinder based disk Actual results: VM will pause shortly after command is run and data is irrecoverable Expected results: Disk is resized and I'm able to resize the disk in the VM with common filesystem tools Additional info:
CentOS 7.3 and 7.4 of the guest OS also affected. They also had the ovirt-guest-agent installed as well.
Does the 4.2 flag mean that RedHat was able to confirm the bug?
(In reply to Logan Kuhn from comment #2) > Does the 4.2 flag mean that RedHat was able to confirm the bug? I flagged the BZ with "ovirt-4.2?", not quite sure why the bot changed it to a +. Fred, the assigned developer, will definitely look into it in this timeline (probably quite shortly), but as we don't have a root cause analysis yet, we cannot commit on a fix date.
It works fine on 4.2. I will try to reproduce on 4.1.x. In the log provided, the command starts, but I cannot see any logs about it completing: 2017-09-21 11:44:18,981 INFO [org.ovirt.engine.core.bll.storage.disk.cinder.ExtendCinderDiskCommand] (pool-5-thread-6) [6de51a79] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}' 2017-09-21 11:44:18,989 INFO [org.ovirt.engine.core.bll.storage.disk.cinder.ExtendCinderDiskCommand] (pool-5-thread-6) [6de51a79] Running command: ExtendCinderDiskCommand internal: true. Entities affected : ID: 0d132c0b-58c0-4166-8ae4-2c6d14f6027a Type: DiskAction group EDIT_DISK_PROPERTIES with role type USER Logan, Can you please provide the cinder log from the Openstack server? It is located here: /var/log/cinder/volume.log
Created attachment 1331976 [details] volume.log
tl;dr bad dimm in one of the cinder servers caused it to be unstable and intermittently unable to communicate with the ceph cluster. Longer version is that we have several servers managing cinder services and almost all of it goes through one server where I pulled logs from. I've attached the volume log as well, there doesn't appear to be anything from this event in that log. On the failed server when I went to check it's log it kernel panic'd and is now sitting in a bad state, but I can resize any disks I want it would seem. I've resized a disk with lvm root, no lvm root and no lvm at all. All centos7 My theory is that the disks that were corrupted which was all that were tested queried <good server> api and used <bad server> volume and since that server couldn't consistently talk with ceph it never returned causing what appeared to be a hang and a timeout which left the disks inconsistent.