Description of problem: This seems related to RHEL 7.2 on PowerPC and how it handles the devices, but I need your input before moving it to other teams in case I'm missing something. Basically we have this scenario of copying all the iscsi disks from a vm to a new one, and then boot it and compare it works. In PowerPC after booting the copied vm the systems shows emergency mode and ask to prompt the password (the systems is not fully booted) Version-Release number of selected component (if applicable): rhevm-3.6.6-0.1 qemu-kvm-rhev-2.3.0-31.el7_2.12.ppc64le qemu-img-rhev-2.3.0-31.el7_2.12.ppc64le vdsm-4.17.27-0.el7ev.noarch VM: RHEL 7.2 3.10.0-327.13.1.el7.ppc64le #1 SMP Mon Feb 29 13:22:06 EST 2016 ppc64le ppc64le ppc64le GNU/Linux Host: RHEL 7.2 3.10.0-327.18.2.el7.ppc64le #1 SMP Fri Apr 8 05:10:45 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux How reproducible: 100% Steps to Reproduce: 1. Clone a vm with a RHEL 7.2 3.10.0-327.18.2.el7 thin provisioning disk 2. Create a permutation of 1 GB disks for Virtio, Virtio-scsi and spapr-vscsi and thin provisioning/preallocated (total 6 disks) 3. Attach those disk to the vm, and activate them 4. Start the vm and create partition and filesystem (ext4 this case) for all, create a small file on them 5. Shutdown the vm. 6. Copy the vm's disk to a new iscsi domain 7. Create a new vm and attach all created disks from the copy to the new vm. Actual results: Vm boots in emergency mode (I cannot understand clearly why, seems related to to the devices and how's unable to mount vdc on the logs) Expected results: Vm boots normally having all the filesystems created as in the original vm Additional info: After the emergency it recommends to check journalctl -xb, I checked but the only highlighted errors are the ones that also appear on the original vm but with different device (that runs fine): copied vm journalctl log: May 09 10:56:09 dhcp167-130.klab.eng.bos.redhat.com systemd[1]: Device dev-disk-by\x2dpartlabel-primary.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:05.0/virtio3/block/vdc/vdc1 and /sys/devices/pci0000:00/0000:00:07.0/virtio1/block/vda/vda1 May 09 10:56:16 localhost.localdomain systemd[1]: Device dev-disk-by\x2dpartlabel-primary.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:05.0/virtio3/block/vdc/vdc1 and /sys/devices/vio/2000/host0/target0:0:0/0:0:0:2/block/sdb/sdb1 May 09 10:56:16 localhost.localdomain systemd[1]: Device dev-disk-by\x2dpartlabel-primary.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:05.0/virtio3/block/vdc/vdc1 and /sys/devices/vio/2000/host0/target0:0:0/0:0:0:3/block/sda/sda1 The issue is the partitions are mounted fine except for vdc device: Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp167--130-root 8.5G 2.3G 6.2G 28% / devtmpfs 458M 0 458M 0% /dev tmpfs 503M 0 503M 0% /dev/shm tmpfs 503M 12M 491M 3% /run tmpfs 503M 0 503M 0% /sys/fs/cgroup /dev/sdc1 992M 2.6M 923M 1% /mount-point8083c9c2bf430399d6343552e55ee67aa9fbd8a9 /dev/sdd1 992M 2.6M 923M 1% /mount-pointbebfd787df8bfa9eed33b5b1ea3ac5bf4059e754 /dev/vda1 992M 2.6M 923M 1% /mount-pointac3bff80b8c4147949ffadaaf1af86295d7b512f /dev/sdb1 992M 2.6M 923M 1% /mount-pointaf322a92a6ae89bf0b62560771c3d3066390bd1b /dev/sda1 992M 2.6M 923M 1% /mount-pointe62eb700e5f83257db9c2bd1a1848075418275cc /dev/vdb2 497M 279M 218M 57% /boot I'll attach logs
Created attachment 1155729 [details] Collection of messages, journalctl output, engine and vdsm. This contains output of /var/log/messages and journalctl output for both the original vm and the new one (with the copied disks - copied_vm). Also includes df output for the copied vm and the GET /disks for the vm with all the disks attached. I also included the engine.log of the run and the vdsm.log just in case.
Nir, have a look please
- Is it a regression? - Does it happen on X86? - Is it reproduced? - Is it related to the disk types (there are different types here)?
(In reply to Yaniv Kaul from comment #3) > - Is it a regression? I don't know, this was a new test that run for the first time in this build. > - Does it happen on X86? No, I marked this specifically for PPC architecture (if also happens in x86 I put x86 and add the PPC on the description). > - Is it reproduced? 100%, I wrote that on the description. > - Is it related to the disk types (there are different types here)? I'll try other scenarios with different types and update.
I simplified the test and checked for the different types, is the same of my first comment, but instead of attaching all the (6) disks for all permutations I've tested for 1 boot disk + 2 attached 1 GB (thin and prealloc) for different interfaces: Boot disk 2 Attached disks with fs VIRTIO VIRTIO => FAILS to boot VIRTIO VIRTIO SCSI => PASS VIRTIO sPARP VSCSI => PASS VIRTIO_SCSI VIRTIO SCSI => PASS VIRTIO_SCSI VIRTIO => PASS The emergency mode only happens with those 3 VIRTIO disks. Remember this happens when copying disks to another domain and attach them to a new vm, original vm with different combination of disks work fine.
putting need info back
(In reply to Carlos Mestre González from comment #5) > I simplified the test and checked for the different types, is the same of my > first comment, but instead of attaching all the (6) disks for all > permutations I've tested for 1 boot disk + 2 attached 1 GB (thin and > prealloc) for different interfaces: > > Boot disk 2 Attached disks with fs > VIRTIO VIRTIO => FAILS to boot > VIRTIO VIRTIO SCSI => PASS > VIRTIO sPARP VSCSI => PASS > VIRTIO_SCSI VIRTIO SCSI => PASS > VIRTIO_SCSI VIRTIO => PASS > > The emergency mode only happens with those 3 VIRTIO disks. > > Remember this happens when copying disks to another domain and attach them > to a new vm, original vm with different combination of disks work fine. Excellent information, can you compare the libvirt XMLs, to see what the difference is, if any? I wonder if the disks order changed.
Created attachment 1160608 [details] libvirtd.log and other logs for passing and failing scenario I attached the libvirtd logs and others for two scenarios that are the same but with different interfaces according to my previous comment: Boot disk 2 Attached disks with fs VIRTIO VIRTIO => FAILS to boot VIRTIO VIRTIO SCSI => PASS Regarding tests, copy_disk_test_vm is the one that fails to boot, copy_disk_vm_iscsi is the original one, check the time in the qemu logs shows the start/shutdown time. in failed run copy_disk_test_vm: 2016-05-23 11:30:36.709+0000: 2016-05-23 11:41:12.287+0000: shutting down pass run: 2016-05-23 12:14:02.631+0000 2016-05-23 12:15:35.476+0000: shutting down
Run into this BZ by random while searching another one. Found this in messages: > May 9 10:42:36 dhcp167-130 systemd: Device dev-disk-by\x2dpartlabel-primary.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:07.0/virtio1/block/vdb/vdb1 and /sys/devices/vio/2000/host0/target0:0:0/0:0:0:3/block/sda/sda1 Any chance LVM (and lvmetad) sees duplicate PVs? If so you should either remove duplicate disk or filter out devices by setting global_filter in lvm.conf. Also found following in the journal: May 09 10:56:16 localhost.localdomain kernel: EXT4-fs (vdb1): VFS: Can't find ext4 filesystem May 09 10:56:16 localhost.localdomain systemd[1]: mount\x2dpoint6fb75ac675058ef30267bb71e17db05e5d622560.mount mount process exited, code=exited status=32 May 09 10:56:16 localhost.localdomain systemd[1]: Failed to mount /mount-point6fb75ac675058ef30267bb71e17db05e5d622560. May 09 10:56:16 localhost.localdomain systemd[1]: Dependency failed for Local File Systems. Are there any `/dev/vdXN` or `/dev/sdXN` in /etc/fstab? I have seen /dev/vdX names changed after cloning VM. Replace them by `LABEL=` or `UUID=` lines.
No clear RCA ATM, pushing out to 3.6.9.
3.6 is gone EOL; Please re-target this bug to a 4.0 release.
Affects also 4.0 (4.0.2.3-0.1.el7ev)
Hi Carlos, Can you reply to comment #9 regarding '/etc/fstab' ? Thanks
Created attachment 1222805 [details] screenshot fstab - kernel version after failure to start Hi, The fstab is also in the description of the bug. I made a new run and took a screenshot of the /etc/fstab, as you can see there are multiple vdXN and sdXN on the file. this was tested with kernel 3.10.0-327 kernel.
(In reply to Carlos Mestre González from comment #14) > Created attachment 1222805 [details] > screenshot fstab - kernel version after failure to start > > Hi, > > The fstab is also in the description of the bug. > > I made a new run and took a screenshot of the /etc/fstab, as you can see > there are multiple vdXN and sdXN on the file. > > this was tested with kernel 3.10.0-327 kernel. I think the best practice should be to use UUID in the /etc/fstab. Can you try to reproduce while mounting with UUID ?
Yes, with UUID the OS boots properly.
Derek hi, We cannot currently fix this issue, but we definitely need to have the workaround/best practice documented somewhere. Any suggestions ? Thanks, Freddy
From [1] : Issue After rebooting, one of my /dev/sdX partitions did not mount automatically. Resolution Device names like /dev/sdX can change across reboots. To prevent this from happening, either set /etc/fstab to use UUIDs or labels. [1] https://access.redhat.com/solutions/424513
*** Bug 1439683 has been marked as a duplicate of this bug. ***