Bug 1334726
| Summary: | [PPC][rhevm-3.6.6-0.1] RHEL 7.2 vm with copied disks enters emergency mode when booted. | |||
|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Carlos Mestre González <cmestreg> | |
| Component: | BLL.Storage | Assignee: | Nir Soffer <nsoffer> | |
| Status: | CLOSED CANTFIX | QA Contact: | Carlos Mestre González <cmestreg> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.6.5.1 | CC: | amureini, bugs, cmestreg, dcadzow, frolland, gklein, hannsj_uhl, istein, mcsontos, nsoffer, ratamir, sbonazzo, tnisan, ylavi | |
| Target Milestone: | ovirt-4.0.7 | Flags: | amureini:
ovirt-4.0.z?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
|
| Target Release: | --- | |||
| Hardware: | ppc64le | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1361549 (view as bug list) | Environment: | ||
| Last Closed: | 2016-11-28 16:10:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1359843, 1361549 | |||
| Attachments: | ||||
|
Description
Carlos Mestre González
2016-05-10 12:20:21 UTC
Created attachment 1155729 [details]
Collection of messages, journalctl output, engine and vdsm.
This contains output of /var/log/messages and journalctl output for both the original vm and the new one (with the copied disks - copied_vm).
Also includes df output for the copied vm and the GET /disks for the vm with all the disks attached.
I also included the engine.log of the run and the vdsm.log just in case.
Nir, have a look please - Is it a regression? - Does it happen on X86? - Is it reproduced? - Is it related to the disk types (there are different types here)? (In reply to Yaniv Kaul from comment #3) > - Is it a regression? I don't know, this was a new test that run for the first time in this build. > - Does it happen on X86? No, I marked this specifically for PPC architecture (if also happens in x86 I put x86 and add the PPC on the description). > - Is it reproduced? 100%, I wrote that on the description. > - Is it related to the disk types (there are different types here)? I'll try other scenarios with different types and update. I simplified the test and checked for the different types, is the same of my first comment, but instead of attaching all the (6) disks for all permutations I've tested for 1 boot disk + 2 attached 1 GB (thin and prealloc) for different interfaces: Boot disk 2 Attached disks with fs VIRTIO VIRTIO => FAILS to boot VIRTIO VIRTIO SCSI => PASS VIRTIO sPARP VSCSI => PASS VIRTIO_SCSI VIRTIO SCSI => PASS VIRTIO_SCSI VIRTIO => PASS The emergency mode only happens with those 3 VIRTIO disks. Remember this happens when copying disks to another domain and attach them to a new vm, original vm with different combination of disks work fine. putting need info back (In reply to Carlos Mestre González from comment #5) > I simplified the test and checked for the different types, is the same of my > first comment, but instead of attaching all the (6) disks for all > permutations I've tested for 1 boot disk + 2 attached 1 GB (thin and > prealloc) for different interfaces: > > Boot disk 2 Attached disks with fs > VIRTIO VIRTIO => FAILS to boot > VIRTIO VIRTIO SCSI => PASS > VIRTIO sPARP VSCSI => PASS > VIRTIO_SCSI VIRTIO SCSI => PASS > VIRTIO_SCSI VIRTIO => PASS > > The emergency mode only happens with those 3 VIRTIO disks. > > Remember this happens when copying disks to another domain and attach them > to a new vm, original vm with different combination of disks work fine. Excellent information, can you compare the libvirt XMLs, to see what the difference is, if any? I wonder if the disks order changed. Created attachment 1160608 [details]
libvirtd.log and other logs for passing and failing scenario
I attached the libvirtd logs and others for two scenarios that are the same but with different interfaces according to my previous comment:
Boot disk 2 Attached disks with fs
VIRTIO VIRTIO => FAILS to boot
VIRTIO VIRTIO SCSI => PASS
Regarding tests, copy_disk_test_vm is the one that fails to boot, copy_disk_vm_iscsi is the original one, check the time in the qemu logs shows the start/shutdown time.
in failed run copy_disk_test_vm:
2016-05-23 11:30:36.709+0000:
2016-05-23 11:41:12.287+0000: shutting down
pass run:
2016-05-23 12:14:02.631+0000
2016-05-23 12:15:35.476+0000: shutting down
Run into this BZ by random while searching another one. Found this in messages:
> May 9 10:42:36 dhcp167-130 systemd: Device dev-disk-by\x2dpartlabel-primary.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:07.0/virtio1/block/vdb/vdb1 and /sys/devices/vio/2000/host0/target0:0:0/0:0:0:3/block/sda/sda1
Any chance LVM (and lvmetad) sees duplicate PVs?
If so you should either remove duplicate disk or filter out devices by setting global_filter in lvm.conf.
Also found following in the journal:
May 09 10:56:16 localhost.localdomain kernel: EXT4-fs (vdb1): VFS: Can't find ext4 filesystem
May 09 10:56:16 localhost.localdomain systemd[1]: mount\x2dpoint6fb75ac675058ef30267bb71e17db05e5d622560.mount mount process exited, code=exited status=32
May 09 10:56:16 localhost.localdomain systemd[1]: Failed to mount /mount-point6fb75ac675058ef30267bb71e17db05e5d622560.
May 09 10:56:16 localhost.localdomain systemd[1]: Dependency failed for Local File Systems.
Are there any `/dev/vdXN` or `/dev/sdXN` in /etc/fstab?
I have seen /dev/vdX names changed after cloning VM.
Replace them by `LABEL=` or `UUID=` lines.
No clear RCA ATM, pushing out to 3.6.9. 3.6 is gone EOL; Please re-target this bug to a 4.0 release. Affects also 4.0 (4.0.2.3-0.1.el7ev) Hi Carlos, Can you reply to comment #9 regarding '/etc/fstab' ? Thanks Created attachment 1222805 [details]
screenshot fstab - kernel version after failure to start
Hi,
The fstab is also in the description of the bug.
I made a new run and took a screenshot of the /etc/fstab, as you can see there are multiple vdXN and sdXN on the file.
this was tested with kernel 3.10.0-327 kernel.
(In reply to Carlos Mestre González from comment #14) > Created attachment 1222805 [details] > screenshot fstab - kernel version after failure to start > > Hi, > > The fstab is also in the description of the bug. > > I made a new run and took a screenshot of the /etc/fstab, as you can see > there are multiple vdXN and sdXN on the file. > > this was tested with kernel 3.10.0-327 kernel. I think the best practice should be to use UUID in the /etc/fstab. Can you try to reproduce while mounting with UUID ? Yes, with UUID the OS boots properly. Derek hi, We cannot currently fix this issue, but we definitely need to have the workaround/best practice documented somewhere. Any suggestions ? Thanks, Freddy From [1] :
Issue
After rebooting, one of my /dev/sdX partitions did not mount automatically.
Resolution
Device names like /dev/sdX can change across reboots.
To prevent this from happening, either set /etc/fstab to use UUIDs or labels.
[1] https://access.redhat.com/solutions/424513
*** Bug 1439683 has been marked as a duplicate of this bug. *** |