Created attachment 1600488[details]
tests scipts, test output, traces, vm xml, qemu log
Description of problem:
Using <blockio logical_block_size="4096" physical_block_size="4096"> in libvirt
xml seems to work around the issue described in bug 1737256, and provisioning
a VM is successful. However the disk is not bootable on the next boot.
Reproduced with Fedora 29 and Alpine 3.10.1 iso.
Version-Release number of selected component (if applicable):
bug 1737256
How reproducible:
Always.
Steps to Reproduce:
1. Add <blockio> element to libvirt xml, or add
logical_block_size,physical_block_size to qemu command line.
2. Start the VM from iso
3. Install
4. Boot
Actual results:
Boot fail with: No bootable device
Expected results:
Boot successful.
Additional info:
For the storage and host details, see bug 1737256.
Here is example flow using alpine iso, tested by running qemu directly,
based on qemu command line created by libvirt when running the VM using oVirt.
name=qemu-fuse
images=/rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com\:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/images
disk=4163cc03-c5ef-4956-ac0a-6e04699ce2e5/51de3a28-5c0a-4db1-a51c-b09c437b6f39
cdrom=3119fe7e-f576-46c3-880b-9590cb4619da/46efd27f-3bbe-4183-91f7-912c163b8eac
truncate -s 0 $images/$disk
truncate -s 1g $images/$disk
# Provision vm from iso.
strace -f -o $name-provision.trace qemu-kvm \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,iothread=iothread1,id=bus1,bus=pci.0,addr=0x5 \
-drive file=$images/$disk,format=raw,cache=none,if=none,id=drive1 \
-device scsi-hd,bus=bus1.0,drive=drive1,id=disk1,logical_block_size=4096,physical_block_size=4096,write-cache=on \
-cdrom $images/$cdrom \
-m 1024 \
-nographic
# Start the vm.
strace -f -o $name-run.trace qemu-kvm \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,iothread=iothread1,id=bus1,bus=pci.0,addr=0x5 \
-drive file=$images/$disk,format=raw,cache=none,if=none,id=drive1 \
-device scsi-hd,bus=bus1.0,drive=drive1,id=disk1,logical_block_size=4096,physical_block_size=4096,write-cache=on \
-m 1024 \
-nographic
Here is output from the guest:
1. Checking devices
We can see that the guest see the expected block size.
localhost:~# grep -s "" /sys/block/sda/queue/*
/sys/block/sda/queue/add_random:1
/sys/block/sda/queue/chunk_sectors:0
/sys/block/sda/queue/dax:0
/sys/block/sda/queue/discard_granularity:4096
/sys/block/sda/queue/discard_max_bytes:1073741824
/sys/block/sda/queue/discard_max_hw_bytes:1073741824
/sys/block/sda/queue/discard_zeroes_data:0
/sys/block/sda/queue/fua:0
/sys/block/sda/queue/hw_sector_size:4096
/sys/block/sda/queue/io_poll:1
/sys/block/sda/queue/io_poll_delay:-1
/sys/block/sda/queue/iostats:1
/sys/block/sda/queue/logical_block_size:4096
/sys/block/sda/queue/max_discard_segments:1
/sys/block/sda/queue/max_hw_sectors_kb:32767
/sys/block/sda/queue/max_integrity_segments:0
/sys/block/sda/queue/max_sectors_kb:1280
/sys/block/sda/queue/max_segment_size:65536
/sys/block/sda/queue/max_segments:126
/sys/block/sda/queue/minimum_io_size:4096
/sys/block/sda/queue/nomerges:0
/sys/block/sda/queue/nr_requests:256
/sys/block/sda/queue/optimal_io_size:0
/sys/block/sda/queue/physical_block_size:4096
/sys/block/sda/queue/read_ahead_kb:128
/sys/block/sda/queue/rotational:1
/sys/block/sda/queue/rq_affinity:1
/sys/block/sda/queue/scheduler:[mq-deadline] kyber none
/sys/block/sda/queue/write_cache:write back
/sys/block/sda/queue/write_same_max_bytes:2147479552
/sys/block/sda/queue/write_zeroes_max_bytes:2147479552
/sys/block/sda/queue/zoned:none
2. Installing
localhost:~# setup-alpine
...
localhost:~# apk add sfdisk syslinux
...
localhost:~# setup-disk
Available disks are:
sda (1.1 GB QEMU QEMU HARDDISK )
Which disk(s) would you like to use? (or '?' for help or 'none') [sda]
The following disk is selected:
sda (1.1 GB QEMU QEMU HARDDISK )
How would you like to use it? ('sys', 'data', 'lvm' or '?' for help) [?] sys
WARNING: The following disk(s) will be erased:
sda (1.1 GB QEMU QEMU HARDDISK )
WARNING: Erase the above disk(s) and continue? [y/N]: y
Creating file systems...
Installing system on /dev/sda3:
/mnt/boot is device /dev/sda1
100% ████████████████████████████████████████████==> initramfs: creating /boot/initramfs-virt
/boot is device /dev/sda1
Installation is complete. Please reboot.
localhost:~# sfdisk -l /dev/sda
Disk /dev/sda: 1 GiB, 1073741824 bytes, 262144 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x9773962b
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 256 25855 25600 100M 83 Linux
/dev/sda2 25856 91391 65536 256M 82 Linux swap / Solaris
/dev/sda3 91392 262143 170752 667M 83 Linux
localhost:~# poweroff
...
3. Next boot fails
SeaBIOS (version ?-20190712_051036-bde29493c9324747a3ac6dd355c75f9e-2.fc29)
iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+3FF91220+3FED1220 C980
Booting from Hard Disk...
Boot failed: could not read the boot disk
Booting from Floppy...
Boot failed: could not read the boot disk
Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
iPXE initialising devices...ok
iPXE 1.0.0+ -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT
net0: 52:54:00:12:34:56 using 82540em on 0000:00:03.0 (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 52:54:00:12:34:56).............. ok
net0: 10.0.2.15/255.255.255.0 gw 10.0.2.2
net0: fec0::5054:ff:fe12:3456/64 gw fe80::2
net0: fe80::5054:ff:fe12:3456/64
Nothing to boot: No such file or directory (http://ipxe.org/2d03e13b)
No more network devices
No bootable device.
The same flow was reproduced using Fedora 29 iso in oVirt.
See the attached vm.xm and qemu.log for the oVirt vm details.
I tested both fuse and libgfapi, same results in both cases. I included only
output from the fuse tests.
The BIOS interfaces simply don't support 4k native disks, so no booting from native 4k disks with BIOS, neither in VMs nor on real hardware. There's nothing we can do about this.
If you need BIOS to access a disk, you need to keep it a logical 512 bytes disk. (I understand that bug 1737256 may mean we have a problem there, but unfortunately, switching to a virtual 4k native disks isn't the easy solution/workaround.)