Bug 1737268

Summary: Disk on gluster 4k storage not bootable after successful installaion
Product: [Fedora] Fedora Reporter: Nir Soffer <nsoffer>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: amit, berrange, cfergeau, dwmw2, itamar, kwolf, pbonzini, rjones, virt-maint, vjuranek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-05 16:15:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1592916    
Attachments:
Description Flags
tests scipts, test output, traces, vm xml, qemu log none

Description Nir Soffer 2019-08-04 21:49:37 UTC
Created attachment 1600488 [details]
tests scipts, test output, traces, vm xml, qemu log

Description of problem:

Using <blockio logical_block_size="4096" physical_block_size="4096"> in libvirt
xml seems to work around the issue described in bug 1737256, and provisioning 
a VM is successful. However the disk is not bootable on the next boot.

Reproduced with Fedora 29 and Alpine 3.10.1 iso.

Version-Release number of selected component (if applicable):
bug 1737256 

How reproducible:
Always.

Steps to Reproduce:
1. Add <blockio> element to libvirt xml, or add
   logical_block_size,physical_block_size to qemu command line.
2. Start the VM from iso
3. Install
4. Boot

Actual results:
Boot fail with: No bootable device

Expected results:
Boot successful.

Additional info:

For the storage and host details, see bug 1737256.

Here is example flow using alpine iso, tested by running qemu directly,
based on qemu command line created by libvirt when running the VM using oVirt.

name=qemu-fuse
images=/rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com\:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/images
disk=4163cc03-c5ef-4956-ac0a-6e04699ce2e5/51de3a28-5c0a-4db1-a51c-b09c437b6f39
cdrom=3119fe7e-f576-46c3-880b-9590cb4619da/46efd27f-3bbe-4183-91f7-912c163b8eac

truncate -s 0 $images/$disk
truncate -s 1g $images/$disk

# Provision vm from iso.
strace -f -o $name-provision.trace qemu-kvm \
        -object iothread,id=iothread1 \
        -device virtio-scsi-pci,iothread=iothread1,id=bus1,bus=pci.0,addr=0x5 \
        -drive file=$images/$disk,format=raw,cache=none,if=none,id=drive1 \
        -device scsi-hd,bus=bus1.0,drive=drive1,id=disk1,logical_block_size=4096,physical_block_size=4096,write-cache=on \
        -cdrom $images/$cdrom \
        -m 1024 \
        -nographic

# Start the vm.
strace -f -o $name-run.trace qemu-kvm \
        -object iothread,id=iothread1 \
        -device virtio-scsi-pci,iothread=iothread1,id=bus1,bus=pci.0,addr=0x5 \
        -drive file=$images/$disk,format=raw,cache=none,if=none,id=drive1 \
        -device scsi-hd,bus=bus1.0,drive=drive1,id=disk1,logical_block_size=4096,physical_block_size=4096,write-cache=on \
        -m 1024 \
        -nographic

Here is output from the guest:

1. Checking devices

We can see that the guest see the expected block size.

localhost:~# grep -s "" /sys/block/sda/queue/*
/sys/block/sda/queue/add_random:1
/sys/block/sda/queue/chunk_sectors:0
/sys/block/sda/queue/dax:0
/sys/block/sda/queue/discard_granularity:4096
/sys/block/sda/queue/discard_max_bytes:1073741824
/sys/block/sda/queue/discard_max_hw_bytes:1073741824
/sys/block/sda/queue/discard_zeroes_data:0
/sys/block/sda/queue/fua:0
/sys/block/sda/queue/hw_sector_size:4096
/sys/block/sda/queue/io_poll:1
/sys/block/sda/queue/io_poll_delay:-1
/sys/block/sda/queue/iostats:1
/sys/block/sda/queue/logical_block_size:4096
/sys/block/sda/queue/max_discard_segments:1
/sys/block/sda/queue/max_hw_sectors_kb:32767
/sys/block/sda/queue/max_integrity_segments:0
/sys/block/sda/queue/max_sectors_kb:1280
/sys/block/sda/queue/max_segment_size:65536
/sys/block/sda/queue/max_segments:126
/sys/block/sda/queue/minimum_io_size:4096
/sys/block/sda/queue/nomerges:0
/sys/block/sda/queue/nr_requests:256
/sys/block/sda/queue/optimal_io_size:0
/sys/block/sda/queue/physical_block_size:4096
/sys/block/sda/queue/read_ahead_kb:128
/sys/block/sda/queue/rotational:1
/sys/block/sda/queue/rq_affinity:1
/sys/block/sda/queue/scheduler:[mq-deadline] kyber none
/sys/block/sda/queue/write_cache:write back
/sys/block/sda/queue/write_same_max_bytes:2147479552
/sys/block/sda/queue/write_zeroes_max_bytes:2147479552
/sys/block/sda/queue/zoned:none


2. Installing


localhost:~# setup-alpine
...

localhost:~# apk add sfdisk syslinux
...

localhost:~# setup-disk
Available disks are:
  sda   (1.1 GB QEMU     QEMU HARDDISK   )
Which disk(s) would you like to use? (or '?' for help or 'none') [sda]
The following disk is selected:
  sda   (1.1 GB QEMU     QEMU HARDDISK   )
How would you like to use it? ('sys', 'data', 'lvm' or '?' for help) [?] sys
WARNING: The following disk(s) will be erased:
  sda   (1.1 GB QEMU     QEMU HARDDISK   )
WARNING: Erase the above disk(s) and continue? [y/N]: y
Creating file systems...
Installing system on /dev/sda3:
/mnt/boot is device /dev/sda1
100% ████████████████████████████████████████████==> initramfs: creating /boot/initramfs-virt
/boot is device /dev/sda1

Installation is complete. Please reboot.

localhost:~# sfdisk -l /dev/sda
Disk /dev/sda: 1 GiB, 1073741824 bytes, 262144 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x9773962b

Device     Boot Start    End Sectors  Size Id Type
/dev/sda1  *      256  25855   25600  100M 83 Linux
/dev/sda2       25856  91391   65536  256M 82 Linux swap / Solaris
/dev/sda3       91392 262143  170752  667M 83 Linux

localhost:~# poweroff
...


3. Next boot fails

SeaBIOS (version ?-20190712_051036-bde29493c9324747a3ac6dd355c75f9e-2.fc29)


iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+3FF91220+3FED1220 C980



Booting from Hard Disk...
Boot failed: could not read the boot disk

Booting from Floppy...
Boot failed: could not read the boot disk

Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
iPXE initialising devices...ok



iPXE 1.0.0+ -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT

net0: 52:54:00:12:34:56 using 82540em on 0000:00:03.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 52:54:00:12:34:56).............. ok
net0: 10.0.2.15/255.255.255.0 gw 10.0.2.2
net0: fec0::5054:ff:fe12:3456/64 gw fe80::2
net0: fe80::5054:ff:fe12:3456/64
Nothing to boot: No such file or directory (http://ipxe.org/2d03e13b)
No more network devices

No bootable device.


The same flow was reproduced using Fedora 29 iso in oVirt.
See the attached vm.xm and qemu.log for the oVirt vm details.

I tested both fuse and libgfapi, same results in both cases. I included only 
output from the fuse tests.

Comment 1 Kevin Wolf 2019-08-05 16:15:47 UTC
The BIOS interfaces simply don't support 4k native disks, so no booting from native 4k disks with BIOS, neither in VMs nor on real hardware. There's nothing we can do about this.

If you need BIOS to access a disk, you need to keep it a logical 512 bytes disk. (I understand that bug 1737256 may mean we have a problem there, but unfortunately, switching to a virtual 4k native disks isn't the easy solution/workaround.)