Bug 1704584

Summary: Failed to start guest via OVMF after system_reset_during_boot - slow train
Product: Red Hat Enterprise Linux 8 Reporter: Xueqiang Wei <xuwei>
Component: edk2Assignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Michael <choma>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: berrange, coli, jinzhao, junzhang, kraxel, lersek, michen, pbonzini, philmd
Target Milestone: rc   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 13:59:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bios log
none
serial log none

Description Xueqiang Wei 2019-04-30 07:01:32 UTC
Description of problem:

after system_reset_during_boot, guest failed to start.

found error in bios log:
Error: Image at 0007EA6F000 start failed: Unsupported



Version-Release number of selected component (if applicable):

host: 
kernel-4.18.0-80.19.el8.x86_64
qemu-kvm-2.12.0-67.module+el8.1.0+3088+c3b61d6f
edk2-ovmf-20190308git89910a39dcfd-2.el8.noarch

guest: 
kernel-4.18.0-80.21.el8.x86_64



How reproducible:
8/10


Steps to Reproduce:
1. boot guest with q35/ovmf
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_2gyxi5mu/monitor-qmpmonitor1-20190429-230122-lUVdivnr,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_2gyxi5mu/monitor-catch_monitor-20190429-230122-lUVdivnr,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idjUXxuR  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_2gyxi5mu/serial-serial0-20190429-230122-lUVdivnr,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190429-230122-lUVdivnr,path=/var/tmp/avocado_2gyxi5mu/seabios-20190429-230122-lUVdivnr,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190429-230122-lUVdivnr,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:a9:aa:ab:ac:ad,id=idPoD7qS,vectors=4,netdev=idX4UPO0,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idX4UPO0,vhost=on,vhostfd=24,fd=21 \
    -m 10240  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.secboot.fd \
    -drive if=pflash,format=raw,file=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2.fd \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0

2. Reset guest system for 20 times during boot randomly.
qmp cmd: {'execute': 'system_reset', 'id': '2W0fmTmo'}

(1) Reset the system by monitor cmd after 39.628secs
(2) Reset the system by monitor cmd after 36.356secs
(3) Reset the system by monitor cmd after 41.094secs
(4) Reset the system by monitor cmd after 9.626secs
(5) Reset the system by monitor cmd after 11.8secs
(6) Reset the system by monitor cmd after 50.529secs
(7) Reset the system by monitor cmd after 38.049secs
(8) Reset the system by monitor cmd after 31.379secs
(9) Reset the system by monitor cmd after 16.472secs
(10) Reset the system by monitor cmd after 23.282secs
(11) Reset the system by monitor cmd after 34.164secs
(12) Reset the system by monitor cmd after 45.996secs
(13) Reset the system by monitor cmd after 3.062secs
(14) Reset the system by monitor cmd after 9.631secs
(15) Reset the system by monitor cmd after 23.259secs
(16) Reset the system by monitor cmd after 32.473secs
(17) Reset the system by monitor cmd after 16.638secs
(18) Reset the system by monitor cmd after 38.076secs
(19) Reset the system by monitor cmd after 21.106secs
(20) Reset the system by monitor cmd after 33.661secs

3. Try to login guest after reset


Actual results:
guest can not boot up after reset. please refer to attached log file for details.


Expected results:
guest works well, boot up normally.


Additional info:

Not hit this issue via seabios.


Hit this issue by automation case "system_reset_during_boot" in avocado.

cmd lines:
# python3 ConfigTest.py --testcase=system_reset_during_boot --imageformat=qcow2 --guestname=RHEL.8.1.0 --driveformat=virtio_scsi --nicmodel=virtio_net --machines=q35 --clone=no --firmware=ovmf

Comment 1 Xueqiang Wei 2019-04-30 07:13:23 UTC
Created attachment 1560194 [details]
bios log

Comment 2 Xueqiang Wei 2019-04-30 07:18:43 UTC
Created attachment 1560195 [details]
serial log

Comment 3 Philippe Mathieu-Daudé 2019-04-30 07:38:20 UTC
> 23:10:48: QEMU Flash: Attempting flash detection at FFC00010
> 23:10:48: QemuFlashDetected => FD behaves as FLASH
> 23:10:48: QemuFlashDetected => Yes
> 23:10:48: Installing QEMU flash SMM FVB
> 23:10:48: SmmInstallProtocolInterface: D326D041-BD31-4C01-B5A8-628BE87F0653 7FFA4730
> 23:10:48: SmmInstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 7FFA4698
> 23:10:48: InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7E09DB18
> 23:10:48: SmmInstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7FFEB2C0
> 23:10:48: Loading SMM driver at 0x0007FEA9000 EntryPoint=0x0007FEB8361 VariableSmm.efi
> 23:10:48: mSmmMemLibInternalMaximumSupportAddress = 0xFFFFFFFFF
> 23:10:48: VarCheckLibRegisterSetVariableCheckHandler - 0x7FEB66FC Success
> 23:10:48: ASSERT /builddir/build/BUILD/edk2-89910a39dcfd/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c(3870): VariableStore->Size == VariableStoreLength
> 23:24:24: (Process terminated with status 0)

This looks to be the same issue than bug 1678713, QEMU doesn't reset the pflash model, the flash ends corrupted.

Comment 4 Laszlo Ersek 2019-04-30 11:01:27 UTC
Phil is exactly right in comment 3.

(

Regarding the message

> Error: Image at 0007EA6F000 start failed: Unsupported

that is expected behavior. Here's the wider context:

> 2019-04-29 23:01:28: Loading driver at 0x0007EA6F000 EntryPoint=0x0007EA70303 CpuS3DataDxe.efi
> [...]
> 2019-04-29 23:01:28: Error: Image at 0007EA6F000 start failed: Unsupported

The "UefiCpuPkg/CpuS3DataDxe" driver exits with EFI_UNSUPPORTED because PcdAcpiS3Enable is FALSE.

And PcdAcpiS3Enable is FALSE because in RHEL[78]'s qemu-kvm[-rhev], we disable S3 support by default. (Otherwise, "OvmfPkg/PlatformPei" would set the PCD to TRUE.)

)

Phil, would you please go ahead and close this BZ as a duplicate of bug 1678713? (I could do that myself of course, but you analyzed this BZ first, so you should get to close it too :))

Thanks!

Comment 5 Philippe Mathieu-Daudé 2019-04-30 13:59:22 UTC
There is already a BZ for this issue, closing as duplicate.

*** This bug has been marked as a duplicate of bug 1678713 ***

Comment 6 Xueqiang Wei 2019-05-05 04:14:09 UTC
for slow train and fast train, just one bug (bug 1678713) track is OK? bug 1678713 is for fast train, this bug is for slow train.

Comment 7 Laszlo Ersek 2019-05-06 11:56:00 UTC
Xueqiang Wei,

good point. I suggest cloning 1678713 for the slow train. (I think it's better to create an explicit clone than to reopen & re-classify this bug; it's better to have a link between the two instances of the same problem.)

Thanks.

Comment 9 Philippe Mathieu-Daudé 2019-07-31 17:51:35 UTC
(In reply to Xueqiang Wei from comment #6)
> for slow train and fast train, just one bug (bug 1678713) track is OK? bug
> 1678713 is for fast train, this bug is for slow train.

Xueqiang Wei cloned the equivalent for slow train as bug 1707192.