Bug 1658208

Summary: [RHEL.8 Slow Train] Qemu quit after many times' guest reboot / system_reset
Product: Red Hat Enterprise Linux 8 Reporter: CongLi <coli>
Component: qemu-kvmAssignee: John Ferlan <jferlan>
Status: CLOSED WORKSFORME QA Contact: CongLi <coli>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: areis, chayang, juzhang, michen, ngu, qzhang, rbalakri, ribarry, virt-maint
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-08 02:50:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
avocado log of reboot failed case
none
avocado log of system_reset failed case none

Description CongLi 2018-12-11 14:06:20 UTC
Description of problem:
Qemu quit after many times' guest reboot / system_reset.

Version-Release number of selected component (if applicable):
qemu-kvm-core-2.12.0-45.module+el8+2313+d65431a0.x86_64
host and guest compose: RHEL-8.0-20181204.0

How reproducible:
2 / 100 (there is no stable reproducer yet)

Steps to Reproduce:
1. Boot up a RHEL.8 guest
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-6,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \

2. Login guest, execute 'reboot' command via shell in guest.

3. After guest boot up, login guest again and do 'reboot'.

4. Repeat step 2 and 3 for many times, qemu quit.

Or:

2. During guest boot up, send 'system_reset' command.

3. After 10 seconds, send 'system_reset' command again.

4. Repeat step 3 for many times, qemu quit.


Actual results:
qemu quit after many times' guest reboot / system_reset.

Expected results:
qemu should not quit and works well.

Additional info:
1. QE met this bug twice via avocado, from the guest serial log, each time qemu quit at 'Started Forward Password Requests to Plymouth Directory Watch', not sure if it's plymouth related or a coincidence.

serial log 1:
2018-12-07 03:30:54: [[0;32m  OK  [0m] Started Show Plymouth Boot Screen.
2018-12-07 03:30:54: [[0;32m  OK  [0m] Started Forward Password Requests to Plymouth Directory Watch.
2018-12-07 03:30:54: [[0;32m  OK  [0m] Reached target Paths.
2018-12-07 03:31:53: Ncat: Broken pipe.
2018-12-07 03:31:53: (Process terminated with status 1)

serial log 2:
2018-12-07 03:12:13: [[0;32m  OK  [0m] Started Show Plymouth Boot Screen.
2018-12-07 03:12:13: [[0;32m  OK  [0m] Reached target Paths.
2018-12-07 03:12:13: [[0;32m  OK  [0m] Started Forward Password Requests to Plymouth Directory Watch.
2018-12-07 03:18:38: Ncat: Broken pipe.
2018-12-07 03:18:38: (Process terminated with status 1)

2. host info:
processor	: 23
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron(tm) Processor 6344
stepping	: 0
microcode	: 0x6000852
cpu MHz		: 1396.317
cache size	: 2048 KB
physical id	: 1
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 75
initial apicid	: 43
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs		: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 5186.40
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

3. Full QEMU CML:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device pcie-root-port,id=pcie_root_port_1,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device pcie-root-port,id=pcie_root_port_2,slot=4,chassis=4,addr=0x4,bus=pcie.0  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_xb5k6wz2/monitor-qmpmonitor1-20181207-030133-mElL9NQH,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_xb5k6wz2/monitor-catch_monitor-20181207-030133-mElL9NQH,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idYbSupX  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_xb5k6wz2/serial-serial0-20181207-030133-mElL9NQH,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20181207-030133-mElL9NQH,path=/var/tmp/avocado_xb5k6wz2/seabios-20181207-030133-mElL9NQH,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20181207-030133-mElL9NQH,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-5,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-6,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:96:97:98:99:9a,id=idox6lW5,vectors=4,netdev=idQPChO3,bus=pcie.0-root-port-7,addr=0x0  \
    -netdev tap,id=idQPChO3,vhost=on,vhostfd=21,fd=9 \
    -m 15360  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'Opteron_G5',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm

Comment 2 CongLi 2018-12-11 14:23:35 UTC
Created attachment 1513400 [details]
avocado log of reboot failed case

Comment 3 CongLi 2018-12-11 14:24:31 UTC
Created attachment 1513401 [details]
avocado log of system_reset failed case

Comment 5 Ademar Reis 2019-04-05 18:48:10 UTC
Was this one tested on rhel8-av? Given it's so hard to reproduce, have you seen any recent occurrencies?

Comment 6 CongLi 2019-04-08 02:50:04 UTC
Not met this issue in recent testing, I will close this bug at first, will reopen it if it happens again.

Thanks.