Bug 1514352

Summary: [RHEL-ALT][s390x] qemu process terminated after rebooting the guest
Product: Red Hat Enterprise Linux 7 Reporter: Zhengtong <zhengtli>
Component: qemu-kvm-maAssignee: Thomas Huth <thuth>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.5-AltCC: cohuck, david, michen, mtessun, qzhang, thuth
Target Milestone: rc   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-ma-2.10.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 14:55:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhengtong 2017-11-17 08:20:28 UTC
Description of problem:
Guest reboot failed. instead, the qemu process terminated

Version-Release number of selected component (if applicable):
Host kernel: 4.14.0-0.rc8.1.el7a.s390x
Guest kernel: 4.14.0-1.el7a.s390x
qemu-kvm-ma-2.10.0-6.el7

How reproducible:
2/2

Steps to Reproduce:
1. Boot a normal s390x guest
2. In guest console , input command "reboot"
[root@localhost ~]# reboot

3.check the process in host

Actual results:
Guest reboot failed. qemu process terminated

Expected results:
Guest could be successfully rebooted.

Additional info:
Guest boot command:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine s390-ccw-virtio  \
    -nodefaults  \
    -vga none  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_ut7GP4/monitor-qmpmonitor1-20171115-020901-7Fni7fsG,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_ut7GP4/monitor-catch_monitor-20171115-020901-7Fni7fsG,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_ut7GP4/serial-serial0-20171115-020901-7Fni7fsG,server,nowait \
    -device sclpconsole,chardev=serial_id_serial0 \
    -device virtio-scsi-ccw,id=virtio_scsi_ccw0 \
    -drive file=/home/zhengtli/root_dir/ALT-Server-7.4-s390x-virtio-scsi.qcow2,id=drive0,if=none,format=qcow2 \
    -device scsi-hd,bus=virtio_scsi_ccw0.0,drive=drive0,id=hd1 \
    -netdev tap,id=hostnet0 \
    -device virtio-net-ccw,netdev=hostnet0,id=net0 \
    -m 1024  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=1  \
    -cpu 'host'  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot strict=on \
    -monitor stdio \

Comment 2 Thomas Huth 2017-11-17 08:45:20 UTC
Hi Zhengtong, could you please provide the console output after running "reboot" in the guest? Thanks!

Comment 3 Zhengtong 2017-11-17 09:01:42 UTC
Sure

--------------------------------------------------------------------
[root@localhost ~]# reboot
reboot
[   17.027338] audit: type=1131 audit(1510906447.128:161): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.027345] audit: type=1131 audit(1510906447.128:162): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rhel-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.041129] audit: type=1131 audit(1510906447.148:163): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rsyslog comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.041134] audit: type=1131 audit(1510906447.148:164): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=accounts-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.043546] audit: type=1131 audit(1510906447.148:165): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-logind comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.045854] audit: type=1400 audit(1510906447.148:166): avc:  denied  { getattr } for  pid=1628 comm="plymouthd" path="/dev/ttyS0" dev="devtmpfs" ino=19353 scontext=system_u:system_r:plymouthd_t:s0 tcontext=system_u:object_r:device_t:s0 tclass=file permissive=0
[   17.045858] audit: type=1300 audit(1510906447.148:166): arch=80000016 syscall=106 success=no exit=-13 a0=12f291c1e a1=3ffd16fdff0 a2=3ffd16fdff0 a3=12f286702 items=0 ppid=1611 pid=1628 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="plymouthd" exe="/usr/sbin/plymouthd" subj=system_u:system_r:plymouthd_t:s0 key=(null)
[   17.045890] audit: type=1327 audit(1510906447.148:166): proctitle=2F7573722F7362696E2F706C796D6F75746864002D2D6D6F64653D73687574646F776E002D2D6174746163682D746F2D73657373696F6E
[   17.055639] audit: type=1131 audit(1510906447.158:167): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-pvscan@8:2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   17.061423] audit: type=1400 audit(1510906447.168:168): avc:  denied  { read write } for  pid=1628 comm="plymouthd" name="ttyS0" dev="devtmpfs" ino=19353 scontext=system_u:system_r:plymouthd_t:s0 tcontext=system_u:object_r:device_t:s0 tclass=file permissive=0
[  OK  ] Started Show Plymouth Reboot Screen.
[  OK  ] Stopped LSB: Configure s390 dump feature.
[  OK  ] Stopped LSB: Starts the Spacewalk Daemon.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped Postfix Mail Transport Agent.
[  OK  ] Stopped Availability of block devices.
         Stopping Logout off all iSCSI sessions on shutdown...
         Stopping LVM2 metadata daemon...
[  OK  ] Stopped LVM2 metadata daemon.
[  OK  ] Stopped Logout off all iSCSI sessions on shutdown.
[  OK  ] Stopped Remount Root and Kernel File Systems.
         Stopping Remount Root and Kernel File Systems...
[  OK  ] Stopped Dynamic System Tuning Daemon.
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Apply Kernel Variables...
[  OK  ] Stopped target Network.
         Stopping LSB: Bring up/down networking...
[  OK  ] Started Restore /run/initramfs.
[  OK  ] Stopped LSB: Bring up/down networking.
[  OK  ] Stopped Network Manager Wait Online.
         Stopping Network Manager Wait Online...
         Stopping Network Manager...
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped target Network (Pre).
         Stopping firewalld - dynamic firewall daemon...
[   19.109200] Ebtables v2.0 unregistered
[  OK  ] Stopped firewalld - dynamic firewall daemon.
         Stopping D-Bus System Message Bus...
         Stopping Authorization Manager...
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped Authorization Manager.
[  OK  ] Reached target Shutdown.
[   19.390972] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[   19.393921] systemd-journald[687]: Received SIGTERM from PID 1 (systemd-shutdow).
[   19.407710] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[   19.409248] systemd-shutdown[1]: Unmounting file systems.
[   19.411965] systemd-shutdown[1]: Unmounting /boot.
[   19.415606] XFS (sda1): Unmounting Filesystem
[   19.442410] systemd-shutdown[1]: All filesystems unmounted.
[   19.442415] systemd-shutdown[1]: Deactivating swaps.
[   19.442465] systemd-shutdown[1]: All swaps deactivated.
[   19.442467] systemd-shutdown[1]: Detaching loop devices.
[   19.442785] systemd-shutdown[1]: All loop devices detached.
[   19.442790] systemd-shutdown[1]: Detaching DM devices.
[   19.481926] /shutdown: 7 output lines suppressed due to ratelimiting
[   19.504389] dracut: Taking over mdmon processes.
[   19.504466] dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
[   19.541520] XFS (dm-0): Unmounting Filesystem
[   19.543427] dracut Warning: Unmounted /oldroot.
[   19.561068] dracut: Disassembling device-mapper devices
[   19.605776] dracut: Waiting for mdraid devices to be clean.
[   19.606829] dracut: Disassembling mdraid devices.
Rebooting.
LOADPARM=[........]
Using virtio-scsi.
target: 0x0000000000000000

! SCSI cannot report LUNs: STATUS=02 RSPN=70 KEY=05 CODE=25 QLFR=00, sure !

Comment 4 Thomas Huth 2017-11-17 09:21:54 UTC
Thanks! After upgrading my qemu-kvm-ma from version 2.10.0-5.el7 to 2.10.0-6.el7, I was finally able to reproduce this issue (it did not happen with -5 !), even with this slightly simplified command line:

sudo /usr/libexec/qemu-kvm -nographic -device virtio-scsi-ccw,id=virtio_scsi_ccw0 -drive file=/var/lib/libvirt/images/thuth.qcow2,id=drive0,if=none,format=qcow2 -device scsi-hd,bus=virtio_scsi_ccw0.0,drive=drive0,id=hd1 -m 1024

In the console log, I also saw this suspicious messages:

! SCSI cannot report LUNs: STATUS=02 RSPN=70 KEY=05 CODE=25 QLFR=00, sure !

... I'll try to have a closer look ...

Comment 5 Thomas Huth 2017-11-17 09:24:51 UTC
Seems like it also does not reproduce 100% ... sometimes I've got to reboot three times here 'till it triggers.

Comment 6 Thomas Huth 2017-11-17 10:38:53 UTC
FWIW, the bug seems to reproduce with "-m 1024", but not with "-m 512" ... so we likely have some weird kind of memory corruption in the s390x-ccw-bios here...

Comment 7 Thomas Huth 2017-11-18 08:05:32 UTC
I've now suggested a patch upstream:
http://marc.info/?i=1510942228-22822-1-git-send-email-thuth@redhat.com

Comment 8 Thomas Huth 2017-11-21 20:40:43 UTC
Patch has been accepted upstream:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8775d91a0f42d016833

Comment 10 Miroslav Rezanina 2017-11-27 14:47:05 UTC
Fix included in qemu-kvm-ma-2.10.0-8.el7

Comment 12 Zhengtong 2017-11-30 02:49:03 UTC
Tested with the build : qemu-kvm-ma-2.10.0-9.el7

With the same booting command line in comment #0 . reboot for 10 times. qemu process no terminated any more. and the guest works well. 

verified

Comment 16 errata-xmlrpc 2018-04-10 14:55:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0831