| Summary: | migration/postcopy: Serial hang after migration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Dr. David Alan Gilbert <dgilbert> |
| Component: | qemu-kvm | Assignee: | Dr. David Alan Gilbert <dgilbert> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | chayang, dgilbert, jinzhao, juzhang, qizhu, qzhang, rbalakri, virt-maint, xianwang, xiaohli |
| Version: | --- | Flags: | xiaohli:
needinfo?
(dgilbert) |
| Target Milestone: | pre-dev-freeze | ||
| Target Release: | 8.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-12-01 07:27:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Didn't reproduce this bz on rhel-8.4.0-av hosts(kernel-4.18.0-277.el8.x86_64&qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64): test steps: 1.Boot a guest with serial console: /usr/libexec/qemu-kvm \ -machine q35 \ -cpu EPYC \ -nodefaults \ -device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -netdev tap,id=tap0,vhost=on \ -m 4096,slots=1,maxmem=6G \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -enable-kvm \ -device VGA \ -vnc :10 \ -rtc base=localtime,clock=host \ -boot menu=off,strict=off,order=cdn,once=c \ -qmp tcp:0:3333,server,nowait \ -serial pipe:/home/test/guest-serial-35728 \ ----> serial command, and /home/test shares to dst host -monitor stdio \ 2.Connect serial console on src host: # cat /home/test/guest-serial-35728 3.Boot a guest with same clis but appends "-incoming defer" on dst host; 4.Connect serial console on dst host: # cat /home/test/guest-serial-35728 5.After guest starts, run stressapptest in guest(guest connected via remote-viewer): # stressapptest -M 80 -s 1000000 6.Enable postcopy capability on src and dst, start postcopy migration. Actual result: postcopy migration succeed, and the serial console on src host before migration, the serial console on dst host after migration work well, doesn't hit serial hang. I have tried twice above test, all pass. Dave, could you help check above tests. if no problem, could we close this bz as currentrelease since I couldn't reproduce it now? (In reply to Li Xiaohui from comment #10) > Didn't reproduce this bz on rhel-8.4.0-av > hosts(kernel-4.18.0-277.el8.x86_64&qemu-kvm-5.2.0-3.module+el8.4. > 0+9499+42e58f08.x86_64): > > > test steps: > 1.Boot a guest with serial console: > /usr/libexec/qemu-kvm \ > -machine q35 \ > -cpu EPYC \ > -nodefaults \ > -device > pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on, > addr=0x2 \ > -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \ > -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \ > -device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \ > -device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \ > -device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \ > -device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \ > -device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \ > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \ > -device > scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi- > id=0,lun=0,bootindex=0 \ > -device > virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 > \ > -blockdev > driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel840-64- > virtio-scsi.qcow2,node-name=drive_sys1 \ > -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ > -netdev tap,id=tap0,vhost=on \ > -m 4096,slots=1,maxmem=6G \ > -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ > -enable-kvm \ > -device VGA \ > -vnc :10 \ > -rtc base=localtime,clock=host \ > -boot menu=off,strict=off,order=cdn,once=c \ > -qmp tcp:0:3333,server,nowait \ > -serial pipe:/home/test/guest-serial-35728 \ ----> serial > command, and /home/test shares to dst host > -monitor stdio \ > 2.Connect serial console on src host: > # cat /home/test/guest-serial-35728 > 3.Boot a guest with same clis but appends "-incoming defer" on dst host; > 4.Connect serial console on dst host: > # cat /home/test/guest-serial-35728 > 5.After guest starts, run stressapptest in guest(guest connected via > remote-viewer): > # stressapptest -M 80 -s 1000000 > 6.Enable postcopy capability on src and dst, start postcopy migration. > > > Actual result: > postcopy migration succeed, and the serial console on src host before > migration, the serial console on dst host after migration work well, doesn't > hit serial hang. > > > I have tried twice above test, all pass. > Dave, could you help check above tests. if no problem, could we close this > bz as currentrelease since I couldn't reproduce it now? Try dropping the video connection, so you actually login/have the console on serial. Thank you(In reply to Dr. David Alan Gilbert from comment #11) > (In reply to Li Xiaohui from comment #10) > > Didn't reproduce this bz on rhel-8.4.0-av > > hosts(kernel-4.18.0-277.el8.x86_64&qemu-kvm-5.2.0-3.module+el8.4. > > 0+9499+42e58f08.x86_64): > > I have tried twice above test, all pass. > > Dave, could you help check above tests. if no problem, could we close this > > bz as currentrelease since I couldn't reproduce it now? > > Try dropping the video connection, so you actually login/have the console on > serial. Thank you Dave. Tried again with right cli and steps: 1.Boot a guest with '-nographic' and serial console: [root@hp-dl385g10-09 home]# sh 1.sh /usr/libexec/qemu-kvm \ -machine q35,accel=kvm \ -cpu EPYC \ -nographic \ -device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -netdev tap,id=tap0,vhost=on \ -m 4096,slots=1,maxmem=6G \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -qmp tcp:0:3333,server,nowait \ -device virtio-serial-pci,bus=root3 \ -chardev pipe,id=ch0,path=/home/test/guest-serial-35728 \ -device virtserialport,chardev=ch0,name=serial1 \ 2.After step 1, the session under 1 is console session, I could login guest and execute command stressapptest: Red Hat Enterprise Linux 8.4 Beta (Ootpa) Kernel 4.18.0-259.el8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket localhost login: root Password: Last login: Sat Jan 30 21:43:27 from 10.72.13.227 [root@localhost ~]# stressapptest -M 200 -s 1000000 2021/01/30-21:43:40(CST) Log: Commandline - stressapptest -M 200 -s 1000000 2021/01/30-21:43:40(CST) Stats: SAT revision 1.0.9_autoconf, 64 bit binary 2021/01/30-21:43:40(CST) Log: root @ localhost.localdomain on Sat Jan 30 21:35:47 CST 20e 2021/01/30-21:43:40(CST) Log: 1 nodes, 4 cpus. 2021/01/30-21:43:40(CST) Log: Defaulting to 4 copy threads 2021/01/30-21:43:40(CST) Log: Prefer plain malloc memory allocation. 2021/01/30-21:43:40(CST) Log: Using mmap() allocation at 0x7f1faf800000. 2021/01/30-21:43:40(CST) Stats: Starting SAT, 200M, 1000000 seconds 2021/01/30-21:43:40(CST) Log: region number 8 exceeds region count 1 2021/01/30-21:43:40(CST) Log: Region mask: 0x1 2021/01/30-21:43:50(CST) Log: Seconds remaining: 999990 2021/01/30-21:44:00(CST) Log: Seconds remaining: 999980 2021/01/30-21:44:10(CST) Log: Seconds remaining: 999970 3.Boot a guest on dst host with '-incoming defer': [root@hp-dl385g10-10 home]# sh 1.sh 4.Send qmp commands on src and dst qmp capabilities, then start postcopy migration. Actual result: the console on dst host works well, the log of stressapptest print correctly: [root@hp-dl385g10-10 home]# sh 1.sh 2021/01/30-21:45:40(CST) Log: Seconds remaining: 999880 2021/01/30-21:45:50(CST) Log: Seconds remaining: 999870 2021/01/30-21:46:00(CST) Log: Seconds remaining: 999860 2021/01/30-21:46:10(CST) Log: Seconds remaining: 999850 ... I didn't use the same qemu cli "-serial pipe:/home/test/guest-serial-35728"(but use another serial pipe command as step 1) since guest usually couldn't start successfully when used it. I would ask corresponding QE for help checking this question. And I think it doesn't effect the bz reproduction. So I would insist on closing this bz currentrelease if you have no other advise. Thank you. |
Description of problem: Processes writing to the serial console in the guests are stuck after a postcopy migration. I've got a stressapptest running in the (f20) guest and it's not giving any output. I can ssh into the guest and look at it's stack and it's stuck in tty output Version-Release number of selected component (if applicable): 2.6.0.27 (I can see it happens on upstream 2.5.1, 2.6.2 but seems to be fixed in 2.7.0) How reproducible: 100% Steps to Reproduce: 1. Start an f20 guest with a qemu command line like: -machine pc,accel=kvm -m 8192 -smp 6 -nographic -drive id=image,file=/home/vms/f20.qcow2,cache=none -serial pipe:/tmp/guest-serial-35728 -monitor pipe:/tmp/guest-hmp-35728 2. In the guest run stuff writing to the console - e.g. stressapptest 3. Postcopy migrate Actual results: VM is alive but anything writing to serial is dead Expected results: Serial output comes out. Additional info: