Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 684076

Summary: Segfault occurred during migration
Product: Red Hat Enterprise Linux 6 Reporter: Amos Kong <akong>
Component: qemu-kvmAssignee: Michael S. Tsirkin <mst>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: ailan, ehabkost, jasowang, lcapitulino, mkenneth, tburke, virt-maint
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.151.el6 Doc Type: Bug Fix
Doc Text:
Cause: bug on address converstion in the vhost migration dirty page handling code. Consequence: qemu-kvm Segmentation fault during live migration of KVM VMs. Fix: corrected address convertion in the vhost migration dirty page handling code. Result: qemu-kvm doesn't crash during live migration anymore.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 11:27:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580951    
Attachments:
Description Flags
debugging patch
none
Debug msg after loading mst's patch
none
extra debugging patch: apply on top
none
debug msg after loading second patch
none
another debug, pls apply on top of the previous ones
none
fix build
none
debug-msg-after-loading-3rd-patch.txt
none
first proposed patch none

Description Amos Kong 2011-03-11 02:42:55 UTC
Description of problem:
I try to do internal migration, the migrations could not completed during 600 seconds. Sometimes, segfault occurred.
vhost is enabled.


Program terminated with signal 11, Segmentation fault.
#0  cpu_physical_memory_set_dirty (dev=<value optimized out>, mfirst=<value optimized out>, mlast=<value optimized out>, rfirst=<value optimized out>, rlast=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/cpu-all.h:909
909         if (!cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
...
(gdb) bt
#0  cpu_physical_memory_set_dirty (dev=<value optimized out>, mfirst=<value optimized out>, mlast=<value optimized out>, rfirst=<value optimized out>, rlast=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/cpu-all.h:909
#1  vhost_dev_sync_region (dev=<value optimized out>, mfirst=<value optimized out>, mlast=<value optimized out>, rfirst=<value optimized out>, rlast=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vhost.c:53
#2  0x0000000000425681 in vhost_client_sync_dirty_bitmap (client=0x10e0df0, start_addr=0, end_addr=18446744073709551615) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/vhost.c:71
#3  0x00000000004eb62d in cpu_notify_sync_dirty_bitmap (start_addr=0, end_addr=18446744073709551615) at /usr/src/debug/qemu-kvm-0.12.1.2/exec.c:1651
#4  cpu_physical_sync_dirty_bitmap (start_addr=0, end_addr=18446744073709551615) at /usr/src/debug/qemu-kvm-0.12.1.2/exec.c:2020
#5  0x000000000040ac1a in ram_save_live (mon=<value optimized out>, f=0x3ac6070, stage=2, opaque=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3387
#6  0x00000000004c0949 in qemu_savevm_state_iterate (mon=0x0, f=0x3ac6070) at savevm.c:1513
#7  0x00000000004b9afd in migrate_fd_put_ready (opaque=0x118af60) at migration.c:392
#8  0x00000000004b9722 in buffered_put_buffer (opaque=0x138df50, buf=0x0, pos=0, size=0) at buffered_file.c:163
#9  0x000000000040bb56 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4433
#10 0x000000000042b44a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2164
#11 0x000000000040eeb5 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4638
#12 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6852


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.150.el6.x86_64
host kernel: 2.6.32-120.el6.x86_64

How reproducible:
not always

Steps to Reproduce:
1. launch src vm
2. launch dst vm
3. execute migrate cmd in monitor of src vm
   tcp, unix
  
Actual results:
1. migration could not completed during 600 seconds.
2. sometimes, segfault occurred. 

Expected results:
migration completed.

Additional info:

1. src VM cmdline:
# qemu-kvm -name 'vm1' -chardev socket,id=human_monitor_A0Ll,path=/tmp/monitor-humanmonitor1-20110310-212223-498e,server,nowait -mon chardev=human_monitor_A0Ll,mode=readline -chardev socket,id=serial_2GGr,path=/tmp/serial-20110310-212223-498e,server,nowait -device isa-serial,chardev=serial_2GGr -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=id4hU3V4,mac=9a:24:36:f4:7f:d4,netdev=id4hU3V4,id=ndev00id4hU3V4,bus=pci.0,addr=0x3 -netdev tap,id=id4hU3V4,vhost=on,ifname='t0-212223-498e',script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 8192 -smp 4,cores=1,threads=1,sockets=4 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none  -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm 


2. host info:
cpu-core: 12
memory: 31G 

processor	: 11
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 8
model name	: Six-Core AMD Opteron(tm) Processor 2427
stepping	: 0
cpu MHz		: 800.000
cache size	: 512 KB
physical id	: 1
siblings	: 6
core id		: 5
cpu cores	: 6
apicid		: 13
initial apicid	: 13
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips	: 4399.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 5 Michael S. Tsirkin 2011-03-14 15:35:50 UTC
Created attachment 484225 [details]
debugging patch

Comment 10 Amos Kong 2011-03-15 05:05:17 UTC
Created attachment 484351 [details]
Debug msg after loading mst's patch

Comment 12 Michael S. Tsirkin 2011-03-15 07:38:50 UTC
Created attachment 484385 [details]
extra debugging patch: apply on top

Comment 13 Amos Kong 2011-03-15 07:53:29 UTC
Created attachment 484387 [details]
debug msg after loading second patch

Comment 14 Michael S. Tsirkin 2011-03-15 12:40:10 UTC
Created attachment 484453 [details]
another debug, pls apply on top of the previous ones

Comment 15 Michael S. Tsirkin 2011-03-15 12:45:46 UTC
Created attachment 484455 [details]
fix build

Comment 16 Amos Kong 2011-03-15 12:51:54 UTC
Created attachment 484457 [details]
debug-msg-after-loading-3rd-patch.txt

Comment 18 Michael S. Tsirkin 2011-03-16 10:35:44 UTC
Created attachment 485702 [details]
first proposed patch

Comment 19 Amos Kong 2011-03-16 10:56:53 UTC
(In reply to comment #18)
> Created attachment 485702 [details]
> first proposed patch

tested 8 time with this patch, they are all pass !

Comment 23 Amos Kong 2011-03-21 09:02:22 UTC
Migration tests are all passed with qemu-kvm-0.12.1.2-2.151.el6.
https://virtlab.englab.nay.redhat.com/job/29738/details/

Moving to VERIFIED.

Comment 25 Eduardo Habkost 2011-05-05 12:31:52 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: bug on address converstion in the vhost migration dirty page handling code.

Consequence: qemu-kvm Segmentation fault during live migration of KVM VMs.

Fix: corrected address convertion in the vhost migration dirty page handling code.

Result: qemu-kvm doesn't crash during live migration anymore.

Comment 26 errata-xmlrpc 2011-05-19 11:27:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 27 errata-xmlrpc 2011-05-19 13:02:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html