Bug 1447258

Summary: Fail to create internal snapshot with data plane enable
Product: Red Hat Enterprise Linux 7 Reporter: aihua liang <aliang>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chayang, coli, hhuang, juzhang, michen, mrezanin, qzhang, stefanha, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:16:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473046    

Description aihua liang 2017-05-02 09:01:49 UTC
Description of problem:
 Fail to create internal snapshot with data plane enable

Version-Release number of selected component (if applicable):
 kernel version:3.10.0-655.el7.x86_64
 qemu-kvm-rhev version:qemu-kvm-rhev-2.9.0-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start guest with qemu cmds:
   /usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc \
-nodefaults  \
-vga std  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20170124-161452-WcepYpO8,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20170124-161452-WcepYpO8,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idqap0h5  \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20170124-161452-WcepYpO8,server,nowait \
-device isa-serial,chardev=serial_id_serial0  \
-chardev socket,id=seabioslog_id_20170124-161452-WcepYpO8,path=/var/tmp/seabios-20170124-161452-WcepYpO8,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20170124-161452-WcepYpO8,iobase=0x402 \
-device virtio-net-pci,mac=9a:b2:b3:b4:b5:b6,id=iduCv1Ln,vectors=4,netdev=idKgexFk,bus=pci.0,addr=05  \
-netdev tap,id=idKgexFk,vhost=on \
-m 4096  \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
-cpu host \
-vnc :1  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,menu=on,strict=off \
-enable-kvm \
-monitor stdio \
-object iothread,id=iothread0 \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/rhel74-64-virtio.qcow2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=03,iothread=iothread0 \
-device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0,lun=0 \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=04,iothread=iothread1 \
-drive id=data_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/t1.qcow2,werror=stop,rerror=stop \
-device scsi-hd,id=data1,drive=data_image1,bus=scsi1.0,lun=0 \

2.Create internal snapshot
  (qemu)savevm     ----wait 1 hour 

Actual results:
  After step2, vm no response.

Expected results:
  Internal snapshot creation success.

Additional info:
  Create internal snapshot success without data plane.
  It's a regression bug, same with 1273689.

Comment 4 Stefan Hajnoczi 2017-05-17 15:54:08 UTC
Here is the backtrace for reference:

$ qemu-system-x86_64 -M accel=kvm -m 1024 -cpu host -drive if=none,id=drive0,file=test.qcow2,aio=native,cache.direct=on,format=qcow2 -object iothread,id=iothread0 -device virtio-scsi-pci,id=scsi0,iothread=iothread0 -device scsi-hd,drive=drive0

Thread 7 (Thread 0x7f73fffff700 (LWP 22269)):
#0  0x00007f7429692460 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000561f98458f35 in qemu_cond_wait (cond=0x561f99a40ac0, mutex=mutex@entry=0x561f98ab9460 <qemu_global_mutex>) at util/qemu-thread-posix.c:148
#2  0x0000561f980d5eaf in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/stefanha/qemu/cpus.c:1086
#3  0x0000561f980d5eaf in qemu_kvm_cpu_thread_fn (arg=0x561f99a28dc0) at /home/stefanha/qemu/cpus.c:1124
#4  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#5  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f7415ee4700 (LWP 22267)):
#0  0x00007f742969538d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f742968eeca in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x0000561f98458be0 in qemu_mutex_lock (mutex=0x561f999e4a30) at util/qemu-thread-posix.c:61
#3  0x0000561f98453f59 in aio_context_acquire (ctx=<optimized out>) at util/async.c:489
#4  0x0000561f983d8eb1 in qemu_laio_process_completions_and_submit (s=0x7f7400000d50) at block/linux-aio.c:238
#5  0x0000561f98455ebc in aio_dispatch_handlers (ctx=ctx@entry=0x561f999e49d0) at util/aio-posix.c:399
#6  0x0000561f98456916 in aio_poll (ctx=0x561f999e49d0, blocking=blocking@entry=true) at util/aio-posix.c:685
#7  0x0000561f981c419e in iothread_run (opaque=0x561f999e4410) at iothread.c:59
#8  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#9  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f7416805700 (LWP 22266)):
#0  0x00007f74243e101d in poll () at /lib64/libc.so.6
#1  0x00007f742747e166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007f742747e4f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007f7427a62a76 in gdbus_shared_thread_func () at /lib64/libgio-2.0.so.0
#4  0x00007f74274a5b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f7417006700 (LWP 22265)):
#0  0x00007f74243e101d in poll () at /lib64/libc.so.6
#1  0x00007f742747e166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007f742747e27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007f742747e2c1 in glib_worker_main () at /lib64/libglib-2.0.so.0
#4  0x00007f74274a5b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f7417807700 (LWP 22264)):
#0  0x00007f74243e101d in poll () at /lib64/libc.so.6
#1  0x00007f742747e166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007f742747e27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007f741780efad in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so
#4  0x00007f74274a5b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f7418215700 (LWP 22263)):
#0  0x00007f74243e6bf9 in syscall () at /lib64/libc.so.6
#1  0x0000561f98459328 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/stefanha/qemu/include/qemu/futex.h:26
#2  0x0000561f98459328 in qemu_event_wait (ev=ev@entry=0x561f98ef1840 <rcu_gp_event>) at util/qemu-thread-posix.c:415
#3  0x0000561f98468b0b in wait_for_readers () at util/rcu.c:131
#4  0x0000561f98468b0b in synchronize_rcu () at util/rcu.c:162
#5  0x0000561f98468dc2 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:256
#6  0x00007f742968c6ca in start_thread () at /lib64/libpthread.so.0
#7  0x00007f74243ecf7f in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f7440c3ff80 (LWP 22262)):
#0  0x00007f74243e10e1 in ppoll () at /lib64/libc.so.6
#1  0x0000561f98454c5b in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x0000561f98454c5b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x0000561f98456891 in aio_poll (ctx=0x561f999e49d0, blocking=blocking@entry=true) at util/aio-posix.c:622
#4  0x0000561f983dee35 in bdrv_rw_vmstate (is_read=false, pos=0, qiov=0x7fffcdd8ad30, bs=0x561f999f7020) at block/io.c:2028
#5  0x0000561f983dee35 in bdrv_writev_vmstate (bs=bs@entry=0x561f999f7020, qiov=qiov@entry=0x7fffcdd8ad30, pos=pos@entry=0) at block/io.c:2056
#6  0x0000561f98100494 in block_writev_buffer (opaque=0x561f999f7020, iov=<optimized out>, iovcnt=<optimized out>, pos=0) at /home/stefanha/qemu/migration/savevm.c:152
#7  0x0000561f98334e23 in qemu_fflush (f=0x561f9a883610) at migration/qemu-file.c:189
#8  0x0000561f9833504e in add_to_iovec (f=<optimized out>, buf=<optimized out>, size=size@entry=4096, may_free=<optimized out>) at migration/qemu-file.c:363
#9  0x0000561f9833540c in qemu_put_buffer_async (f=<optimized out>, buf=buf@entry=0x7f73b7e77000 "\306\313\356g\341k\300\031`\325_\005>\a\230\214\324Nh\301oީ{\027WN#\314\031\bt\300\235\250\202\371\313()Fl݅B\257K\277\346\331\316\337\351R\177\066\273\243˙%\340=Y\344\n\033\231\276Kל\325\313}F\256H\262\070\034ֽ>b\034\332\aňc\bҀ(\306\026\002\206\071\237t\267\v3\257\060l\f7\370\032\300\270\344\252.w\022[ぞ\244ZU\333&\351\005\217\005x\373\n\001\261Di\222\262\351(\223B-I\357\377y\311\r\305\025XSc\332\366Dr\245\210+\354\204\354\023\067\216A\016\006\343>\330\346n}\200x\340\247]\272\253/\a\335\017\006\064ũ\317@V\314o\221"..., size=size@entry=4096, may_free=<optimized out>) at migration/qemu-file.c:375
#10 0x0000561f980fdff4 in ram_save_page (last_stage=<optimized out>, pss=<synthetic pointer>, rs=0x561f98abf220 <ram_state>) at /home/stefanha/qemu/migration/ram.c:840
#11 0x0000561f980fdff4 in ram_save_target_page (last_stage=<optimized out>, pss=<synthetic pointer>, rs=0x561f98abf220 <ram_state>) at /home/stefanha/qemu/migration/ram.c:1286
#12 0x0000561f980fdff4 in ram_save_host_page (last_stage=<optimized out>, pss=<synthetic pointer>, rs=0x561f98abf220 <ram_state>) at /home/stefanha/qemu/migration/ram.c:1324
#13 0x0000561f980fdff4 in ram_find_and_save_block (rs=rs@entry=0x561f98abf220 <ram_state>, last_stage=last_stage@entry=false) at /home/stefanha/qemu/migration/ram.c:1381
#14 0x0000561f980fea13 in ram_find_and_save_block (last_stage=false, rs=0x561f98abf220 <ram_state>) at /home/stefanha/qemu/migration/ram.c:1359
#15 0x0000561f980fea13 in ram_save_iterate (f=0x561f9a883610, opaque=0x561f98abf220 <ram_state>) at /home/stefanha/qemu/migration/ram.c:2014
#16 0x0000561f9810352c in qemu_savevm_state_iterate (f=f@entry=0x561f9a883610, postcopy=postcopy@entry=false) at /home/stefanha/qemu/migration/savevm.c:1040
#17 0x0000561f98104532 in qemu_savevm_state (errp=0x7fffcdd8afc8, f=0x561f9a883610) at /home/stefanha/qemu/migration/savevm.c:1254
#18 0x0000561f98104532 in save_vmstate (name=<optimized out>) at /home/stefanha/qemu/migration/savevm.c:2148
#19 0x0000561f980dd55e in handle_hmp_command (mon=mon@entry=0x561f99a02d40, cmdline=<optimized out>) at /home/stefanha/qemu/monitor.c:3101
#20 0x0000561f980dd7c7 in monitor_command_cb (opaque=0x561f99a02d40, cmdline=<optimized out>, readline_opaque=<optimized out>) at /home/stefanha/qemu/monitor.c:3898
#21 0x0000561f984681e5 in readline_handle_byte (rs=0x561f99a26250, ch=<optimized out>) at util/readline.c:393
#22 0x0000561f980dd757 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/stefanha/qemu/monitor.c:3881
#23 0x0000561f98388b6a in gd_vc_in (terminal=<optimized out>, text=0x561f999b2aa0 "\r", size=1, user_data=0x561f9a882a90) at ui/gtk.c:1801
#24 0x00007f74277563e5 in g_closure_invoke () at /lib64/libgobject-2.0.so.0
#25 0x00007f7427768432 in signal_emit_unlocked_R () at /lib64/libgobject-2.0.so.0
#26 0x00007f742777105f in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#27 0x00007f74277718eb in g_signal_emit_by_name () at /lib64/libgobject-2.0.so.0
#28 0x00007f74293fbca7 in vte_terminal_emit_commit () at /lib64/libvte2_90.so.9
#29 0x00007f7429407224 in vte_terminal_send.constprop () at /lib64/libvte2_90.so.9
#30 0x00007f742940cac2 in vte_terminal_key_press () at /lib64/libvte2_90.so.9
#31 0x00007f7428cfaf71 in _gtk_marshal_BOOLEAN__BOXEDv () at /lib64/libgtk-3.so.0
#32 0x00007f7427756562 in _g_closure_invoke_va () at /lib64/libgobject-2.0.so.0
#33 0x00007f74277708b3 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#34 0x00007f742777143f in g_signal_emit () at /lib64/libgobject-2.0.so.0
#35 0x00007f7428e4820c in gtk_widget_event_internal () at /lib64/libgtk-3.so.0
#36 0x00007f7428e68b5b in gtk_window_propagate_key_event () at /lib64/libgtk-3.so.0
#37 0x00007f7428e6c6ab in gtk_window_key_press_event () at /lib64/libgtk-3.so.0
#38 0x00007f7428cfaf71 in _gtk_marshal_BOOLEAN__BOXEDv () at /lib64/libgtk-3.so.0
#39 0x00007f7427756614 in _g_closure_invoke_va () at /lib64/libgobject-2.0.so.0
#40 0x00007f74277708b3 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#41 0x00007f742777143f in g_signal_emit () at /lib64/libgobject-2.0.so.0
#42 0x00007f7428e4820c in gtk_widget_event_internal () at /lib64/libgtk-3.so.0
#43 0x00007f7428cf8039 in propagate_event () at /lib64/libgtk-3.so.0
#44 0x00007f7428cf9fbe in gtk_main_do_event () at /lib64/libgtk-3.so.0
#45 0x00007f74288105c5 in _gdk_event_emit () at /lib64/libgdk-3.so.0
#46 0x00007f742886c172 in gdk_event_source_dispatch () at /lib64/libgdk-3.so.0
#47 0x00007f742747de52 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#48 0x0000561f984559e3 in glib_pollfds_poll () at util/main-loop.c:213
#49 0x0000561f984559e3 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#50 0x0000561f984559e3 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
#51 0x0000561f98095c77 in main_loop () at vl.c:1899
#52 0x0000561f98095c77 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4729

Comment 5 Stefan Hajnoczi 2017-05-17 17:10:04 UTC
Patches posted upstream "[PATCH 0/3] block: fix 'savevm' hang with -object iothread".

Comment 8 Stefan Hajnoczi 2017-06-28 12:03:42 UTC
The following upstream commits have been merged in qemu.git/master and will be in RHEL 7.5 via rebase:
1575829 migration: hold AioContext lock for loadvm qemu_fclose()
8649f2f migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
17e2a4a migration: avoid recursive AioContext locking in save_vmstate()
ea17c9d block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
dc88a46 block: count bdrv_co_rw_vmstate() requests

Comment 10 aihua liang 2017-10-09 12:30:48 UTC
Verified, the problem has been resolved, so set its status to "Verified".

Kernel version: 3.10.0-718.el7.x86_64
qemu-kvm-rhev version:qemu-kvm-rhev-2.10.0-1.el7.x86_64

Test Steps:
 1.Start guest with qemu cmds:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc \
-nodefaults  \
-vga std  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20170124-161452-WcepYpO8,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=03,iothread=iothread0 \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0,lun=0 \
-device virtio-net-pci,mac=9a:b2:b3:b4:b5:b6,id=iduCv1Ln,vectors=4,netdev=idKgexFk,bus=pci.0,addr=05  \
-netdev tap,id=idKgexFk,vhost=on \
-m 4096  \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
-cpu host \
-vnc :1  \
-rtc base=localtime,clock=host,driftfix=slew  \
-enable-kvm \
-monitor stdio \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=04,iothread=iothread1 \
-drive id=data_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/t1.qcow2,serial=TARGET_DISK0 \
-device scsi-hd,id=data1,drive=data_image1,bus=scsi1.0,lun=0 \

 2.Create internal snapshot
  (qemu)stop
  (qemu)savevm test.save

 3.Check snapshot
   (qemu)info snapshots

 4.Quit vm, check img info
   #qemu-img info /home/kvm_autotest_root/rhel74-64-virtio.qcow2

 5.Start vm, reload snapshot
   (qemu)loadvm test.save

Test result:
  Save snapshot and load snapshot successfully.

Comment 11 aihua liang 2018-02-06 03:30:49 UTC
Verified in kernel:3.10.0-844.el7.x86_64 + qemu-kvm-rhev:qemu-kvm-rhev-2.10.0-19.el7.x86_64, don't hit this issue.

Comment 13 errata-xmlrpc 2018-04-11 00:16:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104