Bug 1274575
Summary: | vm qemu process crashed with spice-server assertion failure | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | shangxu <sllone> | ||||||||
Component: | spice-server | Assignee: | Victor Toso <victortoso> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | SPICE QE bug list <spice-qe-bugs> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 6.5 | CC: | cfergeau, dblechte, djasa, fziglio, mkenneth, qguo, qingyu.yang, rbalakri, rduda, rh-spice-bugs, sllone, tpelka, victortoso | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | spice-server-0.12.4-15.el6 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-03-21 09:20:17 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1269194 | ||||||||||
Attachments: |
|
Description
shangxu
2015-10-23 03:31:27 UTC
Created attachment 1085694 [details]
qemu log
Hi, not the first time seeing this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1172036 How much time is needed for the VM to shut of? Can you reproduce this using VNC? As you mentioned 'sound', does this not occur when sound is off? I have the same problem as 1172036. I can reproduce every time. This time I disable the sound card, problems still occur. which log do you need? (In reply to shangxu from comment #4) > I have the same problem as 1172036. > I can reproduce every time. > This time I disable the sound card, problems still occur. > which log do you need? I'm interested in how to reproduce this problem. rhbz#1172036 did not mention anything besides playing video with potplayer. I did that in different VM's for several days, without issue. Do you mean that it is important to change potplayer transparency and also keep moving it around in the desktop while playing the video in order to this bug happen? How long it usually takes to the crash? Minutes? Hours? Days? *** Bug 1172036 has been marked as a duplicate of this bug. *** A method of reproduce as I have written it, repeatedly change potplayer transparency , and then drag the window, and then open a few other applications. The most important thing is repeatedly change potplayer transparency. Sometimes there will be two minutes, sometimes 10 minutes, it seems, and the performance of the terminal has a relationship, I am in the redhat6.5 vmware virtual machine, through virtviewer access, will soon appear. Through my win7 PC access time, it appears slower. I also test redhat7.1(spice-server-12.4-9),but it does not appear, and I noticed when test in redhat7, /var/log /libvirt/qemu/win7.log no output ‘Application transferred too many scanlines’. Perhaps only when it occurs, it will appear BUG. I searched ‘...scanlines’, it was said jepg related, jepg version of my environment is libjpeg-turbo-1.2.1-3.el6_5.x86_64, redhat7 is libjpeg-turbo-1.2.90-5.el7.x86_64. (In reply to Victor Toso from comment #5) > (In reply to shangxu from comment #4) > > I have the same problem as 1172036. > > I can reproduce every time. > > This time I disable the sound card, problems still occur. > > which log do you need? > > I'm interested in how to reproduce this problem. rhbz#1172036 did not > mention anything besides playing video with potplayer. I did that in > different VM's for several days, without issue. > > Do you mean that it is important to change potplayer transparency and also > keep moving it around in the desktop while playing the video in order to > this bug happen? > > How long it usually takes to the crash? Minutes? Hours? Days? The following method can also reproduce this problem: 1. connected virtual machines through the windows virt-viewer 2. Use potplayer playing video, and then in the playlist, select another video, play video repeatedly switch 3. occasionally change the transparency of a video I change the version of spice-server and libjpeg, the problem still exists. Created attachment 1096766 [details] backtrace with steps The catch here is that spice server makes qemu exit so you have to connect gdb early and set a breakpoint. For spice-server-0.12.4-12.el6_7.1.x86_64, the function in question is this one: static void red_channel_remove_client(RedChannelClient *rcc) { if (!pthread_equal(pthread_self(), rcc->channel->thread_id)) { spice_warning("channel type %d id %d - " "channel->thread_id (0x%lx) != pthread_self (0x%lx)." "If one of the threads is != io-thread && != vcpu-thread, " "this might be a BUG", rcc->channel->type, rcc->channel->id, rcc->channel->thread_id, pthread_self()); } ring_remove(&rcc->channel_link); spice_assert(rcc->channel->clients_num > 0); rcc->channel->clients_num--; // TODO: should we set rcc->channel to NULL??? } I managed to hit the condition several times, it takes varying time to reproduce. I couldn't see any reliable trigger but transparency adjustment seems indeed to make the bug happen. Attached is gdb output with full backtrace of all threads followed by "step 10000" command - note that this assertion is not present in normal qemu log: > ((null):19928): Spice-ERROR **: snd_worker.c:1088:spice_server_playback_get_buffer: assertion `playback_channel->base.active' failed Addendum: To catch the log, I run: gdb --pid $(pgrep -f $VM_NAME) -x gdb_potplayer_commands with gdb_potplayer_commands file contents: set logging file /var/log/libvirt/qemu/${VM_NAME}.log b red_channel.c:1800 commands set logging on t a a bt full step 10000 end continue (aka teeing gdb output to qemu log) Is the backtrace helpful, or do we need some more heavyweight tools for getting information? So, I think I've found a step-by-step to reproduce this: 1-) connect to the w7 machine (not fullscreen) 2-) start potplayer and set transparency (with the slider in the top-right) 3-) start the video 4-) increase the size of remote-viewer (the widget itself) and wait to the guest autoresize 5-) increase the size of potplayer in the guest 6-) decrease potplayer transparency (with the slider in the top-right) 7-) crash seems to happen after step 6 due too few data in jpeg. I have a patch that avoids the crash but might leave glitches in the stream so, waiting for feedback on it. PS: Seems that this not happen with upstream qxl driver. I was only able to reproduce with rhevm-tools 3.5.9 so far. So far, making the spice-server not crash seems the best for now. The stream code in spice-server needs improvements and for this reason it is disable in RHEL7 [0] which is the probable reason for this crash not being reproducible there. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1294564#c3 With this fix applied I noticed that when the crash should happen the stream gets a bit slower and if you move the player in the guest some glitches could be noticed. Some context regarding the error messages: 1-) From libjpeg_turbo "Application transferred too many scanlines" happens when the stream is bigger then what we set for libjpeg encoder so the encoder is ignoring part of the stream; 2-) From libjpeg_turbo "Application transferred too few scanlines" happens when the stream is smaller then what we set in libjpeg encoder so the encoder does not have enough data and causes the crash; The (2) is being handled so error will be avoided. (In reply to Victor Toso from comment #14) > So far, making the spice-server not crash seems the best for now. The stream > code in spice-server needs improvements and for this reason it is disable in > RHEL7 [0] which is the probable reason for this crash not being reproducible > there. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1294564#c3 > > With this fix applied I noticed that when the crash should happen the stream > gets a bit slower and if you move the player in the guest some glitches > could be noticed. > > Some context regarding the error messages: > > 1-) From libjpeg_turbo "Application transferred too many scanlines" happens > when the stream is bigger then what we set for libjpeg encoder so the > encoder is ignoring part of the stream; > > 2-) From libjpeg_turbo "Application transferred too few scanlines" happens > when the stream is smaller then what we set in libjpeg encoder so the > encoder does not have enough data and causes the crash; > > The (2) is being handled so error will be avoided. el7 is by closing mjpeg avoids this problem? If it is el6.5 users should be how to solve this problem? patch? (In reply to shangxu from comment #16) > el7 is by closing mjpeg avoids this problem? On el7 the stream detection is disabled (in order to enable it, you must change qemu command line). > If it is el6.5 users should be how to solve this problem? > patch? Patch are still under review and being tested so I would recommend customers to wait the release. In any case I'll attach the proposal patch here. Created attachment 1128510 [details]
proposal patch to avoid spice-server crash
(In reply to Frediano Ziglio from comment #19) > See > https://lists.freedesktop.org/archives/spice-devel/2016-February/026852.html Indeed, seems that it could be a better way to avoid the crash. Tested and seems that performance is better as stream is not as slow as with the previous patch. I guess that glitches could still happen, but as I said in comment #14 - I would prefer to avoid the crash of spice-server now but fix the sized-stream upstream. This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions Moving to 6.9 . Patch from comment #20 is upstream [0]. Moving to ASSIGNED to double check it early in the next development phase. [0] https://cgit.freedesktop.org/spice/spice/commit/?id=1b69198c4ec73110251e0ebf969275e98950808e Backported following patches and tested following comment #13 and no more crashes. 28f2e425c4e9d86570970d49a7a3eee43e24134e Francois Gouget (1): streaming: Rework red_marshall_stream_data a bit 42a5794845d0ee4b34ac523b8ad5a6c453d2203c Francois Gouget (1): streaming: Remove the Drawable.sized_stream field 032cb0ce85b44da3ee5d0308909164452e25bff5 Francois Gouget (1): mjpeg: Use src_area as the authoritative source for the frame dimensions Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0588.html |