Bug 1455451

Summary:

VM with pty based serial console hangs if PTS buffer is full

Product:

Red Hat Enterprise Linux 7

Reporter:

Sergio Lopez <slopezpa>

Component:

qemu-kvm

Assignee:

pagupta

Status:

CLOSED ERRATA

QA Contact:

Sitong Liu <siliu>

Severity:

urgent

Docs Contact:

Yehuda Zimmerman <yzimmerm>

Priority:

medium

Version:

7.3

CC:

agurenko, chayang, cww, dvacek, ehabkost, hhuang, juzhang, mdeng, michen, miturria, pagupta, pbonzini, qzhang, rbalakri, slopezpa, virt-bugs, virt-maint, xfu

Target Milestone:

Keywords:

TestOnly

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

qemu-kvm-1.5.3-141.el7

Doc Type:

Bug Fix

Doc Text:

Guests accessed using a serial console no longer become unresponsive Previously, if a client opened a host-side pseudoterminal device (pty) on a KVM guest pty serial console and did not read from it, the guest in some cases became unresponsive because because of blocking read/write calls. With this update, the host-side pty open mode was set to non-blocking. As a result, the guest machine does not become unresponsive in the described scenario.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-10 14:32:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1452067

Bug Blocks:

1420851, 1469590

Attachments:

Description	Flags
PoC for making the serial device lossy	none
Python script to keep the PTS open without reading from it	none

Description Sergio Lopez 2017-05-25 08:53:48 UTC

Description of problem:

In a VM configured with a pty based serial device, with a Linux guest configured to use it as console, if the output buffer of the PTS (/dev/pts/[0-9]+) on the Host side is full, the guest will hang.


Version-Release number of selected component (if applicable):

Tested with qemu-1.5.3-126.el7_3.8


How reproducible:

Always


Steps to Reproduce:
1. Create a VM with a pty based serial device
2. Install a Linux guest, and redirect its console to the serial device
3. Open the PTS char device on the Host, without reading anything from it (you can use keepopen.py for this).
4. Make the guest generate some output to the console (using iptables LOG target, or generating martian packages). Please see BZ #1451470 for more information about this.


Actual results:

After some time (a few seconds to ~3 minutes), VM will be effectively hung.


Expected results:

VM shouldn't hang, even if that means losing some of the information being written to the serial console.


Additional info:

This BZ is a spin-off coming from #1451470 to explicitly track this issue. Please take a look at that BZ for more detailed information about precedents and traces.

When the issue triggers, a thread in QEMU can be seen blocking in write():

Thread 8 (Thread 0x7f5eaffff700 (LWP 24255)):
#0  0x00007f5edf4ac43d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f5ee0757efc in g_io_unix_write () from /lib64/libglib-2.0.so.0
#2  0x00007f5ee070a1ed in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
#3  0x00007f5ee22ea44e in io_channel_send (fd=0x7f5ee4a7c3f0, buf=0x7f5ee4f2c894, len=1)
    at qemu-char.c:787
#4  0x00007f5ee224c153 in serial_xmit (chan=chan@entry=0x0, cond=cond@entry=G_IO_OUT, 
    opaque=opaque@entry=0x7f5ee4f2c890) at hw/char/serial.c:281
#5  0x00007f5ee224c491 in serial_ioport_write (opaque=0x7f5ee4f2c890, addr=<optimized out>, val=32, 
    size=<optimized out>) at hw/char/serial.c:330
#6  0x00007f5ee2382e53 in access_with_adjusted_size (addr=addr@entry=0, 
    value=value@entry=0x7f5eafffeb48, size=1, access_size_min=<optimized out>, 
    access_size_max=<optimized out>, access=access@entry=0x7f5ee2383370 <memory_region_write_accessor>, 
    opaque=opaque@entry=0x7f5ee4f2c928) at /root/rpmbuild/BUILD/qemu-1.5.3/memory.c:365
#7  0x00007f5ee238408f in memory_region_iorange_write (iorange=<optimized out>, offset=0, width=1, 
    data=32) at /root/rpmbuild/BUILD/qemu-1.5.3/memory.c:440
#8  0x00007f5ee2381f12 in kvm_handle_io (count=1, size=1, direction=1, data=<optimized out>, port=1016)
    at /root/rpmbuild/BUILD/qemu-1.5.3/kvm-all.c:1517
#9  kvm_cpu_exec (env=env@entry=0x7f5ee4e03ea0) at /root/rpmbuild/BUILD/qemu-1.5.3/kvm-all.c:1674
#10 0x00007f5ee2333675 in qemu_kvm_cpu_thread_fn (arg=0x7f5ee4e03ea0)
    at /root/rpmbuild/BUILD/qemu-1.5.3/cpus.c:802
#11 0x00007f5edf4a5dc5 in start_thread (arg=0x7f5eaffff700) at pthread_create.c:308

Comment 2 Sergio Lopez 2017-05-25 09:02:19 UTC

Created attachment 1282164 [details]
PoC for making the serial device lossy

PoC patch which changes two things:

 1) The O_NONBLOCK flag is added to the PTS file descriptor.

 2) Changes retry logic in serial_xmit to immediately retry when qemu_chr_fe_write fails, and if MAX_XMIT_RETRY is reached, adds UART_LSR_THRE flag to lsr and stops trying.

With this, the Guest no longer hangs, no matter the state of the PTS on the Host, at the cost of potentially losing some chars if the FIFO is full and can't write to the PTS.

Comment 3 Sergio Lopez 2017-05-25 09:03:43 UTC

Created attachment 1282166 [details]
Python script to keep the PTS open without reading from it

Comment 5 pagupta 2017-05-26 12:21:05 UTC

Hello Sergio,

Just looking into this. I want to ask you a question based on 'bt' you 
provided in #comment 0. It looks like its a blocking call?

#0  0x00007f5edf4ac43d in write () at -------------> here
#1  0x00007f5ee0757efc in g_io_unix_write () from /lib64/libglib-2.0.so.0
#2  0x00007f5ee070a1ed in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
#3  0x00007f5ee22ea44e in io_channel_send (fd=0x7f5ee4a7c3f0, buf=0x7f5ee4f2c894, len=1)
    at qemu-char.c:787
#4  0x00007f5ee224c153 in serial_xmit (chan=chan@entry=0x0, cond=cond@entry=G_IO_OUT,  

So, there could be two reasons for that:

1] If its a blocking call.
2] As spotted by you retry logic could be wrong.

For 2] I went through the code. If we compare with the 'bt' above 'qemu_chr_fe_write' function is called only once by 'serial_xmit' to check
if we can write len '1'.

        } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&

If it fails(could not write or no one reading data from serial device) and  
we add a callback. In this single character write, might be because of blocking call? 

For 1] I went through your patch as well, I can see you are returning 
without setting callback. I am not sure whether this is correct thing to do here. Also, you have changed blocking fd to non-blocking.

I want to know have to only tried with just changing from blocking to non-blocking fd? In BZ 1451470 I could see Fam has mentioned below commit is already upstream.

fac6688a18574b6f2caa8c699a936e729ed53ece 

Thanks,
Pankaj

Comment 6 Sergio Lopez 2017-05-29 13:24:15 UTC

(In reply to pagupta from comment #5)
> I want to know have to only tried with just changing from blocking to
> non-blocking fd? In BZ 1451470 I could see Fam has mentioned below commit is
> already upstream.

Hi Pankaj,

Yes, I've first tried only with adding the O_NONBLOCK flag. With it, the write() does indeed return, but the VM still manifested similar symptoms (no network response to SSH nor ping), until the console is flushed using "virsh console".

Looking a bit into this, this is the stack trace generated by the guest kernel:

<snip>
[ 2120.755147] INFO: rcu_sched self-detected stall on CPU { 6}  (t=74571 jiffies g=157201 c=157200 q=2275)
[ 2120.755148] Task dump for CPU 6:
[ 2120.755150] swapper/6       R  running task        0     0      1 0x00000008
[ 2120.755153]  ffff88007bc8edd0 7d015a79700ada28 ffff88007d383738 ffffffff810c47e8
[ 2120.755154]  0000000000000006 ffffffff81a1e6c0 ffff88007d383750 ffffffff810c80c9
[ 2120.755155]  0000000000000007 ffff88007d383780 ffffffff81137960 ffff88007d3901c0
[ 2120.755156] Call Trace:
[ 2120.755165]  <IRQ>  [<ffffffff810c47e8>] sched_show_task+0xa8/0x110
[ 2120.755167]  [<ffffffff810c80c9>] dump_cpu_task+0x39/0x70
[ 2120.755171]  [<ffffffff81137960>] rcu_dump_cpu_stacks+0x90/0xd0
[ 2120.755173]  [<ffffffff8113b0b2>] rcu_check_callbacks+0x442/0x720
[ 2120.755178]  [<ffffffff810eb9ac>] ? update_wall_time+0x26c/0x6c0
[ 2120.755180]  [<ffffffff810f3610>] ? tick_sched_handle.isra.13+0x60/0x60
[ 2120.755184]  [<ffffffff81099697>] update_process_times+0x47/0x80
[ 2120.755185]  [<ffffffff810f35d5>] tick_sched_handle.isra.13+0x25/0x60
[ 2120.755186]  [<ffffffff810f3651>] tick_sched_timer+0x41/0x70
[ 2120.755189]  [<ffffffff810b4d72>] __hrtimer_run_queues+0xd2/0x260
[ 2120.755191]  [<ffffffff810b5310>] hrtimer_interrupt+0xb0/0x1e0
[ 2120.755195]  [<ffffffff81050ff7>] local_apic_timer_interrupt+0x37/0x60
[ 2120.755200]  [<ffffffff81699bcf>] smp_apic_timer_interrupt+0x3f/0x60
[ 2120.755201]  [<ffffffff8169811d>] apic_timer_interrupt+0x6d/0x80
[ 2120.755203]  [<ffffffff810889ee>] ? console_unlock+0x46e/0x4b0
[ 2120.755205]  [<ffffffff81088df4>] vprintk_emit+0x3c4/0x510
[ 2120.755206]  [<ffffffff81089169>] vprintk_default+0x29/0x40
[ 2120.755208]  [<ffffffff816803bb>] printk+0x5e/0x75
[ 2120.755211]  [<ffffffff8131ee43>] ? ___ratelimit+0x93/0x100
[ 2120.755214]  [<ffffffff815ac1bf>] ip_handle_martian_source.isra.35+0x6f/0xd0
[ 2120.755216]  [<ffffffff815f3c44>] ? fib_validate_source+0x64/0xe0
[ 2120.755217]  [<ffffffff815ae1ba>] ip_route_input_slow+0x75a/0xca0
[ 2120.755240]  [<ffffffff815ae746>] ip_route_input_noref+0x46/0x320
[ 2120.755243]  [<ffffffff815b102c>] ip_rcv_finish+0xbc/0x350
[ 2120.755244]  [<ffffffff815b1986>] ip_rcv+0x2b6/0x410
[ 2120.755245]  [<ffffffff815b0f70>] ? inet_del_offload+0x40/0x40
[ 2120.755249]  [<ffffffff8156fe52>] __netif_receive_skb_core+0x582/0x800
[ 2120.755253]  [<ffffffff815dc974>] ? tcp4_gro_receive+0x134/0x1b0
[ 2120.755255]  [<ffffffff815700e8>] __netif_receive_skb+0x18/0x60
[ 2120.755256]  [<ffffffff81570170>] netif_receive_skb_internal+0x40/0xc0
[ 2120.755258]  [<ffffffff81571288>] napi_gro_receive+0xd8/0x130
[ 2120.755276]  [<ffffffffa00c79d5>] virtnet_poll+0x265/0x750 [virtio_net]
[ 2120.755277]  [<ffffffff81570910>] net_rx_action+0x170/0x380
[ 2120.755281]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
[ 2120.755282]  [<ffffffff81698f5c>] call_softirq+0x1c/0x30
[ 2120.755285]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
[ 2120.755286]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
[ 2120.755287]  [<ffffffff81699bd5>] smp_apic_timer_interrupt+0x45/0x60
[ 2120.755288]  [<ffffffff8169811d>] apic_timer_interrupt+0x6d/0x80
[ 2120.755293]  <EOI>  [<ffffffff81060fe6>] ? native_safe_halt+0x6/0x10
[ 2120.755297]  [<ffffffff810347bf>] default_idle+0x1f/0xc0
[ 2120.755298]  [<ffffffff81035106>] arch_cpu_idle+0x26/0x30
[ 2120.755300]  [<ffffffff810e82f5>] cpu_startup_entry+0x245/0x290
[ 2120.755302]  [<ffffffff8104f09a>] start_secondary+0x1ba/0x230
</snip>

This stack trace is definitely different from the one it generates running on a QEMU instance without the O_NONBLOCK patch, but as I said above, from a final user perspective, the issue still persists.

I think the problem comes from the retry logic:

<snip from hw/char/serial.c:234>
    do {
        assert(!(s->lsr & UART_LSR_TEMT));
        if (s->tsr_retry == 0) {
            assert(!(s->lsr & UART_LSR_THRE));

            if (s->fcr & UART_FCR_FE) {
                assert(!fifo8_is_empty(&s->xmit_fifo));
                s->tsr = fifo8_pop(&s->xmit_fifo);
                if (!s->xmit_fifo.num) {
                    s->lsr |= UART_LSR_THRE;
                }
            } else {
                s->tsr = s->thr;
                s->lsr |= UART_LSR_THRE;
            }
            if ((s->lsr & UART_LSR_THRE) && !s->thr_ipending) {
                s->thr_ipending = 1;
                serial_update_irq(s);
            }
        }

        if (s->mcr & UART_MCR_LOOP) {
            /* in loopback mode, say that we just received a char */
            serial_receive1(s, &s->tsr, 1);
        } else if (qemu_chr_fe_write(s->chr, &s->tsr, 1) != 1 &&
                   s->tsr_retry < MAX_XMIT_RETRY) {
            assert(s->watch_tag == 0);
            s->watch_tag = qemu_chr_fe_add_watch(s->chr, G_IO_OUT|G_IO_HUP,
                                                 serial_watch_cb, s);
            if (s->watch_tag > 0) {
                s->tsr_retry++;
                return;
            }
        }
        s->tsr_retry = 0;

        /* Transmit another byte if it is already available. It is only
           possible when FIFO is enabled and not empty. */
    } while (!(s->lsr & UART_LSR_THRE));

    s->last_xmit_ts = qemu_get_clock_ns(vm_clock);
    s->lsr |= UART_LSR_TEMT;
}
</snip>

Combined with this behavior from the VM kernel:

<snip from drivers/tty/serial/8250/8250_core.c> 
static void serial8250_console_putchar(struct uart_port *port, int ch)
{
        struct uart_8250_port *up =
                container_of(port, struct uart_8250_port, port);

        wait_for_xmitr(up, UART_LSR_THRE);
        serial_port_out(port, UART_TX, ch);
}
</snip>

My hypothesis for what's happening is:

 - From QEMU perspective: "qemu_chr_fe_add_watch" starts failing when the PTS is full from Host perspective. Each time it fails, tsr_retry is incremented, installing the callback and returning. When MAX_XMIT_RETRY is reached, tsr_retry is set to 0, returning to the beginning of the loop, which will discard the current character and take a new one from xmit_fifo. But, and this is especially important, UART_LSR_THRE will not be set unless xmit_fifo is empty.

 - From VM's kernel perspective: writes to the I/O port, and waits a bit for UART_LSR_THRE. As is doesn't arrive on time, the wait times out and writes another character to the I/O port, increasing xmit_fifo.

I think that the combination of both perspectives makes that writes to the serial device succeed, although _very_ slowly. So slowly, that from outside the VM looks exactly the same as when it was blocked on write().

The PoC attached to the case just makes that writes to the serial device will fail or succeed quickly, to avoid triggering the above situation.

Sergio.

Comment 7 pagupta 2017-05-30 11:54:27 UTC

(In reply to Sergio Lopez from comment #6)
> (In reply to pagupta from comment #5)
> > I want to know have to only tried with just changing from blocking to
> > non-blocking fd? In BZ 1451470 I could see Fam has mentioned below commit is
> > already upstream.
> 
> Hi Pankaj,
> 
> Yes, I've first tried only with adding the O_NONBLOCK flag. With it, the
> write() does indeed return, but the VM still manifested similar symptoms (no
> network response to SSH nor ping), until the console is flushed using "virsh
> console".
> 
> Looking a bit into this, this is the stack trace generated by the guest
> kernel:
> 
> <snip>
> [ 2120.755147] INFO: rcu_sched self-detected stall on CPU { 6}  (t=74571
> jiffies g=157201 c=157200 q=2275)
> [ 2120.755148] Task dump for CPU 6:
> [ 2120.755150] swapper/6       R  running task        0     0      1
> 0x00000008
> [ 2120.755153]  ffff88007bc8edd0 7d015a79700ada28 ffff88007d383738
> ffffffff810c47e8
> [ 2120.755154]  0000000000000006 ffffffff81a1e6c0 ffff88007d383750
> ffffffff810c80c9
> [ 2120.755155]  0000000000000007 ffff88007d383780 ffffffff81137960
> ffff88007d3901c0
> [ 2120.755156] Call Trace:
> [ 2120.755165]  <IRQ>  [<ffffffff810c47e8>] sched_show_task+0xa8/0x110
> [ 2120.755167]  [<ffffffff810c80c9>] dump_cpu_task+0x39/0x70
> [ 2120.755171]  [<ffffffff81137960>] rcu_dump_cpu_stacks+0x90/0xd0
> [ 2120.755173]  [<ffffffff8113b0b2>] rcu_check_callbacks+0x442/0x720
> [ 2120.755178]  [<ffffffff810eb9ac>] ? update_wall_time+0x26c/0x6c0
> [ 2120.755180]  [<ffffffff810f3610>] ? tick_sched_handle.isra.13+0x60/0x60
> [ 2120.755184]  [<ffffffff81099697>] update_process_times+0x47/0x80
> [ 2120.755185]  [<ffffffff810f35d5>] tick_sched_handle.isra.13+0x25/0x60
> [ 2120.755186]  [<ffffffff810f3651>] tick_sched_timer+0x41/0x70
> [ 2120.755189]  [<ffffffff810b4d72>] __hrtimer_run_queues+0xd2/0x260
> [ 2120.755191]  [<ffffffff810b5310>] hrtimer_interrupt+0xb0/0x1e0
> [ 2120.755195]  [<ffffffff81050ff7>] local_apic_timer_interrupt+0x37/0x60
> [ 2120.755200]  [<ffffffff81699bcf>] smp_apic_timer_interrupt+0x3f/0x60
> [ 2120.755201]  [<ffffffff8169811d>] apic_timer_interrupt+0x6d/0x80
> [ 2120.755203]  [<ffffffff810889ee>] ? console_unlock+0x46e/0x4b0
> [ 2120.755205]  [<ffffffff81088df4>] vprintk_emit+0x3c4/0x510
> [ 2120.755206]  [<ffffffff81089169>] vprintk_default+0x29/0x40
> [ 2120.755208]  [<ffffffff816803bb>] printk+0x5e/0x75
> [ 2120.755211]  [<ffffffff8131ee43>] ? ___ratelimit+0x93/0x100
> [ 2120.755214]  [<ffffffff815ac1bf>]
> ip_handle_martian_source.isra.35+0x6f/0xd0
> [ 2120.755216]  [<ffffffff815f3c44>] ? fib_validate_source+0x64/0xe0
> [ 2120.755217]  [<ffffffff815ae1ba>] ip_route_input_slow+0x75a/0xca0
> [ 2120.755240]  [<ffffffff815ae746>] ip_route_input_noref+0x46/0x320
> [ 2120.755243]  [<ffffffff815b102c>] ip_rcv_finish+0xbc/0x350
> [ 2120.755244]  [<ffffffff815b1986>] ip_rcv+0x2b6/0x410
> [ 2120.755245]  [<ffffffff815b0f70>] ? inet_del_offload+0x40/0x40
> [ 2120.755249]  [<ffffffff8156fe52>] __netif_receive_skb_core+0x582/0x800
> [ 2120.755253]  [<ffffffff815dc974>] ? tcp4_gro_receive+0x134/0x1b0
> [ 2120.755255]  [<ffffffff815700e8>] __netif_receive_skb+0x18/0x60
> [ 2120.755256]  [<ffffffff81570170>] netif_receive_skb_internal+0x40/0xc0
> [ 2120.755258]  [<ffffffff81571288>] napi_gro_receive+0xd8/0x130
> [ 2120.755276]  [<ffffffffa00c79d5>] virtnet_poll+0x265/0x750 [virtio_net]
> [ 2120.755277]  [<ffffffff81570910>] net_rx_action+0x170/0x380
> [ 2120.755281]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
> [ 2120.755282]  [<ffffffff81698f5c>] call_softirq+0x1c/0x30
> [ 2120.755285]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
> [ 2120.755286]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
> [ 2120.755287]  [<ffffffff81699bd5>] smp_apic_timer_interrupt+0x45/0x60
> [ 2120.755288]  [<ffffffff8169811d>] apic_timer_interrupt+0x6d/0x80
> [ 2120.755293]  <EOI>  [<ffffffff81060fe6>] ? native_safe_halt+0x6/0x10
> [ 2120.755297]  [<ffffffff810347bf>] default_idle+0x1f/0xc0
> [ 2120.755298]  [<ffffffff81035106>] arch_cpu_idle+0x26/0x30
> [ 2120.755300]  [<ffffffff810e82f5>] cpu_startup_entry+0x245/0x290
> [ 2120.755302]  [<ffffffff8104f09a>] start_secondary+0x1ba/0x230
> </snip>
> 
> This stack trace is definitely different from the one it generates running
> on a QEMU instance without the O_NONBLOCK patch, but as I said above, from a
> final user perspective, the issue still persists.
> 
> I think the problem comes from the retry logic:
> 
> <snip from hw/char/serial.c:234>
>     do {
>         assert(!(s->lsr & UART_LSR_TEMT));
>         if (s->tsr_retry == 0) {
>             assert(!(s->lsr & UART_LSR_THRE));
> 
>             if (s->fcr & UART_FCR_FE) {
>                 assert(!fifo8_is_empty(&s->xmit_fifo));
>                 s->tsr = fifo8_pop(&s->xmit_fifo);
>                 if (!s->xmit_fifo.num) {
>                     s->lsr |= UART_LSR_THRE;
>                 }
>             } else {
>                 s->tsr = s->thr;
>                 s->lsr |= UART_LSR_THRE;
>             }
>             if ((s->lsr & UART_LSR_THRE) && !s->thr_ipending) {
>                 s->thr_ipending = 1;
>                 serial_update_irq(s);
>             }
>         }
> 
>         if (s->mcr & UART_MCR_LOOP) {
>             /* in loopback mode, say that we just received a char */
>             serial_receive1(s, &s->tsr, 1);
>         } else if (qemu_chr_fe_write(s->chr, &s->tsr, 1) != 1 &&
>                    s->tsr_retry < MAX_XMIT_RETRY) {
>             assert(s->watch_tag == 0);
>             s->watch_tag = qemu_chr_fe_add_watch(s->chr, G_IO_OUT|G_IO_HUP,
>                                                  serial_watch_cb, s);
>             if (s->watch_tag > 0) {
>                 s->tsr_retry++;
>                 return;
>             }
>         }
>         s->tsr_retry = 0;
> 
>         /* Transmit another byte if it is already available. It is only
>            possible when FIFO is enabled and not empty. */
>     } while (!(s->lsr & UART_LSR_THRE));
> 
>     s->last_xmit_ts = qemu_get_clock_ns(vm_clock);
>     s->lsr |= UART_LSR_TEMT;
> }
> </snip>
> 
> Combined with this behavior from the VM kernel:
> 
> <snip from drivers/tty/serial/8250/8250_core.c> 
> static void serial8250_console_putchar(struct uart_port *port, int ch)
> {
>         struct uart_8250_port *up =
>                 container_of(port, struct uart_8250_port, port);
> 
>         wait_for_xmitr(up, UART_LSR_THRE);
>         serial_port_out(port, UART_TX, ch);
> }
> </snip>
> 
> My hypothesis for what's happening is:
> 
>  - From QEMU perspective: "qemu_chr_fe_add_watch" starts failing when the
> PTS is full from Host perspective. Each time it fails, tsr_retry is
> incremented, installing the callback and returning. When MAX_XMIT_RETRY is
> reached, tsr_retry is set to 0, returning to the beginning of the loop,
> which will discard the current character and take a new one from xmit_fifo.
> But, and this is especially important, UART_LSR_THRE will not be set unless
> xmit_fifo is empty.

yes, As we are not flushing just adding new callback if there is no already setup.

Let suppose *first* char write from guest fails and we add a watch and callback.
Also, we increment 'tsr_retry'. As there is no activity(nobody reading/writing) in the file handle after this so callback won't be called?

serial_xmit()
...
 else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
                   s->tsr_retry < MAX_XMIT_RETRY) {
            assert(s->watch_tag == 0);
            s->watch_tag =
                qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
                                      serial_watch_cb, s);-------->add callback
            if (s->watch_tag > 0) {
                s->tsr_retry++; // increment & return
                return;
            }
....
}   

In function 'serial_ioport_write', we are checking for tsr_retry==0:

 if (s->tsr_retry == 0) {
                serial_xmit(s);
            }

So, if there is no activity on file FD we are not at all flushing the data.

> 
>  - From VM's kernel perspective: writes to the I/O port, and waits a bit for
> UART_LSR_THRE. As is doesn't arrive on time, the wait times out and writes
> another character to the I/O port, increasing xmit_fifo.
> 
> I think that the combination of both perspectives makes that writes to the
> serial device succeed, although _very_ slowly. So slowly, that from outside
> the VM looks exactly the same as when it was blocked on write().

results in RCU stalls and other bad behaviour.

> 
> The PoC attached to the case just makes that writes to the serial device
> will fail or succeed quickly, to avoid triggering the above situation.

Your fix is working because you have removed 'callback' and every-time you are calling 'serial_xmit' which might not be correct thing. As callbacks are handled by main thread if there is activity on file FD.


Thanks,
Pankaj
> 
> Sergio.

Thanks,
Pankaj

Comment 8 Eduardo Habkost 2017-05-30 15:04:04 UTC

Is the issue reproducible using "virsh console" or the virDomainOpenConsole() libvirt API?  I'm not sure it is valid usage to open the pty device directly and not read data as soon as it's available.

Comment 9 Paolo Bonzini 2017-05-30 15:51:14 UTC

>>  - From QEMU perspective: "qemu_chr_fe_add_watch" starts failing when the
>> PTS is full from Host perspective. Each time it fails, tsr_retry is
>> incremented, installing the callback and returning. When MAX_XMIT_RETRY is
>> reached, tsr_retry is set to 0, returning to the beginning of the loop,
>> which will discard the current character and take a new one from xmit_fifo.

I don't think this is correct.  Why would qemu_chr_fe_add_watch fail?

>> But, and this is especially important, UART_LSR_THRE will not be set unless
>> xmit_fifo is empty.

Right, THRE is "transmitter hold register empty".

Commit fac6688a should definitely be backported (z-stream too).

To some extend this could be considered also a kernel issue; printk inside rcu_read_lock doesn't seem to be a good idea.  It is probably very widespread, especially for pr_debug and pr_info messages, but martian source does it at pr_warn.

But I agree with Eduardo: if you open the pty device, you should be reading it.

Comment 10 Daniel Vacek 2017-05-31 10:02:42 UTC

(In reply to Paolo Bonzini from comment #9)
> To some extend this could be considered also a kernel issue; printk inside
> rcu_read_lock doesn't seem to be a good idea.  It is probably very
> widespread, especially for pr_debug and pr_info messages, but martian source
> does it at pr_warn.

Not at all. printk is perfectly valid in any context. Even in NMI or interrupts disabled sections. And if you check the code martian source printk is rate-limited to twice a second, IIRC.

Comment 13 Daniel Berrangé 2017-06-01 09:09:08 UTC

(In reply to Eduardo Habkost from comment #8)
> Is the issue reproducible using "virsh console" or the
> virDomainOpenConsole() libvirt API?  I'm not sure it is valid usage to open
> the pty device directly and not read data as soon as it's available.

Yes, it is valid to not read data immediately. For example, when using virDomainOpenConsole, libvirtd will open the PTY, read data from it, and transmit that to the client. If the outgoing TCP buffer for the client is full, libvirtd can stop reading further data from the PTY until there is more space to transmit, rather than queuing an arbitrary amount of data in memory waiting for transmission.

Comment 14 Paolo Bonzini 2017-06-06 21:16:00 UTC

Note I'm going to pick commit 5c2b22f5bf ("Do not hang on full PTY", 2017-06-06) for bug 1452067, because it is necessary for QE to test that scenario easily.

Comment 16 Paolo Bonzini 2017-06-06 21:18:21 UTC

Wrong commit id in comment 14, the right one is fac6688a185.

Comment 25 Sitong Liu 2017-12-11 08:20:28 UTC

QE tried to verify this bug with libvirt.

- Packages in test on host:
qemu-kvm-1.5.3-149.el7.x86_64
kernel 3.10.0-799.el7.x86_64
libvirt-3.9.0-4.el7.x86_64

- Snippet of libvirt xml:
...
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
...

- Test steps according to bug 1451470 comment 26(no step 3) and step 6)).
1. Create and launch a VM with a pty serial console.

2. On the Host, login guest with 'virsh console'.

3. On the Guest, set 'quiet' option to kernel command line and reboot the guest.

[root@guest ~]# cat /proc/sys/kernel/printk
4	4	1	7

3. On the Guest, enable logging for martian packages.

[root@localhost ~]# sysctl net.ipv4.conf.all.log_martians=1
net.ipv4.conf.all.log_martians = 1

4. On the Host, open an SSH connection to the Guest, and run date in a loop. 

Leave this running in another terminal:
[root@localhost ~]# while sleep 1; do date ; done

5. On the Host, use rawsock.py to generate martian packages.

root@dhcp-8-163 home]# python rawsock.py 192.168.122.155 192.168.122.255
 

- Actual Result:

1. No guest hang was hit after around 10 minutes.

2. If 'quiet' option was not set,

[root@guest ~]# cat /proc/sys/kernel/printk
7	4	1	7

   No hang was hit either after around 10 minutes.

3. If quit 'virsh console' when it is printing log, no hang was
   hit after around 10 minutes.

-------

Hi Pankaj, Sergio Lopez,
Do you think this bug can be verified like above? If I missed anything or
did something wrong, please let me know. Thanks.

Best regards,
Sitong

Comment 26 pagupta 2017-12-11 09:52:21 UTC

Hi Sitong,

(In reply to Sitong Liu from comment #25)
> QE tried to verify this bug with libvirt.
> 
> - Packages in test on host:
> qemu-kvm-1.5.3-149.el7.x86_64
> kernel 3.10.0-799.el7.x86_64
> libvirt-3.9.0-4.el7.x86_64
> 
> - Snippet of libvirt xml:
> ...
>     <console type='pty' tty='/dev/pts/2'>
>       <source path='/dev/pts/2'/>
>       <target type='serial' port='0'/>
>       <alias name='serial0'/>
>     </console>
> ...
> 
> - Test steps according to bug 1451470 comment 26(no step 3) and step 6)).
> 1. Create and launch a VM with a pty serial console.
> 
> 2. On the Host, login guest with 'virsh console'.
> 
> 3. On the Guest, set 'quiet' option to kernel command line and reboot the
> guest.
> 
> [root@guest ~]# cat /proc/sys/kernel/printk
> 4	4	1	7
> 
> 3. On the Guest, enable logging for martian packages.
> 
> [root@localhost ~]# sysctl net.ipv4.conf.all.log_martians=1
> net.ipv4.conf.all.log_martians = 1
> 
> 4. On the Host, open an SSH connection to the Guest, and run date in a loop. 
> 
> Leave this running in another terminal:
> [root@localhost ~]# while sleep 1; do date ; done
> 
> 5. On the Host, use rawsock.py to generate martian packages.
> 
> root@dhcp-8-163 home]# python rawsock.py 192.168.122.155 192.168.122.255
>  
> 
> - Actual Result:
> 
> 1. No guest hang was hit after around 10 minutes.
> 
> 2. If 'quiet' option was not set,
> 
> [root@guest ~]# cat /proc/sys/kernel/printk
> 7	4	1	7
> 
>    No hang was hit either after around 10 minutes.
> 
> 3. If quit 'virsh console' when it is printing log, no hang was
>    hit after around 10 minutes.

If you are not seeing hang if you close virsh console or don't open
it at all it confirms Bug is fixed and libvirt takes care of client 
socket handling.
 
> 
> -------
> 
> Hi Pankaj, Sergio Lopez,
> Do you think this bug can be verified like above? If I missed anything or
> did something wrong, please let me know. Thanks.
> 
> Best regards,
> Sitong

Comment 27 Sitong Liu 2017-12-12 12:42:40 UTC

Thanks Pankaj.


Verify this bug with following packages:
qemu-kvm-1.5.3-149.el7.x86_64
kernel 3.10.0-799.el7.x86_64
libvirt-3.9.0-4.el7.x86_64

Test scenarios:
1. open virsh console                                   no hang
2. open virsh console + 'quiet' option                  no hang
3. don't open virsh console                             no hang
4. don't open virsh console + 'quiet' option            no hang
5. close virsh console when printing log                no hang
6. close virsh console when printing + 'quiet' option   no hang

According to these results, this issue has been fixed, set this bug as verified.

Comment 31 errata-xmlrpc 2018-04-10 14:32:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0816