Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1432297

Summary: virtio-serial port can not transfer data when boot >=33 ports for 1 pci
Product: Red Hat Enterprise Linux 8 Reporter: Yu Wang <wyu>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Devices QA Contact: liunana <nanliu>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: chayang, juzhang, knoel, lcapitulino, lijin, mdeng, nanliu, pagupta, phou, qzhang, rbalakri, virt-maint, wyu, xfu, xiagao
Version: 8.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-02 11:05:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473046, 1481076, 1558351    
Attachments:
Description Flags
virtio-serial-37-ports none

Description Yu Wang 2017-03-15 02:45:16 UTC
Created attachment 1263169 [details]
virtio-serial-37-ports

Description of problem:
if we add more than 32 ports for 1 pci, can only transfer data between guest and host via the first 32 ports, when transfer data >= 33 port it failed

Version-Release number of selected component (if applicable):
kernel-3.10.0-594.el7.x86_64
qemu-kvm-rhev-2.8.0-6.el7.x86_64
seabios-1.10.1-2.el7.x86_64
virtio-win-prewhql-133

How reproducible:
100%

Steps to Reproduce:
1. boot with more than 33 ports on one virtio-serial-pci (full boot cmd refer to attachment)

2.on host:
# nc -U /tmp/helloworld36
3.on guest,copy a txt via port36
cmd --> # copy 1.txt \\.\Global.\com.redhat.rhevm.vdsm36


Actual results:
After step3,guest cmd return:The system cannot find the file specified.0 file(s) copied.


Expected results:
could transfer this file.

Additional info:
1 when transfer data via <=32 ports , it pass.

Comment 2 Ladi Prosek 2017-03-15 11:03:54 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1370351#c14 for a brief analysis of this bug.

This would be easy to fix if it wasn't for migration. Queuing control messages host-side means extra state to migrate. Although migrating from newer to older QEMU and dropping the state should be fine because it's no different from the current behavior.

Comment 3 pagupta 2017-03-23 09:10:14 UTC
Hello,

It initially looked like this is effect of change b829c2a9 which changes default queues max limit from 64 to 1024. I think 'virtio-net' supports upto '1024' max number of queues for newer guests with 1c0fbf. But 'virtio-serial' still supports max 64 queues, 32 for host<==>guest communication.

I am not sure whether we need to support more number number of queues for 'virtio-serial'. It depends on scalability of data load or just limiting to 64 i.e 32 ports, should be enough at this point of time.

VIRTIO_QUEUE_MAX 1024           
max_supported_ports = VIRTIO_QUEUE_MAX / 2 - 1;   

/* Each port takes 2 queues, and one pair is for the control queue */

//SO, (1024/2 -1)= 511, max ports, we are seeing in logs but actually supporting only 128*2 queues
...
    vser->bus.max_nr_ports = vser->serial.max_virtserial_ports;
    vser->ivqs = g_malloc(vser->serial.max_virtserial_ports
                          * sizeof(VirtQueue *));
    vser->ovqs = g_malloc(vser->serial.max_virtserial_ports
                          * sizeof(VirtQueue *));
...
//Though we are allocating queues equal to number of supported ports which user gives (max 1024).
// But initializing only 128*2 queues each for guest to host & host to guest communication and pair of 32*2 control queues

vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32)
    * sizeof(vser->ports_map[0]));

Are we allocating ((32 + 31) /32) = 1 uint32 = (32 bits) 

I think here we need to do this:

* Check how many queues at the max we want to support for virtio-serial and  
  accordingly limit it.

* Allocate sender, receiver and control queues accordingly

* Correct 'ports_map' allocation for monitoring active ports.

Thanks,
Pankaj

Comment 4 Ladi Prosek 2017-03-23 09:39:55 UTC
(In reply to pagupta from comment #3)
> //Though we are allocating queues equal to number of supported ports which
> user gives (max 1024).
> // But initializing only 128*2 queues each for guest to host & host to guest
> communication and pair of 32*2 control queues

Aren't you mistaking the number of queues with their size here? The 128 for data queues and 32 for control queues as in

  vser->ivqs[0] = virtio_add_queue(vdev, 128, handle_input);

is the max number of buffers on the queue. It does not mean that we're limited to 128 ports.
 
> vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32)
>     * sizeof(vser->ports_map[0]));
> 
> Are we allocating ((32 + 31) /32) = 1 uint32 = (32 bits) 

Yes, but only with the default max_ports. serial.max_virtserial_ports is initialized based on the "max_ports" property. Check the command line in bug 1370351:

  -device virtio-serial-pci,id=virtio-serial0,disable-legacy=off,disable-modern=off,max_ports=511

> I think here we need to do this:
> 
> * Check how many queues at the max we want to support for virtio-serial and  
>   accordingly limit it.

We already support 1024 queues which is the global virtio limit.
 
> * Allocate sender, receiver and control queues accordingly

Already done. The size of data queues is 128. This is to a large extent arbitrary and may be adjusted but it is completely unrelated to this bug.

The size of control queues is 32 and it is related to this bug. I.e. if you increase this to 512, you will be able to pass all 512 VIRTIO_CONSOLE_PORT_ADD control messages on startup. But it's not a good solution, for multiple reasons:

* larger queues means more guest memory required
* it will need to be adjusted every time the limit on number of queues is increased
* still does not guarantee that the queue will not overflow - i.e. we may still be dropping control messages if the guest is busy and can't process them fast enough

The right solution is to queue control messages in QEMU. The starting point here is send_control_msg and send_control_event in virtio-serial-bus.c whose return value is currently ignored. And it shouldn't be.
 
> * Correct 'ports_map' allocation for monitoring active ports.

It appears to be already correct.

Thanks!
Ladi

Comment 5 pagupta 2017-03-23 12:15:10 UTC
(In reply to Ladi Prosek from comment #4)
> (In reply to pagupta from comment #3)
> > //Though we are allocating queues equal to number of supported ports which
> > user gives (max 1024).
> > // But initializing only 128*2 queues each for guest to host & host to guest
> > communication and pair of 32*2 control queues
> 
> Aren't you mistaking the number of queues with their size here? The 128 for
> data queues and 32 for control queues as in
> 
>   vser->ivqs[0] = virtio_add_queue(vdev, 128, handle_input);
> 
> is the max number of buffers on the queue. It does not mean that we're
> limited to 128 ports.

right.

>  
> > vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32)
> >     * sizeof(vser->ports_map[0]));
> > 
> > Are we allocating ((32 + 31) /32) = 1 uint32 = (32 bits) 
> 
> Yes, but only with the default max_ports. serial.max_virtserial_ports is
> initialized based on the "max_ports" property. Check the command line in bug
> 1370351:
> 
>   -device
> virtio-serial-pci,id=virtio-serial0,disable-legacy=off,disable-modern=off,
> max_ports=511

yes.

> 
> > I think here we need to do this:
> > 
> > * Check how many queues at the max we want to support for virtio-serial and  
> >   accordingly limit it.
> 
> We already support 1024 queues which is the global virtio limit.
>  
> > * Allocate sender, receiver and control queues accordingly
> 
> Already done. The size of data queues is 128. This is to a large extent
> arbitrary and may be adjusted but it is completely unrelated to this bug.
> 
> The size of control queues is 32 and it is related to this bug. I.e. if you
> increase this to 512, you will be able to pass all 512
> VIRTIO_CONSOLE_PORT_ADD control messages on startup. But it's not a good
> solution, for multiple reasons:
> 
> * larger queues means more guest memory required
> * it will need to be adjusted every time the limit on number of queues is
> increased
> * still does not guarantee that the queue will not overflow - i.e. we may
> still be dropping control messages if the guest is busy and can't process
> them fast enough
> 
> The right solution is to queue control messages in QEMU. The starting point
> here is send_control_msg and send_control_event in virtio-serial-bus.c whose
> return value is currently ignored. And it shouldn't be.

right. Thanks for explaining.

this is what we do with virtio-net as well and of-course we have to adjust when we increase number of queues. 
    n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
 
I agree queuing at Qemu side seems better to avoid loss of messages and then
we can place chunk(min if (available free slots in virtqueue), available messages) to virtqueue but this will bring another layer.

I am just thinking if virtio-net is using control queue size '64' without any issue, for virtio-serial also it(64) should be sufficient if we are also consuming control messages. 

Another thing I noticed is in functions: control_in(*empty*) & control_out()

control_out()---> guest to host notifier
----------
we pop buffers from virtqueue and after poping/processing/reseting all the buffers and we do 'notify'. In 'virtio_net' we notify after each buffer pop & reset.

But for virtio_serial, we pop all the entries and after loop ends we notify guest. It means at one point in time guest can only put notification
equal to max queue size which is 32. 

Just thinking moving 'virtio_notify' inside the loop will fix the issue.
I have not tested this just sharing thoughts here.

@@ -483,10 +483,10 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 
         handle_control_message(vser, buf, cur_len);
         virtqueue_push(vq, elem, 0);
+       virtio_notify(vdev, vq);
         g_free(elem);
     }
     g_free(buf);
-    virtio_notify(vdev, vq);
 }

Thanks,
Pankaj

>  
> > * Correct 'ports_map' allocation for monitoring active ports.
> 
> It appears to be already correct.
> 
> Thanks!
> Ladi

Comment 6 Ladi Prosek 2017-03-23 13:14:14 UTC
(In reply to pagupta from comment #5)
> this is what we do with virtio-net as well and of-course we have to adjust
> when we increase number of queues. 
>     n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
>  
> I agree queuing at Qemu side seems better to avoid loss of messages and then
> we can place chunk(min if (available free slots in virtqueue), available
> messages) to virtqueue but this will bring another layer.
> 
> I am just thinking if virtio-net is using control queue size '64' without
> any issue, for virtio-serial also it(64) should be sufficient if we are also
> consuming control messages. 

The control queue used by virtio-net is equivalent to the "out" queue (vser->c_ovq) here in virtio-serial. It is used for sending messages from guest to host, so the burden of queueing / throttling / control flow is on the guest driver. So it is not a good analogy.

The problematic queue in virtio-serial is control "in" (vser->c_ivq) where the host relies on the guest driver keeping the queue prepopulated with buffers. Note that virtqueue data transfer is always guest initiated. There is no way for the device to send data to the driver if the driver hasn't prepared a buffer and pushed to into a queue.

> Another thing I noticed is in functions: control_in(*empty*) & control_out()
> 
> control_out()---> guest to host notifier
> ----------
> we pop buffers from virtqueue and after poping/processing/reseting all the
> buffers and we do 'notify'. In 'virtio_net' we notify after each buffer pop
> & reset.
> 
> But for virtio_serial, we pop all the entries and after loop ends we notify
> guest. It means at one point in time guest can only put notification
> equal to max queue size which is 32. 
> 
> Just thinking moving 'virtio_notify' inside the loop will fix the issue.
> I have not tested this just sharing thoughts here.

As described above, the "out" queue is not what's causing the issue here.

The fix would be in control_in, plus a bunch of other places. Instead of just returning, send_control_msg would push the message onto an internal queue if virtqueue_pop doesn't return a valid buffer. Then in control_in, the internal queue would be checked and if it's not empty, the queued messages would be sent.

Things to look out for:
* Unbounded allocations / should the internal queue have a limit?
* Migrations / what happens to the internal queue when the VM live-migrates somewhere else?
* Can we get away with something VIRTIO_CONSOLE_PORT_ADD specific? A generic solution would be great but in practice we rarely run into this outside of this loop:

  /*
   * The device is up, we can now tell the device about all the
   * ports we have here.
   */
  QTAILQ_FOREACH(port, &vser->ports, next) {
      send_control_event(vser, port->id, VIRTIO_CONSOLE_PORT_ADD, 1);
  }

Comment 7 lijin 2018-07-19 06:19:27 UTC
still hit this issue with qemu-kvm-rhev-2.12.0-7.el7.x86_64

Comment 9 Chao Yang 2019-03-22 08:12:22 UTC
Hi Pankaj,

Per https://bugzilla.redhat.com/show_bug.cgi?id=1436534#c10, we should test 'max_ports' <=31 on rhel7. Could you please clarify that are we going to fix this bug(>= 33 ports) on rhel8?

Comment 10 pagupta 2019-03-22 09:59:34 UTC
Hi Chao Yang,

The support for max_ports >31 is not present upstream. There is no business requirement for support for more than 31 ports.
So, priority of supporting this is low. Right now we are focussing on other bugs with the <=31 ports support. 

Thanks,
Pankaj