Hide Forgot
Description of problem: Many vif interfaces has packet losses in TX. Physical and bonding interfaces are clean. Version-Release number of selected component (if applicable): 5.5 2.6.18-194.3.1.el5xen How reproducible: Steps to Reproduce: 1. Install xen host server with 5.5 2.6.18-194.3.1.el5xen 2. Install xen a dozen VMs 3. Generate network traffic Actual results: Vif interfaces for VMs showed packet loss in TX Expected results: No packet losses in TX or RX. Additional info:
Hi Simon, is this a regression? Have you tried RHEL 5.6? Thanks, Drew
Hi Simon, I can see two drop causes in netfront / network_start_xmit(): 932 frags += (offset + len + PAGE_SIZE - 1) / PAGE_SIZE; 933 if (unlikely(frags > MAX_SKB_FRAGS + 1)) { 934 printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", 935 frags); 936 dump_stack(); 937 goto drop; 938 } 939 940 spin_lock_irq(&np->tx_lock); 941 942 if (unlikely(!netif_carrier_ok(dev) || 943 (frags > 1 && !xennet_can_sg(dev)) || 944 netif_needs_gso(dev, skb))) { 945 spin_unlock_irq(&np->tx_lock); 946 goto drop; 947 } 1019 drop: 1020 np->stats.tx_dropped++; 1021 dev_kfree_skb(skb); 1022 return 0; Do you see messages implying the first kind? Thanks.
(In reply to comment #1) > Hi Simon, > > is this a regression? Have you tried RHEL 5.6? > > Thanks, > Drew Vif packet loss is also seen on another host server running RHEL 5.6. Simon
(In reply to comment #3) > Hi Simon, > > I can see two drop causes in netfront / network_start_xmit(): > > 932 frags += (offset + len + PAGE_SIZE - 1) / PAGE_SIZE; > 933 if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > 934 printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > 935 frags); > 936 dump_stack(); > 937 goto drop; > 938 } > 939 > 940 spin_lock_irq(&np->tx_lock); > 941 > 942 if (unlikely(!netif_carrier_ok(dev) || > 943 (frags > 1 && !xennet_can_sg(dev)) || > 944 netif_needs_gso(dev, skb))) { > 945 spin_unlock_irq(&np->tx_lock); > 946 goto drop; > 947 } > > 1019 drop: > 1020 np->stats.tx_dropped++; > 1021 dev_kfree_skb(skb); > 1022 return 0; > > Do you see messages implying the first kind? Thanks. "xm log" or "xm dmesg" does not show any packet drop in logs. Do I need to turn up log level?
(In reply to comment #5) > (In reply to comment #3) > > I can see two drop causes in netfront / network_start_xmit(): > > 934 printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > > 935 frags); > > Do you see messages implying the first kind? Thanks. > > "xm log" or "xm dmesg" does not show any packet drop in logs. Do I need to turn > up log level? No, this message is logged (if it is logged) by the guest kernel, and on the second highest criticality level. You probably don't have this message and we take the other branch; I'd just like to be sure. Thanks.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #3) > > > > I can see two drop causes in netfront / network_start_xmit(): > > > > 934 printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > > > 935 frags); > > > > Do you see messages implying the first kind? Thanks. > > > > "xm log" or "xm dmesg" does not show any packet drop in logs. Do I need to turn > > up log level? > > No, this message is logged (if it is logged) by the guest kernel, and on the > second highest criticality level. You probably don't have this message and we > take the other branch; I'd just like to be sure. Thanks. There are no packet drop on guest interface. So guests are not aware of any packet loss on vif interfaces.
(In reply to comment #7) > There are no packet drop on guest interface. So guests are not aware of any > packet loss on vif interfaces. It wasn't clear to me from comment 0 ("Vif interfaces for VMs showed packet loss in TX") that it's about netback. Netback also seems to have two drop cases: 239 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) 240 { 241 netif_t *netif = netdev_priv(dev); 242 243 BUG_ON(skb->dev != dev); 244 245 /* Drop the packet if the target domain has no receive buffers. */ 246 if (unlikely(!netif_schedulable(dev) || netbk_queue_full(netif))) 247 goto drop; 248 249 /* 250 * Copy the packet here if it's destined for a flipping interface 251 * but isn't flippable (e.g. extra references to data). 252 */ 253 if (!netif->copying_receiver) { 254 struct sk_buff *nskb = netbk_copy_skb(skb); 255 if ( unlikely(nskb == NULL) ) 256 goto drop; 273 if (netbk_queue_full(netif)) { 274 netif_stop_queue(dev); 275 /* 276 * Schedule 500ms timeout to restart the queue, thus 277 * ensuring that an inactive queue will be drained. 278 * Packets will be immediately be dropped until more 279 * receive buffers become available (see 280 * netbk_queue_full() check above). 281 */ 293 drop: 294 netif->stats.tx_dropped++; 295 dev_kfree_skb(skb); 296 return 0; I think the "netbk_queue_full(netif)" case is likely; that IMHO can be a consequence of CPU starvation of the guest. (Note that bug 681962 / bug 456171 took care of any evtchn notification imbalance, so I'm fairly sure the packet loss is evenly distributed among the guests, proportionally to the generated network/CPU load.) If the host machine is massively over-subscribed (from comment 0: "2. Install xen a dozen VMs"), then a domU may not get a chance to drain its queue / rx shared ring -- it may remain unscheduled for a longer time. In this case netback drops the packets on the floor, and the transport level (TCP) or the application level (UDP) is supposed to retransmit. Does the packet loss worsen as the number of guests increases, assuming the same relative load on each guest? If so, I think there won't be much we can do and what you see is designed-in "graceful" degradation.
Simon, is this the case described by Laszlo? (In reply to comment #8) > > If the host machine is massively over-subscribed (from comment 0: "2. Install > xen a dozen VMs"), then a domU may not get a chance to drain its queue / rx > shared ring -- it may remain unscheduled for a longer time. In this case > netback drops the packets on the floor, and the transport level (TCP) or the > application level (UDP) is supposed to retransmit. > > Does the packet loss worsen as the number of guests increases, assuming the > same relative load on each guest? If so, I think there won't be much we can do > and what you see is designed-in "graceful" degradation.
There are 8 VMs running on the host server. Out of 14 vif interfaces, TX packet drops are found on 6 vif interfaces (on two vifs, very mininal). The host server has 2 QuadCore CPUs. So CPU wise, it's not over-subscribed. Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 894 8 r----- 1637600.6 xen_vm1 48 775 1 ------ 19360.5 xen_vm2 72 512 1 -b---- 188.7 xen_vm3 30 1023 1 -b---- 235466.4 xen_vm4 63 2048 1 -b---- 12394.1 xen_vm5 71 2047 2 -b---- 514.5 xen_vm6 55 2055 1 -b---- 192625.5 xen_vm7 69 2047 1 -b---- 876.3 xen_vm8 64 4095 2 -b---- 74964.2 Simon
Can you post the output of xentop while packet drops are happening?
> the dropped packets will not change after guest booted. Laszlo, could this be fixed by your change to move the Connected status?
Created attachment 517519 [details] dmesg.714283
Created attachment 517544 [details] add some info messages to netback/netif_be_start_xmit() This is the patch that generated the new output in dmesg. (In reply to comment #18) > 4 out of 8 vifs have TX packet drops Actually, five vifs show up with "netif_be_start_xmit(): ...": vif1.0, vif2.0, vif6.0, vif7.0, vif8.0 I grepped the dmesg separately for each of them, and the output always looks like this: device vif<X>.0 entered promiscuous mode ADDRCONF(NETDEV_UP): vif<X>.0: link is not ready ADDRCONF(NETDEV_CHANGE): vif<X>.0: link becomes ready xenbr0: port <Z>(vif<X>.0) entering forwarding state netif_be_start_xmit(): vif<X>.0: can_queue && full netif_be_start_xmit(): vif<X>.0: stopping queue This is where the vif1.0 grep output starts, but the other four continue like this: netif_be_start_xmit(): vif<X>.0: schedulable=1 full=1 txdrop=0 netif_be_start_xmit(): vif<X>.0: schedulable=1 full=1 txdrop=1 ... netif_be_start_xmit(): vif<X>.0: schedulable=1 full=1 txdrop=<Y> vif8.0 stops here, but vif2.0, vif6.0, and vif7.0 end with the additional entries netif_be_start_xmit(): vif<X>.0: can_queue && full netif_be_start_xmit(): vif<X>.0: stopping queue Note that these entries are all interleaved in the dmesg. Unfortunately, I didn't ask for the /var/log/messages file instead of the dmesg, but I suspect these were all very close to each other in time. I figure some guests were started at the same time (based on the adjacent NETDEV_UP messages as well), and during that short time frame, the processors *were* over-subscribed, or at least the guests were not advancing for some reason. I think what netback does is (was...) intended, since these logs perfectly match the comment near where I added the "stopping queue" message: /* * Schedule 500ms timeout to restart the queue, thus * ensuring that an inactive queue will be drained. * Packets will be immediately be dropped until more * receive buffers become available (see * netbk_queue_full() check above). */ Instead of stopping the queue, we could try to crash the guest immediately from netback (and configure the vm to have xend dump its core on crash). This way we could look at the guest state (post-mortem). Hm, instead of that, perhaps netback is wrong to sleep, see http://lists.xensource.com/archives/html/xen-devel/2010-12/msg00921.html http://lists.xensource.com/archives/html/xen-devel/2010-12/msg01035.html http://lists.xensource.com/archives/html/xen-devel/2010-12/msg01080.html and http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdiff;h=1d453c86 It wasn't backported to linux-2.6.18-xen.hg though.
If I understand the upstream patch in comment 20, the problem is not the sleep -- the problem is that after 500ms of sleep, we prematurely re-enable the queue. The guest could be unresponsive for a bit longer than that, and so netback throws away packets destined for the guest. The upstream patch lets the queue disabled as long as necessary, and the packets destined for the guest sit tight wherever they are. Once the guest returns to life, the net_rx_action() tasklet function should wake the queue again. if (netif_queue_stopped(netif->dev) && netif_schedulable(netif->dev) && !netbk_queue_full(netif)) netif_wake_queue(netif->dev);
Created attachment 517553 [details] Remove the 500ms timeout to restart the netif queue (backport 1d453c86 from Jeremy's tree) "It is generally unhelpful as it results in a massive tail-drop should a guest become unresponsive for a relatively short period of time and no back-pressure (other than that caused by a higher layer protocol) is applied to the sender." "The code to restart the queue when space becomes available in the shared ring is already there [in net_rx_action()]. Removal of the timeout simply means that the queue will not be restarted until that code runs."
I agree. Since the code is under netbk_can_queue(dev), it's surely better to let netfront handle flow control, rather than re-enabling after an arbitrary amount of time. It looks like the code was added to band-aid a separate bug.
Simon, can you please describe what kind of actual problems the TX drop caused? Thank you.
Hello Simon, (In reply to comment #30) > Simon, can you please describe what kind of actual problems the TX drop caused? > Thank you. again, can you please elaborate on this? Our internal testing and the candidate upstream patch's commit log seem to confirm that the TX droppage is caused by short periods of guest unresponsiveness. The testing done on this end seems to imply that such unresponsiveness is mostly present at guest bootup, thus the harm caused by the TX drop appears to have low impact. OTOH, applying the patch (attachment 517553 [details]) caused hard-to-reproduce & hard-to-debug, but recurring network hangs in our automated test environment. The hangs are most probably inherent in the *removal* of the automatic queue wakeup, which automatic wakeup is also the root cause of the TX droppage. So, unless you state this problem does serious damage to your operations, I'm going to close this with WONTFIX resolution tomorrow. Thank you.
Packet loss on the Vifs caused performance degradation, intermittent service interruption on the VMs. These are serious problems since there would be no guarantee for stable and reliable services provided by the affected VMs. As a matter of fact, the more heavily used by an VM, the more packet losses could be seen on that vif interface. Simon
Hello Simon, (In reply to comment #35) > As a matter of fact, the more heavily used by an VM, the more packet losses > could be seen on that vif interface. At least this confirms comment 8. Ie. the more heavily a VM is used, the less time it can spend on draining the queue of its netfront driver, and so netback drops packets. I'll try to look into netbk_queue_full() with the help of bug 697991 comment 18. Can you tell us more about the vm configs and the kind of load on the VMs, so that we may be able to reproduce it? Does it help if you increase the VCPU number in the guest (so that the host is still not overloaded)? Or if you decrease the thread count in the application(s) that the VM runs? When the VM is heavily loaded, what are the sys/user percentages? What is the biggest sys time consumer? (Perhaps we'll have to look at this with oprofile.) Thanks.
If this issue can be avoided by adding more slack to the system, i.e. by reducing the number of running VMs by one or by some other load reduction. Then I'm inclined to leave it in WONTFIX. It's already too late for any fix of this nature to go to the next minor release, and in general fixes that have potential to introduce regressions (as we've seen with the initial trial patch for this bug) are way too risky for RHEL Xen at this stage in its life. Simon, Please find the percentage of load where all your VMs are able to perform well. Naturally, if this percentage is way too low, then we still need to investigate this issue. However, if you can keep all the VMs happy with a high, but perhaps not full, load on the host, then I'd say we should take that as the solution.
Created attachment 529525 [details] Xen top
There is no CPU overcommit in this xentop output.
More details about the host server: PowerEdge R710 RHEL 5.6 2.6.18-238.9.1.el5xen, 2 Quad-core CPUs, 24G memory. Networking: Bonding two network interfaces, 5 vlans (one default). The VM config is like: name = "xenvm7" memory = "4096" disk = [ 'tap:aio:/nfs/vms/xenvm7.img,xvda,w', 'tap:aio:/nfs/vms/xenvm7_2.img,xvdb,w', 'tap:aio:/nfs/vms/xenvm7_3.img,xvdc,w', 'tap:aio:/nfs/vms/xenvm7_4.img,xvdd,w' ] vif = [ 'mac=00:16:e3:15:50:4f, bridge=xenbr120', ] bootloader="/usr/bin/pygrub" vcpus=2 on_reboot = 'restart' on_crash = 'restart'
> Does it help if you increase the VCPU number in the guest (so that the host is > still not overloaded)? Or if you decrease the thread count in the > application(s) that the VM runs? I can't test them easily since those VMs are production machines. > When the VM is heavily loaded, what are the sys/user percentages? What is the > biggest sys time consumer? (Perhaps we'll have to look at this with oprofile.) I'd be happy to gather additional data if you let me know how.
Hello Simon, unfortunately we can't dedicate any further resources to this BZ. If you have a RHEL subscription, you might want open a Customer Portal case, and then Global Support Services could become involved. Thank you for your understanding.
(In reply to comment #43) > Hello Simon, > > unfortunately we can't dedicate any further resources to this BZ. If you have a > RHEL subscription, you might want open a Customer Portal case, and then Global > Support Services could become involved. Thank you for your understanding. This issue is more like software bug than system configuration problem. That's reason why I filed the bug here. For configuration related issue, I would contact Global Support Services. I really appreciate if you can re-open this bug and fix it. Simon
Simon, I still haven't seen a response to my inquiry in comment 37. Without knowing exactly how much load is on the system when experiencing packet loss then we're inclined to believe it is a config issue, as this is the only report of this kind. Also, even if a software change could allow the load to increase without packet loss, then if the patch is too risky for RHEL5 Xen then we wont be putting it in, and a config change would still be necessary as a workaround. This is why we've recommended contacting GSS in order to get help with your config based on some best practices they may be aware of. I'm closing this bug again. Please don't reopen it. After working with GSS if all parities believe a bug should be opened, then a new bug can be opened at that time. Drew
I believe this bug was just recently fixed by the following patch: https://lkml.org/lkml/2014/7/30/315 by Zoltan Kiss (xen-netfront: Fix handling packets on compound pages with skb_segment). Could we get a fix in RHEL kernels too please?