Bug 1280040
Summary: | Difficulty consistently processing more than 1 packet per burst with vhostuser | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrew Theurer <atheurer> |
Component: | openvswitch-dpdk | Assignee: | Flavio Leitner <fleitner> |
Status: | CLOSED WORKSFORME | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.3 | CC: | aloughla, atragler, david.marchand, fbaudin, jean-mickael.guerin, kzhang, mleitner, rkhan, sukulkar, thibaut.collet, vincent.jardin |
Target Milestone: | rc | ||
Target Release: | 7.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-16 17:43:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1301628, 1313485 |
Description
Andrew Theurer
2015-11-10 19:40:15 UTC
Hi, I could reproduce this with DPDK 16.04. It is interesting because the stats now are a bit better and spread from 1 to 7 packets per batch, but still far away from the upper limit 64. If I add a simple log to show the burst received to the virtio burst rx for every 10k packets received, then the logs and the stats are 100% at the upper limit 64. So, I believe the guest is polling too fast and finding only few packets available to fetch at each time. When I add a debug, it slows down enough to accumulate in the virtio queue allowing full batch sizes. The problem remains though and the question if the host can't push more because the ring is full, then why the guest is finding only a few packets? To answer this question I disabled mergeable buffers in the testpmd. The result is full batch size all the time: Rx-bursts: 1002289 [99% of 64 pkts + 1% of others] So, it seems like mergeable buffers is causing the issue but I don't know yet the reason. Hi, I've looked more into this but using our current versions (OVS 2.5 + DPDK 2.2 and DPDK 16.04 in the guest). I could not reproduce the issue unless in two situations: Low traffic rate or qemu thread running on another socket. In any case, I changed OVS code to record the last 64 batch sizes sent by PMD and changed testpmd code to show the batch size and number of used entries in the ring after a number of packets. I also enabled the RX burst stats in testpmd. What I see is that with low tput rate obviously the batch is smaller so the NIC sends small batches at each time and that is what the guest gets as a consequence. This is normal and expected. With higher tput close to 0 drop rates, the used buffers in guest bounces between 1/3 and 2/3 of ring's total entries (255), so the guest can get batches with sizes varying from 1 to 64 (depending on testpmd configuration). The testpmd can always read all available buffers when they are available in the ring. For instance, with 222 busy entries, testpmd will get at least 3 batches of 64 in sequence. In the third case the qemu (vcpu thread) is running on another socket, so the memory operations have the known additional cost. In this case the testpmd gets mostly one or very few packets in each single batch. Looking at the host, it is pushing batches of 32, but at very low rate because the time spent copying causes the ethernet device's queue to overflow. So, vhost-user is actually forwarding everything (no drops) but the NIC is dropping most of them. As a result, the guest gets mostly one packet in each batch. The tput rate is about 1.6~1.8Mpps, so it doesn't look like the case reported here. I can revisit this but based on the above and experience with other related tests, I'd say that the current versions are okay and batches are working as expected. Having said that, I will close this bug. If you disagree, please re-open providing more details on how to reproduce the issue so I can take a second and more focused look. Thanks! fbl |