Hide Forgot
Description of problem: When testing for network latency, we observe that sometimes the latency for a packet can be much higher than the average latency. For example, average latency can be 7 usec, while maximum latency can be 99 usec Version-Release number of selected component (if applicable): openvswitch-dpdk-2.4.0-0.10346.git97bab959.1 How reproducible: Requires a 12 hour test Steps to Reproduce: 1. Configure KVM host for reealtime, reserving 4 cpus for openvswitch PMD and 2 cpus for KVM rt-vcpus 2. Set up same KVM host with openvswitch-dpdk, two bridges, each with 1 10Gb dpdk port and 1 vhostuserdpdk port. Create 4 dpdk PMD threads and set to fifo:95 2. Create VM with 2 extra virtio-net interfaces, each assigned to 1 of the ovs bridges 3. On an external system, run a packet generator which can measure packet latency and record minimum, average, and maximum latency over a 12 hour period. Actual results: Maximum latency several times longer than average latency Expected results: Maximum latency no higher than 2x of average latency Additional info: In order to maintain low latency, poll-mode-driver DPDK threads must run without significant interruption (for example from preemption of a higher priority thread or waiting for mutex). We observed via /proc/sched_debug that these threads do get blocked. This has been traced to calls to ovsrcu_quiesce() in the poll-mode-driver threads, which run pmd_thread_main(). There are however other threads which also call ovsrcu_quiesce(). That function calls req_seq, which calls ovs_mutex_lock(&seq_mutex). These threads can block for a long period of time, close to the latency that the packet generator reports.