Hide Forgot
Description of problem: Respectively run 8 concurrent Netperf (TX) Tests with packet 256|1024|4096|16484|65535 size, it's high probability (90%) occur system call with packet 256|1024|4096| size with error messages: send_tcp_maerts: data recv error: Interrupted system call len was -1 Version-Release number of selected component (if applicable): virtio-win-prewhql-0.1.20 How reproducible: 9/10 Steps to Reproduce: 1. running netserver on the guest 2. running netperf on the external host. as guest acts as server, the -t TCP_MAERTS test the tx capacity of guest. for i in `seq 8`; do netperf -H 192.168.100.32 -t TCP_MAERTS & done 3. Actual results: interrupted system call when running 8 concurrent Netperf (TX) Tests with small packet Expected results: Additional info: It's not regression one, could reproduce with virtio-win-1.4.0-1.
Hello, Can you describe the problem in more details - what do you mean by "interrupted system call" ? Where (host, guest ) ? Thanks, Yan.
(In reply to comment #2) > Hello, > > Can you describe the problem in more details - what do you mean by "interrupted > system call" ? > Where (host, guest ) ? > > Thanks, > Yan. Hi,Yan netperf report "interrupted system call".attached the netperf error log. netperf with -t TCP_MAERTS parameter (it test tx capacity of guest) was running on external host ,and netserver was running on window2k8R2 guest.
Created attachment 551495 [details] netperf reports interrupted system call
*** This bug has been marked as a duplicate of bug 684127 ***
Hi,Ronen The netperf error message is "send_tcp_maerts: data recv error: Interrupted system call len was -1" in comment #4 ,and it's not the same with netperf error message in bug #684127 ( netperf: remote error 4Tcp:1). Regards wenli
Wenli, Thanks for catching this. I still tend to believe that this is Netperf's sensitivity to high load in both cases, but it is yet to be proven. Ronen.
What version of netperf is used (is it even possible to understand from the binary)? In latest sources: ftp://ftp.netperf.org/netperf/netperf-2.5.0.zip there is no string that resembles the received error message.
(In reply to comment #8) > What version of netperf is used (is it even possible to understand from the > binary)? > > In latest sources: ftp://ftp.netperf.org/netperf/netperf-2.5.0.zip there is no > string that resembles the received error message. I used version of netperf/netserver is 2.4.5 in both client/server site. I just tried 2.5.0 version of netperf/netserver with command "for i in `seq 8`; do netperf -H 192.168.100.15 (ip of netserver) -- -m 1024 & done" in guest , there is no interrupted system call occurs but with one sesssion error like "send_data:data send error: errno 104 netperf: send_omni:send_data failed : Connection reset by peer" in 100% reproduction ,it also means netperf tests got 7 sessions throughput reply,but one failed with that error. note: since usage of "-m parameter" changed in netperf-2.5.0 (man info about -m bytes Set the size of the buffer passed-in to the “send” calls of a _STREAM test. ) could not use "-t TCP_MAERTS -- -m" parameter in external host to test tx capability (netserver was running on the guest)
Another questions. Did you try to disable LSO and try to run the test with LSO disabled on the guest? To do so you need to open device manager: Run-> devmgmt.msc -> Got to virtio net adapter -> double click -> Go to "Advanced" tab -> Set "Offload.TxLSO" to "Disable" -> Click on "OK" button.
(In reply to comment #10) > Another questions. > > Did you try to disable LSO and try to run the test with LSO disabled on the > guest? > I just tried with LSO off, same error occus "send_data:data send error: errno 104 netperf: send_omni:send_data failed :Connection reset by peer" with 2/10 reproduction.
Hello, Following the request from MST, Here is an experimental driver with enabled indirect buffers feature for Windows guest: \\smamit.eng.lab.tlv.redhat.com\win-team\Public\Yan\2Test Please try to run the test with it. By the way, which Windows OS version are you using? You might need to install test certificate in order to use it on 64bit OS. To do it: double click on NetKVMTemporaryCert.cer and follow installation wizard. In command line enable test signing on the system: bcdedit /set TESTSIGNING ON . And reboot. Best regards, Yan.
(In reply to comment #12) > Hello, > > Following the request from MST, > Here is an experimental driver with enabled indirect buffers feature for > Windows guest: > \\smamit.eng.lab.tlv.redhat.com\win-team\Public\Yan\2Test > > Please try to run the test with it. I tired following three scenarios tests 50 times respectively on the experimental driver. results: 1 > LSO off + each session of netperf running time 10s --- > 12/50 reproduction 2 > LSO on + each session of netperf running time 10s ----> 8/50 reproduction 3 > LSO on + each session of netperf running time 30s ----> 12/50 reproduction > By the way, which Windows OS version are you using? > > You might need to install test certificate in order to use it on 64bit OS. To > do it: double click on NetKVMTemporaryCert.cer and follow installation wizard. > In command line enable test signing on the system: bcdedit /set TESTSIGNING ON > . And reboot. > > Best regards, > Yan.
Cannot reproduce with latest build. please retest with build 27.
(In reply to comment #14) > Cannot reproduce with latest build. please retest with build 27. Still can hit issue on W2k8 R2 64bit guest w/ virtio-win-1.5.1, the error is "send_tcp_maerts: data recv error: Interrupted system call len was -1". Steps: for i in `seq 8`; do netperf -H 192.168.100.32 -t TCP_MAERTS -- -m 1024 & done (run it on external host)
Hello All, Investigation results: 1. I cannot reproduce this issue on my setup with new drivers. 2. According to the bug description “interrupted system call” printout occurs on client size (Linux, physical machine) so it is not clear whether it is related to virtio drivers at all. 3. “Interrupted system call” is not a bug rather normal system behavior under some load scenarios, applications have to know how to deal with it. Requests: 1. Please provide exact version of QEMU that was used to reproduce the bug. 2. Please repeat the same test adding another 8 netperf clients running on the same host and routing traffic to other netperf server (physical machine without virtio drivers). See whether interrupted system call printouts occur for newly added clients.
(In reply to comment #17) > Hello All, > > Investigation results: > 1. I cannot reproduce this issue on my setup with new drivers. > 2. According to the bug description “interrupted system call” printout > occurs on client size (Linux, physical machine) so it is not clear whether > it is related to virtio drivers at all. > 3. “Interrupted system call” is not a bug rather normal system behavior > under some load scenarios, applications have to know how to deal with it. > > > > Requests: > 1. Please provide exact version of QEMU that was used to reproduce the bug. I thought I used qemu-kvm-0.12.1.2-2.294.el6.x86_64 to reproduce this bug. The machine is runs for other tests.I will dobule check the QEMU version. > 2. Please repeat the same test adding another 8 netperf clients running on > the same host and routing traffic to other netperf server (physical machine > without virtio drivers). See whether interrupted system call printouts occur > for newly added clients. Do you mean try 8 netperf client tests from host to another host?
> Do you mean try 8 netperf client tests from host to another host? Yes.
(In reply to comment #19) > > Do you mean try 8 netperf client tests from host to another host? > > Yes. Simultaneously with the original test activity.
Created attachment 604876 [details] 50 repeats results, each repeat includes 8 TX netperf test with 1024 packet netserver version is 2.4.5 which is complied on cygwin.
Created attachment 604878 [details] 50 repeats results, each repeat includes 8 TX netperf test with 1024 packet netserver for windows was coped from https://bugzilla.redhat.com/show_bug.cgi?id=826596#c37
(In reply to comment #17) > Requests: > 1. Please provide exact version of QEMU that was used to reproduce the bug. Retest on rhel6.3 release host (2.6.32-279.el6 & qemu-kvm-0.12.1.2-2.295), Guest windows 2008 r2 64 bits also acts as netserver.From two groups of results in comment #21 and comment #22,there is no "Interrupted system call" string error anymore but with "establish control: are you sure there is a netserver listening on 192.168.100.21 at port 12865?" instead. That is also mean while 8 TX netperf running with 1024 packet, only get 7 tx results,one fails with that error. > 2. Please repeat the same test adding another 8 netperf clients running on > the same host and routing traffic to other netperf server (physical machine > without virtio drivers). See whether interrupted system call printouts occur > for newly added clients. In this scenario, 1 tx netperf clients to VM still hit error with "establish control: are you sure there is a netserver listening on 192.168.100.21 at port 12865?" Steps: [root@hp-z800-04 src]# for i in `seq 8`; do ./netperf -H 192.168.100.21 (Windows2k8 R2) -t TCP_MAERTS -l 60 -- -m 1024 & done Simultaneously run another 8 netperf client tests in another terminal [root@hp-z800-04 src]# for i in `seq 8`; do ./netperf -H 10.66.72.17 (another physical machine) -t TCP_MAERTS -l 60 -- -m 1024 & done [root@hp-z800-04 src]# ps -ef |grep netperf root 19321 14179 13 05:48 pts/0 00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19322 14179 13 05:48 pts/0 00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19323 14179 13 05:48 pts/0 00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19324 14179 13 05:48 pts/0 00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19325 14179 13 05:48 pts/0 00:00:07 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19326 14179 13 05:48 pts/0 00:00:07 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19327 14179 13 05:48 pts/0 00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024 root 19330 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19331 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19332 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19333 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19334 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19335 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19336 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024 root 19337 19278 5 05:48 pts/3 00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
(In reply to comment #23) > (In reply to comment #17) > > > Requests: > > 1. Please provide exact version of QEMU that was used to reproduce the bug. > Retest on rhel6.3 release host (2.6.32-279.el6 & qemu-kvm-0.12.1.2-2.295), > Guest windows 2008 r2 64 bits also acts as netserver.From two groups of > results in comment #21 and comment #22,there is no "Interrupted system call" > string error anymore but with "establish control: are you sure there is a > netserver listening on 192.168.100.21 at port 12865?" instead. That is also > mean while 8 TX netperf running with 1024 packet, only get 7 tx results,one > fails with that error. netperf client version is 2.4.5. Virtio-win is virtio-win-1.5.3-1.el6_3.noarch.rpm
what is happening here: netperf client has a side channel to passing commands to server and getting stats from it. If that side channel socket fails to connect initially, you get establish control error. If it fails to return stats, you get interrupted system call. Could it be we are loosing too many packets? Suggest tracing with tcpdump to verify.
Created attachment 605127 [details] Attempts to establish 8 tx channel, only 7 tx channel established successfully.
(In reply to comment #26) > Created attachment 605127 [details] > Attempts to establish 8 tx channel, only 7 tx channel established > successfully. test result by "tcpdump -n -i eth1 'tcp[13] & 2 == 2' -vv"
Created attachment 605129 [details] Attempts to establish 8 tx channel, 8 tx channel were established successfully. test result by "tcpdump -n -i eth1 'tcp[13] & 2 == 2' -vv"
(In reply to comment #25) > what is happening here: > netperf client has a side channel to passing commands to server > and getting stats from it. > > If that side channel socket fails to connect initially, > you get establish control error. > If it fails to return stats, you get interrupted > system call. > > Could it be we are loosing too many packets? > Suggest tracing with tcpdump to verify. Do following tests by tracked tcpdump while 8 tx clients netperf running.(Windows 2K8 R2 still acts as netserver). - 1. test [root@hp-z800-04 src]# tcpdump -n -i eth1 229773 packets captured 9358220 packets received by filter 9128447 packets dropped by kernel See lots of dropped packets, after researching that dropped packets may caused by tcpdump itself. " Dropped packets At the end of its run, TCPdump will inform you if any packets were dropped in the kernel. If this becomes a problem, it's likely that your host can't keep up with the network traffic and decode it at the same time. Try using TCPdump's -w option to bypass the decoding and write the raw packets to a file, then come back later and decode the file with the -r switch. You can also try using -s to reduce the capture snapshot size. from http://www.freesoft.org/CIE/Topics/55.htm " So that, do another tests by adding -w -s parameter to tcpdump respectively. - 2.test [root@hp-z800-04 ~]# tcpdump -n -i eth1 -w aaa.pcap tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes ^C9023950 packets captured 9210525 packets received by filter 186575 packets dropped by kernel - 3.test [root@hp-z800-04 ~]# tcpdump -n -i eth1 -w bbb.pcap -s 3000 tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 3000 bytes ^C8955895 packets captured 9046557 packets received by filter 90662 packets dropped by kernel From results in 2.test and 3.test, the dropped packets were really decreased, but still not sure if the dropped packets due to tcpdump itself since " dropped:0" shows in "ifconfig |grep eth1". - 4.test - only capture SYN in tcpdump. 4.1 Attempts to establish 8 tx channel, only 7 tx channel established successfully.datail results in comment #26. [root@hp-z800-04 network-scripts]# tcpdump -n -i eth1 'tcp[13] & 2 == 2' -vv 29 packets captured 29 packets received by filter 0 packets dropped by kernel 4.2 Attempts to establish 8 tx channel, 8 tx channel established successfully. datail results in comment #28. [root@hp-z800-04 network-scripts]# tcpdump -n -i eth1 'tcp[13] & 2 == 2' -vv 32 packets captured 32 packets received by filter 0 packets dropped by kernel
We met several similar issues in the past. The interrupted system call were reported by netperf when recv() were interrupt by SIGALARM. Didn't go through all the comments, but we need to consider whether it's a bug according to rick's comment in https://bugzilla.redhat.com/show_bug.cgi?id=717549#c1 and my experimentation in https://bugzilla.redhat.com/show_bug.cgi?id=713063#c6. According to rick's comment, you can try to use -l -1G to check whether the network is still works after your stress test. If you are doing a performance test, better to enable the demo mode and check the time stamp to do the correct calculation of the aggregated throughput. So in conclusion, it's very questionable that we fail the test cases when we see interrupted system call
Let me be sure that I understand. (I did not examine the log files). Does the "problematic" timer expire just because the entire Netperf test did not finish. Does it happen even if there was some recent traffic in that test? - If this is the case, then we definatly need to run netperf differently. - If there was no traffic at all in this test for 5 minutes (or so) then we need to understand why. Probably this is not the case. Ronen.
Please retest with build 39.
Created attachment 624007 [details] netperf result with virtio-win-prewhql-39 Still hit error "send_tcp_maerts: data recv error: Interrupted system call len was -1" with virtio-win-prewhql-39. ================================= Driver Date:10/1/2012 Driver Version: 61.64.104.3900 =================================
Never hit interrupted system call on window2003 64 bit guest with virto-win-prewhql-41,but still high reproduction(about 80%) on win2008 r2 guest with virtio-win-prewhql-41.