RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 771866 - [virtio-win][performance]interrupted system call when running 8 concurrent Netperf (TX) Tests with small packet
Summary: [virtio-win][performance]interrupted system call when running 8 concurrent Ne...
Keywords:
Status: CLOSED DUPLICATE of bug 684127
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Yvugenfi@redhat.com
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-05 09:42 UTC by Quan Wenli
Modified: 2013-05-17 06:40 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-17 06:40:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
netperf reports interrupted system call (3.03 KB, text/plain)
2012-01-09 06:18 UTC, Quan Wenli
no flags Details
50 repeats results, each repeat includes 8 TX netperf test with 1024 packet (128.14 KB, application/octet-stream)
2012-08-16 10:24 UTC, Quan Wenli
no flags Details
50 repeats results, each repeat includes 8 TX netperf test with 1024 packet (128.22 KB, application/octet-stream)
2012-08-16 10:27 UTC, Quan Wenli
no flags Details
Attempts to establish 8 tx channel, only 7 tx channel established successfully. (11.84 KB, application/octet-stream)
2012-08-17 08:54 UTC, Quan Wenli
no flags Details
Attempts to establish 8 tx channel, 8 tx channel were established successfully. (12.77 KB, application/octet-stream)
2012-08-17 08:56 UTC, Quan Wenli
no flags Details
netperf result with virtio-win-prewhql-39 (3.18 KB, application/octet-stream)
2012-10-09 09:21 UTC, Quan Wenli
no flags Details

Description Quan Wenli 2012-01-05 09:42:06 UTC
Description of problem:

Respectively run 8 concurrent Netperf (TX) Tests with packet 256|1024|4096|16484|65535 size, it's high probability (90%) occur system call with packet 256|1024|4096| size with error  messages:
send_tcp_maerts: data recv error: Interrupted system call len was -1

Version-Release number of selected component (if applicable):

virtio-win-prewhql-0.1.20

How reproducible:

9/10

Steps to Reproduce:
1. running netserver on the guest
2. running netperf on the external host.
as guest acts as server, the -t TCP_MAERTS test the tx capacity of guest.
for i in `seq 8`; do  netperf -H 192.168.100.32 -t TCP_MAERTS  & done
3. 
  
Actual results:

interrupted system call when running 8 concurrent Netperf (TX) Tests with small packet 

Expected results:


Additional info:
It's not regression one, could reproduce with virtio-win-1.4.0-1.

Comment 2 Yvugenfi@redhat.com 2012-01-08 12:45:50 UTC
Hello,

Can you describe the problem in more details - what do you mean by "interrupted system call" ?
Where (host, guest ) ? 

Thanks,
Yan.

Comment 3 Quan Wenli 2012-01-09 06:17:21 UTC
(In reply to comment #2)
> Hello,
> 
> Can you describe the problem in more details - what do you mean by "interrupted
> system call" ?
> Where (host, guest ) ? 
> 
> Thanks,
> Yan.

Hi,Yan 

netperf report "interrupted system call".attached the netperf error log.
netperf with -t TCP_MAERTS parameter (it test tx capacity of guest) was running on external host ,and netserver was running on window2k8R2 guest.

Comment 4 Quan Wenli 2012-01-09 06:18:26 UTC
Created attachment 551495 [details]
netperf reports interrupted system call

Comment 5 Ronen Hod 2012-01-09 16:59:46 UTC

*** This bug has been marked as a duplicate of bug 684127 ***

Comment 6 Quan Wenli 2012-01-10 10:28:17 UTC
Hi,Ronen

The netperf error message is "send_tcp_maerts: data recv error: Interrupted system call len was -1" in comment #4 ,and it's not the same with netperf error message in bug #684127 ( netperf: remote error 4Tcp:1).

Regards
wenli

Comment 7 Ronen Hod 2012-01-12 15:49:32 UTC
Wenli,

Thanks for catching this.
I still tend to believe that this is Netperf's sensitivity to high load in both cases, but it is yet to be proven.

Ronen.

Comment 8 Yvugenfi@redhat.com 2012-02-13 23:04:56 UTC
What version of netperf is used (is it even possible to understand from the
binary)? 

In latest sources: ftp://ftp.netperf.org/netperf/netperf-2.5.0.zip there is no
string that resembles the received error message.

Comment 9 Quan Wenli 2012-02-14 07:46:07 UTC
(In reply to comment #8)
> What version of netperf is used (is it even possible to understand from the
> binary)? 
> 
> In latest sources: ftp://ftp.netperf.org/netperf/netperf-2.5.0.zip there is no
> string that resembles the received error message.


I used version of netperf/netserver is 2.4.5 in both client/server site.

I just tried 2.5.0 version of netperf/netserver with command "for i in `seq 8`; do netperf -H 192.168.100.15 (ip of netserver)  -- -m 1024 & done" in guest , there is no interrupted system call occurs but with one sesssion error like "send_data:data send error: errno 104  netperf: send_omni:send_data failed : Connection reset by peer" in 100% reproduction ,it also  means netperf tests got 7 sessions throughput reply,but one failed with that error.

note: since usage of "-m parameter" changed in netperf-2.5.0 
(man info about -m bytes Set the size of the buffer passed-in to the “send” calls of a _STREAM test. ) 
could not use "-t TCP_MAERTS -- -m"  parameter in external host to test tx capability (netserver was running on the guest)

Comment 10 Yvugenfi@redhat.com 2012-02-14 22:35:24 UTC
Another questions.

Did you try to disable LSO and try to run the test with LSO disabled on the guest?

To do so you need to open device manager: Run-> devmgmt.msc -> Got to virtio net adapter -> double click -> Go to "Advanced" tab -> Set "Offload.TxLSO" to "Disable" -> Click on "OK" button.

Comment 11 Quan Wenli 2012-02-15 02:41:23 UTC
(In reply to comment #10)
> Another questions.
> 
> Did you try to disable LSO and try to run the test with LSO disabled on the
> guest?
> 
I just tried with LSO off, same error occus "send_data:data send error: errno 104  netperf: send_omni:send_data failed :Connection reset by peer" with 2/10 reproduction.

Comment 12 Yvugenfi@redhat.com 2012-02-15 09:48:02 UTC
Hello,

Following the request from MST,
Here is an experimental driver with enabled indirect buffers feature for Windows guest: 
\\smamit.eng.lab.tlv.redhat.com\win-team\Public\Yan\2Test

Please try to run the test with it.

By the way, which Windows OS version are you using?

You might need to install test certificate in order to use it on 64bit OS. To do it: double click on NetKVMTemporaryCert.cer and follow installation wizard. In command line enable test signing on the system: bcdedit /set TESTSIGNING ON . And reboot.

Best regards,
Yan.

Comment 13 Quan Wenli 2012-02-21 06:14:06 UTC
(In reply to comment #12)
> Hello,
> 
> Following the request from MST,
> Here is an experimental driver with enabled indirect buffers feature for
> Windows guest: 
> \\smamit.eng.lab.tlv.redhat.com\win-team\Public\Yan\2Test
> 
> Please try to run the test with it.

I tired following three scenarios tests 50 times respectively on the experimental driver.

results:  
1 > LSO off + each session of netperf running time 10s --- > 12/50 reproduction
2 > LSO on  + each session of netperf running time 10s ---->  8/50 reproduction
3 > LSO on  + each session of netperf running time 30s ----> 12/50 reproduction

> By the way, which Windows OS version are you using?
> 
> You might need to install test certificate in order to use it on 64bit OS. To
> do it: double click on NetKVMTemporaryCert.cer and follow installation wizard.
> In command line enable test signing on the system: bcdedit /set TESTSIGNING ON
> . And reboot.
> 
> Best regards,
> Yan.

Comment 14 Yvugenfi@redhat.com 2012-06-05 11:59:31 UTC
Cannot reproduce with latest build. please retest with build 27.

Comment 15 Quan Wenli 2012-06-06 14:53:51 UTC
(In reply to comment #14)
> Cannot reproduce with latest build. please retest with build 27.
Still can hit issue on W2k8 R2 64bit guest w/ virtio-win-1.5.1, the error is "send_tcp_maerts: data recv error: Interrupted system call len was -1".

Steps:

for i in `seq 8`; do  netperf -H 192.168.100.32 -t TCP_MAERTS -- -m 1024 & done (run it on external host)

Comment 17 Yvugenfi@redhat.com 2012-08-05 11:46:12 UTC
Hello All,

Investigation results:
1. I cannot reproduce this issue on my setup with new drivers.
2. According to the bug description “interrupted system call” printout occurs on client size (Linux, physical machine) so it is not clear whether it is related to virtio drivers at all.
3. “Interrupted system call” is not a bug rather normal system behavior under some load scenarios, applications have to know how to deal with it.



Requests:
1. Please provide exact version of QEMU that was used to reproduce the bug.

2. Please repeat the same test adding another 8 netperf clients running on the same host and routing traffic to other netperf server (physical machine without virtio drivers). See whether interrupted system call printouts occur for newly added clients.

Comment 18 Quan Wenli 2012-08-07 08:05:23 UTC
(In reply to comment #17)
> Hello All,
> 
> Investigation results:
> 1. I cannot reproduce this issue on my setup with new drivers.
> 2. According to the bug description “interrupted system call” printout
> occurs on client size (Linux, physical machine) so it is not clear whether
> it is related to virtio drivers at all.
> 3. “Interrupted system call” is not a bug rather normal system behavior
> under some load scenarios, applications have to know how to deal with it.
> 
> 
> 
> Requests:
> 1. Please provide exact version of QEMU that was used to reproduce the bug.

I thought I used qemu-kvm-0.12.1.2-2.294.el6.x86_64 to reproduce this bug.
The machine is runs for other tests.I will dobule check the QEMU version.

> 2. Please repeat the same test adding another 8 netperf clients running on
> the same host and routing traffic to other netperf server (physical machine
> without virtio drivers). See whether interrupted system call printouts occur
> for newly added clients.

Do you mean try 8 netperf client tests from host to another host?

Comment 19 Yvugenfi@redhat.com 2012-08-07 11:07:13 UTC
> Do you mean try 8 netperf client tests from host to another host?

Yes.

Comment 20 Dmitry Fleytman 2012-08-07 11:13:06 UTC
(In reply to comment #19)
> > Do you mean try 8 netperf client tests from host to another host?
> 
> Yes.

Simultaneously with the original test activity.

Comment 21 Quan Wenli 2012-08-16 10:24:55 UTC
Created attachment 604876 [details]
50 repeats results, each repeat includes 8 TX netperf test with 1024 packet

netserver version is 2.4.5 which is complied on cygwin.

Comment 22 Quan Wenli 2012-08-16 10:27:00 UTC
Created attachment 604878 [details]
50 repeats results, each repeat includes 8 TX netperf test with 1024 packet

netserver for windows was coped from https://bugzilla.redhat.com/show_bug.cgi?id=826596#c37

Comment 23 Quan Wenli 2012-08-16 10:31:06 UTC
(In reply to comment #17)

> Requests:
> 1. Please provide exact version of QEMU that was used to reproduce the bug.
Retest on rhel6.3 release host (2.6.32-279.el6 & qemu-kvm-0.12.1.2-2.295), Guest windows 2008 r2 64 bits also acts as netserver.From two groups of results in comment #21 and comment #22,there is no "Interrupted system call" string error anymore but with "establish control: are you sure there is a netserver listening on 192.168.100.21 at port 12865?" instead. That is also mean while 8 TX netperf running with 1024 packet, only get 7 tx results,one fails with that error.
 
> 2. Please repeat the same test adding another 8 netperf clients running on
> the same host and routing traffic to other netperf server (physical machine
> without virtio drivers). See whether interrupted system call printouts occur
> for newly added clients.

In this scenario, 1 tx netperf clients to VM still hit error with "establish control: are you sure there is a netserver listening on 192.168.100.21 at port 12865?"

Steps:
[root@hp-z800-04 src]# for i in `seq 8`; do  ./netperf -H 192.168.100.21 (Windows2k8 R2) -t TCP_MAERTS -l 60 -- -m 1024 & done
Simultaneously run another 8 netperf client tests in another terminal 
[root@hp-z800-04 src]# for i in `seq 8`; do  ./netperf -H 10.66.72.17 (another physical machine) -t TCP_MAERTS -l 60 -- -m 1024 & done
[root@hp-z800-04 src]# ps -ef |grep netperf
root     19321 14179 13 05:48 pts/0    00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19322 14179 13 05:48 pts/0    00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19323 14179 13 05:48 pts/0    00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19324 14179 13 05:48 pts/0    00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19325 14179 13 05:48 pts/0    00:00:07 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19326 14179 13 05:48 pts/0    00:00:07 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19327 14179 13 05:48 pts/0    00:00:08 ./netperf -H 192.168.100.21 -t TCP_MAERTS -l 60 -- -m 1024
root     19330 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19331 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19332 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19333 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19334 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19335 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19336 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024
root     19337 19278  5 05:48 pts/3    00:00:03 ./netperf -H 10.66.72.17 -t TCP_MAERTS -l 60 -- -m 1024

Comment 24 Quan Wenli 2012-08-16 10:34:03 UTC
(In reply to comment #23)
> (In reply to comment #17)
> 
> > Requests:
> > 1. Please provide exact version of QEMU that was used to reproduce the bug.
> Retest on rhel6.3 release host (2.6.32-279.el6 & qemu-kvm-0.12.1.2-2.295),
> Guest windows 2008 r2 64 bits also acts as netserver.From two groups of
> results in comment #21 and comment #22,there is no "Interrupted system call"
> string error anymore but with "establish control: are you sure there is a
> netserver listening on 192.168.100.21 at port 12865?" instead. That is also
> mean while 8 TX netperf running with 1024 packet, only get 7 tx results,one
> fails with that error.

netperf client version is 2.4.5.
Virtio-win is virtio-win-1.5.3-1.el6_3.noarch.rpm

Comment 25 Michael S. Tsirkin 2012-08-16 13:38:59 UTC
what is happening here:
netperf client has a side channel to passing commands to server
and getting stats from it.

If that side channel socket fails to connect initially,
you get establish control error.
If it fails to return stats, you get interrupted
system call.

Could it be we are loosing too many packets?
Suggest tracing with tcpdump to verify.

Comment 26 Quan Wenli 2012-08-17 08:54:08 UTC
Created attachment 605127 [details]
Attempts to establish 8 tx channel, only 7 tx channel established successfully.

Comment 27 Quan Wenli 2012-08-17 08:55:46 UTC
(In reply to comment #26)
> Created attachment 605127 [details]
> Attempts to establish 8 tx channel, only 7 tx channel established
> successfully.

test result by "tcpdump  -n -i eth1 'tcp[13] & 2 == 2' -vv"

Comment 28 Quan Wenli 2012-08-17 08:56:44 UTC
Created attachment 605129 [details]
Attempts to establish 8 tx channel, 8 tx channel were established successfully.

test result by "tcpdump  -n -i eth1 'tcp[13] & 2 == 2' -vv"

Comment 29 Quan Wenli 2012-08-17 09:43:11 UTC
(In reply to comment #25)
> what is happening here:
> netperf client has a side channel to passing commands to server
> and getting stats from it.
> 
> If that side channel socket fails to connect initially,
> you get establish control error.
> If it fails to return stats, you get interrupted
> system call.
> 
> Could it be we are loosing too many packets?
> Suggest tracing with tcpdump to verify.

Do following tests by tracked tcpdump while 8 tx clients netperf running.(Windows 2K8 R2 still acts as netserver).

- 1. test
[root@hp-z800-04 src]# tcpdump -n -i eth1
229773 packets captured
9358220 packets received by filter
9128447 packets dropped by kernel

See lots of dropped packets, after researching that dropped packets may caused by tcpdump itself.

"
Dropped packets
    At the end of its run, TCPdump will inform you if any packets were dropped in the kernel. If this becomes a problem, it's likely that your host can't keep up with the network traffic and decode it at the same time. Try using TCPdump's -w option to bypass the decoding and write the raw packets to a file, then come back later and decode the file with the -r switch. You can also try using -s to reduce the capture snapshot size. 
from http://www.freesoft.org/CIE/Topics/55.htm
"

So that, do another tests by adding -w -s parameter to tcpdump respectively.
- 2.test
[root@hp-z800-04 ~]# tcpdump -n -i eth1  -w aaa.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C9023950 packets captured
9210525 packets received by filter
186575 packets dropped by kernel

- 3.test
[root@hp-z800-04 ~]# tcpdump -n -i eth1 -w bbb.pcap -s 3000
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 3000 bytes
^C8955895 packets captured
9046557 packets received by filter
90662 packets dropped by kernel

From results in 2.test and 3.test, the dropped packets were really decreased, but still not sure if the dropped packets due to tcpdump itself since " dropped:0" shows in "ifconfig |grep eth1".

- 4.test - only capture SYN in tcpdump.
4.1 Attempts to establish 8 tx channel, only 7 tx channel established successfully.datail results in comment #26.
[root@hp-z800-04 network-scripts]# tcpdump  -n -i eth1 'tcp[13] & 2 == 2' -vv 
29 packets captured
29 packets received by filter
0 packets dropped by kernel

4.2 Attempts to establish 8 tx channel, 8 tx channel established successfully. datail results in comment #28.
[root@hp-z800-04 network-scripts]# tcpdump  -n -i eth1 'tcp[13] & 2 == 2' -vv
32 packets captured
32 packets received by filter
0 packets dropped by kernel

Comment 30 jason wang 2012-08-17 09:48:36 UTC
We met several similar issues in the past. The interrupted system call were reported by netperf when recv() were interrupt by SIGALARM.

Didn't go through all the comments, but we need to consider whether it's a bug according to rick's comment in https://bugzilla.redhat.com/show_bug.cgi?id=717549#c1 and my experimentation in https://bugzilla.redhat.com/show_bug.cgi?id=713063#c6. 

According to rick's comment, you can try to use -l -1G to check whether the network is still works after your stress test. If you are doing a performance test, better to enable the demo mode and check the time stamp to do the correct calculation of the aggregated throughput.

So in conclusion, it's very questionable that we fail the test cases when we see interrupted system call

Comment 31 Ronen Hod 2012-08-20 07:37:47 UTC
Let me be sure that I understand. (I did not examine the log files).
Does the "problematic" timer expire just because the entire Netperf test did not finish. Does it happen even if there was some recent traffic in that test?
- If this is the case, then we definatly need to run netperf differently.
- If there was no traffic at all in this test for 5 minutes (or so) then we need to understand why. Probably this is not the case.

Ronen.

Comment 32 Yvugenfi@redhat.com 2012-10-02 11:45:09 UTC
Please retest with build 39.

Comment 33 Quan Wenli 2012-10-09 09:21:32 UTC
Created attachment 624007 [details]
netperf result with virtio-win-prewhql-39

Still hit error "send_tcp_maerts: data recv error: Interrupted system call len was -1" with virtio-win-prewhql-39.
=================================
Driver Date:10/1/2012
Driver Version: 61.64.104.3900
=================================

Comment 35 Quan Wenli 2012-11-07 08:51:42 UTC
Never hit interrupted system call on window2003 64 bit guest with virto-win-prewhql-41,but still high reproduction(about 80%) on win2008 r2 guest with virtio-win-prewhql-41.

Comment 37 Dmitry Fleytman 2013-05-17 06:40:13 UTC

*** This bug has been marked as a duplicate of bug 684127 ***


Note You need to log in before you can comment on or make changes to this bug.