Bug 1370411

Summary: [Performance] netperf clients session number is smaller than it on x86 platform during netperf.host_guest testing.
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: knoel, lvivier, mdeng, michen, qzhang, virt-maint, wquan, yama
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-21 13:25:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
PY file
none
LOG
none
Referencefile none

Comment 3 Min Deng 2016-08-29 02:29:45 UTC
Successfully complete tests with the following parameters on Power platform
    sessions = "1 2 4 6" -* 
    sessions_rr = "1 20" -*  
    sizes = "64 256 1024" -*
    sizes_rr = "64 256 4096"
Currently on X86 platform 
    sessions = "1 2 4 8" 
    sessions_rr = "1 25 50" 
    sizes = "64 256 1024 4096 16384 65535"
    sizes_rr = "64 256 4096"

Comment 4 Laurent Vivier 2016-09-06 09:52:33 UTC
It seems your test is:

numactl --cpunodebind=17 --membind=17 /tmp/netperf_agent.py 50 /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.76 -l 90.0 -t TCP_CRR -v 1 -- -r 64,64

I need to know what is "netperf_agent.py" and what is the real error (more verbose than "Error, not all netperf clients at work")

Comment 5 Min Deng 2016-09-07 09:47:26 UTC
Created attachment 1198625 [details]
PY file

Hi Laurent,

Hope it will help you and if you need more info please feel free to tell me!Thanks a lot.

Best regards,
Min Deng

Comment 6 Laurent Vivier 2016-09-07 12:14:08 UTC
Thank you Dengmin.

I was able to run the test.

In the guest:

    # netperf-2.7.0/src/netserver -D

In the host:

    # numactl --cpunodebind=0 --membind=0 ./netperf_agent.py 50 \
      netperf-2.7.0/src/netperf -D 1 -H 192.168.122.117 -l 90.0 \
                                -t TCP_CRR -v 1 -- -r 64,64

Some notes:

- qemu cmdline, "-m 131072" means 128GB.

  -> Are you sure you have enough memory on the host?

- "numactl --cpunodebind=17 --membind=17" means to start processes on node 17.

  -> Do you have a node 17?

     On my system I've only 2 nodes:
     # lscpu
     ...
     NUMA node0 CPU(s):     0,8,16,24,32,40
     NUMA node1 CPU(s):     48,56,64,72,80,88

     So I run the test with "numactl --cpunodebind=0 --membind=0"

- The firewall must be stopped in the guest:

  # systemctl stop firewalld

If these changes don't fix your problem, could you give the exact error message you have, please?

Thanks

Comment 7 Laurent Vivier 2016-09-08 15:24:57 UTC
As the problem is not clear, defer to 7.4.

Comment 8 Min Deng 2016-09-09 06:27:43 UTC
Hi Laurent,  
  QE need to run performance tests on two special hosts in beaker.They are not in our hand for all time.QE need time to reserve them again in order to give new testing results.As soon as completing testing QE will update the results.Thanks!

Best Regards,
Min

Comment 9 Laurent Vivier 2016-11-21 17:29:15 UTC
Could you try to reproduce this BZ with latest version of kernel and qemu-kvm-rhev?

Comment 10 Min Deng 2016-11-22 06:18:13 UTC
The hosts have be reserved by me and it may take more than one week to give the test results.Still mark the bug as needinfo to me

Comment 11 Min Deng 2016-11-29 10:09:14 UTC
Yes,QE still can reproduce the issue with x86s' parameters please refer to comment 3(In reply to Laurent Vivier from comment #6)
> Thank you Dengmin.
> 
> I was able to run the test.
> 
> In the guest:
> 
>     # netperf-2.7.0/src/netserver -D
> 
> In the host:
> 
>     # numactl --cpunodebind=0 --membind=0 ./netperf_agent.py 50 \
>       netperf-2.7.0/src/netperf -D 1 -H 192.168.122.117 -l 90.0 \
>                                 -t TCP_CRR -v 1 -- -r 64,64
> 
> Some notes:
> 
> - qemu cmdline, "-m 131072" means 128GB.
> 
>   -> Are you sure you have enough memory on the host?

    Actually, my hosts have 260G memory but I reduced both memory and cpu for guest at this time."--vcpu=8 --mem=4096".
   
> 
> - "numactl --cpunodebind=17 --membind=17" means to start processes on node
> 17.
> 
>   -> Do you have a node 17?
Yes,the host have it and I also tried it on node 0 and node 1 as well.
numactl --hardware
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 60249 MB
node 1 cpus: 120 128 136 144 152
node 1 size: 65536 MB
node 1 free: 64698 MB
node 16 cpus: 40 48 56 64 72
node 16 size: 65536 MB
node 16 free: 64914 MB
node 17 cpus: 80 88 96 104 112
node 17 size: 65536 MB
node 17 free: 64706 MB
node distances:
node   0   1  16  17 
  0:  10  20  40  40 
  1:  20  10  40  40 
 16:  40  40  10  20 
 17:  40  40  20  10 

> 
>      On my system I've only 2 nodes:
>      # lscpu
>      ...
>      NUMA node0 CPU(s):     0,8,16,24,32,40
>      NUMA node1 CPU(s):     48,56,64,72,80,88
> 
>      So I run the test with "numactl --cpunodebind=0 --membind=0"
> 
> - The firewall must be stopped in the guest:
> 
>   # systemctl stop firewalld
> 
> If these changes don't fix your problem, could you give the exact error
> message you have, please?
  OK,I will attach log to this bug.Finally,I still can reproduce the similar issues and please give me reply as possible as you can because I have to return the hosts by the end of this week,thanks a lot.

Comment 12 Min Deng 2016-11-29 10:10:04 UTC
Created attachment 1225769 [details]
LOG

Comment 13 Laurent Vivier 2016-11-29 12:45:12 UTC
(In reply to dengmin from comment #12)
> Created attachment 1225769 [details]

It seems it waits 30 seconds the start of the guests before exiting with an error:

Wait until all netperf clients start to work (29.588929 secs)
Sending command: cat /tmp/netperf.43237.nf
Sending command: echo $?
Wait until all netperf clients start to work (29.994572 secs)
Sending command: cat /tmp/netperf.43237.nf
Sending command: echo $?
(monitor catch_monitor) Sending command 'info registers' (via Human Monitor)
Test failed: TestNAError: Error, not all netperf clients at work    [context: Start netperf testing --> Testing TCP_STREAM protocol --> Start netperf client threads]
Destroying VM virt-tests-vm1 (PID 43417)

Perhaps it is not enough on ppc64le for 50 guests: is it possible to increase this timeout to something bigger?

Thanks,
Laurent

Comment 14 Min Deng 2016-11-30 05:20:06 UTC
  Increase time out to 100S and all the guest will be up so far.Compared to x86 platform it is a little bigger value.Any issues please let me know.Thanks a lot.
protocols = "TCP_STREAM TCP_MAERTS TCP_RR TCP_CRR"
sessions = "1 2 4 8"
sessions_rr = "1 25 50"
sizes = "64 256 1024 4096 16384 65535"
sizes_rr = "64 256 4096"

Thanks 
Min

Comment 15 Laurent Vivier 2016-11-30 09:39:52 UTC
Ok, so the problem doesn't seem to be with the guest but with the host, in the numa part.

According to the log, command is:

numactl --cpunodebind=1 --membind=1 /tmp/netperf_agent.py <CPU> \
 /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.44 -l 90.0 -C -c \
 -t TCP_STREAM -- -m <MB>

The time to start according to <CPU> and <MB> is (in secs):

+------+-------+-------+-------+-------+
|MB\CPU|   1   |   2   |   4   |   8   |
+--------------+-------+-------+-------+
| 64   |  1.62 |  1.62 |  1.62 |  2.83 |
+------+-------+-------+-------+-------+
| 256  |  1.62 |  2.02 |  2.02 |  7.70 |
+------+-------+-------+-------+-------+
| 1024 |  1.62 |  1.83 |  6.48 | 17.47 |
+------+-------+-------+-------+-------+
| 4096 |  2.02 |  4.05 |  9.32 | > 30  |
+------+-------+-------+-------+-------+

We can see the time to start really increases with 8 cores.

I think it can be explained because you bind your netperf clients to 1 node, and there are only 5 cores in the node.

Moreover you start your guest with 16 cores and  128 GB, and as your host has only 20 cores, some are shared between netperf clients and the guest.

Are you sure you run this test in the same condition on x86?

Comment 16 Min Deng 2016-12-01 02:40:02 UTC
(In reply to Laurent Vivier from comment #15)
> Ok, so the problem doesn't seem to be with the guest but with the host, in
> the numa part.
> 
> According to the log, command is:
> 
> numactl --cpunodebind=1 --membind=1 /tmp/netperf_agent.py <CPU> \
>  /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.44 -l 90.0 -C -c \
>  -t TCP_STREAM -- -m <MB>
> 
> The time to start according to <CPU> and <MB> is (in secs):
> 
> +------+-------+-------+-------+-------+
> |MB\CPU|   1   |   2   |   4   |   8   |
> +--------------+-------+-------+-------+
> | 64   |  1.62 |  1.62 |  1.62 |  2.83 |
> +------+-------+-------+-------+-------+
> | 256  |  1.62 |  2.02 |  2.02 |  7.70 |
> +------+-------+-------+-------+-------+
> | 1024 |  1.62 |  1.83 |  6.48 | 17.47 |
> +------+-------+-------+-------+-------+
> | 4096 |  2.02 |  4.05 |  9.32 | > 30  |
> +------+-------+-------+-------+-------+
> 
> We can see the time to start really increases with 8 cores.
> 
> I think it can be explained because you bind your netperf clients to 1 node,
> and there are only 5 cores in the node.
  Firstly,thanks for your so quick reply,I have to say it worked as our scripts designed,how do you think our hosts for performance tests ? So far,they are the most appropriate for QE to reserve now. 
> 
> Moreover you start your guest with 16 cores and  128 GB, and as your host
> has only 20 cores, some are shared between netperf clients and the guest.
  I start the test with --vcpu=8 --mem=4096 at this time but not 16 cores and  128 GB.
> Are you sure you run this test in the same condition on x86?
  I don't charge x86 performance test so we had better confirm those details that you want with them before QE provide you the exact answer.But we share almost the same scripts to execute our performance tests but some cfgs are  different,I guess.
  Finally,QE will update it as soon as possible or ask them reply this bug directly.

Thanks 
Min

Comment 17 Min Deng 2016-12-01 02:48:17 UTC
Hi yama,
   Could you please give me a hand to have a look on comment 15,is it the same situations happened on x86 platform,many thanks.Any issues please let me know.Thanks in advance.
Best regards
Min

Comment 18 Quan Wenli 2016-12-02 05:23:01 UTC
(In reply to dengmin from comment #17)
> Hi yama,
>    Could you please give me a hand to have a look on comment 15,is it the
> same situations happened on x86 platform,many thanks.Any issues please let
> me know.Thanks in advance.
> Best regards
> Min

We never test CRR before on x86 platform, 50 sessions RR tests (latency) should be passed in even small machine (4 cores in one node, two numa node totally)

Comment 19 Laurent Vivier 2016-12-02 10:25:10 UTC
(In reply to Quan Wenli from comment #18)
> (In reply to dengmin from comment #17)
> > Hi yama,
> >    Could you please give me a hand to have a look on comment 15,is it the
> > same situations happened on x86 platform,many thanks.Any issues please let
> > me know.Thanks in advance.
> > Best regards
> > Min
> 
> We never test CRR before on x86 platform, 50 sessions RR tests (latency)
> should be passed in even small machine (4 cores in one node, two numa node
> totally)

But the question is: how long it takes to start the clients?

On ppc64le we reach the time out (more than 30 secs to start 8 netperf clients with 4 GB each on a node where there is only 5 cores).

Comment 21 Laurent Vivier 2016-12-06 08:24:05 UTC
Thank you for these details, Quand.
I'm going to try to understand what happens on ppc64le.

Comment 22 Laurent Vivier 2016-12-06 13:29:19 UTC
I'm really not able to reproduce the problem, the 50 clients test starts in less than 2 seconds on my system.

Could you try to start manually the netperf_client.py command line on a host (and netserver in a guest) to see what happens?
Could you try on another host?

Comment 23 Min Deng 2016-12-08 09:20:17 UTC
(In reply to Laurent Vivier from comment #22)
> I'm really not able to reproduce the problem, the 50 clients test starts in
> less than 2 seconds on my system.
> 

Hi Laurent,
   I'd like to confirm with you about the below issue 
   1.about TC_CRR parameter,as we all known it isn't tested by x86 now,should we still test or abandon it in the future ? What's your opinions ?
> Could you try to start manually the netperf_client.py command line on a host
> (and netserver in a guest) to see what happens?
> Could you try on another host?
   2.My host and test ENV match your request absolutely.Frankly it looks like the client can start in a very short time.

#numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50 /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1 -- -r 65535,65535
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo
Interim result:  107.44 Trans/s over 1.005 seconds ending at 1481186250.282
Interim result:  106.22 Trans/s over 1.007 seconds ending at 1481186250.298
...

  Any issues please let me know,thanks a lot.
Thanks
Min

Comment 24 Laurent Vivier 2016-12-08 09:39:34 UTC
(In reply to dengmin from comment #23)
> (In reply to Laurent Vivier from comment #22)
> > I'm really not able to reproduce the problem, the 50 clients test starts in
> > less than 2 seconds on my system.
> > 
> 
> Hi Laurent,
>    I'd like to confirm with you about the below issue 
>    1.about TC_CRR parameter,as we all known it isn't tested by x86
> now,should we still test or abandon it in the future ? What's your opinions ?

My opinion is: more we have tests, better it is...

And the problem is not related especially to TCP_CRR as it happens also with TCP_STREAM (see logs in comment #12).

> > Could you try to start manually the netperf_client.py command line on a host
> > (and netserver in a guest) to see what happens?
> > Could you try on another host?
>    2.My host and test ENV match your request absolutely.Frankly it looks
> like the client can start in a very short time.
> 
> #numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50
> /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1
> -- -r 65535,65535

IMHO, I think the problem is not with the netperf test but with the test environment or with the test host, as it can happen with different kind of test (TCP_RR and TCP_STREAM), and depends on the ratio between the number of core and the number of clients we start.

How can I reproduce your test environment on my system?

Comment 25 Min Deng 2016-12-08 10:00:43 UTC
(In reply to Laurent Vivier from comment #24)
> (In reply to dengmin from comment #23)
> > (In reply to Laurent Vivier from comment #22)
> > > I'm really not able to reproduce the problem, the 50 clients test starts in
> > > less than 2 seconds on my system.
> > > 
> > 
> > Hi Laurent,
> >    I'd like to confirm with you about the below issue 
> >    1.about TC_CRR parameter,as we all known it isn't tested by x86
> > now,should we still test or abandon it in the future ? What's your opinions ?
> 
> My opinion is: more we have tests, better it is...
> 
> And the problem is not related especially to TCP_CRR as it happens also with
> TCP_STREAM (see logs in comment #12).
> 
> > > Could you try to start manually the netperf_client.py command line on a host
> > > (and netserver in a guest) to see what happens?
> > > Could you try on another host?
> >    2.My host and test ENV match your request absolutely.Frankly it looks
> > like the client can start in a very short time.
> > 
> > #numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50
> > /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1
> > -- -r 65535,65535
> 
> IMHO, I think the problem is not with the netperf test but with the test
> environment or with the test host, as it can happen with different kind of
> test (TCP_RR and TCP_STREAM), and depends on the ratio between the number of
> core and the number of clients we start.
> How can I reproduce your test environment on my system?

  Do you mean my automation tests ENV ? if so you need to reserve two hosts with two network adapters connected directly at least.Any issues feel free to let me know.
Thanks
Min

Comment 26 Laurent Vivier 2016-12-08 10:39:48 UTC
(In reply to dengmin from comment #25)
>   Do you mean my automation tests ENV ? if so you need to reserve two hosts
> with two network adapters connected directly at least.Any issues feel free
> to let me know.


AS the IP addresses in the log are in 192.168.X.X, I guess the guest and netperf clients are on the same hosts? Aren't they? So, why two hosts?

Comment 28 Laurent Vivier 2017-02-01 16:19:54 UTC
How can I install you tool autotest, on my system?

Comment 29 Min Deng 2017-02-06 07:23:06 UTC
(In reply to Laurent Vivier from comment #28)
> How can I install you tool autotest, on my system?

Yes,you can and I've sent a mail to you about this request,thanks !

Comment 31 Min Deng 2017-02-09 05:51:32 UTC
The command line is available here.
python ConfigTest.py --testcase=netperf.host_guest --guestname=RHEL.7.3 --platform=ppc64le --imageformat=qcow2 --driveformat=virtio_scsi --nicmodel=virtio_net --display=vnc --verbose=no --clone=yes --nrepeat=5
Tips,
if --clone=yes,it will install new OS every time.
   --nrepeat=5,you can reduce times if necessary
Any issues please let me know,thanks a lot !

Comment 32 Laurent Vivier 2017-02-09 11:44:24 UTC
I have this error:

BRAddIfError: Can't add interface t1-V6LtFM to bridge switch: There is no bridge in system.

The README speaks about  a "--bridge" option, but this is not accepted by the script. How can I fix that?

Comment 33 Min Deng 2017-02-10 08:33:07 UTC
Created attachment 1248991 [details]
Referencefile

It looked like there was no bridge on your system,I would like to upload some useful information for your references.

Comment 34 Laurent Vivier 2017-02-21 13:25:26 UTC
The problem is with the test script.

The script reads the content of the netperf client output to count the numbers of "MIGRATE" words in the file:

    def all_clients_up():
        try:
            content = ssh_cmd(clients[-1], "cat %s" % fname)
        except:
            content = ""
            return False
        if int(sessions) == len(re.findall("MIGRATE", content)):
            return True
        return False

As we should have one "MIGRATE" word by netperf client, this function loops while the number of words is different than the number of sessions.

I've checked manually the number of "MIGRATE" in the file in a case when the test script thinks it fails to start the netperf clients: the number of MIGRATE words is the good one. So I think there is a problem with the function trying to detect the number of clients (it seems "content" is empty in this case, perhaps a bug in ssh_cmd() function).

I close this BZ but I think you should re-assign it to KVM_Autotest.