Hide Forgot
Successfully complete tests with the following parameters on Power platform sessions = "1 2 4 6" -* sessions_rr = "1 20" -* sizes = "64 256 1024" -* sizes_rr = "64 256 4096" Currently on X86 platform sessions = "1 2 4 8" sessions_rr = "1 25 50" sizes = "64 256 1024 4096 16384 65535" sizes_rr = "64 256 4096"
It seems your test is: numactl --cpunodebind=17 --membind=17 /tmp/netperf_agent.py 50 /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.76 -l 90.0 -t TCP_CRR -v 1 -- -r 64,64 I need to know what is "netperf_agent.py" and what is the real error (more verbose than "Error, not all netperf clients at work")
Created attachment 1198625 [details] PY file Hi Laurent, Hope it will help you and if you need more info please feel free to tell me!Thanks a lot. Best regards, Min Deng
Thank you Dengmin. I was able to run the test. In the guest: # netperf-2.7.0/src/netserver -D In the host: # numactl --cpunodebind=0 --membind=0 ./netperf_agent.py 50 \ netperf-2.7.0/src/netperf -D 1 -H 192.168.122.117 -l 90.0 \ -t TCP_CRR -v 1 -- -r 64,64 Some notes: - qemu cmdline, "-m 131072" means 128GB. -> Are you sure you have enough memory on the host? - "numactl --cpunodebind=17 --membind=17" means to start processes on node 17. -> Do you have a node 17? On my system I've only 2 nodes: # lscpu ... NUMA node0 CPU(s): 0,8,16,24,32,40 NUMA node1 CPU(s): 48,56,64,72,80,88 So I run the test with "numactl --cpunodebind=0 --membind=0" - The firewall must be stopped in the guest: # systemctl stop firewalld If these changes don't fix your problem, could you give the exact error message you have, please? Thanks
As the problem is not clear, defer to 7.4.
Hi Laurent, QE need to run performance tests on two special hosts in beaker.They are not in our hand for all time.QE need time to reserve them again in order to give new testing results.As soon as completing testing QE will update the results.Thanks! Best Regards, Min
Could you try to reproduce this BZ with latest version of kernel and qemu-kvm-rhev?
The hosts have be reserved by me and it may take more than one week to give the test results.Still mark the bug as needinfo to me
Yes,QE still can reproduce the issue with x86s' parameters please refer to comment 3(In reply to Laurent Vivier from comment #6) > Thank you Dengmin. > > I was able to run the test. > > In the guest: > > # netperf-2.7.0/src/netserver -D > > In the host: > > # numactl --cpunodebind=0 --membind=0 ./netperf_agent.py 50 \ > netperf-2.7.0/src/netperf -D 1 -H 192.168.122.117 -l 90.0 \ > -t TCP_CRR -v 1 -- -r 64,64 > > Some notes: > > - qemu cmdline, "-m 131072" means 128GB. > > -> Are you sure you have enough memory on the host? Actually, my hosts have 260G memory but I reduced both memory and cpu for guest at this time."--vcpu=8 --mem=4096". > > - "numactl --cpunodebind=17 --membind=17" means to start processes on node > 17. > > -> Do you have a node 17? Yes,the host have it and I also tried it on node 0 and node 1 as well. numactl --hardware available: 4 nodes (0-1,16-17) node 0 cpus: 0 8 16 24 32 node 0 size: 65536 MB node 0 free: 60249 MB node 1 cpus: 120 128 136 144 152 node 1 size: 65536 MB node 1 free: 64698 MB node 16 cpus: 40 48 56 64 72 node 16 size: 65536 MB node 16 free: 64914 MB node 17 cpus: 80 88 96 104 112 node 17 size: 65536 MB node 17 free: 64706 MB node distances: node 0 1 16 17 0: 10 20 40 40 1: 20 10 40 40 16: 40 40 10 20 17: 40 40 20 10 > > On my system I've only 2 nodes: > # lscpu > ... > NUMA node0 CPU(s): 0,8,16,24,32,40 > NUMA node1 CPU(s): 48,56,64,72,80,88 > > So I run the test with "numactl --cpunodebind=0 --membind=0" > > - The firewall must be stopped in the guest: > > # systemctl stop firewalld > > If these changes don't fix your problem, could you give the exact error > message you have, please? OK,I will attach log to this bug.Finally,I still can reproduce the similar issues and please give me reply as possible as you can because I have to return the hosts by the end of this week,thanks a lot.
Created attachment 1225769 [details] LOG
(In reply to dengmin from comment #12) > Created attachment 1225769 [details] It seems it waits 30 seconds the start of the guests before exiting with an error: Wait until all netperf clients start to work (29.588929 secs) Sending command: cat /tmp/netperf.43237.nf Sending command: echo $? Wait until all netperf clients start to work (29.994572 secs) Sending command: cat /tmp/netperf.43237.nf Sending command: echo $? (monitor catch_monitor) Sending command 'info registers' (via Human Monitor) Test failed: TestNAError: Error, not all netperf clients at work [context: Start netperf testing --> Testing TCP_STREAM protocol --> Start netperf client threads] Destroying VM virt-tests-vm1 (PID 43417) Perhaps it is not enough on ppc64le for 50 guests: is it possible to increase this timeout to something bigger? Thanks, Laurent
Increase time out to 100S and all the guest will be up so far.Compared to x86 platform it is a little bigger value.Any issues please let me know.Thanks a lot. protocols = "TCP_STREAM TCP_MAERTS TCP_RR TCP_CRR" sessions = "1 2 4 8" sessions_rr = "1 25 50" sizes = "64 256 1024 4096 16384 65535" sizes_rr = "64 256 4096" Thanks Min
Ok, so the problem doesn't seem to be with the guest but with the host, in the numa part. According to the log, command is: numactl --cpunodebind=1 --membind=1 /tmp/netperf_agent.py <CPU> \ /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.44 -l 90.0 -C -c \ -t TCP_STREAM -- -m <MB> The time to start according to <CPU> and <MB> is (in secs): +------+-------+-------+-------+-------+ |MB\CPU| 1 | 2 | 4 | 8 | +--------------+-------+-------+-------+ | 64 | 1.62 | 1.62 | 1.62 | 2.83 | +------+-------+-------+-------+-------+ | 256 | 1.62 | 2.02 | 2.02 | 7.70 | +------+-------+-------+-------+-------+ | 1024 | 1.62 | 1.83 | 6.48 | 17.47 | +------+-------+-------+-------+-------+ | 4096 | 2.02 | 4.05 | 9.32 | > 30 | +------+-------+-------+-------+-------+ We can see the time to start really increases with 8 cores. I think it can be explained because you bind your netperf clients to 1 node, and there are only 5 cores in the node. Moreover you start your guest with 16 cores and 128 GB, and as your host has only 20 cores, some are shared between netperf clients and the guest. Are you sure you run this test in the same condition on x86?
(In reply to Laurent Vivier from comment #15) > Ok, so the problem doesn't seem to be with the guest but with the host, in > the numa part. > > According to the log, command is: > > numactl --cpunodebind=1 --membind=1 /tmp/netperf_agent.py <CPU> \ > /tmp/netperf-2.6.0/src/netperf -D 1 -H 192.168.58.44 -l 90.0 -C -c \ > -t TCP_STREAM -- -m <MB> > > The time to start according to <CPU> and <MB> is (in secs): > > +------+-------+-------+-------+-------+ > |MB\CPU| 1 | 2 | 4 | 8 | > +--------------+-------+-------+-------+ > | 64 | 1.62 | 1.62 | 1.62 | 2.83 | > +------+-------+-------+-------+-------+ > | 256 | 1.62 | 2.02 | 2.02 | 7.70 | > +------+-------+-------+-------+-------+ > | 1024 | 1.62 | 1.83 | 6.48 | 17.47 | > +------+-------+-------+-------+-------+ > | 4096 | 2.02 | 4.05 | 9.32 | > 30 | > +------+-------+-------+-------+-------+ > > We can see the time to start really increases with 8 cores. > > I think it can be explained because you bind your netperf clients to 1 node, > and there are only 5 cores in the node. Firstly,thanks for your so quick reply,I have to say it worked as our scripts designed,how do you think our hosts for performance tests ? So far,they are the most appropriate for QE to reserve now. > > Moreover you start your guest with 16 cores and 128 GB, and as your host > has only 20 cores, some are shared between netperf clients and the guest. I start the test with --vcpu=8 --mem=4096 at this time but not 16 cores and 128 GB. > Are you sure you run this test in the same condition on x86? I don't charge x86 performance test so we had better confirm those details that you want with them before QE provide you the exact answer.But we share almost the same scripts to execute our performance tests but some cfgs are different,I guess. Finally,QE will update it as soon as possible or ask them reply this bug directly. Thanks Min
Hi yama, Could you please give me a hand to have a look on comment 15,is it the same situations happened on x86 platform,many thanks.Any issues please let me know.Thanks in advance. Best regards Min
(In reply to dengmin from comment #17) > Hi yama, > Could you please give me a hand to have a look on comment 15,is it the > same situations happened on x86 platform,many thanks.Any issues please let > me know.Thanks in advance. > Best regards > Min We never test CRR before on x86 platform, 50 sessions RR tests (latency) should be passed in even small machine (4 cores in one node, two numa node totally)
(In reply to Quan Wenli from comment #18) > (In reply to dengmin from comment #17) > > Hi yama, > > Could you please give me a hand to have a look on comment 15,is it the > > same situations happened on x86 platform,many thanks.Any issues please let > > me know.Thanks in advance. > > Best regards > > Min > > We never test CRR before on x86 platform, 50 sessions RR tests (latency) > should be passed in even small machine (4 cores in one node, two numa node > totally) But the question is: how long it takes to start the clients? On ppc64le we reach the time out (more than 30 secs to start 8 netperf clients with 4 GB each on a node where there is only 5 cores).
Thank you for these details, Quand. I'm going to try to understand what happens on ppc64le.
I'm really not able to reproduce the problem, the 50 clients test starts in less than 2 seconds on my system. Could you try to start manually the netperf_client.py command line on a host (and netserver in a guest) to see what happens? Could you try on another host?
(In reply to Laurent Vivier from comment #22) > I'm really not able to reproduce the problem, the 50 clients test starts in > less than 2 seconds on my system. > Hi Laurent, I'd like to confirm with you about the below issue 1.about TC_CRR parameter,as we all known it isn't tested by x86 now,should we still test or abandon it in the future ? What's your opinions ? > Could you try to start manually the netperf_client.py command line on a host > (and netserver in a guest) to see what happens? > Could you try on another host? 2.My host and test ENV match your request absolutely.Frankly it looks like the client can start in a very short time. #numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50 /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1 -- -r 65535,65535 MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo MIGRATED TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.19.112.186 () port 0 AF_INET : demo Interim result: 107.44 Trans/s over 1.005 seconds ending at 1481186250.282 Interim result: 106.22 Trans/s over 1.007 seconds ending at 1481186250.298 ... Any issues please let me know,thanks a lot. Thanks Min
(In reply to dengmin from comment #23) > (In reply to Laurent Vivier from comment #22) > > I'm really not able to reproduce the problem, the 50 clients test starts in > > less than 2 seconds on my system. > > > > Hi Laurent, > I'd like to confirm with you about the below issue > 1.about TC_CRR parameter,as we all known it isn't tested by x86 > now,should we still test or abandon it in the future ? What's your opinions ? My opinion is: more we have tests, better it is... And the problem is not related especially to TCP_CRR as it happens also with TCP_STREAM (see logs in comment #12). > > Could you try to start manually the netperf_client.py command line on a host > > (and netserver in a guest) to see what happens? > > Could you try on another host? > 2.My host and test ENV match your request absolutely.Frankly it looks > like the client can start in a very short time. > > #numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50 > /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1 > -- -r 65535,65535 IMHO, I think the problem is not with the netperf test but with the test environment or with the test host, as it can happen with different kind of test (TCP_RR and TCP_STREAM), and depends on the ratio between the number of core and the number of clients we start. How can I reproduce your test environment on my system?
(In reply to Laurent Vivier from comment #24) > (In reply to dengmin from comment #23) > > (In reply to Laurent Vivier from comment #22) > > > I'm really not able to reproduce the problem, the 50 clients test starts in > > > less than 2 seconds on my system. > > > > > > > Hi Laurent, > > I'd like to confirm with you about the below issue > > 1.about TC_CRR parameter,as we all known it isn't tested by x86 > > now,should we still test or abandon it in the future ? What's your opinions ? > > My opinion is: more we have tests, better it is... > > And the problem is not related especially to TCP_CRR as it happens also with > TCP_STREAM (see logs in comment #12). > > > > Could you try to start manually the netperf_client.py command line on a host > > > (and netserver in a guest) to see what happens? > > > Could you try on another host? > > 2.My host and test ENV match your request absolutely.Frankly it looks > > like the client can start in a very short time. > > > > #numactl --cpunodebind=0 --membind=0 /tmp/netperf_agent.py 50 > > /tmp/netperf-2.6.0/src/netperf -D 1 -H 10.19.112.186 -l 90.0 -t TCP_CRR -v 1 > > -- -r 65535,65535 > > IMHO, I think the problem is not with the netperf test but with the test > environment or with the test host, as it can happen with different kind of > test (TCP_RR and TCP_STREAM), and depends on the ratio between the number of > core and the number of clients we start. > How can I reproduce your test environment on my system? Do you mean my automation tests ENV ? if so you need to reserve two hosts with two network adapters connected directly at least.Any issues feel free to let me know. Thanks Min
(In reply to dengmin from comment #25) > Do you mean my automation tests ENV ? if so you need to reserve two hosts > with two network adapters connected directly at least.Any issues feel free > to let me know. AS the IP addresses in the log are in 192.168.X.X, I guess the guest and netperf clients are on the same hosts? Aren't they? So, why two hosts?
How can I install you tool autotest, on my system?
(In reply to Laurent Vivier from comment #28) > How can I install you tool autotest, on my system? Yes,you can and I've sent a mail to you about this request,thanks !
The command line is available here. python ConfigTest.py --testcase=netperf.host_guest --guestname=RHEL.7.3 --platform=ppc64le --imageformat=qcow2 --driveformat=virtio_scsi --nicmodel=virtio_net --display=vnc --verbose=no --clone=yes --nrepeat=5 Tips, if --clone=yes,it will install new OS every time. --nrepeat=5,you can reduce times if necessary Any issues please let me know,thanks a lot !
I have this error: BRAddIfError: Can't add interface t1-V6LtFM to bridge switch: There is no bridge in system. The README speaks about a "--bridge" option, but this is not accepted by the script. How can I fix that?
Created attachment 1248991 [details] Referencefile It looked like there was no bridge on your system,I would like to upload some useful information for your references.
The problem is with the test script. The script reads the content of the netperf client output to count the numbers of "MIGRATE" words in the file: def all_clients_up(): try: content = ssh_cmd(clients[-1], "cat %s" % fname) except: content = "" return False if int(sessions) == len(re.findall("MIGRATE", content)): return True return False As we should have one "MIGRATE" word by netperf client, this function loops while the number of words is different than the number of sessions. I've checked manually the number of "MIGRATE" in the file in a case when the test script thinks it fails to start the netperf clients: the number of MIGRATE words is the good one. So I think there is a problem with the function trying to detect the number of clients (it seems "content" is empty in this case, perhaps a bug in ssh_cmd() function). I close this BZ but I think you should re-assign it to KVM_Autotest.