Bug 1084284

Summary: [WHQL][netkvm][macvtap]Many jobs [2 machine] failed as Gathering support device failed
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: virtio-winAssignee: Meirav Dean <mdean>
virtio-win sub component: others QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: acathrow, bcao, dfleytma, hhuang, juzhang, lijin, mdeng, michen, virt-maint, yvugenfi
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-22 13:35:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2014-04-04 05:17:55 UTC
Description of problem:
Many jobs failed as Gathering support device failure.
Most of the job named by "*[2 machine]*" got failure,it was because of Gathering support device failure.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-53.el7.x86_64
kernel-3.10.0-113.el7.x86_64
build 76

How reproducible:
3/3

Steps to Reproduce:
1.boot up guest for example win7-32
N_REPEAT=1
while true;
do date;
echo "test round: $N_REPEAT" ;
N_REPEAT=$(($N_REPEAT+1)) &&
/usr/libexec/qemu-kvm \
-M pc -m 2G -smp 2 \
-cpu Nehalem,+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic \
-usb -device usb-tablet \
-drive file=win7-32-nic1.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none \
-device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-uuid 4c4ef91c-1b82-49af-a6de-cb06f10f74de  \
-rtc-td-hack -no-kvm-pit-reinjection \
-chardev socket,id=a111,path=/tmp/monitor-w732-nic1,server,nowait -mon chardev=a111,mode=readline \
-name win732-nic1 \
-vnc :1 -vga cirrus \
-monitor stdio \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:32:14:26:10:01,id=net0 \
-device virtio-net-pci,netdev=hostnet1,mac=aa:fc:ff:18:f6:e7,id=vnic0 138<>/dev/tap138 \
-netdev tap,id=hostnet1,vhost=on,fd=138  \
-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=0
done
N_REPEAT=1
while true;
do date;
echo "test round: $N_REPEAT" ;
N_REPEAT=$(($N_REPEAT+1)) &&
/usr/libexec/qemu-kvm \
-M pc -m 2G -smp 2 \
-cpu Nehalem,+x2apic,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic \
-usb -device usb-tablet \
-drive file=win7-32-nic2.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none \
-device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-uuid 095b38d6-f5ad-46f7-aaa8-e06be17727dc  \
-rtc-td-hack -no-kvm-pit-reinjection \
-chardev socket,id=a111,path=/tmp/monitor-w732-nic2,server,nowait -mon chardev=a111,mode=readline \
-name win732-nic2 \
-vnc :2 -vga cirrus \
-monitor stdio \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:12:54:36:17:32,id=net0 \
-device virtio-net-pci,netdev=hostnet1,mac=5e:ea:47:6a:d4:bc,id=vnic0 139<>/dev/tap139 \
-netdev tap,id=hostnet1,vhost=on,fd=139 \
-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=0
done
2.submit the jobs 
  NDISTest 6.5 - [2 Machine] - AddressChange
  NDISTest 6.5 - [2 Machine] - CheckConnectivity
  NDISTest 6.5 - [2 Machine] - ConfigCheck
  NDISTest 6.5 - [2 Machine] - GlitchFreeDevice
  NDISTest 6.5 - [2 Machine] - HeaderPayloadSplit
  NDISTest 6.5 - [2 Machine] - InterruptModeration
  NDISTest 6.5 - [2 Machine] - InvalidPackets
  NDISTest 6.5 - [2 Machine] - LinkCheck
  NDISTest 6.5 - [2 Machine] - MPE_Ethernet.xml
  NDISTest 6.5 - [2 Machine] - MultiCast Address
  NDISTest 6.5 - [2 Machine] - OffloadChecksum
  NDISTest 6.5 - [2 Machine] - OffloadLSO
  NDISTest 6.5 - [2 Machine] - PacketFilters
  NDISTest 6.5 - [2 Machine] - Reset
  NDISTest 6.5 - [2 Machine] - ShortPackets
  NDISTest 6.5 - [2 Machine] - SingleEtherType
  NDISTest 6.5 - [2 Machine] - Stats
  NDISTest 6.5 - [2 Machine] - TxFlowControl
  NDISTest 6.5 - [2 Machine] - VlanSendRecv
  ...
3.

Actual results:
The above jobs failed on win2k8/win7/win8/win2012/win2012R2
Test Log Report - Failure Report 

Report Summary 
Test Results 
Description Total Pass Fail Warning Blocked Skipped Pass Rate 
Direct count of EndTest results 20 18 2 0 0 0 90.00% 
 TimeStamp Total Pass Fail Warning Blocked Skipped Pass Rate 
 
Machine, Process, and OS Information 
Machine Name OS Version Build VBL BuildDate Platform Language ServicePack Config 
 (No Machine Information Trace Available) 
 Base Time Process ID Thread ID Process Name 
4/2/2014 11:20:30.640 PM 3632 2848 C:\WLK\JobsWorkingDir\Tasks\WTTJobRunB808926B-29D7-428A-ACCF-1BC2554EBC7B\ndistest.net\ndistest.exe 
4/2/2014 11:20:34.390 PM 3632 3188 C:\WLK\JobsWorkingDir\Tasks\WTTJobRunB808926B-29D7-428A-ACCF-1BC2554EBC7B\ndistest.net\ndistest.exe 
 
Report Details 
Test Cases 
Title Result 
 Failed 
 Start Test 4/2/2014 11:20:39.390 PM Gathering support device #1 information 
Error 4/2/2014 11:22:39.390 PM Test terminated abnormally with an Exception NDISTest.NDISTestCore.TestServices.NDISTestException 
File:    Line: 0 
Error Type:   WIN32 
Error Code:   0x88888 
Error Text:   Error 0x00088888 
End Test 4/2/2014 11:22:40.390 PM Gathering support device #1 information 
Result:   Fail 
Repro:   C:\WLK\JobsWorkingDir\Tasks\WTTJobRunB808926B-29D7-428A-ACCF-1BC2554EBC7B\ndistest.net\ndistest.exe /logo /auto /client /target:Miniport /tc:PCI\VEN_1AF4&DEV_1000&SUBSYS_00011AF4&REV_00\3&13C0B0C5&0&20 /support:{601A687E-C3B1-459B-BD4B-BAC2E7BFCF17} /msg:{4532FD2C-AC70-4814-8A4A-19A377FB0D35} /jobs:lan\SingleEtherType.cpp 
 
 Failed 
 Start Test 4/2/2014 11:22:44.390 PM Create first binding on Support Device 
Error 4/2/2014 11:24:44.390 PM Test terminated abnormally with an Exception NDISTest.NDISTestCore.TestServices.NDISTestException 
File:    Line: 0 
Error Type:   WIN32 
Error Code:   0x88888 
Error Text:   Error 0x00088888 
End Test 4/2/2014 11:24:45.390 PM Create first binding on Support Device 
Result:   Fail 
Repro:   C:\WLK\JobsWorkingDir\Tasks\WTTJobRunB808926B-29D7-428A-ACCF-1BC2554EBC7B\ndistest.net\ndistest.exe /logo /auto /client /target:Miniport /tc:PCI\VEN_1AF4&DEV_1000&SUBSYS_00011AF4&REV_00\3&13C0B0C5&0&20 /support:{601A687E-C3B1-459B-BD4B-BAC2E7BFCF17} /msg:{4532FD2C-AC70-4814-8A4A-19A377FB0D35} /jobs:lan\SingleEtherType.cpp 
 

Expected results:
Those jobs could passed successfully.
Additional info:

Comment 1 Min Deng 2014-04-04 05:25:21 UTC
Created attachment 882518 [details]
win7-32

Comment 3 Min Deng 2014-04-04 09:52:25 UTC
 Although the following jobs were executed on win2k8-64/32 guest by DTM the similar issue could be reproduced as well.The cpk file will be uploaded too.
 Ethernet-NDISTest6.5
 Ethernet-NDISTest6.0
 Ethernet-NDISTest6.5(manual)
 Ethernet-NDISTest6.5(Wol and PM)

Comment 4 Min Deng 2014-04-04 09:56:07 UTC
Created attachment 882639 [details]
win2k8-64 cpk

Comment 5 Yvugenfi@redhat.com 2014-04-06 12:36:31 UTC
Looks like some basic problem in communication between server and client of NDISTest.

Can you ping from one test VM to another?

I would use CheckConnectivity test as the indication that the setup in general is OK.

Best regards,
Yan.

Comment 6 Min Deng 2014-04-08 02:12:14 UTC
(In reply to Yan Vugenfirer from comment #5)
> Looks like some basic problem in communication between server and client of
> NDISTest.
> 
> Can you ping from one test VM to another?
> 
> I would use CheckConnectivity test as the indication that the setup in
> general is OK.
> 
> Best regards,
> Yan.

  Yes,they could ping each other successfully.

Comment 7 Mike Cao 2014-04-08 03:02:36 UTC
(In reply to dengmin from comment #6)
> (In reply to Yan Vugenfirer from comment #5)
> > Looks like some basic problem in communication between server and client of
> > NDISTest.
> > 
> > Can you ping from one test VM to another?
> > 
> > I would use CheckConnectivity test as the indication that the setup in
> > general is OK.
> > 
> > Best regards,
> > Yan.
> 
>   Yes,they could ping each other successfully.

I can ping each other due to the the Nic used as Messgage Device .Pls disable Message Nic in the guest (should be e1000/rtl8139) on sut and support guests and recheck 


Mike

Comment 8 Min Deng 2014-04-08 03:31:04 UTC
(In reply to Mike Cao from comment #7)
> (In reply to dengmin from comment #6)
> > (In reply to Yan Vugenfirer from comment #5)
> > > Looks like some basic problem in communication between server and client of
> > > NDISTest.
> > > 
> > > Can you ping from one test VM to another?
> > > 
> > > I would use CheckConnectivity test as the indication that the setup in
> > > general is OK.
> > > 
> > > Best regards,
> > > Yan.
> > 
> >   Yes,they could ping each other successfully.
> 
> I can ping each other due to the the Nic used as Messgage Device .Pls
> disable Message Nic in the guest (should be e1000/rtl8139) on sut and
> support guests and recheck 
> 
> 
> Mike

 Still,they could ping each other successfully.

Comment 9 Yvugenfi@redhat.com 2014-04-16 09:17:33 UTC
Did you tried this command: ip link set dev macvtap0 allmulticast on ?

It looks like NDISTest server cannot find its clients.

Comment 10 Min Deng 2014-04-23 06:36:15 UTC
Hi Yan,
   I did the related testing after finishing comment9,the original issue has gone but several jobs still failed with different error message,could you please help me have a look on them before I open new bugs in case of any misunderstanding.Thanks in advance.
Best Regards,
Min

Comment 11 Min Deng 2014-04-23 06:39:32 UTC
Created attachment 888756 [details]
2012HCKLOG

Comment 13 Yvugenfi@redhat.com 2014-05-04 15:13:47 UTC
(In reply to dengmin from comment #10)
> Hi Yan,
>    I did the related testing after finishing comment9,the original issue has
> gone but several jobs still failed with different error message,could you
> please help me have a look on them before I open new bugs in case of any
> misunderstanding.Thanks in advance.
> Best Regards,
> Min

Just to be sure - you are testing on top of macvtap now?
Because I see issues we have not encountered on bridge.

In any case, I see following test failures:

1. Test "2c_mini6rsssendrecv" - out of order buffers and dropped indications.

2. Test "Packet Filters" - too many packets are filtered out.

3. Test "stats" - 24 total breakpoints were hit in the protocol driver while this test was executing. Could be related to previous issues. 

4. Test "address change" - after mac address change the device is not receiving the packets addressed to it.

5. Test "reset" - can be a duplicate of the out of order issue and dropped packets.

6. Test "glitch free" - same as above: out of order and dropped packets.

I think there are two main issues here:

1. Out of order packets (btw: how are the VMs connected to each other with macvtap).

2. Dropped packets 


Please close this bug and open appropriate bugs.

Thanks,
Yan.

Comment 14 Yvugenfi@redhat.com 2014-06-22 09:29:24 UTC
Please check comment #13

Comment 15 Mike Cao 2014-06-22 13:35:53 UTC
(In reply to Yan Vugenfirer from comment #14)
> Please check comment #13

I agree this is a configration issue instead of a bug