Bug 1313241

Summary: [RHEL-6.8] Determining IP information for ocrdma_roce.43... failed
Product: Red Hat Enterprise Linux 6 Reporter: zguo <zguo>
Component: libocrdmaAssignee: Jarod Wilson <jarod>
Status: CLOSED WONTFIX QA Contact: Zhang Yi <yizhan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8CC: ddutile, honli, jshortt, mschmidt, mstowell
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 12:05:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Log files none

Description zguo 2016-03-01 08:51:48 UTC
Created attachment 1131821 [details]
Log files

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. https://beaker.engineering.redhat.com/jobs/1243679
2. http://lab-02.rhts.eng.bos.redhat.com/beaker/logs/recipes/2525+/2525607/console.log

Found below FAILED info in both server and client console.log.

Bringing up interface ocrdma_roce.43:  Bringing up interface ocrdma_roce.43:  802.1Q VLAN Support v1.8 Ben Greear <greearb> 
All bugs added by David S. Miller <davem> 
8021q: adding VLAN 0 to HW filter on device ocrdma_roce 
  
Determining IP information for ocrdma_roce.43...  
Determining IP information for ocrdma_roce.43... failed.  
 failed.  
[FAILED] [FAILED]   
Bringing up interface ocrdma_roce.45:    
Bringing up interface ocrdma_roce.45:    
Determining IP information for ocrdma_roce.45...  
Determining IP information for ocrdma_roce.45... failed.  
[FAILED]   
 failed.  
[FAILED]   

[root@rdma-qe-04 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ocrdma_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether 00:90:fa:29:48:6a brd ff:ff:ff:ff:ff:ff
    inet 172.31.40.4/24 brd 172.31.40.255 scope global ocrdma_roce
    inet6 fe80::290:faff:fe29:486a/64 scope link 
       valid_lft forever preferred_lft forever
3: ocrdma_10g_2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:90:fa:29:48:72 brd ff:ff:ff:ff:ff:ff
4: tg3_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 9c:b6:54:bb:48:84 brd ff:ff:ff:ff:ff:ff
    inet 10.16.41.226/21 brd 10.16.47.255 scope global tg3_1
    inet6 2620:52:0:102f:9eb6:54ff:febb:4884/64 scope global dynamic 
       valid_lft 2591890sec preferred_lft 604690sec
    inet6 fe80::9eb6:54ff:febb:4884/64 scope link 
       valid_lft forever preferred_lft forever
5: tg3_2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 9c:b6:54:bb:48:85 brd ff:ff:ff:ff:ff:ff
6: ocrdma_roce.43@ocrdma_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP 
    link/ether 00:90:fa:29:48:6a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe29:486a/64 scope link 
       valid_lft forever preferred_lft forever
7: ocrdma_roce.45@ocrdma_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP 
    link/ether 00:90:fa:29:48:6a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe29:486a/64 scope link 
       valid_lft forever preferred_lft forever
3.

Actual results:


Expected results:


Additional info:

Comment 1 zguo 2016-03-01 08:54:12 UTC
Hi developers,

Would you like to help take a look?

Thanks.

Comment 2 Don Dutile (Red Hat) 2016-03-01 13:55:54 UTC
you attached an rds-tools-qib.log file.
I referenced the beaker log file to see relevant errors -- I see ocrdma_roce fail, but I also see rds stack traces in the log as well.

Could you (re-)test with the -616 kernel.  4 patches were added to -617 for ocrdma, and I just want to eliminate those 4 separate patches as the source of the errors.

Also, did this test pass in 6.7 ?

Comment 3 Honggang LI 2016-03-01 17:29:13 UTC
I had already told all of you about this issue.

The DHCP/VLAN/REORDER_HDR combination is the culprit.

1) comment VLAN_ID=43, it will take more than 80 seconds for roce.43
to get dhcp IP. 4 out of 5 succeed to get IP.

2) comment REORDER_HDR=0
The first round of "ifup ocrdma_roce.43" got IP in 5 second. However,
the second round hang on and never exit. Ctrl+C killed it after 13 minutes.
If you ifdown/ifup the vlan interface in a for loop, dhcp will not work
at all with different error messages.

So, the temporary workaround is to comment REORDER_HDR=0, and never
abuse the vlan interface with ifup/ifdown.

Comment 7 zguo 2016-03-24 07:21:37 UTC
(In reply to Honggang LI from comment #3)
> I had already told all of you about this issue.
> 
> The DHCP/VLAN/REORDER_HDR combination is the culprit.
> 
> 1) comment VLAN_ID=43, it will take more than 80 seconds for roce.43
> to get dhcp IP. 4 out of 5 succeed to get IP.
> 
> 2) comment REORDER_HDR=0
> The first round of "ifup ocrdma_roce.43" got IP in 5 second. However,
> the second round hang on and never exit. Ctrl+C killed it after 13 minutes.
> If you ifdown/ifup the vlan interface in a for loop, dhcp will not work
> at all with different error messages.
> 
> So, the temporary workaround is to comment REORDER_HDR=0, and never
> abuse the vlan interface with ifup/ifdown.

I digged out the email Honggang sent before. We need "uncomment REORDER_HDR=0", and "uncomment VLAN_ID=43" in the /etc/sysconfig/network-scripts/ifcfg-ocrdma_roce.XX. I just tried this workaround and it works.

Comment 9 Jan Kurik 2017-12-06 12:05:47 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/