Bug 857294 - Infiniband cards cause host activate/add to fail
Summary: Infiniband cards cause host activate/add to fail
Keywords:
Status: CLOSED DUPLICATE of bug 823397
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody's working on this, feel free to take it
QA Contact:
URL:
Whiteboard: network
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-14 03:52 UTC by DHC
Modified: 2012-10-19 21:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-23 19:34:43 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
ovirt-engine logfile (15.55 KB, application/octet-stream)
2012-09-14 03:52 UTC, DHC
no flags Details
vdsm log file (20.08 KB, application/octet-stream)
2012-09-14 03:52 UTC, DHC
no flags Details
vdsClient 0 getVdsStats output (4.31 KB, text/plain)
2012-10-09 17:27 UTC, DHC
no flags Details
ib0 rx_bytes stats 5 second samples (286 bytes, text/plain)
2012-10-12 14:47 UTC, DHC
no flags Details
vdsClient 0 getVdsStats output 5 second samples (97.91 KB, text/plain)
2012-10-12 14:48 UTC, DHC
no flags Details

Description DHC 2012-09-14 03:52:39 UTC
Created attachment 612713 [details]
ovirt-engine logfile

Description of problem:
Attempting to activate/add a host with Infiniband cards present fails.
Filing bug against ovirt-engine-core may possibly involve VDSM as well.
The returned length of the Infiniband card "hwaddr" seems to cause the engine DB SQL insert statement to fail  --> (ERROR: value too long for type character
varying(20))

Version-Release number of selected component (if applicable):
oVirt Engine 3.1+
VDSM 4.10+

How reproducible: Always


Steps to Reproduce:
1. Install infiniband card in EL6 or Fedora based ovirt VM host server
2. Install and start rdma service
3. Attempt to add host via vdsm-bootstrap
OR
3.a Add host with vdsm-reg then attempt approve/activate host
  
Actual results:
Host activation/add fails.

Expected results:
Host activation/add succeeds.

Additional info: Disabling the rdma service (EG: devices ib0, ib1, etc go away) on the affected system will restore normal function. This bug prevents the use Inifiniband networking (IPOIB) or NFS-RDMA for storage communications.

Comment 1 DHC 2012-09-14 03:52:59 UTC
Created attachment 612714 [details]
vdsm log file

Comment 2 DHC 2012-09-20 21:18:07 UTC
To fix/workaround this:
I ended up stopping the engine service, dumping the database and altering
the the table vds_interface --> column "mac_addr" and increasing the char
varying length from 20 to 60.
I then restore the altered database and go about business as usual.

The only side effects I note from this is that VDSM does not seem to be able to read/report TX/RX stats properly.
Also the MAC Address fields in the admin/user portals seem to be of fixed size and   the IB card HW address overflows the field.

- DHC

Comment 3 itzikb 2012-09-21 14:08:39 UTC
Hi,
There is an open bug:
https://bugzilla.redhat.com/show_bug.cgi?id=823397

Itzik

Comment 4 Moti Asayag 2012-09-23 19:34:43 UTC

*** This bug has been marked as a duplicate of bug 823397 ***

Comment 5 DHC 2012-10-06 02:49:07 UTC
Just built and tested ovirt-engine from master commit: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=8c3f5e5ba95ca46009b70143daa5aae4513943a5

I can confirm that this resolves this issue.

The only remaining things to note which are really minor at best is that the MAC field in UI does not expand to show the full HCA IB HW address. Also interface statistics for the IB boards do not seem to be displaying.

- DHC

Comment 6 Dan Kenigsberg 2012-10-09 08:15:00 UTC
DHC, does the kernel update /sys/class/net/ib0/statistics/rx_bytes ?

Does vdsm report your IB board in

 vdsClient 0 getVdsStats

? What are the values reported there?

Comment 7 DHC 2012-10-09 17:27:04 UTC
Created attachment 624202 [details]
vdsClient 0 getVdsStats output

Attached output for vdsClient 0 getVdsStats

Comment 8 DHC 2012-10-09 17:30:59 UTC
I now tested with newer versions of firefox on windows and fedora/EL and the hover-over on the IB address field does display the whole address. IE9 also seems to work but IE8 for whatever reason does not always seem to work (very random).

Comment 9 Dan Kenigsberg 2012-10-09 22:02:50 UTC
Thanks.

ib0': {'macAddr': '', 'name': 'ib0', 'txDropped': '12', 'rxErrors': '0', 'txRate': '0.1', 'rxRate': '0.1', 'txErrors': '0', 'state': 'up', 'speed': '1000', 'rxDropped': '0'}

Could it be that 0.1 * 1000 mbps is the true value? Could you push some more traffic via ib0? How does /sys/class/net/ib0/statistics/rx_bytes grow?

Comment 10 DHC 2012-10-10 22:11:02 UTC
The max data rate on the IB board per port in that server is 10GB so 1000 MB/s since the board is actually an older DDR 4x board.

On that system the datastores are mounted via NFS via ib0
/sys/class/net/ib0/statistics/rx_bytes looks like:

root@kezan~]# cat /sys/class/net/ib0/statistics/rx_bytes 
212110936952

Comment 11 Dan Kenigsberg 2012-10-11 08:38:30 UTC
Would you stress your link and run
# cat /sys/class/net/ib0/statistics/rx_bytes; sleep 10; /sys/class/net/ib0/statistics/rx_bytes; vdsClient 0 getVdsStats

this would give us better comparison of what Vdsm reads and what it reports.

Comment 12 DHC 2012-10-12 14:47:06 UTC
Created attachment 626036 [details]
ib0 rx_bytes stats 5 second samples

ib0 rx_bytes stats sampled every 5 seconds

Comment 13 DHC 2012-10-12 14:48:07 UTC
Created attachment 626037 [details]
vdsClient 0 getVdsStats output 5 second samples

vdsClient 0 getVdsStats output sampled every 5 seconds

Comment 14 Dan Kenigsberg 2012-10-13 22:18:45 UTC
DHC, are you sure there is a problem? I see that in 110 seconds, 226690 bytes has been received on ib0. That's 1.8 mbs, which is in the neighborhood of what vdsm reports. Am I missing something?

Comment 15 DHC 2012-10-19 21:14:07 UTC
Dan,
I too note that vdsm is reporting correctly however what I see in the UI is:
Name Address	 MAC               Speed(Mbps) Rx (Mbps) Tx (Mbps) Drops(Pkts)	
ib0  192.168.1.1 80:00:04:04:fe... 0           < 1       < 1       21

Both RX/TX never seem to report anything but "< 1" even when I am pushing a large amount of data through the link.

- DHC


Note You need to log in before you can comment on or make changes to this bug.