Bug 857294 - Infiniband cards cause host activate/add to fail
Infiniband cards cause host activate/add to fail
Status: CLOSED DUPLICATE of bug 823397
Product: oVirt
Classification: Community
Component: ovirt-engine-core (Show other bugs)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Nobody's working on this, feel free to take it
Depends On:
  Show dependency treegraph
Reported: 2012-09-13 23:52 EDT by DHC
Modified: 2012-10-19 17:14 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-09-23 15:34:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
ovirt-engine logfile (15.55 KB, application/octet-stream)
2012-09-13 23:52 EDT, DHC
no flags Details
vdsm log file (20.08 KB, application/octet-stream)
2012-09-13 23:52 EDT, DHC
no flags Details
vdsClient 0 getVdsStats output (4.31 KB, text/plain)
2012-10-09 13:27 EDT, DHC
no flags Details
ib0 rx_bytes stats 5 second samples (286 bytes, text/plain)
2012-10-12 10:47 EDT, DHC
no flags Details
vdsClient 0 getVdsStats output 5 second samples (97.91 KB, text/plain)
2012-10-12 10:48 EDT, DHC
no flags Details

  None (edit)
Description DHC 2012-09-13 23:52:39 EDT
Created attachment 612713 [details]
ovirt-engine logfile

Description of problem:
Attempting to activate/add a host with Infiniband cards present fails.
Filing bug against ovirt-engine-core may possibly involve VDSM as well.
The returned length of the Infiniband card "hwaddr" seems to cause the engine DB SQL insert statement to fail  --> (ERROR: value too long for type character

Version-Release number of selected component (if applicable):
oVirt Engine 3.1+
VDSM 4.10+

How reproducible: Always

Steps to Reproduce:
1. Install infiniband card in EL6 or Fedora based ovirt VM host server
2. Install and start rdma service
3. Attempt to add host via vdsm-bootstrap
3.a Add host with vdsm-reg then attempt approve/activate host
Actual results:
Host activation/add fails.

Expected results:
Host activation/add succeeds.

Additional info: Disabling the rdma service (EG: devices ib0, ib1, etc go away) on the affected system will restore normal function. This bug prevents the use Inifiniband networking (IPOIB) or NFS-RDMA for storage communications.
Comment 1 DHC 2012-09-13 23:52:59 EDT
Created attachment 612714 [details]
vdsm log file
Comment 2 DHC 2012-09-20 17:18:07 EDT
To fix/workaround this:
I ended up stopping the engine service, dumping the database and altering
the the table vds_interface --> column "mac_addr" and increasing the char
varying length from 20 to 60.
I then restore the altered database and go about business as usual.

The only side effects I note from this is that VDSM does not seem to be able to read/report TX/RX stats properly.
Also the MAC Address fields in the admin/user portals seem to be of fixed size and   the IB card HW address overflows the field.

Comment 3 itzikb 2012-09-21 10:08:39 EDT
There is an open bug:

Comment 4 Moti Asayag 2012-09-23 15:34:43 EDT

*** This bug has been marked as a duplicate of bug 823397 ***
Comment 5 DHC 2012-10-05 22:49:07 EDT
Just built and tested ovirt-engine from master commit: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=8c3f5e5ba95ca46009b70143daa5aae4513943a5

I can confirm that this resolves this issue.

The only remaining things to note which are really minor at best is that the MAC field in UI does not expand to show the full HCA IB HW address. Also interface statistics for the IB boards do not seem to be displaying.

Comment 6 Dan Kenigsberg 2012-10-09 04:15:00 EDT
DHC, does the kernel update /sys/class/net/ib0/statistics/rx_bytes ?

Does vdsm report your IB board in

 vdsClient 0 getVdsStats

? What are the values reported there?
Comment 7 DHC 2012-10-09 13:27:04 EDT
Created attachment 624202 [details]
vdsClient 0 getVdsStats output

Attached output for vdsClient 0 getVdsStats
Comment 8 DHC 2012-10-09 13:30:59 EDT
I now tested with newer versions of firefox on windows and fedora/EL and the hover-over on the IB address field does display the whole address. IE9 also seems to work but IE8 for whatever reason does not always seem to work (very random).
Comment 9 Dan Kenigsberg 2012-10-09 18:02:50 EDT

ib0': {'macAddr': '', 'name': 'ib0', 'txDropped': '12', 'rxErrors': '0', 'txRate': '0.1', 'rxRate': '0.1', 'txErrors': '0', 'state': 'up', 'speed': '1000', 'rxDropped': '0'}

Could it be that 0.1 * 1000 mbps is the true value? Could you push some more traffic via ib0? How does /sys/class/net/ib0/statistics/rx_bytes grow?
Comment 10 DHC 2012-10-10 18:11:02 EDT
The max data rate on the IB board per port in that server is 10GB so 1000 MB/s since the board is actually an older DDR 4x board.

On that system the datastores are mounted via NFS via ib0
/sys/class/net/ib0/statistics/rx_bytes looks like:

root@kezan~]# cat /sys/class/net/ib0/statistics/rx_bytes 
Comment 11 Dan Kenigsberg 2012-10-11 04:38:30 EDT
Would you stress your link and run
# cat /sys/class/net/ib0/statistics/rx_bytes; sleep 10; /sys/class/net/ib0/statistics/rx_bytes; vdsClient 0 getVdsStats

this would give us better comparison of what Vdsm reads and what it reports.
Comment 12 DHC 2012-10-12 10:47:06 EDT
Created attachment 626036 [details]
ib0 rx_bytes stats 5 second samples

ib0 rx_bytes stats sampled every 5 seconds
Comment 13 DHC 2012-10-12 10:48:07 EDT
Created attachment 626037 [details]
vdsClient 0 getVdsStats output 5 second samples

vdsClient 0 getVdsStats output sampled every 5 seconds
Comment 14 Dan Kenigsberg 2012-10-13 18:18:45 EDT
DHC, are you sure there is a problem? I see that in 110 seconds, 226690 bytes has been received on ib0. That's 1.8 mbs, which is in the neighborhood of what vdsm reports. Am I missing something?
Comment 15 DHC 2012-10-19 17:14:07 EDT
I too note that vdsm is reporting correctly however what I see in the UI is:
Name Address	 MAC               Speed(Mbps) Rx (Mbps) Tx (Mbps) Drops(Pkts)	
ib0 80:00:04:04:fe... 0           < 1       < 1       21

Both RX/TX never seem to report anything but "< 1" even when I am pushing a large amount of data through the link.


Note You need to log in before you can comment on or make changes to this bug.