Bug 1856292
| Summary: | [RHEL8.3] udpong does not run successfully on MLX4 ROCE, as well as MLX5 ROCE | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Brian Chae <bchae> |
| Component: | rdma-core | Assignee: | Honggang LI <honli> |
| Status: | CLOSED NOTABUG | QA Contact: | Infiniband QE <infiniband-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.3 | CC: | ahleihel, hwkernel-mgr, rdma-dev-team |
| Target Milestone: | rc | ||
| Target Release: | 8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-08-18 13:57:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Brian Chae
2020-07-13 10:34:39 UTC
Mellanox hardware specific issue add Alaa into the CC'ed list. Hi, Do you know if this ever worked on either of mlx4 or mlx5? does it work on other vendors? Regards, Alaa > + [20-07-01 23:01:48] timeout 5m udpong -s 172.31.45.120 -S 1024 -C 10240000 > rconnect: Protocol not supporte By default the test will run with rsocket and it seems to work only on an InfiniBand link layer. When using option "-T s" it will start using standard tcp/ip sockets, and then it can work on Ethernet link layer as well. # udpong -h ... [-T test_option] s|sockets - use standard tcp/ip sockets a|async - asynchronous operation (use poll) b|blocking - use blocking calls n|nonblocking - use nonblocking calls e|echo - server echoes all messages Regarding the second issue: > > [root@rdma-dev-20 ~]$ timeout 5m udpong -s 172.31.45.119 -S 1024 -C 10240000 -T socket > name bytes xfers total time Gb/sec usec/xfer > custom 1k 10m 9.7g 16.56s 5.07 1.62 > [root@rdma-dev-20 ~]$ > > The server side hangs... Looking at the code it seems that the test is designed this way, the server remains up and waits for other clients to connect. The same thing happens also when running over IB link. (In reply to Alaa Hleihel (Mellanox) from comment #2) > Hi, > > Do you know if this ever worked on either of mlx4 or mlx5? > does it work on other vendors? > > Regards, > Alaa Going back to RHEL7, we had this issue... But, closed. See bz 1635810. So, it never worked from RHEL7 to 8.3. We do see the same issue on both MLX4 and MLX5 ROCE devices. Brian (In reply to Brian Chae from comment #4) > (In reply to Alaa Hleihel (Mellanox) from comment #2) > > Hi, > > > > Do you know if this ever worked on either of mlx4 or mlx5? > > does it work on other vendors? > > > > Regards, > > Alaa > > Going back to RHEL7, we had this issue... But, closed. See bz 1635810. So, > it never worked from RHEL7 to 8.3. I cannot access BZ 1635810, can you CC me on it? > We do see the same issue on both MLX4 and MLX5 ROCE devices. I meant, does it work on other vendors (other than Mellanox mlx5/mlx4)? (In reply to Alaa Hleihel (Mellanox) from comment #5) > I cannot access BZ 1635810, can you CC me on it? Done. Thanks, Honggang! Based on the comments in that BZ, especially https://bugzilla.redhat.com/show_bug.cgi?id=1635810#c8 , I see that this is just not supported in rscoket, and it's not an mlx5/4 specific issue. I think that this BZ can be closed as well. Regards, Alaa (In reply to Alaa Hleihel (Mellanox) from comment #5) > (In reply to Brian Chae from comment #4) > > (In reply to Alaa Hleihel (Mellanox) from comment #2) > > > Hi, > > > > > > Do you know if this ever worked on either of mlx4 or mlx5? > > > does it work on other vendors? > > > > > > Regards, > > > Alaa > > > > Going back to RHEL7, we had this issue... But, closed. See bz 1635810. So, > > it never worked from RHEL7 to 8.3. > > I cannot access BZ 1635810, can you CC me on it? > > > We do see the same issue on both MLX4 and MLX5 ROCE devices. > > I meant, does it work on other vendors (other than Mellanox mlx5/mlx4)? Alaa, the same issue is observed on BXNT ROCE device, as well. (In reply to Brian Chae from comment #8) > (In reply to Alaa Hleihel (Mellanox) from comment #5) > > (In reply to Brian Chae from comment #4) > > > (In reply to Alaa Hleihel (Mellanox) from comment #2) > > > > Hi, > > > > > > > > Do you know if this ever worked on either of mlx4 or mlx5? > > > > does it work on other vendors? > > > > > > > > Regards, > > > > Alaa > > > > > > Going back to RHEL7, we had this issue... But, closed. See bz 1635810. So, > > > it never worked from RHEL7 to 8.3. > > > > I cannot access BZ 1635810, can you CC me on it? > > > > > We do see the same issue on both MLX4 and MLX5 ROCE devices. > > > > I meant, does it work on other vendors (other than Mellanox mlx5/mlx4)? > > Alaa, the same issue is observed on BXNT ROCE device, as well. However, "udpong" succeeds on HFI1 OPA0 (In reply to Brian Chae from comment #9) > However, "udpong" succeeds on HFI1 OPA0 That's OK. OPA is InfiniBand hardware, not ROCE (Ethernet) hardware. Afer discussing with Honggang, we decided to close this bug. ack, thanks |