Bug 1814682 - Review Request: rshim - rshim driver for Mellanox BlueField SoC
Summary: Review Request: rshim - rshim driver for Mellanox BlueField SoC
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Michal Schmidt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1655882 1744737
TreeView+ depends on / blocked
 
Reported: 2020-03-18 14:13 UTC by limings
Modified: 2020-05-07 01:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-06 20:30:50 UTC
Type: ---
Embargoed:
mschmidt: fedora-review+


Attachments (Terms of Use)

Description limings 2020-03-18 14:13:36 UTC
Spec URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0/rshim.spec
SRPM URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0/rshim-2.0-1.fc31.src.rpm
Description: This is the user-space driver to access Mellanox BlueField SoC via the rshim interface. It provides ways to push boot stream, debug the target or login via virtual console or network interface.
Fedora Account System Username: lsun

This package is part of the effort to add support into Redhat for the Mellanox BlueField SoC. Please also see details in https://bugzilla.redhat.com/show_bug.cgi?id=1744737

I am the upstream maintainer of this package. This is my first package in Fedora and I would need a sponsor.

koji build URL: https://koji.fedoraproject.org/koji/taskinfo?taskID=42589875

Comment 1 Honggang LI 2020-03-19 23:42:06 UTC
http://people.redhat.com/honli/bz1814682/

      1	BUFFER_SIZE_WARNING
      4	CLANG_WARNING
      2	LOCK
      5	MISSING_LOCK
      1	OVERRUN
      1	RESOURCE_LEAK
      1	SIZEOF_MISMATCH
      5	SLEEP
      7	TAINTED_SCALAR

Hi, please fix the issues found by Coverity. If you think an issue is false positive, please explain it.

Comment 2 Michal Schmidt 2020-03-20 16:12:45 UTC
Commenting on a few lines in rshim.spec:

> URL: https://github.com/mellanox/rshim-user-space

Seeing the name "rshim-user-space" made me curious if there's a kernel part and I found:
https://github.com/mellanox/rshim
"BlueField RSHIM driver"

What is the relation between the kernel and the userspace drivers?
Was the kernel driver the original approach which later got obsoleted by the userspace implementation?

> Obsoletes: %{name} < 2.0

This line says: rshim-2.0 obsoletes rshim < 2.0. That seems redundant.

> %global with_systemd %(if ( test -d "%{_unitdir}" > /dev/null); then echo -n '1'; else echo -n '0'; fi)

Why not assume systemd always?

> %prep
> rm -fr %{name}-%{version}
> mkdir %{name}-%{version}
> tar -axf %{SOURCE0} -C %{name}-%{version} --strip-components 1
> %setup -q -D -T

Would this simpler way work too?:
%prep
%setup -n rshim-user-space-%{name}-%{version}

Or maybe, since you are the upstream maintainer, you could change the upstream tarball generation to avoid the odd directory name?

> %install
> %makeinstall -C src INSTALL_DIR="%{buildroot}%{_bindir}"
> %if "%{with_systemd}" == "1"
> %{__install} -d %{buildroot}%{_unitdir}
> %{__install} -m 0644 bfrshim.service %{buildroot}%{_unitdir}
> %endif
> %{__install} -d %{buildroot}%{_mandir}/man1
> %{__install} -m 0644 man/bfrshim.1 %{buildroot}%{_mandir}/man1

Why don't you let %makeinstall install these?

> %__spec_install_post

This looks like duplicating rpm-build's work.

> %post
> %if "%{with_systemd}" == "1"
>   systemctl daemon-reload
>   systemctl enable bfrshim.service
> %endif

No, you need to use the macros from the packaging guidelines:
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_systemd

> %files
> %license LICENSE
> %defattr(-,root,root,-)

This looks like the default, so the %defattr line is redundant. 

> %%doc README.md

Double % sign, typo?

> %changelog
> * Fri Mar 13 2020 Liming Sun <lsun> - 2.0-1
>- Update the spec file according to fedora packaging-guidelines
> * Mon Dec 16 2019 Liming Sun <lsun>
>- Initial packaging

Please add an empty line between changelog entries.

Comment 3 Michal Schmidt 2020-03-20 16:27:07 UTC
The github project name is "rshim-user-space", the package name is "rshim", the daemon binary and the systemd service are called "bfrshim".
Would it be too late to make them the same? Just a proposal. Feel free to disagree.

Comment 4 limings 2020-03-20 17:50:05 UTC
(In reply to Honggang LI from comment #1)
> http://people.redhat.com/honli/bz1814682/
> 
>       1	BUFFER_SIZE_WARNING
>       4	CLANG_WARNING
>       2	LOCK
>       5	MISSING_LOCK
>       1	OVERRUN
>       1	RESOURCE_LEAK
>       1	SIZEOF_MISMATCH
>       5	SLEEP
>       7	TAINTED_SCALAR
> 
> Hi, please fix the issues found by Coverity. If you think an issue is false
> positive, please explain it.

Thanks Honggang for the comments. I am working on it and will post new version after fixing these warnings.

Comment 5 limings 2020-03-20 18:45:40 UTC
(In reply to Michal Schmidt from comment #2)
> Commenting on a few lines in rshim.spec:
> 
> > URL: https://github.com/mellanox/rshim-user-space
> 
> Seeing the name "rshim-user-space" made me curious if there's a kernel part
> and I found:
> https://github.com/mellanox/rshim
> "BlueField RSHIM driver"
> 
> What is the relation between the kernel and the userspace drivers?
> Was the kernel driver the original approach which later got obsoleted by the
> userspace implementation?

Yes, the kernel driver is the original approach, which has upstreaming difficulties due to mixed functionalities. This 'rshim-user-space' driver intends to replace the kernel driver with exactly the same user interface. 

> 
> > Obsoletes: %{name} < 2.0
> 
> This line says: rshim-2.0 obsoletes rshim < 2.0. That seems redundant.

Will remove it.

> 
> > %global with_systemd %(if ( test -d "%{_unitdir}" > /dev/null); then echo -n '1'; else echo -n '0'; fi)
> 
> Why not assume systemd always?

The driver might be used on some older OS version (like centos 6) which doesn't have systemd. The check here is to prevent source RPM build issues for such case. This package will also be included in Mellanox OFED package which needs to support different build environments.

> 
> > %prep
> > rm -fr %{name}-%{version}
> > mkdir %{name}-%{version}
> > tar -axf %{SOURCE0} -C %{name}-%{version} --strip-components 1
> > %setup -q -D -T
> 
> Would this simpler way work too?:
> %prep
> %setup -n rshim-user-space-%{name}-%{version}
> 
> Or maybe, since you are the upstream maintainer, you could change the
> upstream tarball generation to avoid the odd directory name?

I had a little trouble with the name. The current tarball with such top-level odd name was generated by GitHub automatically when a release/tag is created. But you're right. I could actually upload the tarball and use it instead.
Will make the change in next post.

> 
> > %install
> > %makeinstall -C src INSTALL_DIR="%{buildroot}%{_bindir}"
> > %if "%{with_systemd}" == "1"
> > %{__install} -d %{buildroot}%{_unitdir}
> > %{__install} -m 0644 bfrshim.service %{buildroot}%{_unitdir}
> > %endif
> > %{__install} -d %{buildroot}%{_mandir}/man1
> > %{__install} -m 0644 man/bfrshim.1 %{buildroot}%{_mandir}/man1
> 
> Why don't you let %makeinstall install these?

Will update it in next post.

> 
> > %__spec_install_post
> 
> This looks like duplicating rpm-build's work.

Will update it in next post.

> 
> > %post
> > %if "%{with_systemd}" == "1"
> >   systemctl daemon-reload
> >   systemctl enable bfrshim.service
> > %endif
> 
> No, you need to use the macros from the packaging guidelines:
> https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/
> #_systemd

Will update it in next post.

> 
> > %files
> > %license LICENSE
> > %defattr(-,root,root,-)
> 
> This looks like the default, so the %defattr line is redundant. 

Will remove it in next post.

> 
> > %%doc README.md
> 
> Double % sign, typo?

Will fix it in next post.

> 
> > %changelog
> > * Fri Mar 13 2020 Liming Sun <lsun> - 2.0-1
> >- Update the spec file according to fedora packaging-guidelines
> > * Mon Dec 16 2019 Liming Sun <lsun>
> >- Initial packaging
> 
> Please add an empty line between changelog entries.

Will fix it in next post.

Comment 6 limings 2020-03-20 19:35:37 UTC
(In reply to Michal Schmidt from comment #3)
> The github project name is "rshim-user-space", the package name is "rshim",
> the daemon binary and the systemd service are called "bfrshim".
> Would it be too late to make them the same? Just a proposal. Feel free to
> disagree.

It's good suggestion. A little history that this package is user-space driver and is intended to replace the kernel module implementation which has name 'rshim' on GitHub. Both are called 'rshim' for easy upgrade since they have exactly the same functionality and user interface. RPM version < 2.0 is for kernel modules implementation, >= 2.0 is for user space implementation.

I'll rename bfrshim to 'rshim' directly to avoid confusion on this one. Thanks!

Comment 7 Honggang LI 2020-03-23 02:28:30 UTC
1814682-rshim]$ rpm -qpl results/rshim-2.0-1.fc33.x86_64.rpm  | grep man 
/usr/share/man/man1/bfrshim.1.gz
               ^^^^         ^

 1814682-rshim]$ rpm -qpl results/rshim-2.0-1.fc33.x86_64.rpm  | grep bin
/usr/bin/bfrshim
    ^^^^

./src/rshim.c:2092:  rc = system("modprobe cuse");
./src/rshim_net.c:56:  rc = system("modprobe tun");

The binary 'bfrshim' may execute 'modprobe' command, which requires administrator permission.
I suggest to move 'bfrshim' into the '/usr/sbin' directory, and install the manpage into man-8 section instead of man-1.

Comment 8 Honggang LI 2020-03-23 02:34:47 UTC
(In reply to lsun from comment #4)
 
> Thanks Honggang for the comments. I am working on it and will post new
> version after fixing these warnings.

Please fix those issue for this fedora package review. When we import this package for RHEL-8,
we will have to fix those issue. Because it is a mandatory task for RHEL package import.

Fix those issues earlier will save time for us in the future.

Thanks

Comment 9 limings 2020-03-23 16:03:38 UTC
(In reply to Honggang LI from comment #7)
> 1814682-rshim]$ rpm -qpl results/rshim-2.0-1.fc33.x86_64.rpm  | grep man 
> /usr/share/man/man1/bfrshim.1.gz
>                ^^^^         ^
> 
>  1814682-rshim]$ rpm -qpl results/rshim-2.0-1.fc33.x86_64.rpm  | grep bin
> /usr/bin/bfrshim
>     ^^^^
> 
> ./src/rshim.c:2092:  rc = system("modprobe cuse");
> ./src/rshim_net.c:56:  rc = system("modprobe tun");
> 
> The binary 'bfrshim' may execute 'modprobe' command, which requires
> administrator permission.
> I suggest to move 'bfrshim' into the '/usr/sbin' directory, and install the
> manpage into man-8 section instead of man-1.

This binary does need root permission to run. I'll update it as suggested.

Comment 10 limings 2020-03-23 16:06:57 UTC
(In reply to Honggang LI from comment #8)
> (In reply to lsun from comment #4)
>  
> > Thanks Honggang for the comments. I am working on it and will post new
> > version after fixing these warnings.
> 
> Please fix those issue for this fedora package review. When we import this
> package for RHEL-8,
> we will have to fix those issue. Because it is a mandatory task for RHEL
> package import.
> 
> Fix those issues earlier will save time for us in the future.
> 
> Thanks

Yes, definitely. I have been working on it and planned to submit a new version today.

One question about the coverity warnings:

Is there a way I could run koji to get the same output? Or is it possible you could give me some details about this warning: "4	CLANG_WARNING"?  I ran coverity locally but couldn't get this one. Thanks!

Comment 11 Michal Schmidt 2020-03-23 16:58:18 UTC
I reproduced it on a Fedora system.
You can install csmock:
  dnf install csmock
And apply the "clang" checker on your source RPM:
  csmock -t clang -r fedora-rawhide-x86_64 -o output/ ./rshim-2.0-1.fc31.src.rpm
You'll find "scan-results.*" in the "output/" directory.

Comment 12 Michal Schmidt 2020-03-23 17:01:59 UTC
Honggang provided his test results in:
https://people.redhat.com/honli/bz1814682/rshim-2.0-1.fc31.tar.xz

Comment 13 limings 2020-03-23 20:11:35 UTC
Thanks all for the comments! 
Below are the updated info (2.0.1) trying to solve these comments.

Spec URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.1/rshim.spec
SRPM URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.1/rshim-2.0.1-1.fc31.src.rpm
koji build URL: https://koji.fedoraproject.org/koji/taskinfo?taskID=42727191

Below are the unsolved coverity warnings and explanations (based on Comment 12):

Error: LOCK (CWE-667): [#def2]
[lsun] Function rshim_write_delayed() has lock held outside by the caller. No need to unlock it on return.

Error: LOCK (CWE-667): [#def3]
[lsun] same.

Error: MISSING_LOCK (CWE-667): [#def4]
[lsun] This is the RSH_DEV_TYPE_TMFIFO case and is called from rshim_fifo_input()and rshim_fifo_output(). In both cases the ringlock are already held.

Error: MISSING_LOCK (CWE-667): [#def5]
[lsun] This one is called from  the 'RSH_EVENT_ATTACH' handling, which already has the lock held when calling rshim_notify(bd, RSH_EVENT_ATTACH, 0).

Error: MISSING_LOCK (CWE-667): [#def6]
[lsun] same

Error: SLEEP (CWE-367): [#def26]
rshim_register(bd);
[lsun] This one is called during device probe at early stage. The sleep purpose is to detect if any other driver has already attached to the same rshim device (since it could be attached from USB or PCIe via different host machine). The probing takes some time. The sleep here is ok.

Error: MISSING_LOCK (CWE-667): [#def24]
Error: MISSING_LOCK (CWE-667): [#def25]
[lsun] same as [#def4 above.

Comment 14 Honggang LI 2020-03-24 01:29:15 UTC
(In reply to lsun from comment #13)
> Thanks all for the comments! 
> Below are the updated info (2.0.1) trying to solve these comments.
> 
> Spec URL:
> https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.1/
> rshim.spec
> SRPM URL:
> https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.1/
> rshim-2.0.1-1.fc31.src.rpm


*************************************************************
Task method: VersionDiffBuild

Task URL: https://cov01.lab.eng.brq.redhat.com/covscanhub/task/164815/
Comment:


Added (+), Fixed (-)

SLEEP                     +2
TAINTED_SCALAR            +1


BUFFER_SIZE_WARNING       -1
CLANG_WARNING             -4
LOCK                      -2
OVERRUN                   -1
PW.BAD_PRINTF_FORMAT_STRING -3
RESOURCE_LEAK             -1
SIZEOF_MISMATCH           -1
SLEEP                     -4
TAINTED_SCALAR            -7
*************************************************************


Newly introduced defects
List of Defects

Error: TAINTED_SCALAR (CWE-20): [#def1]
rshim-2.0.1/src/rshim.c:2209: tainted_argument: Calling function "rshim_fd_full_read" taints argument "index".
rshim-2.0.1/src/rshim.c:2211: tainted_data: Using tainted variable "index" as an index into an array "rshim_devs".
rshim-2.0.1/src/rshim.c:2211: remediation: Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
# 2209|           rc = rshim_fd_full_read(rshim_work_fd[0], &index, sizeof(index));
# 2210|           if (rc == sizeof(index)) {
# 2211|->           bd = rshim_devs[index];
# 2212|             if (bd)
# 2213|               rshim_work_handler(bd);

Error: SLEEP (CWE-367): [#def2]
rshim-2.0.1/src/rshim_pcie.c:392: lock_acquire: Calling function "pthread_mutex_lock" acquires lock "bd->mutex".
rshim-2.0.1/src/rshim_pcie.c:400: sleep: Call to "rshim_register" might sleep while holding lock "bd->mutex".
#  398|      */
#  399|     rshim_lock();
#  400|->   ret = rshim_register(bd);
#  401|     if (ret) {
#  402|       rshim_unlock();

Error: SLEEP (CWE-367): [#def3]
rshim-2.0.1/src/rshim_pcie_lf.c:524: lock_acquire: Calling function "pthread_mutex_lock" acquires lock "bd->mutex".
rshim-2.0.1/src/rshim_pcie_lf.c:532: sleep: Call to "rshim_register" might sleep while holding lock "bd->mutex".
#  530|      */
#  531|     rshim_lock();
#  532|->   ret = rshim_register(bd);
#  533|     if (ret) {
#  534|       rshim_unlock();




> koji build URL: https://koji.fedoraproject.org/koji/taskinfo?taskID=42727191
> 
> Below are the unsolved coverity warnings and explanations (based on Comment
> 12):
> 
> Error: LOCK (CWE-667): [#def2]
> [lsun] Function rshim_write_delayed() has lock held outside by the caller.
> No need to unlock it on return.
> 
> Error: LOCK (CWE-667): [#def3]
> [lsun] same.
> 
> Error: MISSING_LOCK (CWE-667): [#def4]
> [lsun] This is the RSH_DEV_TYPE_TMFIFO case and is called from
> rshim_fifo_input()and rshim_fifo_output(). In both cases the ringlock are
> already held.
> 
> Error: MISSING_LOCK (CWE-667): [#def5]
> [lsun] This one is called from  the 'RSH_EVENT_ATTACH' handling, which
> already has the lock held when calling rshim_notify(bd, RSH_EVENT_ATTACH, 0).
> 
> Error: MISSING_LOCK (CWE-667): [#def6]
> [lsun] same
> 
> Error: SLEEP (CWE-367): [#def26]
> rshim_register(bd);
> [lsun] This one is called during device probe at early stage. The sleep
> purpose is to detect if any other driver has already attached to the same
> rshim device (since it could be attached from USB or PCIe via different host
> machine). The probing takes some time. The sleep here is ok.
> 
> Error: MISSING_LOCK (CWE-667): [#def24]
> Error: MISSING_LOCK (CWE-667): [#def25]
> [lsun] same as [#def4 above.

Thanks for the comments.

Comment 15 Honggang LI 2020-03-24 09:15:20 UTC
Michal
 Assign this bug to you, as I don't have the permission to add lsun into the fedora packager group. thanks

Comment 16 limings 2020-03-24 15:45:12 UTC
Thanks Honggang! The 'TAINTED_SCALAR +1' is an easy fix. I'll make it in next version.
The "SLEEP +2" is the same as "Error: SLEEP (CWE-367): [#def26]" in last build, which is ok since this function happens during driver probe which could take while.

(In reply to Honggang LI from comment #14)
...
> Task URL: https://cov01.lab.eng.brq.redhat.com/covscanhub/task/164815/
> Comment:
> 
> 
> Added (+), Fixed (-)
> 
> SLEEP                     +2
> TAINTED_SCALAR            +1
...
> *************************************************************
> 
> 
> Newly introduced defects
> List of Defects
> 
> Error: TAINTED_SCALAR (CWE-20): [#def1]
...
> 
> Error: SLEEP (CWE-367): [#def2]
...
> 
> Error: SLEEP (CWE-367): [#def3]
...

Comment 17 Honggang LI 2020-03-25 11:14:08 UTC
I tried to test rshim with an aarch64 machine. But I never got the /dev/rsh*/* files. How can I test rshim? thank


[root@mellanox-bluefield-01 p]# rshim -f -l 4
modprobe: FATAL: Module cuse not found in directory /lib/modules/4.18.0-188.2.el8.bz1655714.v1.aarch64
Probing pcie-03:00.2
create rshim pcie-03:00.2
BAR[0] unassigned, run 'lspci -v'
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^Cepoll_wait failed; Interrupted system call
^C^Cepoll_wait failed; Interrupted system call
^Z
[1]+  Stopped                 rshim -f -l 4
[root@mellanox-bluefield-01 p]# ps -ef  | grep rshim
root       11213    4877  2 07:10 pts/0    00:00:00 rshim -f -l 4
root       11239    4877  0 07:11 pts/0    00:00:00 grep --color=auto rshim
[root@mellanox-bluefield-01 p]# kill -9 11213
[root@mellanox-bluefield-01 p]# 
[1]+  Killed                  rshim -f -l 4
[root@mellanox-bluefield-01 p]# 
[root@mellanox-bluefield-01 p]# lspci -v
00:00.0 PCI bridge: Mellanox Technologies MT416842 BlueField SoC Crypto enabled (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 75
	Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
	I/O behind bridge: 00000000-00000fff [size=4K]
	Memory behind bridge: 00000000-001fffff [size=2M]
	Prefetchable memory behind bridge: 000000e200000000-000000e204ffffff [size=80M]
	Capabilities: [60] Express Root Port (Slot-), MSI 00
	Capabilities: [e0] MSI: Enable+ Count=4/4 Maskable+ 64bit+
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [1c0] #19
	Capabilities: [230] Access Control Services
	Capabilities: [320] #27
	Capabilities: [370] #26
	Capabilities: [430] Downstream Port Containment
	Kernel driver in use: pcieport

01:00.0 PCI bridge: Mellanox Technologies MT416842 Family [BlueField SoC PCIe Bridge] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 78
	Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 00000000-00000fff [size=4K]
	Memory behind bridge: 00000000-001fffff [size=2M]
	Prefetchable memory behind bridge: 000000e200000000-000000e204ffffff [size=80M]
	Capabilities: [60] Express Upstream Port, MSI 00
	Capabilities: [e0] MSI: Enable+ Count=1/4 Maskable+ 64bit+
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [1c0] #19
	Capabilities: [320] #27
	Capabilities: [370] #26
	Kernel driver in use: pcieport

02:02.0 PCI bridge: Mellanox Technologies MT416842 Family [BlueField SoC PCIe Bridge] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 79
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 00000000-00000fff [size=4K]
	Memory behind bridge: 00000000-001fffff [size=2M]
	Prefetchable memory behind bridge: 000000e200000000-000000e204ffffff [size=80M]
	Capabilities: [60] Express Downstream Port (Slot-), MSI 00
	Capabilities: [e0] MSI: Enable+ Count=2/4 Maskable+ 64bit+
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [1c0] #19
	Capabilities: [230] Access Control Services
	Capabilities: [320] #27
	Capabilities: [370] #26
	Capabilities: [430] Downstream Port Containment
	Kernel driver in use: pcieport

03:00.0 Ethernet controller: Mellanox Technologies MT416842 BlueField integrated ConnectX-5 network controller
	Subsystem: Mellanox Technologies Device 0029
	Flags: bus master, fast devsel, latency 0, IRQ 81
	Memory at e200000000 (64-bit, prefetchable) [size=32M]
	Memory at e204000000 (64-bit, prefetchable) [size=2M]
	Expansion ROM at e000000000 [disabled] [size=1M]
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [1c0] #19
	Capabilities: [230] Access Control Services
	Capabilities: [320] #27
	Capabilities: [370] #26
	Capabilities: [420] #25
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

03:00.1 Ethernet controller: Mellanox Technologies MT416842 BlueField integrated ConnectX-5 network controller
	Subsystem: Mellanox Technologies Device 0029
	Flags: bus master, fast devsel, latency 0, IRQ 99
	Memory at e202000000 (64-bit, prefetchable) [size=32M]
	Memory at e204200000 (64-bit, prefetchable) [size=2M]
	Expansion ROM at e000100000 [disabled] [size=1M]
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [230] Access Control Services
	Capabilities: [420] #25
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

03:00.2 DMA controller: Mellanox Technologies MT416842 BlueField SoC management interfac (prog-if 00 [8237])
	Subsystem: Mellanox Technologies Device 0029
	Flags: fast devsel
	Memory at <unassigned> (64-bit, prefetchable) [disabled]
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable- Count=64 Masked-
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [230] Access Control Services
	Capabilities: [420] #25

[root@mellanox-bluefield-01 p]# uname -r
4.18.0-188.2.el8.bz1655714.v1.aarch64

Comment 18 Alaa Hleihel (NVIDIA Mellanox) 2020-03-25 11:22:08 UTC
(In reply to Honggang LI from comment #17)
> I tried to test rshim with an aarch64 machine. But I never got the
> /dev/rsh*/* files. How can I test rshim? thank
> 
> 
> [root@mellanox-bluefield-01 p]# rshim -f -l 4
> 

I see that you are trying to load the rshim driver on the BlueField system, which is wrong.

- rshim user-space driver is used on the external host that hosts the BlueField card.
- on the BlueField, we have a module named mlxbf_tmfifo that creates the corresponding interface that links to the rshim interface on the external host.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1655736#c7

> [root@mellanox-bluefield-01 p]# uname -r
> 4.18.0-188.2.el8.bz1655714.v1.aarch64

This kernel build has the mlxbf_tmfifo driver.

Comment 19 Honggang LI 2020-03-25 12:32:56 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #18)

> See also https://bugzilla.redhat.com/show_bug.cgi?id=1655736#c7

 on SmartNIC:
  # ip add add 192.168.100.2/24 dev tmfifo_net0
  # ip -6 addr add 2001::192:168:100:2/112 dev tmfifo_net0

What does that mean "on SmartNIC"? How can I "login" the SmartNIC?

Comment 20 Alaa Hleihel (NVIDIA Mellanox) 2020-03-25 12:39:29 UTC
(In reply to Honggang LI from comment #19)
> (In reply to Alaa Hleihel (Mellanox) from comment #18)
> 
> > See also https://bugzilla.redhat.com/show_bug.cgi?id=1655736#c7
> 
>  on SmartNIC:
>   # ip add add 192.168.100.2/24 dev tmfifo_net0
>   # ip -6 addr add 2001::192:168:100:2/112 dev tmfifo_net0
> 
> What does that mean "on SmartNIC"? How can I "login" the SmartNIC?

I'm confused, weren't you trying to run rshim tool on the SmartNIC before?
> [root@mellanox-bluefield-01 p]# uname -r
What is "mellanox-bluefield-01" host? is this the SmartNIC (BlueField) or the external host that is hosting the BlueField card?

> How can I "login" the SmartNIC?

Connecting to the SmartNIC is possible via couple of interfaces:
1. minicom to the UART/USB interface (if cables are connected)
2. using rshim driver on the external host, we get the /dev/rshimX/console that we can connected to with screen or minicom tools.


Can I have access to this system?

Comment 21 Honggang LI 2020-03-25 12:49:43 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #20)
> (In reply to Honggang LI from comment #19)
> > (In reply to Alaa Hleihel (Mellanox) from comment #18)
> > 
> > > See also https://bugzilla.redhat.com/show_bug.cgi?id=1655736#c7
> > 
> >  on SmartNIC:
> >   # ip add add 192.168.100.2/24 dev tmfifo_net0
> >   # ip -6 addr add 2001::192:168:100:2/112 dev tmfifo_net0
> > 
> > What does that mean "on SmartNIC"? How can I "login" the SmartNIC?
> 
> I'm confused, weren't you trying to run rshim tool on the SmartNIC before?
> > [root@mellanox-bluefield-01 p]# uname -r
> What is "mellanox-bluefield-01" host? is this the SmartNIC (BlueField) or
> the external host that is hosting the BlueField card?

mellanox-bluefield-01.khw4.lab.eng.bos.redhat.com

> 
> > How can I "login" the SmartNIC?
> 
> Connecting to the SmartNIC is possible via couple of interfaces:
> 1. minicom to the UART/USB interface (if cables are connected)
> 2. using rshim driver on the external host, we get the /dev/rshimX/console
> that we can connected to with screen or minicom tools.
> 
> 
> Can I have access to this system?

I sent the accout/pw via email.

Comment 22 Honggang LI 2020-03-25 12:59:52 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #20)

> Connecting to the SmartNIC is possible via couple of interfaces:
> 1. minicom to the UART/USB interface (if cables are connected)
> 2. using rshim driver on the external host, we get the /dev/rshimX/console
           ^^^^^^^^^^^^

Do you mean kernel space driver? or user space driver?

Comment 23 Alaa Hleihel (NVIDIA Mellanox) 2020-03-25 13:05:25 UTC
Thanks!

I see is pbunyan's setup :)

* mellanox-bluefield-01.khw4.lab.eng.bos.redhat.com
This is indeed the BlueFeild card, that we also refer to as the SmartNIC.

The 'external' host that host this card is also an ARM64 system in this case (which doesn't matter much for us), it is: hpe-mantis-01.khw4.lab.eng.bos.redhat.com
I used to connect to the SmartNIC console from the mantis host using this command:
# minicom --baudrate 115200 --device /dev/ttyUSB0
Anyway, you already got a direct SSH connection to this SmartNIC (ssh to mellanox-bluefield-01.khw4.lab.eng.bos.redhat.com host).

So to test out the new rshim user-space package, you need to login to the mantis host and install it there.
But first, note that you need to remove the rshim kernel module that is currently installed and running there (RPM package rshim-1.16-0.ga7ad4e6_4.18.0_80.el8.aarch64.aarch64)
[root@hpe-mantis-01 ~]# lsmod | grep rshim
rshim_net             262144  0
rshim_pcie            262144  0
rshim                 262144  2 rshim_pcie,rshim_net
[root@hpe-mantis-01 ~]# 


(In reply to Honggang LI from comment #22)
> (In reply to Alaa Hleihel (Mellanox) from comment #20)
> 
> > Connecting to the SmartNIC is possible via couple of interfaces:
> > 1. minicom to the UART/USB interface (if cables are connected)
> > 2. using rshim driver on the external host, we get the /dev/rshimX/console
>            ^^^^^^^^^^^^
> 
> Do you mean kernel space driver? or user space driver?

Either one will provide this console. But we are moving away from the kernel implementation to the user-space one,
so let's focus on the new user-space driver.

Comment 24 limings 2020-03-25 13:15:36 UTC
>>> modprobe: FATAL: Module cuse not found in directory /lib/modules/4.18.0-188.2.el8. bz1655714.v1.aarch64

Honggang,

I was a little concerned about the "modprobe: FATAL: Module cuse not found" error message you mentioned earlier. Once you try it on hpe-mantis-01 (the external host machine) and see the same error messages,  you could check the Linux configuration CONFIG_CUSE (see below) to see whether it's enabled or not. The user-space driver relies on it to create the /dev/rshimX/. It's usually enabled by default in centos, ubuntu, and default arm64 config. But I am not sure the kernel configuration you're using.

CONFIG_FUSE_FS=m
CONFIG_CUSE=m

Thanks!

Comment 25 Honggang LI 2020-03-27 08:37:47 UTC
I'm waiting for a pair of external host and SmartNIC to test rshim. The one I used is unavailable now.

Comment 26 Alaa Hleihel (NVIDIA Mellanox) 2020-03-29 12:39:16 UTC
(In reply to Honggang LI from comment #25)
> I'm waiting for a pair of external host and SmartNIC to test rshim. The one
> I used is unavailable now.

Hi Honggang,

Did you check if you can use the other machines in Beaker?

Comment 27 Honggang LI 2020-03-30 02:35:34 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #26)

> Did you check if you can use the other machines in Beaker?

Yes, I sent email to machine owner to ask for access of rdma-qe-04 and rdma-snic-01 .

Comment 28 limings 2020-04-06 12:38:16 UTC
Michal (and Honggang),

Any advices what the next step would be? Do I need to provide new version to fix anything? Michal, Honggang mentioned that I need to be added into fedora packager group (comment 15 above). Is it something you could help or any suggestions? Or are we waiting for the verification be complete first? This is my first fedora package. I'll need some guidance and a 'sponsor' according to the wiki https://fedoraproject.org/wiki/How_to_get_sponsored_into_the_packager_group.

Thanks!
Liming

Comment 29 Honggang LI 2020-04-08 07:57:12 UTC
(In reply to lsun from comment #28)
> Michal (and Honggang),
> 
> Any advices what the next step would be? Do I need to provide new version to

The package is good. We were waiting for hardware to test the package.
Unfortunately, we are unlikely get the hardware after a week waiting.

We will skip the test of this package for Fedora Rawhide. Redhat QE will
test it for RHEL, when this package get into RHEL.

> fix anything? Michal, Honggang mentioned that I need to be added into fedora
> packager group (comment 15 above).

mschmidt will add you into the packager group.

Comment 30 Honggang LI 2020-04-08 08:00:16 UTC
The srpm package PASSED review. Set the 'fedora-review+' flag for it.

Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed



===== MUST items =====

C/C++:
[ ]: Package does not contain kernel modules.
honli: PASS

[ ]: Package contains no static executables.
honli: PASS

[x]: If your application is a C or C++ application you must list a
     BuildRequires against gcc, gcc-c++ or clang.
[x]: Header files in -devel subpackage, if present.
[x]: Package does not contain any libtool archives (.la)
[x]: Rpath absent or only used for internal libs.

Generic:
[ ]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
honli: PASS. See next comment.

[ ]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "*No copyright* GPL (v2)", "Unknown or generated". 19 files
     have unknown license. Detailed output of licensecheck in
     /home/honli/1814682-rshim/licensecheck.txt
honli: PASS. I had checked all files. Only three files do not have license
tag. rshim-2.0.1/README.md, rshim-2.0.1/man/rshim.8, and rshim-2.0.1/bootstrap.sh
does not have license tag. It is fine for the first two files as no one
add license tag for such kind of files. rshim-2.0.1/bootstrap.sh is a dummy
bash script. It is better to add a SPDX-License-Identifier in the head of it.
But it is also acceptable to ignore it.

[ ]: License file installed when any subpackage combination is installed.
honli: PASS.

[ ]: Package must own all directories that it creates.
     Note: Directories without known owners: /usr/lib/systemd/system,
     /usr/lib/systemd
honli: PASS. Both directories are co-own by many packages.

[ ]: %build honors applicable compiler flags or justifies otherwise.
honli: PASS. Default compiler flags used.

[ ]: Package contains no bundled libraries without FPC exception.
honli: PASS. No bundled library.

[ ]: Changelog in prescribed format.
honli: PASS

[ ]: Sources contain only permissible code or content.
honli: PASS

[ ]: Package contains desktop file if it is a GUI application.
honli: PASS. No desktop file.

[ ]: Development files must be in a -devel package
honli: PASS. No development files.

[ ]: Package uses nothing in %doc for runtime.
honli: PASS.

[ ]: Package consistently uses macros (instead of hard-coded directory
     names).
honli: PASS

[ ]: Package is named according to the Package Naming Guidelines.
honli: PASS

[ ]: Package does not generate any conflict.
honli: PASS

[ ]: Package obeys FHS, except libexecdir and /usr/target.
honli: PASS

[ ]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
honli: PASS

[ ]: Requires correct, justified where necessary.
honli: PASS

[ ]: Spec file is legible and written in American English.
honli: PASS

[ ]: Package contains systemd file(s) if in need.
honli: PASS

[ ]: Useful -debuginfo package or justification otherwise.
honli: PASS

[ ]: Package is not known to require an ExcludeArch tag.
honli: PASS

[ ]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 10240 bytes in 1 files.
honli: PASS

[ ]: Package complies to the Packaging Guidelines
honli: PASS

[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: Package requires other packages for directories it uses.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: systemd_post is invoked in %post, systemd_preun in %preun, and
     systemd_postun in %postun for Systemd service files.
     Note: Systemd service file(s) in rshim
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

===== SHOULD items =====

Generic:
[!]: Uses parallel make %{?_smp_mflags} macro.
honli: PASS. Because the source tarball just has a few C files. That means
it can be compiled in about one minute. It is OK for this package to ignore
parallel compilation.

[ ]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
honli: PASS. It is has a separate license file.

[ ]: Final provides and requires are sane (see attachments).
honli: PASS.

[ ]: Package functions as described.
honli: PASS. We could not get the hardware in time, so we will ask Mellanox to
test this. After this package get into RHEL, Redhat QE will test it for RHEL.

[ ]: Latest version is packaged.
honli: PASS

[ ]: Package does not include license text files separate from upstream.
honli: PASS

[ ]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
honli: PASS

[ ]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
honli: PASS

[ ]: Package should compile and build into binary rpms on all supported
     architectures.
honli: PASS

[ ]: %check is present and all tests pass.
honli: PASS

[ ]: Packages should try to preserve timestamps of original installed
     files.
honli: PASS

[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Fully versioned dependency in subpackages if applicable.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on debuginfo package(s).
     Note: No rpmlint messages.
[x]: Rpmlint is run on all installed packages.
     Note: There are rpmlint messages (see attachment).
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.
[x]: Package should not use obsolete m4 macros
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: rshim-2.0.1-1.fc33.x86_64.rpm
          rshim-debuginfo-2.0.1-1.fc33.x86_64.rpm
          rshim-debugsource-2.0.1-1.fc33.x86_64.rpm
          rshim-2.0.1-1.fc33.src.rpm
4 packages and 0 specfiles checked; 0 errors, 0 warnings.




Rpmlint (debuginfo)
-------------------
Checking: rshim-debuginfo-2.0.1-1.fc33.x86_64.rpm
1 packages and 0 specfiles checked; 0 errors, 0 warnings.





Rpmlint (installed packages)
----------------------------
rshim-debugsource.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
rshim-debuginfo.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
rshim.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
3 packages and 0 specfiles checked; 0 errors, 3 warnings.
honli: PASS. I'm pretty sure those URL are valid. It is a fedora-review
tool issue for those false positive.


Source checksums
----------------
https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.1/rshim-2.0.1.tar.gz :
  CHECKSUM(SHA256) this package     : 57c9055df902d55ad990d3209e0c190f9c4d9087726bad31afd457bd95bcba80
  CHECKSUM(SHA256) upstream package : 57c9055df902d55ad990d3209e0c190f9c4d9087726bad31afd457bd95bcba80


Requires
--------
rshim (rpmlib, GLIBC filtered):
    /bin/sh
    libc.so.6()(64bit)
    libfuse.so.2()(64bit)
    libfuse.so.2(FUSE_2.4)(64bit)
    libfuse.so.2(FUSE_2.5)(64bit)
    libfuse.so.2(FUSE_2.8)(64bit)
    libpci.so.3()(64bit)
    libpci.so.3(LIBPCI_3.0)(64bit)
    libpci.so.3(LIBPCI_3.5)(64bit)
    libpthread.so.0()(64bit)
    libusb-1.0.so.0()(64bit)
    rtld(GNU_HASH)

rshim-debuginfo (rpmlib, GLIBC filtered):

rshim-debugsource (rpmlib, GLIBC filtered):



Provides
--------
rshim:
    rshim
    rshim(x86-64)

rshim-debuginfo:
    debuginfo(build-id)
    rshim-debuginfo
    rshim-debuginfo(x86-64)

rshim-debugsource:
    rshim-debugsource
    rshim-debugsource(x86-64)



Generated by fedora-review 0.7.5 (5fa5b7e) last change: 2020-02-16
Command line :/usr/bin/fedora-review -b 1814682
Buildroot used: fedora-rawhide-x86_64
Active plugins: Generic, Shell-api, C/C++
Disabled plugins: Haskell, Ruby, PHP, Ocaml, Java, R, SugarActivity, Perl, fonts, Python
Disabled flags: EPEL6, EPEL7, DISTTAG, BATCH, EXARCH

Comment 31 Michal Schmidt 2020-04-08 22:34:02 UTC
Thank you Honggang for the detailed review results.
I disagree with the overall PASSED result, though the package is almost there.


Liming,

please fix this item flagged by fedora-review:
> [!]: Uses parallel make %{?_smp_mflags} macro.
The recommended fix is to use the "%make_build" macro.

You still have this in the %install section:
> %__spec_install_post
Please remove it.

About these BRs:
> BuildRequires: systemd-units
> BuildRequires: systemd-rpm-macros
The first one seems unnecessary.
systemd-rpm-macros should be sufficient.

Comment 32 limings 2020-04-09 13:04:10 UTC
(In reply to Michal Schmidt from comment #31)
> Thank you Honggang for the detailed review results.
> I disagree with the overall PASSED result, though the package is almost
> there.
> 
> 
> Liming,
> 
> please fix this item flagged by fedora-review:
> > [!]: Uses parallel make %{?_smp_mflags} macro.
> The recommended fix is to use the "%make_build" macro.
> 
> You still have this in the %install section:
> > %__spec_install_post
> Please remove it.
> 
> About these BRs:
> > BuildRequires: systemd-units
> > BuildRequires: systemd-rpm-macros
> The first one seems unnecessary.
> systemd-rpm-macros should be sufficient.

Thanks Michal! I'll use %make_build and remove the %__spec_install_post in next post.

As for the "systemd-units", it appears to be required by koji. I got some failures like below once removed it. So I'll leave it for now (please correct me if I am incorrect).
https://koji.fedoraproject.org/koji/taskinfo?taskID=43163313

Comment 33 limings 2020-04-09 13:20:33 UTC
Thanks all for the comments! 
Below are the updated info (2.0.2) trying to solve the comments. I kept the "BuildRequires: systemd-units" in order to pass koji build.

Spec URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.2/rshim.spec
SRPM URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.2/rshim-2.0.2-1.fc31.src.rpm
koji build URL: https://koji.fedoraproject.org/koji/taskinfo?taskID=43163450

Comment 34 Honggang LI 2020-04-10 12:19:48 UTC
Hi, Alaa

 I tried to test rshim user space driver with machine qualcomm-amberwing-rep2-01 .
But I can't get the /dev/rshim* file. Could you please have a look?

thanks

Comment 35 Alaa Hleihel (NVIDIA Mellanox) 2020-04-12 06:49:47 UTC
(In reply to Honggang LI from comment #34)
> Hi, Alaa
> 
>  I tried to test rshim user space driver with machine
> qualcomm-amberwing-rep2-01 .
> But I can't get the /dev/rshim* file. Could you please have a look?
> 
> thanks

Hi Honggang LI,

Sure, I will log in to the system and check why it's not working.

Comment 36 Alaa Hleihel (NVIDIA Mellanox) 2020-04-12 10:58:36 UTC
Hi,

I logged in to the system and found the following issues:

################################################################

1. rshim service start fails:

Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: Probing pcie-01:00.2
Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: create rshim pcie-01:00.2
Apr 12 02:39:06 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[4799]: Failed to map RShim registers

[root@qualcomm-amberwing-rep2-01 ~]# rshim -f
modprobe: FATAL: Module cuse not found in directory /lib/modules/4.18.0-147.el8.aarch64
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Probing pcie-01:00.2
create rshim pcie-01:00.2
Failed to map RShim registers


The reason that a required module is not installed on the system:
[root@qualcomm-amberwing-rep2-01 ~]# modinfo cuse
modinfo: ERROR: Module cuse not found.


The fix is: 
# dnf install -y kernel-modules-extra

Then the module will be available:
[root@qualcomm-amberwing-rep2-01 ~]# modinfo cuse
filename:       /lib/modules/4.18.0-147.el8.aarch64/kernel/fs/fuse/cuse.ko.xz


################################################################

2. rshim service stop fails:
Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Stopping rshim driver for BlueField SoC...
Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[4383]: rshim.service: Failed to execute command: No such file or directory
Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[4383]: rshim.service: Failed at step EXEC spawning /usr/bin/killall: No such file or directory
                                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 12 02:35:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Control process exited, code=exited status=203
Apr 12 02:36:55 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4384]: Connection closed by 10.35.206.44 port 60160 [preauth]
Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4386]: Accepted password for root from 10.35.206.44 port 60162 ssh2
Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd-logind[1469]: New session 5 of user root.
Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Started Session 5 of user root.
Apr 12 02:36:59 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com sshd[4386]: pam_unix(sshd:session): session opened for user root by (uid=0)
Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: State 'stop-sigterm' timed out. Killing.
Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Killing process 4363 (rshim) with signal SIGKILL.
Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: rshim.service: Failed with result 'exit-code'.
Apr 12 02:37:27 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Stopped rshim driver for BlueField SoC.


The fix is: 
# dnf install -y psmisc

################################################################

3. Even after fixing the above, we still fail to load everything:

[root@qualcomm-amberwing-rep2-01 ~]# rshim  -f
Probing pcie-01:00.2
create rshim pcie-01:00.2
Failed to map RShim registers


From strace on "rshim -f":

write(1, "Probing pcie-01:00.2\n", 21Probing pcie-01:00.2
)  = 21
write(1, "create rshim pcie-01:00.2\n", 26create rshim pcie-01:00.2
) = 26
openat(AT_FDCWD, "/dev/mem", O_RDWR|O_SYNC) = -1 ENOENT (No such file or directory)
                 ^^^^^^^^^^                   ^^^^^^^^^^^^^^
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, -1, 0x80100300000) = -1 EBADF (Bad file descriptor)
write(1, "Failed to map RShim registers\n", 30Failed to map RShim registers
) = 30

That's because CONFIG_DEVMEM is not enabled in the kernel:

[root@qualcomm-amberwing-rep2-01 ~]# grep CONFIG_DEVMEM /boot/config-4.18.0-147.el8.aarch64 
# CONFIG_DEVMEM is not set

--> Note; I see that this config is disabled only on aarch64 in RHEL-8.
I created a kernel with this config enabled, and then it worked.

[root@qualcomm-amberwing-rep2-01 ~]# ls -l /dev/mem
crw-r-----. 1 root kmem 1, 1 Apr 12  2020 /dev/mem
[root@qualcomm-amberwing-rep2-01 ~]# systemctl start rshim
[root@qualcomm-amberwing-rep2-01 ~]# systemctl status rshim
● rshim.service - rshim driver for BlueField SoC
   Loaded: loaded (/usr/lib/systemd/system/rshim.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-04-12 05:36:57 EDT; 4s ago
     Docs: man:rshim(8)
  Process: 5783 ExecStart=/usr/sbin/rshim $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 5784 (rshim)
    Tasks: 6 (limit: 37682)
   Memory: 32.5M
   CGroup: /system.slice/rshim.service
           └─5784 /usr/sbin/rshim

Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Starting rshim driver for BlueField SoC...
Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com systemd[1]: Started rshim driver for BlueField SoC.
Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: Probing pcie-01:00.2
Apr 12 05:36:57 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: create rshim pcie-01:00.2
Apr 12 05:36:58 qualcomm-amberwing-rep2-01.khw4.lab.eng.bos.redhat.com rshim[5784]: rshim0 attached
[root@qualcomm-amberwing-rep2-01 ~]# ls -l /dev/rshim*
total 0
crw-------. 1 root root 241, 0 Apr 12 05:36 boot
crw-------. 1 root root 240, 0 Apr 12 05:36 console
crw-------. 1 root root 239, 0 Apr 12 05:36 misc
crw-------. 1 root root 242, 0 Apr 12 05:36 rshim
[root@qualcomm-amberwing-rep2-01 ~]# 


################################################################

4. Even after fixing all previous issues, accessing the device always hangs.
E.g either of these will hang:
# cat  /dev/rshim0/misc
# sudo minicom --color on --baudrate 115200 --device /dev/rshim0/console

And dmesg will show something like this:

Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: INFO: task cat:6591 blocked for more than 60 seconds.
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel:      Not tainted 4.18.0 #1
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: cat             D    0  6591   6316 0x00000201
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: Call trace:
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __switch_to+0x6c/0x90
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __schedule+0x270/0x8a8
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: schedule+0x30/0x78
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: request_wait_answer+0x144/0x260 [fuse]
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __fuse_request_send+0xac/0xd0 [fuse]
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: fuse_request_send+0x58/0x68 [fuse]
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: fuse_direct_io+0x358/0x5a0 [fuse]
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: cuse_read_iter+0x78/0xa0 [cuse]
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: new_sync_read+0x108/0x158
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __vfs_read+0x74/0x90
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: vfs_read+0x98/0x150
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: ksys_read+0x6c/0xd0
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: __arm64_sys_read+0x24/0x30
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: el0_svc_handler+0xa0/0x128
Apr 12 05:46:41 qualcomm-amberwing-rep2-01 kernel: el0_svc+0x8/0xc



Current BlueField version used was:

Mellanox BlueField A0 BL1 V1.0
NOTICE:  BL2: v1.5(release):BL2.2
NOTICE:  BL2: Built : 15:58:07, Jul 25 2019
NOTICE:  BL2 built for hw (ver 0)
NOTICE:  Running as MBF1M332A-AS system
NOTICE:  Initializing DDR at mss[0]=0x18000000
NOTICE:  No SPD detected on MSS0 DIMM0
NOTICE:  No SPD detected on MSS0 DIMM1
NOTICE:  No memory present on MSS 0
NOTICE:  Doing MSS idle operations on MSS 0
NOTICE:  Initializing DDR at mss[1]=0x20000000
NOTICE:  No SPD detected on MSS1 DIMM0
NOTICE:  No SPD detected on MSS1 DIMM1
NOTICE:  Finished initializing DDR on MSS 1!
NOTICE:  DDR POST passed.
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v1.5(release):BL2.2
NOTICE:  BL31: Built : 15:58:07, Jul 25 2019
NOTICE:  BL31 built for hw (ver 0)
UEFI firmware (version BlueField:2.0-f399628 built at 15:59:48 on Jul 25 2019)


I've updated it to the latest BlueField-2.5.1.11213 (using kernel module rshim version rshim-1.18-0.gb99e894_4.18.0.aarch64 from BlueField-2.5.1.11213 bundle):

Mellanox BlueField A0 BL1 V1.0
NOTICE:  Enabled watchdog (120 sec delay)
NOTICE:  Next boot will be in swap_emmc mode
NOTICE:  BL2: v1.5(release):2.5.1-0-gbe0dd6b
NOTICE:  BL2: Built : 23:42:29, Apr  2 2020
NOTICE:  BL2 built for hw (ver 0)
NOTICE:  Running as MBF1M332A-AS system
NOTICE:  Initializing DDR at mss[0]=0x18000000
NOTICE:  No SPD detected on MSS0 DIMM0
NOTICE:  No SPD detected on MSS0 DIMM1
NOTICE:  No memory present on MSS 0
NOTICE:  Doing MSS idle operations on MSS 0
NOTICE:  Initializing DDR at mss[1]=0x20000000
NOTICE:  No SPD detected on MSS1 DIMM0
NOTICE:  No SPD detected on MSS1 DIMM1
NOTICE:  Finished initializing DDR on MSS 1!
NOTICE:  DDR POST passed.
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v1.5(release):2.5.1-0-gbe0dd6b
NOTICE:  BL31: Built : 23:42:29, Apr  2 2020
NOTICE:  BL31 built for hw (ver 0)
UEFI firmware (version BlueField:2.5.1-0-ga9be8ec built at 23:43:44 on Apr  2 2020)


But it still hangs, will continue checking...

Comment 37 Alaa Hleihel (NVIDIA Mellanox) 2020-04-12 11:00:41 UTC
So to recap my previous long comment:

1. I think the rshim RPM package should Require the package which will provide the 'cuse' kernel module, for RHEL-8 that's "kernel-modules-extra".
What about Fedora?

2. The rshim RPM package should Require "psmisc" package since it uses 'killall' utility.

3. Red Hat needs to enable CONFIG_DEVMEM on aarch64 builds as well (it's the only arch that has this config disabled).

4. Liming, are you familiar with such an issue where accessing the device hangs (this does not happen when using the kernel module rshim)?

Comment 38 Alaa Hleihel (NVIDIA Mellanox) 2020-04-12 11:21:09 UTC
For issue #4, I see that there are upstream fixes in the "fuse" driver that fixes a deadlock issue.
Those fixes were already backported to RHEL-8.2, will update my kernel and retest.

Comment 39 Honggang LI 2020-04-12 11:43:56 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #37)
> So to recap my previous long comment:
> 
> 1. I think the rshim RPM package should Require the package which will
> provide the 'cuse' kernel module, for RHEL-8 that's "kernel-modules-extra".
> What about Fedora?

Same as RHEL-8.

localhost ~]$ find  /lib/modules/ -name '*cuse*'
/lib/modules/5.7.0-0.rc0.git8.1.fc33.x86_64/extra/fs/fuse/cuse.ko.xz

localhost ~]$ rpm -qf /lib/modules/5.7.0-0.rc0.git8.1.fc33.x86_64/extra/fs/fuse/cuse.ko.xz
kernel-modules-extra-5.7.0-0.rc0.git8.1.fc33.x86_64

Comment 40 Alaa Hleihel (NVIDIA Mellanox) 2020-04-12 12:09:03 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #38)
> For issue #4, I see that there are upstream fixes in the "fuse" driver that
> fixes a deadlock issue.
> Those fixes were already backported to RHEL-8.2, will update my kernel and
> retest.

didn't help..
with kernel-debug, from the  first terminal I ran "# cat /dev/rshim0/misc", and at the same time from another terminal I ran "# sudo minicom --color on --baudrate 115200 --device /dev/rshim0/console",
then got:
(I lowered hung_task_timeout_secs threshold to 10 seconds):

[  812.368606] INFO: task cat:5758 blocked for more than 10 seconds.
[  812.373943]       Not tainted 4.18.0-193.el8.aarch64+debug #1
[  812.379721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  812.387436] cat             D26640  5758   5687 0x00000209
[  812.392996] Call trace:
[  812.395347]  __switch_to+0x1a0/0x258
[  812.398983]  __schedule+0x918/0x2120
[  812.402455]  schedule+0xf4/0x3a8
[  812.405722]  request_wait_answer+0x29c/0x530 [fuse]
[  812.410669]  fuse_simple_request+0x408/0x970 [fuse]
[  812.415440]  fuse_direct_io+0xd88/0x18c8 [fuse]
[  812.419997]  cuse_read_iter+0xdc/0x110 [cuse]
[  812.424249]  new_sync_read+0x358/0x4b0
[  812.427976]  __vfs_read+0xc4/0xf8
[  812.431358]  vfs_read+0xe0/0x290
[  812.434486]  ksys_read+0xcc/0x178
[  812.437786]  __arm64_sys_read+0x70/0xa0
[  812.441704]  el0_svc_handler+0x160/0x388
[  812.445511]  el0_svc+0x8/0xc
[  812.448459] 
               Showing all locks held in the system:
[  812.454560] 1 lock held by khungtaskd/292:
[  812.458709]  #0: ffff2000139abda0 (rcu_read_lock){....}, at: debug_show_all_locks+0xd8/0x358
[  812.467068] 2 locks held by agetty/1645:
[  812.471036]  #0: ffff80143f62aa90 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x48/0x58
[  812.479096]  #1: ffff20002e6722e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x1ac/0x13f8

[  812.489254] =============================================

Comment 41 limings 2020-04-12 13:22:43 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #37)
> So to recap my previous long comment:
> 
> 1. I think the rshim RPM package should Require the package which will
> provide the 'cuse' kernel module, for RHEL-8 that's "kernel-modules-extra".
> What about Fedora?
> 
> 2. The rshim RPM package should Require "psmisc" package since it uses
> 'killall' utility.
> 
> 3. Red Hat needs to enable CONFIG_DEVMEM on aarch64 builds as well (it's the
> only arch that has this config disabled).
> 
> 4. Liming, are you familiar with such an issue where accessing the device
> hangs (this does not happen when using the kernel module rshim)?

Thanks Alaa! So far we have verified on CentOS-7, RedHat-7 and ubuntu-18 on x86 machine.
For #1 & #2. Yes, I'll add the dependency. This module is part of the kernel package in CentOS/RedHat 7.
Looks like it has separate package in Fedora 31 and Redhat-8.

Fro #4, the hung issue is new to me. I'll install centos 8 today and try to reproduce it. Thanks!

Comment 42 limings 2020-04-12 16:39:01 UTC
Alaa, 

Looks like this server is running kernel-modules-extra-5.7.0. What OS is it? Is it Fedora-33? Is there a way I could download it and verify it?
I just tried this driver on Fedora 31 and CentOS 8.1. All worked well for me.

Comment 43 limings 2020-04-12 17:49:49 UTC
Alaa, I downloaded Fedora-33 (Rawhide) x86_64 version and tried it like below. It worked well for me. Below is what I captured. The difference is that it's the x86 version. (I don't have setup to try f33 aarch64 version yet).

On the server where the problem exists, have we tried whether the kernel module based driver works or not? If not working, the SmartNIC FW might be stuck. We might need power-cycle to recover. Thanks!

========================================

# uname -r
5.7.0-0.rc0.git8.1.fc33.x86_64

# cat /etc/redhat-release
Fedora release 33 (Rawhide)

# cat /dev/rshim0/misc
DISPLAY_LEVEL   0 (0:basic, 1:advanced, 2:log)
BOOT_MODE       1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
BOOT_TIMEOUT    100 (seconds)
SW_RESET        0 (1: reset)
DEV_NAME        pcie-00:08.0

========================================

Comment 44 Alaa Hleihel (NVIDIA Mellanox) 2020-04-13 06:17:57 UTC
Hi Liming,

No, I am not using Fedora, I am using RHEL-8.1 (kernel-modules-extra-4.18.0-147.el8.aarch64).

> kernel-modules-extra-5.7.0-0.rc0.git8.1.fc33.x86_64

This output was from Honggang, he just was checking the location of the cuse driver.



> On the server where the problem exists, have we tried whether the kernel module based driver works or not? If not working, the SmartNIC FW might be stuck. We might need power-cycle to recover. Thanks!

Yes, I did, and the kernel rshim driver worked fine on the same system.
Reboot and power-cycle didn't help when using the user-space rshim.

Note that I have an x86_64 system at Mellanox lab with RHEL-8.1 and everything worked well.
For now, I see that the issue happens only on this aarch64 system (I don't have another arm system to try).

Also, I've tried building upstream kernel v5.6 using the RHEL-8.1 .config file (+ enabling CONFIG_DEVMEM) and I got the same issue.

Comment 45 Alaa Hleihel (NVIDIA Mellanox) 2020-04-14 15:24:05 UTC
Update: We found the issue, the device was not enabled, so we couldn't read from it...

[root@qualcomm-amberwing-rep2-01 pcie_read_test]# lspci -vvvv -d :c2d2
0000:01:00.2 DMA controller: Mellanox Technologies MT416842 BlueField SoC management interfac (prog-if 00 [8237])
Subsystem: Mellanox Technologies Device 0082
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 
Region 0: Memory at 80100300000 (64-bit, non-prefetchable) [disabled] [size=1M]
                                                          ^^^^^^^^^^^^
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
...
...



Liming will provide a new version soon with the needed fixes.

Comment 47 Michal Schmidt 2020-04-20 11:35:20 UTC
(In reply to lsun from comment #32)
> As for the "systemd-units", it appears to be required by koji. I got some
> failures like below once removed it. So I'll leave it for now (please
> correct me if I am incorrect).
> https://koji.fedoraproject.org/koji/taskinfo?taskID=43163313

OK, in that case leaving a BuildRequires there is acceptable.
Note that the "systemd-units" package was merged into the main "systemd" package in 2012 (before Fedora 18). Fedora packaging guidelines removed the last reference to "systemd-units" in 2016, keeping "BuildRequires: systemd" as the preferred way.

There is another option. You can remove the BR and instead tell your configure script to not autodetect the systemd units directory, using:
  %configure --with-systemdsystemunitdir=%{_unitdir}
The minor advantage of this would be a smaller buildroot.

Use whatever option you prefer there.

(In reply to lsun from comment #46)
> Spec URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.3/rshim.spec

> Requires: psmisc

Why does the rshim.service use killall in the first place? There is:
> KillMode=process

Why this mode? Do you need child processes to be left running in the cgroup after the service is stopped?
If yes, commenting on it in the unit file would be nice.

> ExecStop=/usr/bin/killall -SIGKILL rshim

Referencing processes to kill by name is not good. It would kill unrelated processes with the same name.
Is none of systemd's kill modes suitable for stopping the service without additional help?
And why SIGKILL? Does it not stop on SIGTERM?

> Requires: kernel-modules-extra

This is always going to be imperfect, because nothing guarantees that the installed package corresponds to the actually running kernel (different versions, variants like -debug, custom unpackaged kernels, ...).
There is precedent for depending on kernel-modules-extra in Fedora packages (usbip, xl2tpd, ...) though.
I would just make it more explicit which module the package needs by instead using this:

Requires: kmod(cuse.ko)
# Hint for dnf to prefer kernel-modules-extra over kernel-debug-modules-extra:
Suggests: kernel-modules-extra


BTW, is it necessary to call 'system("modprobe cuse");' in src/rshim.c? I would expect the module to get autoloaded during the call to cuse_lowlevel_setup.

Comment 48 Alaa Hleihel (NVIDIA Mellanox) 2020-04-20 12:07:58 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #37)

> 3. Red Hat needs to enable CONFIG_DEVMEM on aarch64 builds as well (it's the
> only arch that has this config disabled).
> 

Update: enabling CONFIG_DEVMEM is no longer required.
rshim-2.0.3 uses the device resources directly instead of /dev/mem

Comment 49 limings 2020-04-22 12:05:23 UTC
Thanks Michal for the comments. Please see my response below.

(In reply to Michal Schmidt from comment #47)
> (In reply to lsun from comment #32)
> > As for the "systemd-units", it appears to be required by koji. I got some
> > failures like below once removed it. So I'll leave it for now (please
> > correct me if I am incorrect).
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=43163313
> 
> OK, in that case leaving a BuildRequires there is acceptable.
> Note that the "systemd-units" package was merged into the main "systemd"
> package in 2012 (before Fedora 18). Fedora packaging guidelines removed the
> last reference to "systemd-units" in 2016, keeping "BuildRequires: systemd"
> as the preferred way.
> 
> There is another option. You can remove the BR and instead tell your
> configure script to not autodetect the systemd units directory, using:
>   %configure --with-systemdsystemunitdir=%{_unitdir}
> The minor advantage of this would be a smaller buildroot.
> 
> Use whatever option you prefer there.

Will update in next post and use "BuildRequires: systemd" as suggested.


> 
> (In reply to lsun from comment #46)
> > Spec URL: https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.3/rshim.spec
> 
> > Requires: psmisc
> 
> Why does the rshim.service use killall in the first place? There is:
> > KillMode=process
> 
> Why this mode? Do you need child processes to be left running in the cgroup
> after the service is stopped?
> If yes, commenting on it in the unit file would be nice.

It's a copy/paste error. I would like to stop all processes when service stops.

> 
> > ExecStop=/usr/bin/killall -SIGKILL rshim
> 
> Referencing processes to kill by name is not good. It would kill unrelated
> processes with the same name.
> Is none of systemd's kill modes suitable for stopping the service without
> additional help?
> And why SIGKILL? Does it not stop on SIGTERM?

Will update in next posted version with the following:
- Remove the "Requires: psmisc" 
- Remove "ExecStop".
- Use "KillMode=control-group"
- Support SIGTERM

[Service]
Restart=always
Type=forking
ExecStart=-/usr/sbin/rshim $OPTIONS
KillMode=control-group


> 
> > Requires: kernel-modules-extra
> 
> This is always going to be imperfect, because nothing guarantees that the
> installed package corresponds to the actually running kernel (different
> versions, variants like -debug, custom unpackaged kernels, ...).
> There is precedent for depending on kernel-modules-extra in Fedora packages
> (usbip, xl2tpd, ...) though.
> I would just make it more explicit which module the package needs by instead
> using this:
> 
> Requires: kmod(cuse.ko)
> # Hint for dnf to prefer kernel-modules-extra over
> kernel-debug-modules-extra:
> Suggests: kernel-modules-extra

Will update in next posted version as suggested:

Requires: kmod(cuse.ko)
Suggests: kernel-modules-extra


> 
> 
> BTW, is it necessary to call 'system("modprobe cuse");' in src/rshim.c? I
> would expect the module to get autoloaded during the call to
> cuse_lowlevel_setup.

It seems not loading 'cuse' automatically. 
Below is what I got on CentOS-7 when calling cuse_lowlevel_setup() with cuse not loaded.
"cuse: device not found, try 'modprobe cuse' first".

Comment 51 Alaa Hleihel (NVIDIA Mellanox) 2020-04-23 06:05:27 UTC
Thanks a lot, Liming.
I tested rshim-2.0.4-1 and I see that everything is functioning well now using the default RHEL-8.1 kernel 4.18.0-147.el8.aarch64.

Hi Honggang,
You can take the setup back and continue your testing/review.


Regards,
Alaa


# ls  /dev/rshim0/
boot  console  misc  rshim

# cat /dev/rshim0/misc 
DISPLAY_LEVEL   0 (0:basic, 1:advanced, 2:log)
BOOT_MODE       1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
BOOT_TIMEOUT    100 (seconds)
SW_RESET        0 (1: reset)
DEV_NAME        pcie-01:00.2


Access to the SoC console works well, I used this command:
# sudo minicom --color on --baudrate 115200 --device /dev/rshim0/console


File transfer using the tmfifo interface is also working well:
# ip -s addr show tmfifo_net0
123: tmfifo_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 00:1a:ca:ff:ff:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global tmfifo_net0
       valid_lft forever preferred_lft forever
    inet6 fe80::21a:caff:feff:ff02/64 scope link 
       valid_lft forever preferred_lft forever
    RX: bytes  packets  errors  dropped overrun mcast   
    2362587    32877    0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    615240797  406494   0       56      0       0

Comment 52 Honggang LI 2020-04-29 02:58:38 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #51)

> Hi Honggang,
> You can take the setup back and continue your testing/review.

I confirmed the network traffic between the aarch64 machine and SmartNIC works via ssh.
The rshim user space driver works for me.

Thanks

Comment 53 Honggang LI 2020-04-29 06:35:30 UTC
(In reply to lsun from comment #49)

> > Why does the rshim.service use killall in the first place? There is:
> > > KillMode=process
> > 
> > Why this mode? Do you need child processes to be left running in the cgroup
> > after the service is stopped?
> > If yes, commenting on it in the unit file would be nice.
> 
> It's a copy/paste error. I would like to stop all processes when service
> stops.
> 
> > 
> > > ExecStop=/usr/bin/killall -SIGKILL rshim
> > 
> > Referencing processes to kill by name is not good. It would kill unrelated
> > processes with the same name.
> > Is none of systemd's kill modes suitable for stopping the service without
> > additional help?
> > And why SIGKILL? Does it not stop on SIGTERM?
> 
> Will update in next posted version with the following:
> - Remove the "Requires: psmisc" 
> - Remove "ExecStop".
> - Use "KillMode=control-group"
> - Support SIGTERM
> 
> [Service]
> Restart=always
> Type=forking
> ExecStart=-/usr/sbin/rshim $OPTIONS
> KillMode=control-group

Confirmed rshim.service start/stop works with the new systemd service file.

Comment 54 Honggang LI 2020-04-29 06:47:04 UTC
I had run basic functional tests for rshim. Confirmed it works as expected and the
output of fedora-review tool looks good for me.

Set the 'fedora-review+' flag. Michal, please add lsun into packager group.

=====================================================================================

Package Review
==============

Legend:
[x] = Pass, [!] = Fail, [-] = Not applicable, [?] = Not evaluated
[ ] = Manual review needed



===== MUST items =====

C/C++:
[ ]: Package does not contain kernel modules.
PASS

[ ]: Package contains no static executables.
PASS

[x]: If your application is a C or C++ application you must list a
     BuildRequires against gcc, gcc-c++ or clang.
[x]: Header files in -devel subpackage, if present.
[x]: Package does not contain any libtool archives (.la)
[x]: Rpath absent or only used for internal libs.

Generic:
[ ]: Package is licensed with an open-source compatible license and meets
     other legal requirements as defined in the legal section of Packaging
     Guidelines.
PASS

[ ]: License field in the package spec file matches the actual license.
     Note: Checking patched sources after %prep for licenses. Licenses
     found: "*No copyright* GPL (v2)", "Unknown or generated", "GPL (v2)".
     23 files have unknown license. Detailed output of licensecheck in
     /home/honli/review/1814682-rshim/licensecheck.txt
PASS. It is a review tool issue, as those files have "SPDX-License-Identifier: GPL-2.0-only" tag.

[ ]: License file installed when any subpackage combination is installed.
PASS.

[ ]: Package must own all directories that it creates.
     Note: Directories without known owners: /usr/lib/systemd,
     /usr/lib/systemd/system
PASS

[ ]: %build honors applicable compiler flags or justifies otherwise.
PASS

[ ]: Package contains no bundled libraries without FPC exception.
PASS

[ ]: Changelog in prescribed format.
PASS

[ ]: Sources contain only permissible code or content.
PASS

[ ]: Package contains desktop file if it is a GUI application.
PASS. It is not a GUI application.

[ ]: Development files must be in a -devel package
PASS. no devel

[ ]: Package uses nothing in %doc for runtime.
PASS

[ ]: Package consistently uses macros (instead of hard-coded directory
     names).
PASS

[ ]: Package is named according to the Package Naming Guidelines.
PASS

[ ]: Package does not generate any conflict.
PASS

[ ]: Package obeys FHS, except libexecdir and /usr/target.
PASS

[ ]: If the package is a rename of another package, proper Obsoletes and
     Provides are present.
PASS

[ ]: Requires correct, justified where necessary.
PASS

[ ]: Spec file is legible and written in American English.
PASS

[ ]: Package contains systemd file(s) if in need.
PASS. start/stop rshim service confirmed the systemd service file works.

[ ]: Useful -debuginfo package or justification otherwise.
PASS

[ ]: Package is not known to require an ExcludeArch tag.
PASS

[ ]: Large documentation must go in a -doc subpackage. Large could be size
     (~1MB) or number of files.
     Note: Documentation size is 10240 bytes in 1 files.
PASS

[ ]: Package complies to the Packaging Guidelines
PASS

[x]: Package successfully compiles and builds into binary rpms on at least
     one supported primary architecture.
[x]: Package installs properly.
[x]: Rpmlint is run on all rpms the build produces.
     Note: No rpmlint messages.
[x]: If (and only if) the source package includes the text of the
     license(s) in its own file, then that file, containing the text of the
     license(s) for the package is included in %license.
[x]: Package requires other packages for directories it uses.
[x]: Package does not own files or directories owned by other packages.
[x]: Package uses either %{buildroot} or $RPM_BUILD_ROOT
[x]: Package does not run rm -rf %{buildroot} (or $RPM_BUILD_ROOT) at the
     beginning of %install.
[x]: Macros in Summary, %description expandable at SRPM build time.
[x]: Dist tag is present.
[x]: Package does not contain duplicates in %files.
[x]: Permissions on files are set properly.
[x]: Package must not depend on deprecated() packages.
[x]: Package use %makeinstall only when make install DESTDIR=... doesn't
     work.
[x]: Package is named using only allowed ASCII characters.
[x]: Package does not use a name that already exists.
[x]: Package is not relocatable.
[x]: Sources used to build the package match the upstream source, as
     provided in the spec URL.
[x]: Spec file name must match the spec package %{name}, in the format
     %{name}.spec.
[x]: systemd_post is invoked in %post, systemd_preun in %preun, and
     systemd_postun in %postun for Systemd service files.
     Note: Systemd service file(s) in rshim
[x]: File names are valid UTF-8.
[x]: Packages must not store files under /srv, /opt or /usr/local

===== SHOULD items =====

Generic:
[ ]: If the source package does not include license text(s) as a separate
     file from upstream, the packager SHOULD query upstream to include it.
PASS

[ ]: Final provides and requires are sane (see attachments).
PASS

[ ]: Package functions as described.
PASS. I had run functional tests. It works.

[ ]: Latest version is packaged.
PASS

[ ]: Package does not include license text files separate from upstream.
PASS

[ ]: Sources are verified with gpgverify first in %prep if upstream
     publishes signatures.
     Note: gpgverify is not used.
PASS. Upstream does not use gpgverify for rshim.

[ ]: Description and summary sections in the package spec file contains
     translations for supported Non-English languages, if available.
PASS

[ ]: Package should compile and build into binary rpms on all supported
     architectures.
PASS

[ ]: %check is present and all tests pass.
PASS, no %check

[ ]: Packages should try to preserve timestamps of original installed
     files.
PASS

[x]: Reviewer should test that the package builds in mock.
[x]: Buildroot is not present
[x]: Package has no %clean section with rm -rf %{buildroot} (or
     $RPM_BUILD_ROOT)
[x]: No file requires outside of /etc, /bin, /sbin, /usr/bin, /usr/sbin.
[x]: Fully versioned dependency in subpackages if applicable.
[x]: Packager, Vendor, PreReq, Copyright tags should not be in spec file
[x]: Sources can be downloaded from URI in Source: tag
[x]: SourceX is a working URL.
[x]: Spec use %global instead of %define unless justified.

===== EXTRA items =====

Generic:
[x]: Rpmlint is run on debuginfo package(s).
     Note: No rpmlint messages.
[x]: Rpmlint is run on all installed packages.
     Note: There are rpmlint messages (see attachment).
[x]: Large data in /usr/share should live in a noarch subpackage if package
     is arched.
[x]: Package should not use obsolete m4 macros
[x]: Spec file according to URL is the same as in SRPM.


Rpmlint
-------
Checking: rshim-2.0.4-1.fc33.x86_64.rpm
          rshim-debuginfo-2.0.4-1.fc33.x86_64.rpm
          rshim-debugsource-2.0.4-1.fc33.x86_64.rpm
          rshim-2.0.4-1.fc33.src.rpm
4 packages and 0 specfiles checked; 0 errors, 0 warnings.




Rpmlint (debuginfo)
-------------------
Checking: rshim-debuginfo-2.0.4-1.fc33.x86_64.rpm
1 packages and 0 specfiles checked; 0 errors, 0 warnings.





Rpmlint (installed packages)
----------------------------
rshim-debuginfo.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
rshim.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
rshim-debugsource.x86_64: W: invalid-url URL: https://github.com/mellanox/rshim-user-space <urlopen error [Errno -2] Name or service not known>
3 packages and 0 specfiles checked; 0 errors, 3 warnings.
PASS. It is a fedora-review tool issue. Those URLs are valid.



Source checksums
----------------
https://github.com/Mellanox/rshim-user-space/releases/download/rshim-2.0.4/rshim-2.0.4.tar.gz :
  CHECKSUM(SHA256) this package     : 48d0e0467a3d13286c810d2c50751bfbb58cd8e328b82ebe8e08264add47b052
  CHECKSUM(SHA256) upstream package : 48d0e0467a3d13286c810d2c50751bfbb58cd8e328b82ebe8e08264add47b052


Requires
--------
rshim (rpmlib, GLIBC filtered):
    /bin/sh
    kmod(cuse.ko)
    libc.so.6()(64bit)
    libfuse.so.2()(64bit)
    libfuse.so.2(FUSE_2.4)(64bit)
    libfuse.so.2(FUSE_2.5)(64bit)
    libfuse.so.2(FUSE_2.8)(64bit)
    libpci.so.3()(64bit)
    libpci.so.3(LIBPCI_3.0)(64bit)
    libpci.so.3(LIBPCI_3.5)(64bit)
    libpthread.so.0()(64bit)
    libusb-1.0.so.0()(64bit)
    rtld(GNU_HASH)

rshim-debuginfo (rpmlib, GLIBC filtered):

rshim-debugsource (rpmlib, GLIBC filtered):



Provides
--------
rshim:
    rshim
    rshim(x86-64)

rshim-debuginfo:
    debuginfo(build-id)
    rshim-debuginfo
    rshim-debuginfo(x86-64)

rshim-debugsource:
    rshim-debugsource
    rshim-debugsource(x86-64)



Generated by fedora-review 0.7.5 (5fa5b7e) last change: 2020-02-16
Command line :/usr/bin/fedora-review -b 1814682
Buildroot used: fedora-rawhide-x86_64
Active plugins: Shell-api, Generic, C/C++
Disabled plugins: Perl, SugarActivity, R, PHP, Ocaml, Python, Haskell, Java, fonts, Ruby
Disabled flags: EPEL6, EPEL7, DISTTAG, BATCH, EXARCH

Comment 55 limings 2020-05-04 19:51:44 UTC
Michal, thanks for adding me packager group! 

I tried to do "fedpkg request-repo ...", but got rejected. It complained that this bugzilla hasn't been approved yet.

Do you need to approve this one first? Or any further comments or advice what the next step would be? 

Thanks!

Comment 56 Michal Schmidt 2020-05-05 13:04:44 UTC
The "fedora-review" flag is already set to "+" and it should be the only flag that matters for this purpose.
Please try again, double-check that the Bugzilla number is correct. 
If it still does not work, try it with verbose output and paste here the output:
fedpkg -v request-repo rshim 1814682

Comment 57 limings 2020-05-06 02:04:46 UTC
This was the issue that was rejected.
https://pagure.io/releng/fedora-scm-requests/issue/24679

It said "The review is not approved by the assignee of the Bugzilla bug".

I just submitted a new request: 
https://pagure.io/releng/fedora-scm-requests/issue/24793

Comment 58 limings 2020-05-06 02:05:51 UTC
[lsun@fedora ~]$ fedpkg -v request-repo rshim 1814682
https://pagure.io/releng/fedora-scm-requests/issue/24793

Comment 59 Michal Schmidt 2020-05-06 07:23:45 UTC
(In reply to lsun from comment #57)
> It said "The review is not approved by the assignee of the Bugzilla bug".

Oh, I see. The server side checks who set the fedora-review+ flag.

Comment 60 Gwyn Ciesla 2020-05-06 14:14:56 UTC
(fedscm-admin):  The Pagure repository was created at https://src.fedoraproject.org/rpms/rshim

Comment 61 limings 2020-05-06 14:59:32 UTC
Thanks a lot for the help!!!

The changes have been pushed. The 'fedpkg build' also seem passed.

https://src.fedoraproject.org/rpms/rshim

$ fedpkg import ~/rshim-2.0.4-1.fc31.src.rpm
$ git commit -m "Initial import (#1814682)."
$ git push
$ fedpkg build
https://koji.fedoraproject.org/koji/taskinfo?taskID=44157911

Comment 62 limings 2020-05-06 19:14:10 UTC
Michal / Honggang,

This is my first Fedora package. Any advice what the next steps would be?
Such as how to get it into fedora 33 or Redhat, how to make it visible to 'dnf' install, etc.

Thanks!

Comment 63 Michal Schmidt 2020-05-06 20:30:50 UTC
The package is successfully built and tagged "f33". It will be installable with dnf in Fedora Rawhide once a daily compose of Rawhide finishes and propagates to mirrors.

$ koji latest-pkg f33 rshim
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
rshim-2.0.4-1.fc33                        f33                   lsun


You may want to push rshim also as an update to Fedora 32, the current stable release.
You'd do that by creating the f32 branch in Fedora dist-git:
  fedpkg request-branch --repo rshim f32

I don't remember if the new branch would be created empty, or as a fork of master.
You can do a git merge or cherry-pick between branches.
Then you can do a build while having the f32 branch checked out: fedpkg build
Then you can publish your build as a package update for Fedora 32 using Bodhi (read https://fedoraproject.org/wiki/Bodhi).

Getting rshim into RHEL is out of scope for this Fedora review (unless you're interested in EPEL, https://fedoraproject.org/wiki/EPEL). We should continue in RHEL bug 1744737.

Comment 64 Honggang LI 2020-05-07 01:31:23 UTC
https://release-monitoring.org/project/98355/

Create this project for upstream release monitoring for rshim.


Note You need to log in before you can comment on or make changes to this bug.