Bug 2065886 - hostid/hostnqn not generated after install nvme-cli
Summary: hostid/hostnqn not generated after install nvme-cli
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nvme-cli
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Andy Lutomirski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-19 03:15 UTC by Zhang Yi
Modified: 2022-07-15 17:45 UTC (History)
4 users (show)

Fixed In Version: nvme-cli-2.0-1.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-07 04:22:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Zhang Yi 2022-03-19 03:15:26 UTC
Description of problem:
nvme-cli doesn't work during blktests

Version-Release number of selected component (if applicable):
libnvme-1.0~rc6-1.fc37.aarch64
nvme-cli-2.0~rc6-1.fc37.aarch64

How reproducible:
100%

Steps to Reproduce:
1. blktests ./check nvme/004
2.
3.

Actual results:


Expected results:


Additional info:


[root@ampere-mtsnow-altra-13 blktests]# ./check nvme/004
nvme/004 (test nvme and nvmet UUID NS descriptors)           [failed]
    runtime  0.417s  ...  0.473s
    --- tests/nvme/004.out	2022-03-18 22:44:09.814863720 -0400
    +++ /mnt/tests/gitlab.com/cki-project/kernel-tests/-/archive/main/kernel-tests-main.zip/storage/blktests/blk/blktests/results/nodev/nvme/004.out.bad	2022-03-18 23:15:06.493009866 -0400
    @@ -1,5 +1,5 @@
     Running nvme/004
    -91fdba0d-f87b-4c25-b80f-db7be1418b9e
    -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
    -NQN:blktests-subsystem-1 disconnected 1 controller(s)
    +cat: /sys/block/n1/uuid: No such file or directory
    +cat: /sys/block/n1/wwid: No such file or directory
    +NQN:blktests-subsystem-1 disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/004.out /mnt/tests/gitlab.com/cki-project/kernel-tests/-/archive/main/kernel-tests-main.zip/storage/blktests/blk/blktests/results/nodev/nvme/004.out.bad' to see the entire diff)
[root@ampere-mtsnow-altra-13 blktests]# dmesg
[ 1927.441042] run blktests nvme/004 at 2022-03-18 23:15:06
[ 1927.473378] loop0: detected capacity change from 0 to 2097152
[ 1927.484105] nvmet: adding nsid 1 to subsystem blktests-subsystem-1

Comment 1 Zhang Yi 2022-03-19 03:22:01 UTC
Seems it was due to the hostid/hostnqn file not created after install nvme-cli,
I installed one el9 nvme-cli which generated the hostid/hostnqn, and reinstalled nvme-cli-2.0~rc6-1.fc37.aarch64, it works now. 



[root@ampere-mtsnow-altra-13 blktests]# ls /etc/nvme/
discovery.conf  hostid  hostnqn
[root@ampere-mtsnow-altra-13 blktests]# ./check nvme/004
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  0.373s  ...  6.253s
[root@ampere-mtsnow-altra-13 blktests]# rpm -qa nvme-cli
nvme-cli-2.0~rc6-1.fc37.aarch64

Comment 2 Zhang Yi 2022-03-19 03:32:24 UTC
we need to update the nvme.spec file to generated the hostid/hostnqn during post phase.

Comment 3 Andy Lutomirski 2022-03-20 17:01:34 UTC
Are you sure this is a problem with nvme-cli and not with the tests?  nvme-cli is supposed to work correctly (and arguably better from a containerization / stateless perspective) without hostid and hostnqn.

Comment 4 Zhang Yi 2022-03-21 02:55:23 UTC
(In reply to Andy Lutomirski from comment #3)
> Are you sure this is a problem with nvme-cli and not with the tests? 
> nvme-cli is supposed to work correctly (and arguably better from a
> containerization / stateless perspective) without hostid and hostnqn.

Yes, I manually generated the hostnqn, and the test works. 

# ls /etc/nvme/
discovery.conf

# ./check nvme/004
nvme/004 (test nvme and nvmet UUID NS descriptors)           [failed]
    runtime  1.759s  ...  0.376s
    --- tests/nvme/004.out	2022-03-20 22:37:12.136706113 -0400
    +++ /root/blktests/results/nodev/nvme/004.out.bad	2022-03-20 22:47:36.995712572 -0400
    @@ -1,5 +1,5 @@
     Running nvme/004
    -91fdba0d-f87b-4c25-b80f-db7be1418b9e
    -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
    -NQN:blktests-subsystem-1 disconnected 1 controller(s)
    +cat: /sys/block/n1/uuid: No such file or directory
    +cat: /sys/block/n1/wwid: No such file or directory
    +NQN:blktests-subsystem-1 disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/004.out /root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff)

# nvme gen-hostnqn >/etc/nvme/hostnqn

# cat /etc/nvme/hostnqn 
nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4c10-8059-b5c04f4c4732

# ./check nvme/004
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  0.376s  ...  1.758s


And from the default nvme.spec.in, you can also see hostnqn/hostid generated during the post phase.
https://github.com/linux-nvme/nvme-cli/blob/master/nvme.spec.in

Comment 5 Andy Lutomirski 2022-03-23 19:43:19 UTC
I think you misunderstand me.  nvme-clie is not supposed to require a hostnqn file for proper functionality.  See, for example:

https://github.com/linux-nvme/nvme-cli/blob/555e9d750464dd3c09701eb35e40ad3f5f8eaf97/Documentation/nvme-show-hostnqn.html

(search for systemd)

So whatever problem you're experiencing may well be a bug, but the correct fix isn't to start creating /etc/nvme/hostnqn.

Comment 6 Tomáš Bžatek 2022-04-01 14:12:05 UTC
HostNQN is now (nvme-cli 2.x series) generated through libnvme (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) using stable machine DMI identifiers, when available. In case of a missing /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory.

So this file should not be needed for nvme-clie, otherwise it looks like a bug. What calls in blktests do fail exactly?
Also, I've just built libnvme-1.0~rc8-1 that brings some more fixes in this regard.

However the upcoming nvme-stas daemons will require /etc/nvme/hostnqn and /etc/nvme/hostid files to be present and will refuse to start otherwise. I think it should be a nvme-cli task to generate those files in %post, just like we do in RHEL. This is of course brings its own set of troubles, e.g. when generating an installer image and these issues are being currently fixed.

Comment 7 Zhang Yi 2022-04-02 08:39:36 UTC
(In reply to Tomáš Bžatek from comment #6)
> HostNQN is now (nvme-cli 2.x series) generated through libnvme
> (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967)
> using stable machine DMI identifiers, when available. In case of a missing
> /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory.

So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x?

I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn

> 
> So this file should not be needed for nvme-clie, otherwise it looks like a
> bug. What calls in blktests do fail exactly?

I will check it later.

> Also, I've just built libnvme-1.0~rc8-1 that brings some more fixes in this
> regard.

I tried latest fedora rawhide, the nvme command doesn't work now, I think we need fix this issue first.

# uname -r
5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64

# rpm -qa nvme-cli libnvme
libnvme-1.0~rc8-1.fc37.x86_64
nvme-cli-2.0~rc6-1.fc37.x86_64

# nvme
nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version LIBNVME_1_0

> 
> However the upcoming nvme-stas daemons will require /etc/nvme/hostnqn and
> /etc/nvme/hostid files to be present and will refuse to start otherwise. I
> think it should be a nvme-cli task to generate those files in %post, just
> like we do in RHEL. This is of course brings its own set of troubles, e.g.
> when generating an installer image and these issues are being currently
> fixed.

Comment 8 Zhang Yi 2022-04-02 08:44:01 UTC
Filed below issue to track it:
Bug 2071219 - nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version LIBNVME_1_0

Comment 9 Zhang Yi 2022-04-02 11:06:46 UTC
(In reply to Zhang Yi from comment #7)
> (In reply to Tomáš Bžatek from comment #6)
> > HostNQN is now (nvme-cli 2.x series) generated through libnvme
> > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967)
> > using stable machine DMI identifiers, when available. In case of a missing
> > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory.
> 
> So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x?
> 
> I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn
> 
> > 
> > So this file should not be needed for nvme-clie, otherwise it looks like a
> > bug. What calls in blktests do fail exactly?
> 
> I will check it later.

The blktests will execute below cmd to connect the target:
# nvme connect -t loop -n blktests-subsystem-1


With the nvme-cli 1.x version, seems the hostnqn is not necessary.

With 2.x, it will be failed w/o hostnqn, I've filed one issue to confirm it:

https://github.com/linux-nvme/nvme-cli/issues/1473

Comment 10 Zhang Yi 2022-04-02 12:06:37 UTC
(In reply to Zhang Yi from comment #9)
> (In reply to Zhang Yi from comment #7)
> > (In reply to Tomáš Bžatek from comment #6)
> > > HostNQN is now (nvme-cli 2.x series) generated through libnvme
> > > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967)
> > > using stable machine DMI identifiers, when available. In case of a missing
> > > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory.
> > 
> > So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x?
> > 
> > I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn
> > 
> > > 
> > > So this file should not be needed for nvme-clie, otherwise it looks like a
> > > bug. What calls in blktests do fail exactly?
> > 
> > I will check it later.
> 
> The blktests will execute below cmd to connect the target:
> # nvme connect -t loop -n blktests-subsystem-1
> 
> 
> With the nvme-cli 1.x version, seems the hostnqn is not necessary.

The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi()

https://github.com/linux-nvme/nvme-cli/blob/deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905

Comment 11 Andy Lutomirski 2022-04-02 19:34:30 UTC
(In reply to Zhang Yi from comment #10)
> (In reply to Zhang Yi from comment #9)
> > (In reply to Zhang Yi from comment #7)
> > > (In reply to Tomáš Bžatek from comment #6)
> > > > HostNQN is now (nvme-cli 2.x series) generated through libnvme
> > > > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967)
> > > > using stable machine DMI identifiers, when available. In case of a missing
> > > > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory.
> > > 
> > > So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x?
> > > 
> > > I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn
> > > 
> > > > 
> > > > So this file should not be needed for nvme-clie, otherwise it looks like a
> > > > bug. What calls in blktests do fail exactly?
> > > 
> > > I will check it later.
> > 
> > The blktests will execute below cmd to connect the target:
> > # nvme connect -t loop -n blktests-subsystem-1
> > 
> > 
> > With the nvme-cli 1.x version, seems the hostnqn is not necessary.
> 
> The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi()
> 
> https://github.com/linux-nvme/nvme-cli/blob/
> deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905

Do you mean nvme_generate_systemd(), four lines down?

https://github.com/linux-nvme/nvme-cli/blob/deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L909

Comment 12 Tomáš Bžatek 2022-04-04 14:17:30 UTC
(In reply to Zhang Yi from comment #7)
> I tried latest fedora rawhide, the nvme command doesn't work now, I think we
> need fix this issue first.
> 
> # uname -r
> 5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64
> 
> # rpm -qa nvme-cli libnvme
> libnvme-1.0~rc8-1.fc37.x86_64
> nvme-cli-2.0~rc6-1.fc37.x86_64
> 
> # nvme
> nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version
> LIBNVME_1_0

FYI, this is a separate issue, tracked in bug 2071219, caused by recent libnvme-1.0~rc8 update from last Friday, while the original issue in this bugreport was related to libnvme-1.0~rc6, hence a different one.

Comment 13 Tomáš Bžatek 2022-04-04 14:59:14 UTC
(In reply to Andy Lutomirski from comment #11)
> (In reply to Zhang Yi from comment #10)
> > The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi()
> > 
> > https://github.com/linux-nvme/nvme-cli/blob/
> > deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905
> 
> Do you mean nvme_generate_systemd(), four lines down?
> 
> https://github.com/linux-nvme/nvme-cli/blob/
> deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L909

This is way different in nvme-cli 2.x now. And yes, lot of confusion on various fronts.


(In reply to Zhang Yi from comment #9)
> With 2.x, it will be failed w/o hostnqn, I've filed one issue to confirm it:
> 
> https://github.com/linux-nvme/nvme-cli/issues/1473

I'll pick that change up for nvme-cli-2.0~rc8. While the libnvme codebase was okay with missing /etc/nvme/hostnqn and hostid files, the difference might have been in what commandline arguments were supplied to nvme-cli and how the fallback used to work.

FYI, don't expect full functional parity between the 1.x and 2.x series, there are differences in commandline arguments, return codes and the output (slightly different JSON structure in particular). And since it's a partly new codebase, we need to identify as much differences as possible and potentially fix related bugs.

Comment 14 Zhang Yi 2022-04-06 05:35:06 UTC
OK, thanks for the update.

Then the remaining question is will fedora update the nvme.spec to generate hostid/hostnqn during the POSY phase?

I'm fine if we won't generate them as the previous release did, but RHEL did.

Comment 15 Tomáš Bžatek 2022-04-11 13:59:05 UTC
(In reply to Zhang Yi from comment #14)
> Then the remaining question is will fedora update the nvme.spec to generate
> hostid/hostnqn during the POSY phase?

Yes, I added that in nvme-cli-2.0~rc8-1. Still I think your original issue is elsewhere. Please retest with nvme-cli-2.0-1.fc37 and report any differences. The 'blktests' scripts might need adaptations for the new nvme-cli 2.0 as well.

Comment 16 Fedora Update System 2022-04-11 14:01:30 UTC
FEDORA-2022-31819ca3b3 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-31819ca3b3

Comment 17 Fedora Update System 2022-04-11 14:57:59 UTC
FEDORA-2022-31819ca3b3 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-31819ca3b3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-31819ca3b3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2022-05-07 04:22:14 UTC
FEDORA-2022-31819ca3b3 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Maurizio Lombardi 2022-05-27 15:43:26 UTC
FYI, rpm-ostree team are complaining about this %post script in RHEL and would like to see it removed

https://bugzilla.redhat.com/show_bug.cgi?id=1900691#c4

They are suggesting to "add a check for /run/ostree-booted in your %post script to skip all uuid or fdqn generation
when running under rpm-ostree.
Then, you need to create a systemd unit that will be run on boot and will generate those files if they don't exists."

Comment 20 Andy Lutomirski 2022-05-28 00:56:14 UTC
Or remove it for all systems, not just ostree, and instead fix nvme-cli to be fully functional based on machine-id.

I'm not maintaining this any more, but if I was, I would not have made this change to the RPM.

Comment 21 Tomáš Bžatek 2022-07-15 17:45:45 UTC
See the discussion in bug 1900691. For the time being nvme-stas will take over generating /etc/nvme/hostnqn and /etc/nvme/hostid via the stas-config@.service. I've pushed nvme-cli-2.1~rc0-1.fc37 with the change.

There are obviously pros and cons for each approach. Things might change again once the current NVMe Technical Proposals in works are finalized and published.


Note You need to log in before you can comment on or make changes to this bug.