Description of problem: nvme-cli doesn't work during blktests Version-Release number of selected component (if applicable): libnvme-1.0~rc6-1.fc37.aarch64 nvme-cli-2.0~rc6-1.fc37.aarch64 How reproducible: 100% Steps to Reproduce: 1. blktests ./check nvme/004 2. 3. Actual results: Expected results: Additional info: [root@ampere-mtsnow-altra-13 blktests]# ./check nvme/004 nvme/004 (test nvme and nvmet UUID NS descriptors) [failed] runtime 0.417s ... 0.473s --- tests/nvme/004.out 2022-03-18 22:44:09.814863720 -0400 +++ /mnt/tests/gitlab.com/cki-project/kernel-tests/-/archive/main/kernel-tests-main.zip/storage/blktests/blk/blktests/results/nodev/nvme/004.out.bad 2022-03-18 23:15:06.493009866 -0400 @@ -1,5 +1,5 @@ Running nvme/004 -91fdba0d-f87b-4c25-b80f-db7be1418b9e -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e -NQN:blktests-subsystem-1 disconnected 1 controller(s) +cat: /sys/block/n1/uuid: No such file or directory +cat: /sys/block/n1/wwid: No such file or directory +NQN:blktests-subsystem-1 disconnected 0 controller(s) ... (Run 'diff -u tests/nvme/004.out /mnt/tests/gitlab.com/cki-project/kernel-tests/-/archive/main/kernel-tests-main.zip/storage/blktests/blk/blktests/results/nodev/nvme/004.out.bad' to see the entire diff) [root@ampere-mtsnow-altra-13 blktests]# dmesg [ 1927.441042] run blktests nvme/004 at 2022-03-18 23:15:06 [ 1927.473378] loop0: detected capacity change from 0 to 2097152 [ 1927.484105] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Seems it was due to the hostid/hostnqn file not created after install nvme-cli, I installed one el9 nvme-cli which generated the hostid/hostnqn, and reinstalled nvme-cli-2.0~rc6-1.fc37.aarch64, it works now. [root@ampere-mtsnow-altra-13 blktests]# ls /etc/nvme/ discovery.conf hostid hostnqn [root@ampere-mtsnow-altra-13 blktests]# ./check nvme/004 nvme/004 (test nvme and nvmet UUID NS descriptors) [passed] runtime 0.373s ... 6.253s [root@ampere-mtsnow-altra-13 blktests]# rpm -qa nvme-cli nvme-cli-2.0~rc6-1.fc37.aarch64
we need to update the nvme.spec file to generated the hostid/hostnqn during post phase.
Are you sure this is a problem with nvme-cli and not with the tests? nvme-cli is supposed to work correctly (and arguably better from a containerization / stateless perspective) without hostid and hostnqn.
(In reply to Andy Lutomirski from comment #3) > Are you sure this is a problem with nvme-cli and not with the tests? > nvme-cli is supposed to work correctly (and arguably better from a > containerization / stateless perspective) without hostid and hostnqn. Yes, I manually generated the hostnqn, and the test works. # ls /etc/nvme/ discovery.conf # ./check nvme/004 nvme/004 (test nvme and nvmet UUID NS descriptors) [failed] runtime 1.759s ... 0.376s --- tests/nvme/004.out 2022-03-20 22:37:12.136706113 -0400 +++ /root/blktests/results/nodev/nvme/004.out.bad 2022-03-20 22:47:36.995712572 -0400 @@ -1,5 +1,5 @@ Running nvme/004 -91fdba0d-f87b-4c25-b80f-db7be1418b9e -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e -NQN:blktests-subsystem-1 disconnected 1 controller(s) +cat: /sys/block/n1/uuid: No such file or directory +cat: /sys/block/n1/wwid: No such file or directory +NQN:blktests-subsystem-1 disconnected 0 controller(s) ... (Run 'diff -u tests/nvme/004.out /root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff) # nvme gen-hostnqn >/etc/nvme/hostnqn # cat /etc/nvme/hostnqn nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4c10-8059-b5c04f4c4732 # ./check nvme/004 nvme/004 (test nvme and nvmet UUID NS descriptors) [passed] runtime 0.376s ... 1.758s And from the default nvme.spec.in, you can also see hostnqn/hostid generated during the post phase. https://github.com/linux-nvme/nvme-cli/blob/master/nvme.spec.in
I think you misunderstand me. nvme-clie is not supposed to require a hostnqn file for proper functionality. See, for example: https://github.com/linux-nvme/nvme-cli/blob/555e9d750464dd3c09701eb35e40ad3f5f8eaf97/Documentation/nvme-show-hostnqn.html (search for systemd) So whatever problem you're experiencing may well be a bug, but the correct fix isn't to start creating /etc/nvme/hostnqn.
HostNQN is now (nvme-cli 2.x series) generated through libnvme (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) using stable machine DMI identifiers, when available. In case of a missing /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory. So this file should not be needed for nvme-clie, otherwise it looks like a bug. What calls in blktests do fail exactly? Also, I've just built libnvme-1.0~rc8-1 that brings some more fixes in this regard. However the upcoming nvme-stas daemons will require /etc/nvme/hostnqn and /etc/nvme/hostid files to be present and will refuse to start otherwise. I think it should be a nvme-cli task to generate those files in %post, just like we do in RHEL. This is of course brings its own set of troubles, e.g. when generating an installer image and these issues are being currently fixed.
(In reply to Tomáš Bžatek from comment #6) > HostNQN is now (nvme-cli 2.x series) generated through libnvme > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) > using stable machine DMI identifiers, when available. In case of a missing > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory. So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x? I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn > > So this file should not be needed for nvme-clie, otherwise it looks like a > bug. What calls in blktests do fail exactly? I will check it later. > Also, I've just built libnvme-1.0~rc8-1 that brings some more fixes in this > regard. I tried latest fedora rawhide, the nvme command doesn't work now, I think we need fix this issue first. # uname -r 5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64 # rpm -qa nvme-cli libnvme libnvme-1.0~rc8-1.fc37.x86_64 nvme-cli-2.0~rc6-1.fc37.x86_64 # nvme nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version LIBNVME_1_0 > > However the upcoming nvme-stas daemons will require /etc/nvme/hostnqn and > /etc/nvme/hostid files to be present and will refuse to start otherwise. I > think it should be a nvme-cli task to generate those files in %post, just > like we do in RHEL. This is of course brings its own set of troubles, e.g. > when generating an installer image and these issues are being currently > fixed.
Filed below issue to track it: Bug 2071219 - nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version LIBNVME_1_0
(In reply to Zhang Yi from comment #7) > (In reply to Tomáš Bžatek from comment #6) > > HostNQN is now (nvme-cli 2.x series) generated through libnvme > > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) > > using stable machine DMI identifiers, when available. In case of a missing > > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory. > > So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x? > > I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn > > > > > So this file should not be needed for nvme-clie, otherwise it looks like a > > bug. What calls in blktests do fail exactly? > > I will check it later. The blktests will execute below cmd to connect the target: # nvme connect -t loop -n blktests-subsystem-1 With the nvme-cli 1.x version, seems the hostnqn is not necessary. With 2.x, it will be failed w/o hostnqn, I've filed one issue to confirm it: https://github.com/linux-nvme/nvme-cli/issues/1473
(In reply to Zhang Yi from comment #9) > (In reply to Zhang Yi from comment #7) > > (In reply to Tomáš Bžatek from comment #6) > > > HostNQN is now (nvme-cli 2.x series) generated through libnvme > > > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) > > > using stable machine DMI identifiers, when available. In case of a missing > > > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory. > > > > So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x? > > > > I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn > > > > > > > > So this file should not be needed for nvme-clie, otherwise it looks like a > > > bug. What calls in blktests do fail exactly? > > > > I will check it later. > > The blktests will execute below cmd to connect the target: > # nvme connect -t loop -n blktests-subsystem-1 > > > With the nvme-cli 1.x version, seems the hostnqn is not necessary. The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi() https://github.com/linux-nvme/nvme-cli/blob/deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905
(In reply to Zhang Yi from comment #10) > (In reply to Zhang Yi from comment #9) > > (In reply to Zhang Yi from comment #7) > > > (In reply to Tomáš Bžatek from comment #6) > > > > HostNQN is now (nvme-cli 2.x series) generated through libnvme > > > > (https://github.com/linux-nvme/libnvme/blob/master/src/nvme/fabrics.c#L967) > > > > using stable machine DMI identifiers, when available. In case of a missing > > > > /etc/nvme/hostnqn file, a default hostnqn is generated on-the-fly in memory. > > > > > > So maybe there has issue in nvmf_hostnqn_generate with nvme-cli 2.x? > > > > > > I tried F36 nvme-cli 1.x, it works even without /etc/nvme/hostnqn > > > > > > > > > > > So this file should not be needed for nvme-clie, otherwise it looks like a > > > > bug. What calls in blktests do fail exactly? > > > > > > I will check it later. > > > > The blktests will execute below cmd to connect the target: > > # nvme connect -t loop -n blktests-subsystem-1 > > > > > > With the nvme-cli 1.x version, seems the hostnqn is not necessary. > > The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi() > > https://github.com/linux-nvme/nvme-cli/blob/ > deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905 Do you mean nvme_generate_systemd(), four lines down? https://github.com/linux-nvme/nvme-cli/blob/deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L909
(In reply to Zhang Yi from comment #7) > I tried latest fedora rawhide, the nvme command doesn't work now, I think we > need fix this issue first. > > # uname -r > 5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64 > > # rpm -qa nvme-cli libnvme > libnvme-1.0~rc8-1.fc37.x86_64 > nvme-cli-2.0~rc6-1.fc37.x86_64 > > # nvme > nvme: symbol lookup error: nvme: undefined symbol: nvme_init_id_ns, version > LIBNVME_1_0 FYI, this is a separate issue, tracked in bug 2071219, caused by recent libnvme-1.0~rc8 update from last Friday, while the original issue in this bugreport was related to libnvme-1.0~rc6, hence a different one.
(In reply to Andy Lutomirski from comment #11) > (In reply to Zhang Yi from comment #10) > > The nvme-cli 1.16, hostnqn got from hostnqn_read_dmi() > > > > https://github.com/linux-nvme/nvme-cli/blob/ > > deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L905 > > Do you mean nvme_generate_systemd(), four lines down? > > https://github.com/linux-nvme/nvme-cli/blob/ > deee9cae1ac94760deebd71f8e5449061338666c/fabrics.c#L909 This is way different in nvme-cli 2.x now. And yes, lot of confusion on various fronts. (In reply to Zhang Yi from comment #9) > With 2.x, it will be failed w/o hostnqn, I've filed one issue to confirm it: > > https://github.com/linux-nvme/nvme-cli/issues/1473 I'll pick that change up for nvme-cli-2.0~rc8. While the libnvme codebase was okay with missing /etc/nvme/hostnqn and hostid files, the difference might have been in what commandline arguments were supplied to nvme-cli and how the fallback used to work. FYI, don't expect full functional parity between the 1.x and 2.x series, there are differences in commandline arguments, return codes and the output (slightly different JSON structure in particular). And since it's a partly new codebase, we need to identify as much differences as possible and potentially fix related bugs.
OK, thanks for the update. Then the remaining question is will fedora update the nvme.spec to generate hostid/hostnqn during the POSY phase? I'm fine if we won't generate them as the previous release did, but RHEL did.
(In reply to Zhang Yi from comment #14) > Then the remaining question is will fedora update the nvme.spec to generate > hostid/hostnqn during the POSY phase? Yes, I added that in nvme-cli-2.0~rc8-1. Still I think your original issue is elsewhere. Please retest with nvme-cli-2.0-1.fc37 and report any differences. The 'blktests' scripts might need adaptations for the new nvme-cli 2.0 as well.
FEDORA-2022-31819ca3b3 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-31819ca3b3
FEDORA-2022-31819ca3b3 has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-31819ca3b3` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-31819ca3b3 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-31819ca3b3 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report.
FYI, rpm-ostree team are complaining about this %post script in RHEL and would like to see it removed https://bugzilla.redhat.com/show_bug.cgi?id=1900691#c4 They are suggesting to "add a check for /run/ostree-booted in your %post script to skip all uuid or fdqn generation when running under rpm-ostree. Then, you need to create a systemd unit that will be run on boot and will generate those files if they don't exists."
Or remove it for all systems, not just ostree, and instead fix nvme-cli to be fully functional based on machine-id. I'm not maintaining this any more, but if I was, I would not have made this change to the RPM.
See the discussion in bug 1900691. For the time being nvme-stas will take over generating /etc/nvme/hostnqn and /etc/nvme/hostid via the stas-config@.service. I've pushed nvme-cli-2.1~rc0-1.fc37 with the change. There are obviously pros and cons for each approach. Things might change again once the current NVMe Technical Proposals in works are finalized and published.