Bug 1652894

Summary: when create vhba without indicating wwpn/wwnn, libvirt will generate duplicated ones
Product: Red Hat Enterprise Linux 7 Reporter: yisun
Component: libvirtAssignee: John Ferlan <jferlan>
Status: CLOSED ERRATA QA Contact: yisun
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: hhan, jdenemar, jferlan, jsuchane, lmen, mtessun, xuzhang
Target Milestone: rcKeywords: Automation, Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.5.0-11.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1655586 1667329 (view as bug list) Environment:
Last Closed: 2019-08-06 13:14:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1651787, 1655586, 1667329    

Description yisun 2018-11-23 12:22:27 UTC
Description of problem:
when create vhba without indicating wwpn/wwnn, libvirt will generate duplicated ones

Version-Release number of selected component (if applicable):
libvirt-4.5.0-10.virtcov.el7_6.3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a vhba xml without indicating wwpn/wwnn, as follow:
# cat vhba.xml 
<device><capability type="scsi_host"><capability type="fc_host"><wwnn /><wwpn /></capability></capability><parent>scsi_host1</parent></device>

2. create vhba repeatedly with the vhba.xml, with 5 secs interval
[root@hp-dl360eg8-06 ~]# for i in {1..10}; do virsh nodedev-create vhba.xml; sleep 5; done
Node device scsi_host79 created from vhba.xml
Node device scsi_host80 created from vhba.xml
error: Failed to create node device from vhba.xml
error: Write of '5001a4a000000006:5001a4a00000000b' to '/sys/class/fc_host/host1/vport_create' during vport create/delete failed: Name not unique on network
Node device scsi_host81 created from vhba.xml
Node device scsi_host82 created from vhba.xml
error: Failed to create node device from vhba.xml
error: Write of '5001a4a00000000d:5001a4a00000000e' to '/sys/class/fc_host/host1/vport_create' during vport create/delete failed: Name not unique on network
Node device scsi_host83 created from vhba.xml
error: Failed to create node device from vhba.xml
error: Write of '5001a4a000000006:5001a4a000000000' to '/sys/class/fc_host/host1/vport_create' during vport create/delete failed: Name not unique on network
error: Failed to create node device from vhba.xml
error: Write of '5001a4a00000000f:5001a4a00000000e' to '/sys/class/fc_host/host1/vport_create' during vport create/delete failed: Name not unique on network
Node device scsi_host84 created from vhba.xml

3. if remove the 5 secs interval, more failures happened.

Actual results:
A lot of vhba creations failed due to wwpn duplicated. 

Expected results:
Libvirt should provide unique wwpns to vhba if no wwpns provided by xml.

Comment 2 John Ferlan 2018-11-27 00:05:42 UTC
Generation of the wwnn/wwpn is handled via virRandomGenerateWWN which as the name suggests is supposed to be a random generation.

I attempted your script on the host I have and 10 unique scsi_host/fc_host's were created with unique wwnn and wwpn values.

Compared to the generated numbers you show using repeated patterns where the "random" number is essentially rotate/increment in a linear and non random pattern removing the "5001a4a" which is the QUMRANET_OUI value from your output (e.g. "000000006", "00000000b", "00000000d", "0000000e", etc. looping back to "00000000").

I'd say something is very wrong with your environment either gnutls_rnd is returning bad values or the /dev/urandom is bogus.

Comment 3 yisun 2018-11-28 03:19:05 UTC
(In reply to John Ferlan from comment #2)
> Generation of the wwnn/wwpn is handled via virRandomGenerateWWN which as the
> name suggests is supposed to be a random generation.
> 
> I attempted your script on the host I have and 10 unique scsi_host/fc_host's
> were created with unique wwnn and wwpn values.
> 
> Compared to the generated numbers you show using repeated patterns where the
> "random" number is essentially rotate/increment in a linear and non random
> pattern removing the "5001a4a" which is the QUMRANET_OUI value from your
> output (e.g. "000000006", "00000000b", "00000000d", "0000000e", etc. looping
> back to "00000000").
> 
> I'd say something is very wrong with your environment either gnutls_rnd is
> returning bad values or the /dev/urandom is bogus.

We just have one machine with hba card to test npiv cases, and the OS of that machine will always be clearly installed by beaker before auto script executed. I'll trigger a npiv auto job to reserve that machine and try to see if anything wrong on it. btw, do you have any suggestions about how to check gnutls_rnd? I tried using "dd if=/dev/urandom of=/tmp/file bs=1K count=1024" and can get random output. thx

Comment 4 John Ferlan 2018-11-29 02:09:32 UTC
Usage of gnutls_rnd is a build environment thing. IOW: whatever built the rpm for libvirt-4.5.0-10.virtcov.el7_6.3.x86_64 shown above. I'm sure there's some way in some log file to know whether gnutls is being used, but it's not in my short term memory.

Either way, it doesn't make sense that the value is essentially increasing by 1 rotating back through 16 values unless someone installed some sort of library that returned "known" values in order to perform some other test. Our test environment allows returning known/specific values...

Comment 9 John Ferlan 2018-11-29 17:27:27 UTC
Here's a "quick hack" for something to try on the system that I've cobbled together from extracting out the code libvirt uses in order to generate the random byte string.

$ cat random.c
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <limits.h>
#include <gnutls/gnutls.h>
#include <gnutls/crypto.h>


static int
virRandomBytes(unsigned char *buf,
               size_t buflen)
{
    int rv;

    /* Generate the byte stream using gnutls_rnd() if possible */
    if ((rv = gnutls_rnd(GNUTLS_RND_RANDOM, buf, buflen)) < 0) {
        printf("failed to generate byte stream: %s\n", gnutls_strerror(rv));
        return -1;
    }
    return 0;
}

static unsigned long long
virRandomBits(int nbits)
{
    unsigned long long ret = 0;

    if (virRandomBytes((unsigned char *) &ret, sizeof(ret)) < 0) {
        /* You're already hosed, so this particular non-random value
         * isn't any worse.  */
        return 0;
    }

    if (nbits < 64)
        ret &= (1ULL << nbits) - 1;

    return ret;
}

#define QUMRANET_OUI "001a4a"
static int
virRandomGenerateWWN(char **wwn)
{
    if (asprintf(wwn, "5" "%s%09llx", QUMRANET_OUI,
                 (unsigned long long)virRandomBits(36)) < 0)
        return -1;
    return 0;
}

int main(int argc, int **argv)

{
    char *wwnn = NULL;
    char *wwpn = NULL;

    if (virRandomGenerateWWN(&wwnn) < 0 ||
        virRandomGenerateWWN(&wwpn) < 0) {
        printf("Failure to generate wwnn/wwpn\n");
        exit(1);
    }

    printf("wwnn=0x%s wwpn=0x%s\n", wwnn, wwpn);
}

$ cc -o random random.c -lgnutls
$ ./random
wwnn=0x5001a4a6380cd412 wwpn=0x5001a4a9b1aa0277
$

That's output on my host.

Comment 10 yisun 2018-11-30 03:25:57 UTC
on the clean installed 7.6.z systems i mentioned in comment 5, with the gnutls-devel version mentioned in comment 7 and with the test code mentioned in comment 9... nothing wrong happens in my hosts:

Host1:
[root@ibm-x3850x5-05 ~]# rpm -qa | grep gnutls-devel
gnutls-devel-3.3.29-8.el7.x86_64
[root@ibm-x3850x5-05 ~]# for i in {1..10}; do ./random; done
wwnn=0x5001a4a3f9444d3b wwpn=0x5001a4aaf9fb744a
wwnn=0x5001a4a4eddefd7f wwpn=0x5001a4ad38b0818c
wwnn=0x5001a4a2f33f7dc2 wwpn=0x5001a4aebf743fe0
wwnn=0x5001a4a6b173d4f1 wwpn=0x5001a4aef1357189
wwnn=0x5001a4a6bb7601aa wwpn=0x5001a4adae0f530f
wwnn=0x5001a4afb50d447f wwpn=0x5001a4a162df0ceb
wwnn=0x5001a4a4d3f9329c wwpn=0x5001a4a242009d4a
wwnn=0x5001a4a94bf299ef wwpn=0x5001a4adfe246ed4
wwnn=0x5001a4a56e66792d wwpn=0x5001a4a35bf61c48
wwnn=0x5001a4a9367cdd27 wwpn=0x5001a4a82d5b48c5

Host2:
[root@hp-dl360eg8-06 ~]# rpm -qa | grep gnutls-devel
gnutls-devel-3.3.29-8.el7.x86_64
[root@hp-dl360eg8-06 ~]# for i in {1..10}; do ./random; done
wwnn=0x5001a4a3341815dd wwpn=0x5001a4a42d9a4341
wwnn=0x5001a4ae0375dc37 wwpn=0x5001a4ae15c94b91
wwnn=0x5001a4ad6dbc1b4d wwpn=0x5001a4a28288d56c
wwnn=0x5001a4abd0b070b9 wwpn=0x5001a4a5c3b9a6ea
wwnn=0x5001a4a6c22bdf98 wwpn=0x5001a4a8aa7fdd19
wwnn=0x5001a4a4fe083e3d wwpn=0x5001a4a80cc59754
wwnn=0x5001a4aa48f01fd3 wwpn=0x5001a4a00c68d95a
wwnn=0x5001a4a5758f709b wwpn=0x5001a4a0a5e0af8f
wwnn=0x5001a4a01bb5f7b8 wwpn=0x5001a4ad9fa8136c
wwnn=0x5001a4a3d38d17c9 wwpn=0x5001a4a32591f7ce

So do we need to run test code on the rpm builder host for a check?

Comment 11 John Ferlan 2018-11-30 17:38:14 UTC
> So do we need to run test code on the rpm builder host for a check?

I do not believe that matters. The failure is happening on the target host not the build system.

At this point I'm leaning towards environmental with respect to what gets loaded when libvirtd is loaded.  IOW: Is there something installed on the target host(s) that is intercepting the libgnutls.so gnutls_rnd call and returning non random values.

Does the "failed to generate byte stream: %s" message show up in the libvirt logs (assuming you turn them on)? 

Since you've proven you can build directly linking to the "host" gnutls library and not get certain results, but then run as part of libvirtd and get those bad results, then there's something in that "stack" that's causing the problem.

So a bit of debugging is needed. I think a 'starting point' would be to run 'lsof -p <pid> | grep gnutls' on the pid that's running libvirtd since in the long run that's what will make the call to gnutls_rnd. Let's make sure that the library is there and loaded.

Another data point could be to use gdb to attach to the libvirtd process (gdb <pathto>/libvirtd and attach <pid>). From the gdb> prompt using "info symbol gnutls_rnd" should tell you which library loaded the symbol. Staying in gdb> (after a continue) and running the client call again to create the vHBA may result in seeing the above mentioned "failed to generate" error (it may not, but it may).

The "good news" to a degree is that I don't believe creating vHBAs in a loop such as is being done in the problem statement is not a "typical" operation.

Comment 12 yisun 2018-12-03 08:50:09 UTC
(In reply to John Ferlan from comment #11)
> > So do we need to run test code on the rpm builder host for a check?
> 
> I do not believe that matters. The failure is happening on the target host
> not the build system.

Hi John,
I checked the latest rhel7.6 branch and found that there maybe a data type misusing.
***
root@yisun /tmp/libvirt-rhel  ## git show 60da4a11
...
@@ -108,26 +61,14 @@ VIR_ONCE_GLOBAL_INIT(virRandom)
 uint64_t virRandomBits(int nbits)
 {
     uint64_t ret = 0;
...
+    ret &= (1U << nbits) - 1;
***

Here the 64 bits ret is &= with a 1U which is a 32 bits type, seems a lot of zeros will be generated...
In comment 9, your test code using a 1ULL which is a 64 bits value so no error happened, but if changed the test code to 1U, same error will happened with the test code.

Comment 13 yisun 2018-12-03 08:56:28 UTC
just change the 1ULL to 1U of your test code, same error happens as follow:
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a000000000 wwpn=0x5001a4a00000000e
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a000000003 wwpn=0x5001a4a00000000c
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a00000000d wwpn=0x5001a4a00000000b
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a00000000b wwpn=0x5001a4a00000000c
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a000000002 wwpn=0x5001a4a000000002
[root@hp-dl360eg8-06 tmp]# ./random 
wwnn=0x5001a4a000000006 wwpn=0x5001a4a000000001

Comment 14 yisun 2018-12-03 09:03:39 UTC
Set as a regression. Some other functions may be effected by the random number generator.

Comment 16 John Ferlan 2018-12-03 11:51:48 UTC
A patch is already in upstream libvirt 4.7.0 via commit:

commit 0a5a6f0d019996b015bb0acbe30efa8f2fbbb351
Author: Michal Privoznik <mprivozn>
Date:   Thu Aug 2 09:29:38 2018 +0200

    virrandom: Avoid undefined behaviour in virRandomBits
    
    If nbits is 64 (or greater) then shifting 1ULL left is undefined.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>

diff --git a/src/util/virrandom.c b/src/util/virrandom.c
index 3c011a8615..7915f6531e 100644
--- a/src/util/virrandom.c
+++ b/src/util/virrandom.c
@@ -68,7 +68,9 @@ uint64_t virRandomBits(int nbits)
         return 0;
     }
 
-    ret &= (1ULL << nbits) - 1;
+    if (nbits < 64)
+        ret &= (1ULL << nbits) - 1;
+
     return ret;
 }
 

Thus "process wise" for RHEL 7.7, this bz changes to POST for 7.7 and I've set the 7.6.z flag. Although the usage in this case is for vHBA WWNN/WWPN in a loop, the virRandomBits function is far more widely used for things like generation of a MAC or a UUID.

Comment 17 John Ferlan 2018-12-03 12:12:31 UTC
Seems commit 78c47a92ecb450c9f8bcabd35da7006dc2547882 from libvirt.4.6.0 is also needed:

Author: Michal Privoznik <mprivozn>
Date:   Wed Aug 1 13:26:46 2018 +0200

    util: Don't overflow in virRandomBits
    
    The function is supposed to return up to 64bit long integer. In
    order to do that it calls virRandomBytes() to fill the integer
    with random bytes and then masks out everything but requested
    bits. However, when doing that it shifts 1U and not 1ULL. So
    effectively, requesting 32 random bis or more always return 0
    which is not random enough.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>
    Reviewed-by: Pino Toscano <ptoscano>

diff --git a/src/util/virrandom.c b/src/util/virrandom.c
index 01cc82a052..3c011a8615 100644
--- a/src/util/virrandom.c
+++ b/src/util/virrandom.c
@@ -68,7 +68,7 @@ uint64_t virRandomBits(int nbits)
         return 0;
     }
 
-    ret &= (1U << nbits) - 1;
+    ret &= (1ULL << nbits) - 1;
     return ret;
 }

Comment 21 yisun 2019-04-18 06:20:15 UTC
test result:
PASSED

tested with:
libvirt-4.5.0-12.virtcov.el7.x86_64

covered by auto cases:
npiv.nodedev_create_destroy.negative_testing.create_max_vhbas
npiv.nodedev_create_destroy.negative_testing.frequently_create_destroy_vhbas.random_wwn

Comment 24 errata-xmlrpc 2019-08-06 13:14:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294