Bug 1349218

Summary: Host localhost installation failed. Failed to configure management network on the host.
Product: Red Hat Enterprise Virtualization Manager Reporter: Douglas Schilling Landgraf <dougsland>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.7CC: acanan, bazulay, cshao, danken, dougsland, eedri, fromani, gklein, lsurette, mgoldboi, mkalinin, oourfali, pkliczew, pstehlik, rhev-integ, srevivo, tlitovsk, ycui, ykaul
Target Milestone: ovirt-4.0.1Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1353105 (view as bug list) Environment:
Last Closed: 2016-08-23 20:16:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1353105    
Attachments:
Description Flags
ovirt-20160622210811-192.168.122.98-41116180.log
none
vdsm logs none

Description Douglas Schilling Landgraf 2016-06-23 01:16:59 UTC
Description of problem:

Due the bug#1349068 I have generated a new rhev-hypervisor6-6.8-20160621.1.iso   which includes updated sanlock. Now, I can register the node and the registration process continue but I see now: "Host localhost installation failed. Failed to configure management network on the host."

Version-Release number of selected component (if applicable):


# rpm -qa | grep -i vdsm
vdsm-yajsonrpc-4.16.37-1.el6ev.noarch
vdsm-hook-vhostmd-4.16.37-1.el6ev.noarch
vdsm-reg-4.16.37-1.el6ev.noarch
vdsm-cli-4.16.37-1.el6ev.noarch
vdsm-jsonrpc-4.16.37-1.el6ev.noarch
vdsm-python-zombiereaper-4.16.37-1.el6ev.noarch
vdsm-xmlrpc-4.16.37-1.el6ev.noarch
vdsm-4.16.37-1.el6ev.x86_64
vdsm-hook-ethtool-options-4.16.37-1.el6ev.noarch
ovirt-node-plugin-vdsm-0.2.0-26.el6ev.noarch
vdsm-python-4.16.37-1.el6ev.noarch

How reproducible:

- Register the node.
- Approve the node

Actual results:
Host is not up 

Expected results:
Host is up 

Additional info:
After some minutes I see the host status change to "Unassigned" from 	
"Host localhost was autorecovered" operation.

Comment 1 Douglas Schilling Landgraf 2016-06-23 01:17:59 UTC
# ifconfig -a
;vdsmdummy; Link encap:Ethernet  HWaddr 92:97:F3:86:7B:25  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr 52:54:00:C5:0C:39  
          inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2260 errors:0 dropped:0 overruns:0 frame:0
          TX packets:971 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1575933 (1.5 MiB)  TX bytes:402677 (393.2 KiB)
          Interrupt:11 Base address:0x2000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:381 errors:0 dropped:0 overruns:0 frame:0
          TX packets:381 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:64923 (63.4 KiB)  TX bytes:64923 (63.4 KiB)

rhevm     Link encap:Ethernet  HWaddr 52:54:00:C5:0C:39  
          inet addr:192.168.122.98  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1005 errors:0 dropped:0 overruns:0 frame:0
          TX packets:611 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:96289 (94.0 KiB)  TX bytes:357943 (349.5 KiB)


virsh > net-list --all
Name                 State      Autostart     Persistent
--------------------------------------------------
;vdsmdummy;          active     no            no
default              inactive   no            yes
vdsm-rhevm           active     yes           yes

Comment 2 Douglas Schilling Landgraf 2016-06-23 01:19:24 UTC
Created attachment 1171164 [details]
ovirt-20160622210811-192.168.122.98-41116180.log

Comment 3 Douglas Schilling Landgraf 2016-06-23 01:20:51 UTC
Created attachment 1171165 [details]
vdsm logs

Comment 5 Dan Kenigsberg 2016-06-23 08:21:13 UTC
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied

this is an infra issue.

Does `virsh -r list` work? And does `virsh list` (with shibboleth of course).

Comment 6 Dan Kenigsberg 2016-06-23 08:23:52 UTC
(In reply to Douglas Schilling Landgraf from comment #0)

> Additional info:
> After some minutes I see the host status change to "Unassigned" from 	
> "Host localhost was autorecovered" operation.

This sounds much like bug 1348103

Comment 7 Piotr Kliczewski 2016-06-23 13:47:06 UTC
It looks like different issue. I see plenty of:

Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error while sampling stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 529, in run
  File "/usr/share/vdsm/virt/sampling.py", line 519, in sample
  File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__
  File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__
  File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__
  File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 160, in get
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 95, in _open_qemu_connection
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied

in the logs. Francesco have you seen it before?

Comment 8 Francesco Romani 2016-06-23 13:51:00 UTC
(In reply to Piotr Kliczewski from comment #7)
> It looks like different issue. I see plenty of:
> 
> Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error
> while sampling stats
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/sampling.py", line 529, in run
>   File "/usr/share/vdsm/virt/sampling.py", line 519, in sample
>   File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__
>   File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in
> __call__
>   File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in
> __call__
>   File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr
>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 160, in get
>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 95, in _open_qemu_connection
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
> libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock':
> Permission denied
> 
> in the logs. Francesco have you seen it before?

It is the same issue Dan mentioned previously.
Looks new to me, I can only tell that AFAIK the socket should look like

$ ls -lh /var/run/libvirt/libvirt-sock
srwxrwx---. 1 root qemu 0 Jun 23 14:31 /var/run/libvirt/libvirt-sock

Comment 9 Douglas Schilling Landgraf 2016-06-23 18:29:34 UTC
(In reply to Dan Kenigsberg from comment #5)
> libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock':
> Permission denied
> 
> this is an infra issue.
> 
> Does `virsh -r list` work? And does `virsh list` (with shibboleth of course).

# virsh -r 
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh > list
 Id    Name                           State
----------------------------------------------------

virsh > quit



# virsh list
Please enter your authentication name: vdsm@rhevh
Please enter your password: 
 Id    Name                           State
----------------------------------------------------

[root@localhost ~]#

Comment 13 Yaniv Bronhaim 2016-06-28 15:13:04 UTC
user vdsm is not configured well. it is not in qemu group for some reason
running "usermod -a -G qemu vdsm" and restart the service solves it

not sure how it happens, as in vdsm 4.16.37 we do - /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user} , but maybe someone modified it before the build?

Comment 14 Yaniv Bronhaim 2016-06-28 16:53:28 UTC
The following is the code we run before installing vdsm rpm. maybe one of the command failed because the executable was not installed (getent, useradd, usermod .. ) and it leaded to skip part of the calls here

%pre                                                                            
# Force standard locale behavior (English)                                      
export LC_ALL=C                                                                 
                                                                                
/usr/bin/getent passwd %{vdsm_user} >/dev/null || \                             
    /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \              
        -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}          
/usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user}                
/usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user}                             
                                                                                
# We keep the previous rpm version number in a file for managing upgrade flow   
# in vdsmd_init_script upgraded_version_check task                              
if [ "$1" -gt 1 ]; then                                                         
    rpm -q %{vdsm_name} > "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" 
fi  

please verify if this can be the case. maybe you can check /var/log/messages after vdsm rpm installation and find which command failed to run..?

Comment 15 Douglas Schilling Landgraf 2016-06-28 18:35:36 UTC
(In reply to Yaniv Bronhaim from comment #14)
> The following is the code we run before installing vdsm rpm. maybe one of
> the command failed because the executable was not installed (getent,
> useradd, usermod .. ) and it leaded to skip part of the calls here
> 
> %pre                                                                        
> 
> # Force standard locale behavior (English)                                  
> 
> export LC_ALL=C                                                             
> 
>                                                                             
> 
> /usr/bin/getent passwd %{vdsm_user} >/dev/null || \                         
> 
>     /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \          
> 
>         -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}      
> 
> /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user}            
> 
> /usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user}                         
> 
>                                                                             
> 
> # We keep the previous rpm version number in a file for managing upgrade
> flow   
> # in vdsmd_init_script upgraded_version_check task                          
> 
> if [ "$1" -gt 1 ]; then                                                     
> 
>     rpm -q %{vdsm_name} >
> "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" 
> fi  
> 
> please verify if this can be the case. maybe you can check /var/log/messages
> after vdsm rpm installation and find which command failed to run..?

Thanks Yaniv, looks like it was missing a requirement in vdsm spec, I manually build vdsm with requirements of shadow-utils and it goes ahead in the registration.  Same issue that sanlock folks faced: https://bugzilla.redhat.com/show_bug.cgi?id=1344139#c1  

This looks like happened because some dep. removed the requirement of user{add,mod,del} tools and affected both packages. Patch is ready for review.

Comment 16 Oved Ourfali 2016-06-29 12:30:51 UTC
So this will require a release of vdsm for 3.5?

Comment 17 Douglas Schilling Landgraf 2016-06-29 13:37:31 UTC
(In reply to Oved Ourfali from comment #16)
> So this will require a release of vdsm for 3.5?

Yes, please.

Comment 18 Oved Ourfali 2016-06-29 13:40:37 UTC
So I'm not sure there is one.
Eyal?

Douglas - can you handle that on node level?

Comment 19 Douglas Schilling Landgraf 2016-06-29 13:47:39 UTC
(In reply to Oved Ourfali from comment #18)
> So I'm not sure there is one.
> Eyal?
> 
> Douglas - can you handle that on node level?

The best approach is VDSM rebuild as it's a requirement for vdsm installation. 
Additionally, VDSM needs to be rebuild anyway because of bz#1349068.

Comment 20 Oved Ourfali 2016-06-29 13:51:41 UTC
(In reply to Douglas Schilling Landgraf from comment #19)
> (In reply to Oved Ourfali from comment #18)
> > So I'm not sure there is one.
> > Eyal?
> > 
> > Douglas - can you handle that on node level?
> 
> The best approach is VDSM rebuild as it's a requirement for vdsm
> installation. 
> Additionally, VDSM needs to be rebuild anyway because of bz#1349068.

I see bz#1349068 is targeted to 3.6.8.
Douglas - Anyway, I assign it to you, okay?
Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5 then it should be there as well.

Comment 21 Douglas Schilling Landgraf 2016-06-29 14:01:36 UTC
(In reply to Oved Ourfali from comment #20)
> (In reply to Douglas Schilling Landgraf from comment #19)
> > (In reply to Oved Ourfali from comment #18)
> > > So I'm not sure there is one.
> > > Eyal?
> > > 
> > > Douglas - can you handle that on node level?
> > 
> > The best approach is VDSM rebuild as it's a requirement for vdsm
> > installation. 
> > Additionally, VDSM needs to be rebuild anyway because of bz#1349068.
> 
> I see bz#1349068 is targeted to 3.6.8.
> Douglas - Anyway, I assign it to you, okay?
> Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5
> then it should be there as well.

Both bugs are blocking rhevh build for 3.5 on top of RHEL 6.8. I don't think target is correct.  Moran, could you please help?

Comment 23 Dan Kenigsberg 2016-06-29 14:59:27 UTC
In my opinion, useradd should be "shoved" into node, without waiting for another vdsm respin - just like what I suggested for 3.6.

Comment 32 Pavol Brilla 2016-08-16 14:04:24 UTC
vdsm is requiring shadow-utils as dependency: 

# yum list vdsm && yum deplist vdsm | grep shadow 
Loaded plugins: product-id, search-disabled-repos
Available Packages
vdsm.x86_64                       4.18.11-1.el7ev                        4.0.2-8
  dependency: shadow-utils
   provider: shadow-utils.x86_64 2:4.1.5.1-18.el7

Comment 34 errata-xmlrpc 2016-08-23 20:16:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1671.html