Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1349218 - Host localhost installation failed. Failed to configure management network on the host.
Host localhost installation failed. Failed to configure management network on...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.7
All Linux
urgent Severity urgent
: ovirt-4.0.1
: ---
Assigned To: Yaniv Bronhaim
Pavol Brilla
: ZStream
Depends On:
Blocks: 1353105
  Show dependency treegraph
 
Reported: 2016-06-22 21:16 EDT by Douglas Schilling Landgraf
Modified: 2016-08-23 16:16 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1353105 (view as bug list)
Environment:
Last Closed: 2016-08-23 16:16:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ovirt-20160622210811-192.168.122.98-41116180.log (247.94 KB, text/plain)
2016-06-22 21:19 EDT, Douglas Schilling Landgraf
no flags Details
vdsm logs (30.75 KB, application/x-gzip)
2016-06-22 21:20 EDT, Douglas Schilling Landgraf
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 59900 master NEW spec: Add shadow-utils as requirement 2016-06-28 15:27 EDT
oVirt gerrit 60152 ovirt-4.0 MERGED spec: Add shadow-utils as requirement 2016-07-04 11:02 EDT
oVirt gerrit 60160 ovirt-3.6 MERGED spec: Add shadow-utils as requirement 2016-07-04 11:49 EDT
Red Hat Product Errata RHEA-2016:1671 normal SHIPPED_LIVE VDSM 4.0 GA bug fix and enhancement update 2016-09-02 17:32:03 EDT

  None (edit)
Description Douglas Schilling Landgraf 2016-06-22 21:16:59 EDT
Description of problem:

Due the bug#1349068 I have generated a new rhev-hypervisor6-6.8-20160621.1.iso   which includes updated sanlock. Now, I can register the node and the registration process continue but I see now: "Host localhost installation failed. Failed to configure management network on the host."

Version-Release number of selected component (if applicable):


# rpm -qa | grep -i vdsm
vdsm-yajsonrpc-4.16.37-1.el6ev.noarch
vdsm-hook-vhostmd-4.16.37-1.el6ev.noarch
vdsm-reg-4.16.37-1.el6ev.noarch
vdsm-cli-4.16.37-1.el6ev.noarch
vdsm-jsonrpc-4.16.37-1.el6ev.noarch
vdsm-python-zombiereaper-4.16.37-1.el6ev.noarch
vdsm-xmlrpc-4.16.37-1.el6ev.noarch
vdsm-4.16.37-1.el6ev.x86_64
vdsm-hook-ethtool-options-4.16.37-1.el6ev.noarch
ovirt-node-plugin-vdsm-0.2.0-26.el6ev.noarch
vdsm-python-4.16.37-1.el6ev.noarch

How reproducible:

- Register the node.
- Approve the node

Actual results:
Host is not up 

Expected results:
Host is up 

Additional info:
After some minutes I see the host status change to "Unassigned" from 	
"Host localhost was autorecovered" operation.
Comment 1 Douglas Schilling Landgraf 2016-06-22 21:17:59 EDT
# ifconfig -a
;vdsmdummy; Link encap:Ethernet  HWaddr 92:97:F3:86:7B:25  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr 52:54:00:C5:0C:39  
          inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2260 errors:0 dropped:0 overruns:0 frame:0
          TX packets:971 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1575933 (1.5 MiB)  TX bytes:402677 (393.2 KiB)
          Interrupt:11 Base address:0x2000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:381 errors:0 dropped:0 overruns:0 frame:0
          TX packets:381 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:64923 (63.4 KiB)  TX bytes:64923 (63.4 KiB)

rhevm     Link encap:Ethernet  HWaddr 52:54:00:C5:0C:39  
          inet addr:192.168.122.98  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1005 errors:0 dropped:0 overruns:0 frame:0
          TX packets:611 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:96289 (94.0 KiB)  TX bytes:357943 (349.5 KiB)


virsh > net-list --all
Name                 State      Autostart     Persistent
--------------------------------------------------
;vdsmdummy;          active     no            no
default              inactive   no            yes
vdsm-rhevm           active     yes           yes
Comment 2 Douglas Schilling Landgraf 2016-06-22 21:19 EDT
Created attachment 1171164 [details]
ovirt-20160622210811-192.168.122.98-41116180.log
Comment 3 Douglas Schilling Landgraf 2016-06-22 21:20 EDT
Created attachment 1171165 [details]
vdsm logs
Comment 5 Dan Kenigsberg 2016-06-23 04:21:13 EDT
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied

this is an infra issue.

Does `virsh -r list` work? And does `virsh list` (with shibboleth of course).
Comment 6 Dan Kenigsberg 2016-06-23 04:23:52 EDT
(In reply to Douglas Schilling Landgraf from comment #0)

> Additional info:
> After some minutes I see the host status change to "Unassigned" from 	
> "Host localhost was autorecovered" operation.

This sounds much like bug 1348103
Comment 7 Piotr Kliczewski 2016-06-23 09:47:06 EDT
It looks like different issue. I see plenty of:

Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error while sampling stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 529, in run
  File "/usr/share/vdsm/virt/sampling.py", line 519, in sample
  File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__
  File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__
  File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__
  File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 160, in get
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 95, in _open_qemu_connection
  File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied

in the logs. Francesco have you seen it before?
Comment 8 Francesco Romani 2016-06-23 09:51:00 EDT
(In reply to Piotr Kliczewski from comment #7)
> It looks like different issue. I see plenty of:
> 
> Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error
> while sampling stats
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/sampling.py", line 529, in run
>   File "/usr/share/vdsm/virt/sampling.py", line 519, in sample
>   File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__
>   File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in
> __call__
>   File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in
> __call__
>   File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr
>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 160, in get
>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 95, in _open_qemu_connection
>   File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
> libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock':
> Permission denied
> 
> in the logs. Francesco have you seen it before?

It is the same issue Dan mentioned previously.
Looks new to me, I can only tell that AFAIK the socket should look like

$ ls -lh /var/run/libvirt/libvirt-sock
srwxrwx---. 1 root qemu 0 Jun 23 14:31 /var/run/libvirt/libvirt-sock
Comment 9 Douglas Schilling Landgraf 2016-06-23 14:29:34 EDT
(In reply to Dan Kenigsberg from comment #5)
> libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock':
> Permission denied
> 
> this is an infra issue.
> 
> Does `virsh -r list` work? And does `virsh list` (with shibboleth of course).

# virsh -r 
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh > list
 Id    Name                           State
----------------------------------------------------

virsh > quit



# virsh list
Please enter your authentication name: vdsm@rhevh
Please enter your password: 
 Id    Name                           State
----------------------------------------------------

[root@localhost ~]#
Comment 13 Yaniv Bronhaim 2016-06-28 11:13:04 EDT
user vdsm is not configured well. it is not in qemu group for some reason
running "usermod -a -G qemu vdsm" and restart the service solves it

not sure how it happens, as in vdsm 4.16.37 we do - /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user} , but maybe someone modified it before the build?
Comment 14 Yaniv Bronhaim 2016-06-28 12:53:28 EDT
The following is the code we run before installing vdsm rpm. maybe one of the command failed because the executable was not installed (getent, useradd, usermod .. ) and it leaded to skip part of the calls here

%pre                                                                            
# Force standard locale behavior (English)                                      
export LC_ALL=C                                                                 
                                                                                
/usr/bin/getent passwd %{vdsm_user} >/dev/null || \                             
    /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \              
        -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}          
/usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user}                
/usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user}                             
                                                                                
# We keep the previous rpm version number in a file for managing upgrade flow   
# in vdsmd_init_script upgraded_version_check task                              
if [ "$1" -gt 1 ]; then                                                         
    rpm -q %{vdsm_name} > "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" 
fi  

please verify if this can be the case. maybe you can check /var/log/messages after vdsm rpm installation and find which command failed to run..?
Comment 15 Douglas Schilling Landgraf 2016-06-28 14:35:36 EDT
(In reply to Yaniv Bronhaim from comment #14)
> The following is the code we run before installing vdsm rpm. maybe one of
> the command failed because the executable was not installed (getent,
> useradd, usermod .. ) and it leaded to skip part of the calls here
> 
> %pre                                                                        
> 
> # Force standard locale behavior (English)                                  
> 
> export LC_ALL=C                                                             
> 
>                                                                             
> 
> /usr/bin/getent passwd %{vdsm_user} >/dev/null || \                         
> 
>     /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \          
> 
>         -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}      
> 
> /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user}            
> 
> /usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user}                         
> 
>                                                                             
> 
> # We keep the previous rpm version number in a file for managing upgrade
> flow   
> # in vdsmd_init_script upgraded_version_check task                          
> 
> if [ "$1" -gt 1 ]; then                                                     
> 
>     rpm -q %{vdsm_name} >
> "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" 
> fi  
> 
> please verify if this can be the case. maybe you can check /var/log/messages
> after vdsm rpm installation and find which command failed to run..?

Thanks Yaniv, looks like it was missing a requirement in vdsm spec, I manually build vdsm with requirements of shadow-utils and it goes ahead in the registration.  Same issue that sanlock folks faced: https://bugzilla.redhat.com/show_bug.cgi?id=1344139#c1  

This looks like happened because some dep. removed the requirement of user{add,mod,del} tools and affected both packages. Patch is ready for review.
Comment 16 Oved Ourfali 2016-06-29 08:30:51 EDT
So this will require a release of vdsm for 3.5?
Comment 17 Douglas Schilling Landgraf 2016-06-29 09:37:31 EDT
(In reply to Oved Ourfali from comment #16)
> So this will require a release of vdsm for 3.5?

Yes, please.
Comment 18 Oved Ourfali 2016-06-29 09:40:37 EDT
So I'm not sure there is one.
Eyal?

Douglas - can you handle that on node level?
Comment 19 Douglas Schilling Landgraf 2016-06-29 09:47:39 EDT
(In reply to Oved Ourfali from comment #18)
> So I'm not sure there is one.
> Eyal?
> 
> Douglas - can you handle that on node level?

The best approach is VDSM rebuild as it's a requirement for vdsm installation. 
Additionally, VDSM needs to be rebuild anyway because of bz#1349068.
Comment 20 Oved Ourfali 2016-06-29 09:51:41 EDT
(In reply to Douglas Schilling Landgraf from comment #19)
> (In reply to Oved Ourfali from comment #18)
> > So I'm not sure there is one.
> > Eyal?
> > 
> > Douglas - can you handle that on node level?
> 
> The best approach is VDSM rebuild as it's a requirement for vdsm
> installation. 
> Additionally, VDSM needs to be rebuild anyway because of bz#1349068.

I see bz#1349068 is targeted to 3.6.8.
Douglas - Anyway, I assign it to you, okay?
Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5 then it should be there as well.
Comment 21 Douglas Schilling Landgraf 2016-06-29 10:01:36 EDT
(In reply to Oved Ourfali from comment #20)
> (In reply to Douglas Schilling Landgraf from comment #19)
> > (In reply to Oved Ourfali from comment #18)
> > > So I'm not sure there is one.
> > > Eyal?
> > > 
> > > Douglas - can you handle that on node level?
> > 
> > The best approach is VDSM rebuild as it's a requirement for vdsm
> > installation. 
> > Additionally, VDSM needs to be rebuild anyway because of bz#1349068.
> 
> I see bz#1349068 is targeted to 3.6.8.
> Douglas - Anyway, I assign it to you, okay?
> Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5
> then it should be there as well.

Both bugs are blocking rhevh build for 3.5 on top of RHEL 6.8. I don't think target is correct.  Moran, could you please help?
Comment 23 Dan Kenigsberg 2016-06-29 10:59:27 EDT
In my opinion, useradd should be "shoved" into node, without waiting for another vdsm respin - just like what I suggested for 3.6.
Comment 32 Pavol Brilla 2016-08-16 10:04:24 EDT
vdsm is requiring shadow-utils as dependency: 

# yum list vdsm && yum deplist vdsm | grep shadow 
Loaded plugins: product-id, search-disabled-repos
Available Packages
vdsm.x86_64                       4.18.11-1.el7ev                        4.0.2-8
  dependency: shadow-utils
   provider: shadow-utils.x86_64 2:4.1.5.1-18.el7
Comment 34 errata-xmlrpc 2016-08-23 16:16:41 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1671.html

Note You need to log in before you can comment on or make changes to this bug.