Bug 1349218
Summary: | Host localhost installation failed. Failed to configure management network on the host. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Douglas Schilling Landgraf <dougsland> | ||||||
Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Pavol Brilla <pbrilla> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 3.5.7 | CC: | acanan, bazulay, cshao, danken, dougsland, eedri, fromani, gklein, lsurette, mgoldboi, mkalinin, oourfali, pkliczew, pstehlik, rhev-integ, srevivo, tlitovsk, ycui, ykaul | ||||||
Target Milestone: | ovirt-4.0.1 | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1353105 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-08-23 20:16:41 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1353105 | ||||||||
Attachments: |
|
Description
Douglas Schilling Landgraf
2016-06-23 01:16:59 UTC
# ifconfig -a ;vdsmdummy; Link encap:Ethernet HWaddr 92:97:F3:86:7B:25 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) eth0 Link encap:Ethernet HWaddr 52:54:00:C5:0C:39 inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2260 errors:0 dropped:0 overruns:0 frame:0 TX packets:971 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1575933 (1.5 MiB) TX bytes:402677 (393.2 KiB) Interrupt:11 Base address:0x2000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:381 errors:0 dropped:0 overruns:0 frame:0 TX packets:381 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:64923 (63.4 KiB) TX bytes:64923 (63.4 KiB) rhevm Link encap:Ethernet HWaddr 52:54:00:C5:0C:39 inet addr:192.168.122.98 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:fec5:c39/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1005 errors:0 dropped:0 overruns:0 frame:0 TX packets:611 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:96289 (94.0 KiB) TX bytes:357943 (349.5 KiB) virsh > net-list --all Name State Autostart Persistent -------------------------------------------------- ;vdsmdummy; active no no default inactive no yes vdsm-rhevm active yes yes Created attachment 1171164 [details]
ovirt-20160622210811-192.168.122.98-41116180.log
Created attachment 1171165 [details]
vdsm logs
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied this is an infra issue. Does `virsh -r list` work? And does `virsh list` (with shibboleth of course). (In reply to Douglas Schilling Landgraf from comment #0) > Additional info: > After some minutes I see the host status change to "Unassigned" from > "Host localhost was autorecovered" operation. This sounds much like bug 1348103 It looks like different issue. I see plenty of: Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error while sampling stats Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 529, in run File "/usr/share/vdsm/virt/sampling.py", line 519, in sample File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__ File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__ File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__ File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in __call__ File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 160, in get File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 95, in _open_qemu_connection File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied in the logs. Francesco have you seen it before? (In reply to Piotr Kliczewski from comment #7) > It looks like different issue. I see plenty of: > > Thread-12::ERROR::2016-06-23 01:08:33,639::sampling::547::vds::(run) Error > while sampling stats > Traceback (most recent call last): > File "/usr/share/vdsm/virt/sampling.py", line 529, in run > File "/usr/share/vdsm/virt/sampling.py", line 519, in sample > File "/usr/share/vdsm/virt/sampling.py", line 294, in __init__ > File "/usr/share/vdsm/virt/sampling.py", line 182, in __init__ > File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in > __call__ > File "/usr/share/vdsm/caps.py", line 283, in getNumaTopology > File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 987, in > __call__ > File "/usr/share/vdsm/caps.py", line 183, in _getCapsXMLStr > File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line > 160, in get > File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line > 95, in _open_qemu_connection > File "/usr/lib/python2.6/site-packages/vdsm/utils.py", line 1109, in retry > File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth > libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': > Permission denied > > in the logs. Francesco have you seen it before? It is the same issue Dan mentioned previously. Looks new to me, I can only tell that AFAIK the socket should look like $ ls -lh /var/run/libvirt/libvirt-sock srwxrwx---. 1 root qemu 0 Jun 23 14:31 /var/run/libvirt/libvirt-sock (In reply to Dan Kenigsberg from comment #5) > libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': > Permission denied > > this is an infra issue. > > Does `virsh -r list` work? And does `virsh list` (with shibboleth of course). # virsh -r Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh > list Id Name State ---------------------------------------------------- virsh > quit # virsh list Please enter your authentication name: vdsm@rhevh Please enter your password: Id Name State ---------------------------------------------------- [root@localhost ~]# user vdsm is not configured well. it is not in qemu group for some reason running "usermod -a -G qemu vdsm" and restart the service solves it not sure how it happens, as in vdsm 4.16.37 we do - /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user} , but maybe someone modified it before the build? The following is the code we run before installing vdsm rpm. maybe one of the command failed because the executable was not installed (getent, useradd, usermod .. ) and it leaded to skip part of the calls here %pre # Force standard locale behavior (English) export LC_ALL=C /usr/bin/getent passwd %{vdsm_user} >/dev/null || \ /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \ -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user} /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user} /usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user} # We keep the previous rpm version number in a file for managing upgrade flow # in vdsmd_init_script upgraded_version_check task if [ "$1" -gt 1 ]; then rpm -q %{vdsm_name} > "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" fi please verify if this can be the case. maybe you can check /var/log/messages after vdsm rpm installation and find which command failed to run..? (In reply to Yaniv Bronhaim from comment #14) > The following is the code we run before installing vdsm rpm. maybe one of > the command failed because the executable was not installed (getent, > useradd, usermod .. ) and it leaded to skip part of the calls here > > %pre > > # Force standard locale behavior (English) > > export LC_ALL=C > > > > /usr/bin/getent passwd %{vdsm_user} >/dev/null || \ > > /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \ > > -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user} > > /usr/sbin/usermod -a -G %{qemu_group},%{snlk_group} %{vdsm_user} > > /usr/sbin/usermod -a -G %{cdrom_group} %{qemu_user} > > > > # We keep the previous rpm version number in a file for managing upgrade > flow > # in vdsmd_init_script upgraded_version_check task > > if [ "$1" -gt 1 ]; then > > rpm -q %{vdsm_name} > > "%{_localstatedir}/lib/%{vdsm_name}/upgraded_version" > fi > > please verify if this can be the case. maybe you can check /var/log/messages > after vdsm rpm installation and find which command failed to run..? Thanks Yaniv, looks like it was missing a requirement in vdsm spec, I manually build vdsm with requirements of shadow-utils and it goes ahead in the registration. Same issue that sanlock folks faced: https://bugzilla.redhat.com/show_bug.cgi?id=1344139#c1 This looks like happened because some dep. removed the requirement of user{add,mod,del} tools and affected both packages. Patch is ready for review. So this will require a release of vdsm for 3.5? (In reply to Oved Ourfali from comment #16) > So this will require a release of vdsm for 3.5? Yes, please. So I'm not sure there is one. Eyal? Douglas - can you handle that on node level? (In reply to Oved Ourfali from comment #18) > So I'm not sure there is one. > Eyal? > > Douglas - can you handle that on node level? The best approach is VDSM rebuild as it's a requirement for vdsm installation. Additionally, VDSM needs to be rebuild anyway because of bz#1349068. (In reply to Douglas Schilling Landgraf from comment #19) > (In reply to Oved Ourfali from comment #18) > > So I'm not sure there is one. > > Eyal? > > > > Douglas - can you handle that on node level? > > The best approach is VDSM rebuild as it's a requirement for vdsm > installation. > Additionally, VDSM needs to be rebuild anyway because of bz#1349068. I see bz#1349068 is targeted to 3.6.8. Douglas - Anyway, I assign it to you, okay? Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5 then it should be there as well. (In reply to Oved Ourfali from comment #20) > (In reply to Douglas Schilling Landgraf from comment #19) > > (In reply to Oved Ourfali from comment #18) > > > So I'm not sure there is one. > > > Eyal? > > > > > > Douglas - can you handle that on node level? > > > > The best approach is VDSM rebuild as it's a requirement for vdsm > > installation. > > Additionally, VDSM needs to be rebuild anyway because of bz#1349068. > > I see bz#1349068 is targeted to 3.6.8. > Douglas - Anyway, I assign it to you, okay? > Currently targeting the bug to 3.6.8, but if there is a VDSM build for 3.5 > then it should be there as well. Both bugs are blocking rhevh build for 3.5 on top of RHEL 6.8. I don't think target is correct. Moran, could you please help? In my opinion, useradd should be "shoved" into node, without waiting for another vdsm respin - just like what I suggested for 3.6. vdsm is requiring shadow-utils as dependency: # yum list vdsm && yum deplist vdsm | grep shadow Loaded plugins: product-id, search-disabled-repos Available Packages vdsm.x86_64 4.18.11-1.el7ev 4.0.2-8 dependency: shadow-utils provider: shadow-utils.x86_64 2:4.1.5.1-18.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1671.html |