Bug 1318812 - Activating a host fails if it has nrpe installed
Summary: Activating a host fails if it has nrpe installed
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Edward Haas
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-17 22:23 UTC by Darryl Bond
Modified: 2016-06-14 21:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-14 21:42:19 UTC
oVirt Team: Network
Embargoed:
danken: ovirt-4.1?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
host deploy log from engine server (26.96 KB, application/x-gzip)
2016-03-17 22:23 UTC, Darryl Bond
no flags Details
vdsm.log (863.21 KB, application/octet-stream)
2016-03-23 03:13 UTC, Darryl Bond
no flags Details
supervdsm.log (477.68 KB, application/x-xz)
2016-03-23 03:14 UTC, Darryl Bond
no flags Details
host h5 vdsm log (32.05 KB, application/x-gzip)
2016-03-24 01:03 UTC, Darryl Bond
no flags Details
supervdsm log on h5 (1.22 KB, application/x-gzip)
2016-03-24 01:04 UTC, Darryl Bond
no flags Details
ovirt-host-deploy log from failed deploy (54.36 KB, application/x-gzip)
2016-03-24 01:16 UTC, Darryl Bond
no flags Details
vdsm.log after successful re-install after nrpe is removed (49.73 KB, application/x-gzip)
2016-03-24 01:57 UTC, Darryl Bond
no flags Details
supervdsm.log after successful deploy after nrpe was removed (9.32 KB, application/x-gzip)
2016-03-24 01:58 UTC, Darryl Bond
no flags Details
engine log after successful deploy (28.44 KB, application/x-gzip)
2016-03-24 01:59 UTC, Darryl Bond
no flags Details
Engine log for failed deploy (24.70 KB, application/x-gzip)
2016-03-29 01:42 UTC, Darryl Bond
no flags Details
Engine.log after successful deploy (32.55 KB, application/x-gzip)
2016-03-29 01:47 UTC, Darryl Bond
no flags Details

Description Darryl Bond 2016-03-17 22:23:04 UTC
Created attachment 1137533 [details]
host deploy log from engine server

Description of problem: 
A host with nrpe installed fails to activate. The error displayed is 
Mar 17, 2016 10:09:48 AM
Host ovirt36-h6 installation failed. Failed to configure management network on the host.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:48 AM
Failed to configure management network on host ovirt36-h6 due to setup networks failure.
oVirt


Version-Release number of selected component (if applicable):
Centos 7.2
oVirt 3.6.3.4

How reproducible: 
Fails every time until nrpe is removed (nrpe service stopped is not enough)


Steps to Reproduce:
1. Install centos 7.2 / yum update / yum install nrpe nagios-plugins-all
2. Attempt to deploy host via engine GUI (New)
3.

Actual results:
Host deploy fails when configuring the management network.

Expected results:
Successful deploy (Subject to networks compliance on the host)


Additional info:
This was the first of our hosts that used ansible for initial configuration which deployed our monitoring customisations like nrpe. Previous host deploys has proceeded successfully and the ansible script was run after deploy. this host we used ansible to set the host up ready for deploy (install the ovirt release rpm etc)

Noticed that deploy was restarting nrpe, wondered why it even cared.
Stopped nrpe, ovirt reinstall still started it and failed.
Removed nrpe, ovirt reinstall configured ovirtmgmt network correctly and succeeded.


Log from GUI of 

Mar 17, 2016 10:11:34 AM
Status of host ovirt36-h6 was set to NonOperational.
oVirt

Mar 17, 2016 10:11:33 AM
Host ovirt36-h6 does not comply with the cluster SandyBridge networks, the following networks are missing on host: 'Admin,Control,iSCSI,MGMT,ovirtmgmt'
47125e91
oVirt

Mar 17, 2016 10:09:48 AM
Host ovirt36-h6 installation failed. Failed to configure management network on the host.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:48 AM
Failed to configure management network on host ovirt36-h6 due to setup networks failure.
oVirt

Mar 17, 2016 10:09:40 AM
Installing Host ovirt36-h6. Stage: Termination.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:39 AM
Installing Host ovirt36-h6. Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20160317100939-ovirt36-h6.gps.local-4a8dcd3d.log'.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:39 AM
Installing Host ovirt36-h6. Stage: Pre-termination.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:39 AM
Installing Host ovirt36-h6. Starting ovirt-vmconsole-host-sshd.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:38 AM
Installing Host ovirt36-h6. Starting vdsm.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:37 AM
Installing Host ovirt36-h6. Stopping libvirtd.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:36 AM
Installing Host ovirt36-h6. Restarting nrpe service.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:36 AM
Installing Host ovirt36-h6. Stage: Closing up.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:36 AM
Installing Host ovirt36-h6. Stage: Transaction commit.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:34 AM
Installing Host ovirt36-h6. Enrolling serial console certificate.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:32 AM
Installing Host ovirt36-h6. Enrolling certificate.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:32 AM
Installing Host ovirt36-h6. Stage: Misc configuration.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:30 AM
Installing Host ovirt36-h6. Stage: Package installation.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:30 AM
Installing Host ovirt36-h6. Stage: Misc configuration.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:30 AM
Installing Host ovirt36-h6. Stage: Transaction setup.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:30 AM
Installing Host ovirt36-h6. Hardware supports virtualization.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:30 AM
Installing Host ovirt36-h6. Stage: Setup validation.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:29 AM
Installing Host ovirt36-h6. Disabling Kdump integration.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:29 AM
Installing Host ovirt36-h6. Logs at host located at: '/tmp/ovirt-host-deploy-20160317100921-l9fuvv.log'.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:29 AM
Installing Host ovirt36-h6. Kdump supported.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:29 AM
Installing Host ovirt36-h6. Stage: Environment customization.
4a8dcd3d
oVirt

Mar 17, 2016 10:09:29 AM
Installing Host ovirt36-h6. Stage: Programs detection.
4a8dcd3d
oVirt

Comment 1 Sandro Bonazzola 2016-03-18 09:19:58 UTC
Dan, can you please check why the host is failing network configuration if nrpe is installed? I guess people may be interested in monitoring network with Nagios so I'd like to avoid to remove it during the host deployment.

Comment 2 Dan Kenigsberg 2016-03-20 10:27:58 UTC
Darryl, please attach {super,}vdsm.log from the host that you fail to install.

Comment 3 Darryl Bond 2016-03-23 03:13:54 UTC
Created attachment 1139313 [details]
vdsm.log

Log of first failed attempt

Comment 4 Darryl Bond 2016-03-23 03:14:31 UTC
Created attachment 1139314 [details]
supervdsm.log

Comment 5 Darryl Bond 2016-03-23 03:16:30 UTC
I just removed and installed a host that was otherwise working fine. It had it's networks configured etc. It successfully activated with nrpe running.

Comment 6 Edward Haas 2016-03-23 06:54:08 UTC
not necessarily related, but we see the ping tightloop solved by https://gerrit.ovirt.org/#/c/54644/

08:59:51,535::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True
jsonrpc.Executor/6::DEBUG::2016-03-17 08:59:51,538::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {}
jsonrpc.Executor/6::DEBUG::2016-03-17 08:59:51,538::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True
jsonrpc.Executor/7::DEBUG::2016-03-17 08:59:51,707::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getCapabilities' in bridge with {}



'enp9s0': {'addr': '10.4.14.42', 'ipv6gateway': '::', 'ipv6addrs': ['fe80::76d0:2bff:fec8:dcba/64'], 'mtu': '1500', 'dhcpv4': False, 'netmask': '255.255.0.0', 'dhcpv6': False, 'ipv4addrs': ['10.4.14.42/16'], 'cfg': {'DOMAIN': 'gps.local', 'NAME': 'enp9s0', 'DNS3': '10.4.171.3', 'DNS2': '10.4.171.2', 'DNS1': '10.4.171.1', 'DEFROUTE': 'yes', 'IPV6_AUTOCONF': 'yes', 'PREFIX': '16', 'IPV6_DEFROUTE': 'yes', 'IPV6_FAILURE_FATAL': 'no', 'IPV6_PEERROUTES': 'yes', 'IPV4_FAILURE_FATAL': 'no', 'ONBOOT': 'yes', 'IPV6_PEERDNS': 'yes', 'IPV6_PRIVACY': 'no', 'BOOTPROTO': 'none', 'DEVICE': 'enp9s0', 'UUID': 'de4f2cce-e2dd-452a-bf6a-ba48d69d4344', 'IPV6INIT': 'yes', 'IPADDR': '10.4.14.42', 'TYPE': 'Ethernet', 'GATEWAY': '10.4.254.254'}, 'hwaddr': '74:d0:2b:c8:dc:ba', 'speed': 1000, 'gateway': '10.4.254.254'}

did you add the host with 10.4.14.42 addr?

Comment 7 Edward Haas 2016-03-23 06:56:01 UTC
The vdsm and supervdsm logs seem to be from different time ranges.
Please recreate the problem and send the two logs of that period.

Comment 8 Darryl Bond 2016-03-24 01:03:34 UTC
Created attachment 1139805 [details]
host h5 vdsm log

VDSM.log on failed host

Comment 9 Darryl Bond 2016-03-24 01:04:21 UTC
Created attachment 1139806 [details]
supervdsm log  on h5

supoervdsm log on failed host

Comment 10 Darryl Bond 2016-03-24 01:06:58 UTC
Here is a freshly installed host with nrpe installed which fails in th esame way as before.

Note: the networks must be configured on the host after it is installed.
The first thing you notice is that ovirtmgmt network is not attached to the initial interface. When NRPE is removed, this succeeds.

Comment 11 Darryl Bond 2016-03-24 01:16:41 UTC
Created attachment 1139810 [details]
ovirt-host-deploy log from failed deploy

Comment 12 Darryl Bond 2016-03-24 01:57:44 UTC
Created attachment 1139812 [details]
vdsm.log after successful re-install after nrpe is removed

Comment 13 Darryl Bond 2016-03-24 01:58:22 UTC
Created attachment 1139813 [details]
supervdsm.log after successful deploy after nrpe was removed

Comment 14 Darryl Bond 2016-03-24 01:59:08 UTC
Created attachment 1139815 [details]
engine log after successful deploy

Comment 15 Darryl Bond 2016-03-24 02:11:50 UTC
Performed a re-install after nrpe package was removed.
1. Could not be performed until the failed network tasks were removed. The only way I know to do this is to run engine-setup on the engine which clears tasks and locks.

2. Install still fails due to the network not being configured correctly. Some of the storage relies on the additional networks having IP addresses on the host.

3. The ovirtmgmt network is attached to the default interface after nrpe is removed and the networks can be configured as normal.

4. The host is then successfully activated

Further information:
If the host (after the initial successful install with nrpe removed) has nrpe re-installed and started and is then is removed and re-added (via New) it will activate correctly. I assume due to the networks being set up ready to go. Nrpe somehow affects the attachment of ovirtmgmt to the default interface which then breaks the network configuration.

Comment 16 Edward Haas 2016-03-24 07:08:00 UTC
Comment on attachment 1139810 [details]
ovirt-host-deploy log from failed deploy

Can you please provide the engile.log file from the Engine server?
var/log/ovirt-engine/engine.log

Comment 17 Darryl Bond 2016-03-29 01:42:10 UTC
Created attachment 1141050 [details]
Engine log for failed deploy

Comment 18 Darryl Bond 2016-03-29 01:47:44 UTC
Created attachment 1141062 [details]
Engine.log after successful deploy

Comment 19 Sandro Bonazzola 2016-05-02 09:52:06 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 20 Yaniv Lavi 2016-05-23 13:15:10 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 21 Yaniv Lavi 2016-05-23 13:19:55 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 22 Edward Haas 2016-05-24 06:35:28 UTC
I could not see any useful information from Engine or VDSM, with relation to networking.
Recreating a setup with nrpe installed has not shown any problem.


I can see an error in the ovirt-host-deploy log (see below).
Didi, is this a known issue? Is it related?

2016-03-24 10:48:58 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:178 Cannot initialize minidnf
Traceback (most recent call last):
  File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", line 165, in _boot
    constants.PackEnv.DNF_DISABLED_PLUGINS
  File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", line 75, in _getMiniDNF
    from otopi import minidnf
  File "/tmp/ovirt-8HvEFjNVVJ/pythonlib/otopi/minidnf.py", line 31, in <module>
    import dnf
ImportError: No module named dnf
2016-03-24 10:48:58 DEBUG otopi.context context.dumpEnvironment:500 ENVIRONMENT DUMP - BEGIN

Comment 23 Yedidyah Bar David 2016-05-24 07:00:09 UTC
(In reply to Edward Haas from comment #22)
> I could not see any useful information from Engine or VDSM, with relation to
> networking.
> Recreating a setup with nrpe installed has not shown any problem.
> 
> 
> I can see an error in the ovirt-host-deploy log (see below).
> Didi, is this a known issue? Is it related?
> 
> 2016-03-24 10:48:58 DEBUG otopi.plugins.otopi.packagers.dnfpackager
> dnfpackager._boot:178 Cannot initialize minidnf
> Traceback (most recent call last):
>   File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py",
> line 165, in _boot
>     constants.PackEnv.DNF_DISABLED_PLUGINS
>   File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py",
> line 75, in _getMiniDNF
>     from otopi import minidnf
>   File "/tmp/ovirt-8HvEFjNVVJ/pythonlib/otopi/minidnf.py", line 31, in
> <module>
>     import dnf
> ImportError: No module named dnf
> 2016-03-24 10:48:58 DEBUG otopi.context context.dumpEnvironment:500
> ENVIRONMENT DUMP - BEGIN

No, it's unrelated and can be ignored.

Normally, since we added dnf support, either it or yum will fail, depending on which is installed.

If there is a real problem with the packager, it will fail later on when actually used.

Comment 24 Dan Kenigsberg 2016-06-01 10:12:26 UTC
Daryl, do you - unlike Edy - still see the bug? Could you somehow let us log into your system to understand what is wrong there?

Comment 25 Darryl Bond 2016-06-14 03:31:03 UTC
I just re-installed one of the hosts from scratch. Still exhibits the same fault.
I just noticed the journal refers to a failure to configure network vdsm-ovirtmgt. The ovirtmgt network is configured.
Jun 14 13:14:58 ovirt36-h7.gps.local multipathd[20928]: dm-3: remove map (uevent)
Jun 14 13:14:58 ovirt36-h7.gps.local multipathd[20928]: dm-3: remove map (uevent)
Jun 14 13:15:01 ovirt36-h7.gps.local vdsm[21130]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 2
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 parse_server_challenge()
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 ask_user_info()
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 2
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 ask_user_info()
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 make_client_response()
Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 3
Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Started /usr/sbin/ifup enp9s0.
Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Starting /usr/sbin/ifup enp9s0.
Jun 14 13:15:02 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link down
Jun 14 13:15:02 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_UP): enp9s0: link is not ready
Jun 14 13:15:02 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link down
Jun 14 13:15:02 ovirt36-h7.gps.local kernel: device enp9s0 entered promiscuous mode
Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Started /usr/sbin/ifup ovirtmgmt.
Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Starting /usr/sbin/ifup ovirtmgmt.
Jun 14 13:15:02 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt: link is not ready
Jun 14 13:15:04 ovirt36-h7.gps.local daemonAdapter[20910]: libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-ovirtmgmt'
Jun 14 13:15:05 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link up
Jun 14 13:15:05 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp9s0: link becomes ready
Jun 14 13:15:05 ovirt36-h7.gps.local kernel: ovirtmgmt: port 1(enp9s0) entered forwarding state
Jun 14 13:15:05 ovirt36-h7.gps.local kernel: ovirtmgmt: port 1(enp9s0) entered forwarding state
Jun 14 13:15:05 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ovirtmgmt: link becomes ready
Jun 14 13:15:07 ovirt36-h7.gps.local vdsm[21130]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
Jun 14 13:15:09 ovirt36-h7.gps.local sshd[18567]: pam_unix(sshd:session): session closed for user root
Jun 14 13:15:09 ovirt36-h7.gps.local systemd-logind[823]: Removed session 9.

Please contact me directly via email to arrange a log in if necessary.

Comment 26 Darryl Bond 2016-06-14 21:42:19 UTC
Oops, it was pointed out by one of my colleagues that the host wasn't activating because the rest of the networks were not configured, not the same as when we were running into the issues with nrpe.
I configured the networks and the host activated just fine.


Note You need to log in before you can comment on or make changes to this bug.