| Summary: | Activating a host fails if it has nrpe installed | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Darryl Bond <darryl.bond> |
| Component: | Core | Assignee: | Edward Haas <edwardh> |
| Status: | CLOSED WORKSFORME | QA Contact: | Pavel Stehlik <pstehlik> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | --- | CC: | bugs, danken, darryl.bond, didi |
| Target Milestone: | ovirt-4.1.0-alpha | Flags: | danken:
ovirt-4.1?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-06-14 21:42:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Attachments: | |||
Dan, can you please check why the host is failing network configuration if nrpe is installed? I guess people may be interested in monitoring network with Nagios so I'd like to avoid to remove it during the host deployment. Darryl, please attach {super,}vdsm.log from the host that you fail to install.
Created attachment 1139313 [details]
vdsm.log
Log of first failed attempt
Created attachment 1139314 [details]
supervdsm.log
I just removed and installed a host that was otherwise working fine. It had it's networks configured etc. It successfully activated with nrpe running. not necessarily related, but we see the ping tightloop solved by https://gerrit.ovirt.org/#/c/54644/ 08:59:51,535::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True jsonrpc.Executor/6::DEBUG::2016-03-17 08:59:51,538::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {} jsonrpc.Executor/6::DEBUG::2016-03-17 08:59:51,538::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True jsonrpc.Executor/7::DEBUG::2016-03-17 08:59:51,707::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getCapabilities' in bridge with {} 'enp9s0': {'addr': '10.4.14.42', 'ipv6gateway': '::', 'ipv6addrs': ['fe80::76d0:2bff:fec8:dcba/64'], 'mtu': '1500', 'dhcpv4': False, 'netmask': '255.255.0.0', 'dhcpv6': False, 'ipv4addrs': ['10.4.14.42/16'], 'cfg': {'DOMAIN': 'gps.local', 'NAME': 'enp9s0', 'DNS3': '10.4.171.3', 'DNS2': '10.4.171.2', 'DNS1': '10.4.171.1', 'DEFROUTE': 'yes', 'IPV6_AUTOCONF': 'yes', 'PREFIX': '16', 'IPV6_DEFROUTE': 'yes', 'IPV6_FAILURE_FATAL': 'no', 'IPV6_PEERROUTES': 'yes', 'IPV4_FAILURE_FATAL': 'no', 'ONBOOT': 'yes', 'IPV6_PEERDNS': 'yes', 'IPV6_PRIVACY': 'no', 'BOOTPROTO': 'none', 'DEVICE': 'enp9s0', 'UUID': 'de4f2cce-e2dd-452a-bf6a-ba48d69d4344', 'IPV6INIT': 'yes', 'IPADDR': '10.4.14.42', 'TYPE': 'Ethernet', 'GATEWAY': '10.4.254.254'}, 'hwaddr': '74:d0:2b:c8:dc:ba', 'speed': 1000, 'gateway': '10.4.254.254'} did you add the host with 10.4.14.42 addr? The vdsm and supervdsm logs seem to be from different time ranges. Please recreate the problem and send the two logs of that period. Created attachment 1139805 [details]
host h5 vdsm log
VDSM.log on failed host
Created attachment 1139806 [details]
supervdsm log on h5
supoervdsm log on failed host
Here is a freshly installed host with nrpe installed which fails in th esame way as before. Note: the networks must be configured on the host after it is installed. The first thing you notice is that ovirtmgmt network is not attached to the initial interface. When NRPE is removed, this succeeds. Created attachment 1139810 [details]
ovirt-host-deploy log from failed deploy
Created attachment 1139812 [details]
vdsm.log after successful re-install after nrpe is removed
Created attachment 1139813 [details]
supervdsm.log after successful deploy after nrpe was removed
Created attachment 1139815 [details]
engine log after successful deploy
Performed a re-install after nrpe package was removed. 1. Could not be performed until the failed network tasks were removed. The only way I know to do this is to run engine-setup on the engine which clears tasks and locks. 2. Install still fails due to the network not being configured correctly. Some of the storage relies on the additional networks having IP addresses on the host. 3. The ovirtmgmt network is attached to the default interface after nrpe is removed and the networks can be configured as normal. 4. The host is then successfully activated Further information: If the host (after the initial successful install with nrpe removed) has nrpe re-installed and started and is then is removed and re-added (via New) it will activate correctly. I assume due to the networks being set up ready to go. Nrpe somehow affects the attachment of ovirtmgmt to the default interface which then breaks the network configuration. Comment on attachment 1139810 [details]
ovirt-host-deploy log from failed deploy
Can you please provide the engile.log file from the Engine server?
var/log/ovirt-engine/engine.log
Created attachment 1141050 [details]
Engine log for failed deploy
Created attachment 1141062 [details]
Engine.log after successful deploy
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA. oVirt 4.0 beta has been released, moving to RC milestone. oVirt 4.0 beta has been released, moving to RC milestone. I could not see any useful information from Engine or VDSM, with relation to networking.
Recreating a setup with nrpe installed has not shown any problem.
I can see an error in the ovirt-host-deploy log (see below).
Didi, is this a known issue? Is it related?
2016-03-24 10:48:58 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:178 Cannot initialize minidnf
Traceback (most recent call last):
File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", line 165, in _boot
constants.PackEnv.DNF_DISABLED_PLUGINS
File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", line 75, in _getMiniDNF
from otopi import minidnf
File "/tmp/ovirt-8HvEFjNVVJ/pythonlib/otopi/minidnf.py", line 31, in <module>
import dnf
ImportError: No module named dnf
2016-03-24 10:48:58 DEBUG otopi.context context.dumpEnvironment:500 ENVIRONMENT DUMP - BEGIN
(In reply to Edward Haas from comment #22) > I could not see any useful information from Engine or VDSM, with relation to > networking. > Recreating a setup with nrpe installed has not shown any problem. > > > I can see an error in the ovirt-host-deploy log (see below). > Didi, is this a known issue? Is it related? > > 2016-03-24 10:48:58 DEBUG otopi.plugins.otopi.packagers.dnfpackager > dnfpackager._boot:178 Cannot initialize minidnf > Traceback (most recent call last): > File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", > line 165, in _boot > constants.PackEnv.DNF_DISABLED_PLUGINS > File "/tmp/ovirt-8HvEFjNVVJ/otopi-plugins/otopi/packagers/dnfpackager.py", > line 75, in _getMiniDNF > from otopi import minidnf > File "/tmp/ovirt-8HvEFjNVVJ/pythonlib/otopi/minidnf.py", line 31, in > <module> > import dnf > ImportError: No module named dnf > 2016-03-24 10:48:58 DEBUG otopi.context context.dumpEnvironment:500 > ENVIRONMENT DUMP - BEGIN No, it's unrelated and can be ignored. Normally, since we added dnf support, either it or yum will fail, depending on which is installed. If there is a real problem with the packager, it will fail later on when actually used. Daryl, do you - unlike Edy - still see the bug? Could you somehow let us log into your system to understand what is wrong there? I just re-installed one of the hosts from scratch. Still exhibits the same fault. I just noticed the journal refers to a failure to configure network vdsm-ovirtmgt. The ovirtmgt network is configured. Jun 14 13:14:58 ovirt36-h7.gps.local multipathd[20928]: dm-3: remove map (uevent) Jun 14 13:14:58 ovirt36-h7.gps.local multipathd[20928]: dm-3: remove map (uevent) Jun 14 13:15:01 ovirt36-h7.gps.local vdsm[21130]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 2 Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 parse_server_challenge() Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 ask_user_info() Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 2 Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 ask_user_info() Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 make_client_response() Jun 14 13:15:01 ovirt36-h7.gps.local python[20910]: DIGEST-MD5 client step 3 Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Started /usr/sbin/ifup enp9s0. Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Starting /usr/sbin/ifup enp9s0. Jun 14 13:15:02 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link down Jun 14 13:15:02 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_UP): enp9s0: link is not ready Jun 14 13:15:02 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link down Jun 14 13:15:02 ovirt36-h7.gps.local kernel: device enp9s0 entered promiscuous mode Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Started /usr/sbin/ifup ovirtmgmt. Jun 14 13:15:02 ovirt36-h7.gps.local systemd[1]: Starting /usr/sbin/ifup ovirtmgmt. Jun 14 13:15:02 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt: link is not ready Jun 14 13:15:04 ovirt36-h7.gps.local daemonAdapter[20910]: libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-ovirtmgmt' Jun 14 13:15:05 ovirt36-h7.gps.local kernel: r8169 0000:09:00.0 enp9s0: link up Jun 14 13:15:05 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp9s0: link becomes ready Jun 14 13:15:05 ovirt36-h7.gps.local kernel: ovirtmgmt: port 1(enp9s0) entered forwarding state Jun 14 13:15:05 ovirt36-h7.gps.local kernel: ovirtmgmt: port 1(enp9s0) entered forwarding state Jun 14 13:15:05 ovirt36-h7.gps.local kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ovirtmgmt: link becomes ready Jun 14 13:15:07 ovirt36-h7.gps.local vdsm[21130]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Jun 14 13:15:09 ovirt36-h7.gps.local sshd[18567]: pam_unix(sshd:session): session closed for user root Jun 14 13:15:09 ovirt36-h7.gps.local systemd-logind[823]: Removed session 9. Please contact me directly via email to arrange a log in if necessary. Oops, it was pointed out by one of my colleagues that the host wasn't activating because the rest of the networks were not configured, not the same as when we were running into the issues with nrpe. I configured the networks and the host activated just fine. |
Created attachment 1137533 [details] host deploy log from engine server Description of problem: A host with nrpe installed fails to activate. The error displayed is Mar 17, 2016 10:09:48 AM Host ovirt36-h6 installation failed. Failed to configure management network on the host. 4a8dcd3d oVirt Mar 17, 2016 10:09:48 AM Failed to configure management network on host ovirt36-h6 due to setup networks failure. oVirt Version-Release number of selected component (if applicable): Centos 7.2 oVirt 3.6.3.4 How reproducible: Fails every time until nrpe is removed (nrpe service stopped is not enough) Steps to Reproduce: 1. Install centos 7.2 / yum update / yum install nrpe nagios-plugins-all 2. Attempt to deploy host via engine GUI (New) 3. Actual results: Host deploy fails when configuring the management network. Expected results: Successful deploy (Subject to networks compliance on the host) Additional info: This was the first of our hosts that used ansible for initial configuration which deployed our monitoring customisations like nrpe. Previous host deploys has proceeded successfully and the ansible script was run after deploy. this host we used ansible to set the host up ready for deploy (install the ovirt release rpm etc) Noticed that deploy was restarting nrpe, wondered why it even cared. Stopped nrpe, ovirt reinstall still started it and failed. Removed nrpe, ovirt reinstall configured ovirtmgmt network correctly and succeeded. Log from GUI of Mar 17, 2016 10:11:34 AM Status of host ovirt36-h6 was set to NonOperational. oVirt Mar 17, 2016 10:11:33 AM Host ovirt36-h6 does not comply with the cluster SandyBridge networks, the following networks are missing on host: 'Admin,Control,iSCSI,MGMT,ovirtmgmt' 47125e91 oVirt Mar 17, 2016 10:09:48 AM Host ovirt36-h6 installation failed. Failed to configure management network on the host. 4a8dcd3d oVirt Mar 17, 2016 10:09:48 AM Failed to configure management network on host ovirt36-h6 due to setup networks failure. oVirt Mar 17, 2016 10:09:40 AM Installing Host ovirt36-h6. Stage: Termination. 4a8dcd3d oVirt Mar 17, 2016 10:09:39 AM Installing Host ovirt36-h6. Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20160317100939-ovirt36-h6.gps.local-4a8dcd3d.log'. 4a8dcd3d oVirt Mar 17, 2016 10:09:39 AM Installing Host ovirt36-h6. Stage: Pre-termination. 4a8dcd3d oVirt Mar 17, 2016 10:09:39 AM Installing Host ovirt36-h6. Starting ovirt-vmconsole-host-sshd. 4a8dcd3d oVirt Mar 17, 2016 10:09:38 AM Installing Host ovirt36-h6. Starting vdsm. 4a8dcd3d oVirt Mar 17, 2016 10:09:37 AM Installing Host ovirt36-h6. Stopping libvirtd. 4a8dcd3d oVirt Mar 17, 2016 10:09:36 AM Installing Host ovirt36-h6. Restarting nrpe service. 4a8dcd3d oVirt Mar 17, 2016 10:09:36 AM Installing Host ovirt36-h6. Stage: Closing up. 4a8dcd3d oVirt Mar 17, 2016 10:09:36 AM Installing Host ovirt36-h6. Stage: Transaction commit. 4a8dcd3d oVirt Mar 17, 2016 10:09:34 AM Installing Host ovirt36-h6. Enrolling serial console certificate. 4a8dcd3d oVirt Mar 17, 2016 10:09:32 AM Installing Host ovirt36-h6. Enrolling certificate. 4a8dcd3d oVirt Mar 17, 2016 10:09:32 AM Installing Host ovirt36-h6. Stage: Misc configuration. 4a8dcd3d oVirt Mar 17, 2016 10:09:30 AM Installing Host ovirt36-h6. Stage: Package installation. 4a8dcd3d oVirt Mar 17, 2016 10:09:30 AM Installing Host ovirt36-h6. Stage: Misc configuration. 4a8dcd3d oVirt Mar 17, 2016 10:09:30 AM Installing Host ovirt36-h6. Stage: Transaction setup. 4a8dcd3d oVirt Mar 17, 2016 10:09:30 AM Installing Host ovirt36-h6. Hardware supports virtualization. 4a8dcd3d oVirt Mar 17, 2016 10:09:30 AM Installing Host ovirt36-h6. Stage: Setup validation. 4a8dcd3d oVirt Mar 17, 2016 10:09:29 AM Installing Host ovirt36-h6. Disabling Kdump integration. 4a8dcd3d oVirt Mar 17, 2016 10:09:29 AM Installing Host ovirt36-h6. Logs at host located at: '/tmp/ovirt-host-deploy-20160317100921-l9fuvv.log'. 4a8dcd3d oVirt Mar 17, 2016 10:09:29 AM Installing Host ovirt36-h6. Kdump supported. 4a8dcd3d oVirt Mar 17, 2016 10:09:29 AM Installing Host ovirt36-h6. Stage: Environment customization. 4a8dcd3d oVirt Mar 17, 2016 10:09:29 AM Installing Host ovirt36-h6. Stage: Programs detection. 4a8dcd3d oVirt