Bug 1064047 - Hosts non responsive and fencing does not work
Summary: Hosts non responsive and fencing does not work
Keywords:
Status: CLOSED DUPLICATE of bug 1055153
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.4.0
Assignee: Eli Mesika
QA Contact: bugs@ovirt.org
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-11 21:41 UTC by Netbulae
Modified: 2014-03-13 08:08 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-13 08:08:36 UTC
oVirt Team: ---


Attachments (Terms of Use)
screenshot of node connection status (22.31 KB, image/png)
2014-02-11 21:45 UTC, Netbulae
no flags Details
power management (38.56 KB, image/png)
2014-02-21 11:03 UTC, Netbulae
no flags Details
getVdsCaps output (7.89 KB, text/plain)
2014-03-06 14:47 UTC, Netbulae
no flags Details

Description Netbulae 2014-02-11 21:41:09 UTC
Description of problem:
Hosts are non responsive and power management/fencing does not work. When I check the hosts they appear connected and registered themselves to the engine.
Pinging from the nodes works as well as telnet/curl to and from the engine and the other node.

So I don't know if power management doesn't work because they are non responsive or they are non responsive because power management doesn't work.

Version-Release number of selected component (if applicable):
3.4.0beta2


Manual fencing works from the node:

    fence_ipmilan -A password -p ************* -a xxxx01.xxxxx.xxxx -P -l power -o status

        Getting status of IPMI:xxxx01.xxxxx.xxxx...Chassis power = On

        Done

    fence_ipmilan -A password -p ************* -a xxxx01.xxxxx.xxxx -P -l power -o reboot

        Rebooting machine @ IPMI:xxxx01.xxxxx.xxxx...Done

    
When I leave out "-P" (-P Use Lanplus to improve security of connection) it fails.

    fence_ipmilan -A password -p *********** -a xxxx01.xxxxx.xxxx -l power -o status

    Getting status of IPMI:xxxx01.xxxxx.xxxx...Chassis power = Unknown

    Failed

from the engine.log:
014-02-11 22:24:28,551 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-1) Command GetCapabilitiesVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, vds=Host[xxxx02.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:29,338 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (ajp--127.0.0.1-8702-8) [52012893] Lock Acquired to object EngineLock [exclusiveLocks= key: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 value: VDS
, sharedLocks= ]
2014-02-11 22:24:29,420 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] Running command: ActivateVdsCommand internal: false. Entities affected :  ID: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 Type: VDS
2014-02-11 22:24:29,421 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] Before acquiring lock in order to prevent monitoring for host xxxx01.xxxxx.xxxxx from data-center Default
2014-02-11 22:24:29,421 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] Lock acquired, from now a monitoring of host will be skipped for host xxxx01.xxxxx.xxxxx from data-center Default
2014-02-11 22:24:29,482 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] START, SetVdsStatusVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, status=Unassigned, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 2e0ef762
2014-02-11 22:24:29,497 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] FINISH, SetVdsStatusVDSCommand, log id: 2e0ef762
2014-02-11 22:24:29,523 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] START, ActivateVdsVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00), log id: 7ce5d852
2014-02-11 22:24:29,545 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-6-thread-11) [52012893] Command GetCapabilitiesVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, vds=Host[xxxx01.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:29,546 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-11) [52012893] Server failed to respond, vds_id = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, vds_name = xxxx01.xxxxx.xxxxx, vm_count = 0, spm_status = None, non-responsive_timeout (seconds) = 120, error = java.net.ConnectException: Connection refused
2014-02-11 22:24:29,576 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-11) [52012893] Correlation ID: null, Call Stack: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:102)
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56)
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:28)
	at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:537)
	at org.ovirt.engine.core.vdsbroker.VdsManager.refreshHost(VdsManager.java:403)
	at org.ovirt.engine.core.vdsbroker.VdsManager.activate(VdsManager.java:389)
	at org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand.executeVdsIdCommand(ActivateVdsVDSCommand.java:13)
	at org.ovirt.engine.core.vdsbroker.VdsIdVDSCommandBase.executeVDSCommand(VdsIdVDSCommandBase.java:42)
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56)
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:28)
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:399)
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33)
	at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:1971)
	at org.ovirt.engine.core.bll.ActivateVdsCommand.executeCommand(ActivateVdsCommand.java:53)
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123)
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208)
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884)
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174)
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116)
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1228)
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351)
	at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:179)
	at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:151)
	at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:160)
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
	at org.ovirt.engine.core.utils.ssl.AuthSSLProtocolSocketFactory.createSocket(AuthSSLProtocolSocketFactory.java:288)
	at org.apache.commons.httpclient.HttpConnection.open(Unknown Source)
	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(Unknown Source)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown Source)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(Unknown Source)
	at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown Source)
	at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown Source)
	at org.apache.xmlrpc.client.XmlRpcCommonsTransport.writeRequest(XmlRpcCommonsTransport.java:227)
	at org.apache.xmlrpc.client.XmlRpcStreamTransport.sendRequest(XmlRpcStreamTransport.java:151)
	at org.apache.xmlrpc.client.XmlRpcHttpTransport.sendRequest(XmlRpcHttpTransport.java:143)
	at org.apache.xmlrpc.client.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:56)
	at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:167)
	at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:137)
	at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:126)
	at org.apache.xmlrpc.client.util.ClientFactory$1.invoke(ClientFactory.java:140)
	at com.sun.proxy.$Proxy54.getVdsCapabilities(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcUtils$AsyncProxy$InternalCallable.call(XmlRpcUtils.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	... 5 more
, Custom Event ID: -1, Message: Host xxxx01.xxxxx.xxxxx is non responsive.
2014-02-11 22:24:29,609 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (org.ovirt.thread.pool-6-thread-8) [52012893] ResourceManager::vdsNotResponding entered for Host b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, 10.50.50.111
2014-02-11 22:24:29,676 INFO  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-8) [14d591cc] Lock Acquired to object EngineLock [exclusiveLocks= key: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 value: VDS_FENCE
, sharedLocks= ]
2014-02-11 22:24:29,720 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (org.ovirt.thread.pool-6-thread-11) [228b1556] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 Type: VDS
2014-02-11 22:24:29,726 ERROR [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (org.ovirt.thread.pool-6-thread-11) [228b1556] Could not find server cpu for server b08aba0a-ad0a-4960-a4ca-8b9abbd18c00:xxxx01.xxxxx.xxxxx, flags: null
2014-02-11 22:24:29,769 WARN  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (org.ovirt.thread.pool-6-thread-11) [558c5410] CanDoAction of action HandleVdsVersion failed. Reasons:VDS_CANNOT_CHECK_VERSION_HOST_NON_RESPONSIVE
2014-02-11 22:24:29,774 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-11) [558c5410] Failed to activate VDS = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 with error: java.net.ConnectException: Connection refused.
2014-02-11 22:24:29,775 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (org.ovirt.thread.pool-6-thread-11) [558c5410] FINISH, ActivateVdsVDSCommand, return: Host[xxxx01.xxxxx.xxxxx], log id: 7ce5d852
2014-02-11 22:24:29,887 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-6-thread-11) [558c5410] Activate finished. Lock released. Monitoring can run now for host xxxx01.xxxxx.xxxxx from data-center Default
2014-02-11 22:24:29,898 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-11) [558c5410] Correlation ID: 52012893, Job ID: a1ffea0f-33dd-4850-aa1a-90f9d4ca7d23, Call Stack: null, Custom Event ID: -1, Message: Host xxxx01.xxxxx.xxxxx was activated by admin.
2014-02-11 22:24:29,909 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-6-thread-11) [558c5410] Lock freed to object EngineLock [exclusiveLocks= key: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 value: VDS
, sharedLocks= ]
2014-02-11 22:24:29,954 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host xxxx02.xxxxx.xxxxx from cluster Default was chosen as a proxy to execute Restart command on Host xxxx01.xxxxx.xxxxx.
2014-02-11 22:24:29,954 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (org.ovirt.thread.pool-6-thread-8) Using Host xxxx02.xxxxx.xxxxx from cluster Default as proxy to execute Restart command on Host xxxx01.xxxxx.xxxxx
2014-02-11 22:24:29,976 INFO  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-8) Running command: VdsNotRespondingTreatmentCommand internal: true. Entities affected :  ID: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 Type: VDS
2014-02-11 22:24:30,053 INFO  [org.ovirt.engine.core.bll.StopVdsCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Running command: StopVdsCommand internal: true. Entities affected :  ID: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 Type: VDS
2014-02-11 22:24:30,073 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] START, SetVdsStatusVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, status=Reboot, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 247e5902
2014-02-11 22:24:30,087 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] FINISH, SetVdsStatusVDSCommand, log id: 247e5902
2014-02-11 22:24:30,209 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host xxxx02.xxxxx.xxxxx from cluster Default was chosen as a proxy to execute Stop command on Host xxxx01.xxxxx.xxxxx.
2014-02-11 22:24:30,210 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Using Host xxxx02.xxxxx.xxxxx from cluster Default as proxy to execute Stop command on Host xxxx01.xxxxx.xxxxx
2014-02-11 22:24:30,228 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Executing <Stop> Power Management command, Proxy Host:xxxx02.xxxxx.xxxxx, Agent:ipmilan, Target Host:xxxx01.xxxxx.xxxxx, Management IP:xxxx01ipmi.xxxxx.xxxxx, User:power, Options:lanplus=yes
2014-02-11 22:24:30,250 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] START, FenceVdsVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, targetVdsId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, action = Stop, ip = xxxx01ipmi.xxxxx.xxxxx, port = , type = ipmilan, user = power, password = ******, options = 'lanplus=yes'), log id: 9216d3c
2014-02-11 22:24:30,269 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Command FenceVdsVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, targetVdsId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, action = Stop, ip = xxxx01ipmi.xxxxx.xxxxx, port = , type = ipmilan, user = power, password = ******, options = 'lanplus=yes') execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:30,270 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] FINISH, FenceVdsVDSCommand, log id: 9216d3c
2014-02-11 22:24:30,270 ERROR [org.ovirt.engine.core.bll.StopVdsCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Failed to stop VDS using Primary Power Management agent
2014-02-11 22:24:30,290 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host xxxx01.xxxxx.xxxxx became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted"
2014-02-11 22:24:30,311 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to power fence host xxxx01.xxxxx.xxxxx. Please check the host status and it's power management settings, and then manually reboot it and click "Confirm Host Has Been Rebooted"
2014-02-11 22:24:30,312 ERROR [org.ovirt.engine.core.bll.StopVdsCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Command org.ovirt.engine.core.bll.StopVdsCommand throw Vdc Bll exception. With error message VdcBLLException: VDS_FENCE_OPERATION_FAILED (Failed with error VDS_FENCE_OPERATION_FAILED and code 5021)
2014-02-11 22:24:30,331 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Correlation ID: 4636cc31, Job ID: e60365f1-1fcb-49a8-8402-2e9d4c6e15b6, Call Stack: null, Custom Event ID: -1, Message: Failed to stop Host xxxx01.xxxxx.xxxxx, (User: Engine).
2014-02-11 22:24:30,342 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Restart host action failed, updating host b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 to NonResponsive
2014-02-11 22:24:30,394 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] START, SetVdsStatusVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, status=NonResponsive, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 25d0c0dc
2014-02-11 22:24:30,409 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] FINISH, SetVdsStatusVDSCommand, log id: 25d0c0dc
2014-02-11 22:24:30,420 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Correlation ID: 14d591cc, Job ID: 0cdb31d5-bb77-42d1-8764-9bb473b5ced3, Call Stack: null, Custom Event ID: -1, Message: Host xxxx01.xxxxx.xxxxx is rebooting.
2014-02-11 22:24:30,431 INFO  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-6-thread-8) [4636cc31] Lock freed to object EngineLock [exclusiveLocks= key: b08aba0a-ad0a-4960-a4ca-8b9abbd18c00 value: VDS_FENCE
, sharedLocks= ]
2014-02-11 22:24:31,862 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-61) Command GetCapabilitiesVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, vds=Host[xxxx02.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:33,415 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-71) Command GetCapabilitiesVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, vds=Host[xxxx01.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:34,964 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-60) Command GetCapabilitiesVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, vds=Host[xxxx02.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:36,770 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-100) Command GetCapabilitiesVDSCommand(HostName = xxxx01.xxxxx.xxxxx, HostId = b08aba0a-ad0a-4960-a4ca-8b9abbd18c00, vds=Host[xxxx01.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-02-11 22:24:38,085 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-91) Command GetCapabilitiesVDSCommand(HostName = xxxx02.xxxxx.xxxxx, HostId = 9067a99f-992b-41cf-89b5-fbf385b5473d, vds=Host[xxxx02.xxxxx.xxxxx]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused

Comment 1 Netbulae 2014-02-11 21:45:53 UTC
Created attachment 862013 [details]
screenshot of node connection status

Comment 2 Itamar Heim 2014-02-16 08:23:59 UTC
Setting target release to current version for consideration and review. please
do not push non-RFE bugs to an undefined target release to make sure bugs are
reviewed for relevancy, fix, closure, etc.

Comment 3 Eli Mesika 2014-02-20 14:31:18 UTC
Can you please explain/show how did you configured the PM agent in the UI?

Few comments :

1) you have to make sure that you have an additional host that can server as a proxy for fencing operations in your Cluster/DC
2) You can add/set the lanplus in the options field
3) What is happened after you do 1) and 2) and click the TEST button for the host that has PM configured ?

Comment 4 Netbulae 2014-02-20 14:52:24 UTC
1) All the nodes are in an unresponding state so I don't know if they will be able to act as proxy.
2)I have "lanplus=yes¨ specified in the options field but it doesn't appear to help
3)In beta2 there were no options and now with the premature beta3 upgrade it at least gives me cluster and dc. The error I get is:

Test Failed, VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused (Failed with error VDS_NETWORK_ERROR and code 5022)

It could be that something got broken during all the beta upgrades so I will do a clean redeployment of the entire system with the latest beta3 tomorrow.

If I can reproduce, I will sniff the traffic to see what happens.

Comment 5 Eli Mesika 2014-02-20 14:59:26 UTC
(In reply to Netbulae from comment #4)
> 1) All the nodes are in an unresponding state so I don't know if they will
> be able to act as proxy.

This points to a deeper problem in your env , from the log I can see that there are lot of network errors 

> 2)I have "lanplus=yes¨ specified in the options field but it doesn't appear
> to help

BTW key=value is for non boolean keys , you should but only "lanplus" in the options field

> 3)In beta2 there were no options and now with the premature beta3 upgrade it
> at least gives me cluster and dc. The error I get is:
> 
> Test Failed, VdcBLLException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> java.net.ConnectException: Connection refused (Failed with error
> VDS_NETWORK_ERROR and code 5022)
> 
> It could be that something got broken during all the beta upgrades so I will
> do a clean redeployment of the entire system with the latest beta3 tomorrow.

Good idea 

> 
> If I can reproduce, I will sniff the traffic to see what happens.

Please keep us updated on that , Thanks

Comment 6 Netbulae 2014-02-21 11:03:28 UTC
Created attachment 865921 [details]
power management

I redeployed everything from scratch with beta3, but still the same issue

Comment 7 Netbulae 2014-02-21 11:04:38 UTC
I do have some problems installing the node:

2014-02-21 11:28:08,619 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (ajp--127.0.0.1-8702-5) [7c3ea78a] Initialize vdsBroker (xx.xx.xx.xx1,54,321)
2014-02-21 11:28:08,752 INFO  [org.ovirt.engine.core.vdsbroker.ResourceManager] (ajp--127.0.0.1-8702-5) [7c3ea78a] VDS bec0ed78-d8c0-453c-a69e-eb4d337d8cda was added to the Resource Manager
2014-02-21 11:28:08,752 INFO  [org.ovirt.engine.core.vdsbroker.AddVdsVDSCommand] (ajp--127.0.0.1-8702-5) [7c3ea78a] FINISH, AddVdsVDSCommand, log id: 78a939c0
2014-02-21 11:28:08,798 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-5) [7c3ea78a] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to verify Power Management configuration for Host node1.
2014-02-21 11:28:09,279 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-5) [7c3ea78a] Correlation ID: 73d3c257, Call Stack: null, Custom Event ID: -1, Message: Host node1 was added by Non interactive user.
2014-02-21 11:28:09,279 INFO  [org.ovirt.engine.core.bll.RegisterVdsQuery] (ajp--127.0.0.1-8702-5) [7c3ea78a] RegisterVdsQuery::Register - Registered a new VDS pending approval - Name: node1, Hostname: xx.xx.xx.xx1, UniqueID: 54443858-4E54-2500-904B-0025904B9C22
2014-02-21 11:28:09,289 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-5) [7c3ea78a] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host node1 registered.
2014-02-21 11:28:25,249 INFO  [org.ovirt.engine.core.services.RegisterServlet] (ajp--127.0.0.1-8702-1) Registration request: source='xx.xx.xx.xx1', secured='true', address='xx.xx.xx.xx1', vdsPort=54321, name='node1', uniqueId='54443858-4E54-2500-904B-0025904B9C22_00:25:90:01:d1:74'
2014-02-21 11:28:25,268 ERROR [org.ovirt.engine.core.bll.RegisterVdsQuery] (ajp--127.0.0.1-8702-1) Query execution failed due to invalid inputs. VDS_STATUS_NOT_VALID_FOR_UPDATE
2014-02-21 11:28:44,316 ERROR [org.ovirt.engine.core.bll.GetHostArchitectureQuery] (ajp--127.0.0.1-8702-8) Query GetHostArchitectureQuery failed. Exception message is null : java.lang.NullPointerException: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.GetHostArchitectureQuery.executeQueryCommand(GetHostArchitectureQuery.java:21) [bll.jar:]
	at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:66) [bll.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:28) [dal.jar:]
	at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:480) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:457) [bll.jar:]
	at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) [:1.7.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
	at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
	at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final]
	at org.ovirt.engine.core.bll.interceptors.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:13) [bll.jar:]
	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) [:1.7.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
	at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
	at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:211) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:363) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:194) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final]
	at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
	at org.ovirt.engine.core.common.interfaces.BackendLocal$$$view9.runQuery(Unknown Source) [common.jar:]
	at org.ovirt.engine.ui.frontend.server.gwt.GenericApiGWTServiceImpl.RunQuery(GenericApiGWTServiceImpl.java:59)
	at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) [:1.7.0_51]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
	at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
	at com.google.gwt.rpc.server.RPC.invokeAndStreamResponse(RPC.java:196)
	at com.google.gwt.rpc.server.RpcServlet.processCall(RpcServlet.java:172)
	at com.google.gwt.rpc.server.RpcServlet.processPost(RpcServlet.java:233)
	at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:754) [jboss-servlet-3.0-api.jar:1.0.1.Final]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-3.0-api.jar:1.0.1.Final]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.ovirt.engine.ui.frontend.server.gwt.GwtCachingFilter.doFilter(GwtCachingFilter.java:132)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.ovirt.engine.core.branding.BrandingFilter.doFilter(BrandingFilter.java:72)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.ovirt.engine.core.utils.servlet.LocaleFilter.doFilter(LocaleFilter.java:64) [utils.jar:]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:58) [bll.jar:]
	at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:49) [bll.jar:]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.ovirt.engine.core.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:80) [common.jar:]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:489)
	at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.jboss.web.rewrite.RewriteValve.invoke(RewriteValve.java:466)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
	at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:505)
	at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:445)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
	at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]

2014-02-21 11:29:13,760 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (ajp--127.0.0.1-8702-6) Unable to get value of property: vdsName for class org.ovirt.engine.core.common.businessentities.VdsStatic
2014-02-21 11:29:13,760 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (ajp--127.0.0.1-8702-6) Unable to get value of property: vdsName for class org.ovirt.engine.core.common.businessentities.VdsStatic
2014-02-21 11:29:13,761 INFO  [org.ovirt.engine.core.bll.UpdateVdsCommand] (ajp--127.0.0.1-8702-6) [1f1fa082] Running command: UpdateVdsCommand internal: false. Entities affected :  ID: bec0ed78-d8c0-453c-a69e-eb4d337d8cda Type: VDS
2014-02-21 11:29:14,101 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (ajp--127.0.0.1-8702-6) Unable to get value of property: vdsName for class org.ovirt.engine.core.common.businessentities.VdsStatic
2014-02-21 11:29:14,101 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (ajp--127.0.0.1-8702-6) Unable to get value of property: vdsName for class org.ovirt.engine.core.common.businessentities.VdsStatic
2014-02-21 11:29:14,499 ERROR [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-6) [1f1fa082] Failed to run Power Management command on Host node1, no running proxy Host was found.
2014-02-21 11:29:14,845 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-6) [1f1fa082] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Power Management test failed for Host node1.There is no other host in the data center that can be used to test the power management settings.
2014-02-21 11:29:15,012 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-6) [1f1fa082] Correlation ID: 1f1fa082, Call Stack: null, Custom Event ID: -1, Message: Host node1 configuration was updated by admin.
2014-02-21 11:29:15,390 INFO  [org.ovirt.engine.core.bll.ApproveVdsCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Running command: ApproveVdsCommand internal: false. Entities affected :  ID: bec0ed78-d8c0-453c-a69e-eb4d337d8cda Type: VDS
2014-02-21 11:29:15,423 INFO  [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Before Installation host bec0ed78-d8c0-453c-a69e-eb4d337d8cda, node1
2014-02-21 11:29:15,485 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] START, SetVdsStatusVDSCommand(HostName = node1, HostId = bec0ed78-d8c0-453c-a69e-eb4d337d8cda, status=Installing, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 1fa6d77b
2014-02-21 11:29:15,518 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] FINISH, SetVdsStatusVDSCommand, log id: 1fa6d77b
2014-02-21 11:29:15,946 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Installation xx.xx.xx.xx1: Connected to host xx.xx.xx.xx1 with SSH key fingerprint: e5:a4:07:da:3a:42:62:52:87:92:50:26:ea:2a:66:33
2014-02-21 11:29:15,979 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Correlation ID: 54b06ade, Call Stack: null, Custom Event ID: -1, Message: Installing Host node1. Connected to host xx.xx.xx.xx1 with SSH key fingerprint: e5:a4:07:da:3a:42:62:52:87:92:50:26:ea:2a:66:33.
2014-02-21 11:29:16,005 INFO  [org.ovirt.engine.core.bll.VdsDeploy] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Installation of xx.xx.xx.xx1. Executing command via SSH umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/setup DIALOG/dialect=str:machine DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-host-deploy.tar
2014-02-21 11:29:16,006 INFO  [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (org.ovirt.thread.pool-6-thread-4) Tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar' is missing, creating
2014-02-21 11:29:16,008 ERROR [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (org.ovirt.thread.pool-6-thread-4) Cannot create tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar': java.io.FileNotFoundException: File or directory /usr/share/ovirt-host-deploy/interface-3 not found
	at org.ovirt.engine.core.utils.archivers.tar.Tar.doTar(Tar.java:97) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.create(CachedTar.java:41) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.ensure(CachedTar.java:105) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.getFile(CachedTar.java:151) [utils.jar:]
	at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:1045) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:225) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:140) [bll.jar:]
	at org.ovirt.engine.core.bll.ApproveVdsCommand.executeCommand(ApproveVdsCommand.java:45) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1228) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:189) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:156) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:165) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]

2014-02-21 11:29:16,013 ERROR [org.ovirt.engine.core.bll.VdsDeploy] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Error during host xx.xx.xx.xx1 install: java.io.IOException: Cannot create tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar'
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.create(CachedTar.java:46) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.ensure(CachedTar.java:105) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.getFile(CachedTar.java:151) [utils.jar:]
	at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:1045) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:225) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:140) [bll.jar:]
	at org.ovirt.engine.core.bll.ApproveVdsCommand.executeCommand(ApproveVdsCommand.java:45) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1228) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:189) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:156) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:165) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]
Caused by: java.io.FileNotFoundException: File or directory /usr/share/ovirt-host-deploy/interface-3 not found
	at org.ovirt.engine.core.utils.archivers.tar.Tar.doTar(Tar.java:97) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.create(CachedTar.java:41) [utils.jar:]
	... 22 more

2014-02-21 11:29:16,023 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Host installation failed for host bec0ed78-d8c0-453c-a69e-eb4d337d8cda, node1.: java.io.IOException: Cannot create tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar'
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.create(CachedTar.java:46) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.ensure(CachedTar.java:105) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.getFile(CachedTar.java:151) [utils.jar:]
	at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:1045) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:225) [bll.jar:]
	at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:140) [bll.jar:]
	at org.ovirt.engine.core.bll.ApproveVdsCommand.executeCommand(ApproveVdsCommand.java:45) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1228) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:189) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:156) [bll.jar:]
	at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:165) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]
Caused by: java.io.FileNotFoundException: File or directory /usr/share/ovirt-host-deploy/interface-3 not found
	at org.ovirt.engine.core.utils.archivers.tar.Tar.doTar(Tar.java:97) [utils.jar:]
	at org.ovirt.engine.core.utils.archivers.tar.CachedTar.create(CachedTar.java:41) [utils.jar:]
	... 22 more

2014-02-21 11:29:16,071 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] START, SetVdsStatusVDSCommand(HostName = node1, HostId = bec0ed78-d8c0-453c-a69e-eb4d337d8cda, status=InstallFailed, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 31165e3e
2014-02-21 11:29:16,112 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-4) [54b06ade] FINISH, SetVdsStatusVDSCommand, log id: 31165e3e
2014-02-21 11:29:16,134 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-4) [54b06ade] Correlation ID: 54b06ade, Job ID: 4a971c3d-4a79-4bde-98bb-ca4ee29d1d20, Call Stack: null, Custom Event ID: -1, Message: Host node1 installation failed. Cannot create tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar'.
2014-02-21 11:29:17,954 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-67) Initializing Host: node1
2014-02-21 11:33:11,802 INFO  [org.ovirt.engine.core.services.RegisterServlet] (ajp--127.0.0.1-8702-6) Registration request: source='xx.xx.xx.xx1', secured='true', address='xx.xx.xx.xx1', vdsPort=54321, name='node1', uniqueId='54443858-4E54-2500-904B-0025904B9C22_00:25:90:01:d1:74'
2014-02-21 11:33:11,842 ERROR [org.ovirt.engine.core.bll.RegisterVdsQuery] (ajp--127.0.0.1-8702-6) Query execution failed due to invalid inputs. VDS_STATUS_NOT_VALID_FOR_UPDATE

Comment 8 Eli Mesika 2014-02-23 21:20:36 UTC
Vitor, see comment 7 , this seems to be a result of NPE on 


GetHostArchitectureQuery.java: 
---snip---
ServerCpu sc =
                CpuFlagsManagerHandler.FindMaxServerCpuByFlags(host.getCpuFlags(),
                        host.getVdsGroupCompatibilityVersion());

        getQueryReturnValue().setReturnValue(sc.getArchitecture());

---snip---

It seems that sc is NULL ...

What can cause that ???

Comment 9 Vitor de Lima 2014-02-24 11:30:21 UTC
This can happen if the engine cannot retrieve the host cpu flags (which seems to be the case) or if the CPU is unsupported (I guess this is impossible in this context).

Comment 10 Netbulae 2014-02-24 12:41:38 UTC
cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping	: 2
cpu MHz		: 2400.045
cache size	: 12288 KB
physical id	: 1
siblings	: 8
core id		: 10
cpu cores	: 4
apicid		: 53
initial apicid	: 53
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4801.95
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Comment 11 Eli Mesika 2014-02-24 12:52:06 UTC
Vitor, please look at info given by the BZ reporter on comment 10 and advise how to check/resolve 

Thanks

Comment 12 Vitor de Lima 2014-02-24 13:50:43 UTC
What is the output of:

vdsClient -s 1.2.3.4 getVdsCaps

(Replacing 1.2.3.4 by the ip address of the non-responsive host and calling it from the machine with the ovirt-engine installed first and then from the host itself later)

Comment 13 Netbulae 2014-02-24 16:06:05 UTC
Sorry, I got frustrated and wiped the nodes to install centos 6 with VDSM on it.

I only have limited time to test things between my other projects :-(

Now I have working nodes but can't test with vdsClient properly anymore. 

So terribly sorry I cannot debug anymore and thanks for all your help.

I will try testing another node image when released (RC?) but for now I closed the bug. I need to get on with testing other parts.

Comment 14 Netbulae 2014-03-06 14:46:45 UTC
Hi,

I´m having the same non-responsive issues again with 3.4.0 RC.

When I install the nodes, login with admin, register the node and reboot right after registering. I see the node registering in the management console and the nodes become "Up". When I reboot again after that however, the nodes become unresponsive and I cannot get them Up again. It doesn't matter if I use power management to reboot or do it manually from the admin console/root.

When I try vdsClient -s node1 getVdsCaps from the engine (after installing vdsm-cli), I get:
"Exception: No permission to read file: /etc/pki/vdsm/keys/vdsmkey.pem"

I attached the output when I try the vdsClient from node2.

When I try the same from node1 to node2, I get:

Connection to node2:54321 refused


from engine.log:
2014-03-06 15:31:40,274 INFO  [org.ovirt.engine.core.bll.ClearNonResponsiveVdsVmsCommand] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] Running command: ClearNonResponsiveVdsVmsCommand internal: true. Entities affected :  ID: 79f25466-7989-4bb1-9f8b-9b8657a64b7b Type: VDS
2014-03-06 15:31:40,292 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVdsVMsClearedVDSCommand] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] START, UpdateVdsVMsClearedVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b), log id: 6eb1c2f4
2014-03-06 15:31:40,293 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVdsVMsClearedVDSCommand] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] FINISH, UpdateVdsVMsClearedVDSCommand, log id: 6eb1c2f4
2014-03-06 15:31:40,692 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] Correlation ID: 2e8ff470, Call Stack: null, Custom Event ID: -1, Message: All VMs' status on Non Responsive Host node2 were changed to 'Down' by admin
2014-03-06 15:31:40,737 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] Correlation ID: 304920df, Call Stack: null, Custom Event ID: -1, Message: Manual fence for host node2 was started.
2014-03-06 15:31:40,737 INFO  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] (org.ovirt.thread.pool-6-thread-4) [2e8ff470] Lock freed to object EngineLock [exclusiveLocks= key: 79f25466-7989-4bb1-9f8b-9b8657a64b7b value: VDS_FENCE
, sharedLocks= ]
2014-03-06 15:31:41,767 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-25) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:44,825 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-32) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:47,902 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-26) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
[root@manage01en01 ovirt-engine]# tail engine.log 
2014-03-06 15:31:41,767 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-25) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:44,825 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-32) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:47,902 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-26) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:50,990 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-39) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:54,058 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-42) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:31:57,114 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-50) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:32:00,179 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-68) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:32:03,235 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-75) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:32:06,503 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-86) Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-03-06 15:32:09,869 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-80) [21b5b0a2] Command GetCapabilitiesVDSCommand(HostName = node2, HostId = 79f25466-7989-4bb1-9f8b-9b8657a64b7b, vds=Host[node2]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused

Comment 15 Netbulae 2014-03-06 14:47:24 UTC
Created attachment 871499 [details]
getVdsCaps output

Comment 16 Eli Mesika 2014-03-09 09:29:25 UTC
please attach full engine.log and VDSM log

Comment 17 Netbulae 2014-03-11 13:10:44 UTC
I redeployed a full setup because I've done a lot of messing about. I now did a wipe and reinstall of the nodes and the engine. I then logged in to the admin on the nodes, enabled dhcp and registered the management server.

This time I didn't get the nodes connected even before reboot. 

In /var/log/messages, I see these errrors:

Mar 11 12:43:33 localhost libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: No such file or directory

Mar 11 12:43:48 localhost python: error in accessing vdsm log file
Mar 11 12:43:48 localhost respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Mar 11 12:43:48 localhost python: error in accessing vdsm log file
Mar 11 12:43:48 localhost respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Mar 11 12:43:48 localhost python: error in accessing vdsm log file
Mar 11 12:43:48 localhost respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
<<<<snip>>>>
Mar 11 12:44:19 localhost python: error in accessing vdsm log file
Mar 11 12:44:19 localhost respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I see the vdsm.log is owned by root and 0 bytes, shouldn't this be vdsm ?

[root@node1 vdsm]# ls -lart
total 24
drwxr-xr-x.  2 vdsm kvm  4096 2014-03-11 09:45 backup
drwxr-xr-x. 13 root root 4096 2014-03-11 12:43 ..
-rw-r--r--.  1 root root 5464 2014-03-11 12:43 supervdsm.log
-rw-r--r--.  1 root root    0 2014-03-11 12:43 vdsm.log
-rw-r--r--.  1 root root    0 2014-03-11 12:43 metadata.log
drwxr-xr-x.  3 vdsm kvm  4096 2014-03-11 12:43 .
-rw-r--r--.  1 vdsm kvm   352 2014-03-11 12:43 upgrade.log

It appears vdsm is not running at all:

[root@node1 vdsm]# /etc/init.d/vdsmd status
VDS daemon is not running, and its watchdog is running

When I chown /var/log/vdsm/* to user vdsm:kvm, I get vdsm started but throwing this errror:

Mar 11 13:04:57 localhost vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.
Mar 11 13:05:17 localhost vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.


But the nodes are now up. So it appears to be a simple permission problem.

Comment 18 Yaniv Bronhaim 2014-03-12 13:29:09 UTC
the json warning is not related, ignore it for now. the bug is that the log becomes owned by root. Douglas, is it refer to bug 1055153 ?

Comment 19 Douglas Schilling Landgraf 2014-03-12 15:15:42 UTC
(In reply to Yaniv Bronhaim from comment #18)
> the json warning is not related, ignore it for now. the bug is that the log
> becomes owned by root. Douglas, is it refer to bug 1055153 ?

Hello Yaniv, BZ#1055153 should resolves this bug on ovit 3.4 beta3.

Comment 20 Netbulae 2014-03-12 15:48:02 UTC
I think we need a new node image from fabian then, I use the latest available ovirt-node-iso-3.0.4-1.0.201401291204.vdsm34.el6.iso and it's dated 11-Feb-2014... before the bugfix. 

There will probably be some more fixes that didn't make this node image.

Comment 21 Yaniv Bronhaim 2014-03-12 20:25:35 UTC
may i close as duplicated?

Comment 22 Netbulae 2014-03-13 07:44:38 UTC
Yes you may close it. I'd still like a new node image to test with though....

Thanks for the help.

Comment 23 Eli Mesika 2014-03-13 08:08:36 UTC

*** This bug has been marked as a duplicate of bug 1055153 ***


Note You need to log in before you can comment on or make changes to this bug.