Bug 1007362

Summary: RHEV-H 6.5 is approved failed, network error during communication with the host.
Product: Red Hat Enterprise Virtualization Manager Reporter: Ying Cui <ycui>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: abaron, alonbl, bazulay, danken, gouyang, hateya, iheim, leiwang, lpeer, masayag, mburns, mpavlik, pstehlik, ycui, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-16 12:06:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
host deply log none

Comment 3 Alon Bar-Lev 2013-09-12 10:56:42 UTC
Please also attach host deploy log.

Comment 4 Martin Pavlik 2013-09-12 11:14:58 UTC
I believe qemu part is not related to network problem, I've reported another bug 1007375 for it.

Comment 5 Mike Burns 2013-09-12 12:07:39 UTC
This may be ovirt-node-plugin-vdsm as well.

Comment 6 Dan Kenigsberg 2013-09-14 20:42:28 UTC
vdsm.log except is missing the all-important call to setupNetwork (or any other API call). Please verify whether none exists.

If setupNetwork do show up in the log, please attach supervdsm.log as well.

I'm setting the needinfo flag for Alon's comment 3 request of ovirt-host-deploy logs.

Comment 7 Moti Asayag 2013-09-14 21:09:26 UTC
(In reply to Dan Kenigsberg from comment #6)
> vdsm.log except is missing the all-important call to setupNetwork (or any
> other API call). Please verify whether none exists.
> 
> If setupNetwork do show up in the log, please attach supervdsm.log as well.
> 
> I'm setting the needinfo flag for Alon's comment 3 request of
> ovirt-host-deploy logs.

There will be no call to setupNetworks not in engine nor in vdsm logs.
According to the engine.log the vdsm agent haven't been ready to accept any requests by the engine in the expected time after the host deploy was completed.

2013-09-12 18:01:28,954 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 10.66.65.147: Stage: Termination
2013-09-12 18:01:28,960 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 3e34977e, Call Stack: null, Custom Event ID: -1, Message: Installing Host dhcp-65-147.nay.redhat.com. Stage: Termination.
2013-09-12 18:01:29,009 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (pool-5-thread-49) [3e34977e] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
...
2013-09-12 18:02:03,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (pool-5-thread-49) [3e34977e] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2013-09-12 18:02:03,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (pool-5-thread-49) [3e34977e] Command PollVDS execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException

The engine.log contains in the same period of time also error when performing 'getVdsCaps' for other host which might indicate sort of problem on the engine side to access the hosts but this isn't unequivocal.
2013-09-12 18:02:03,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (pool-5-thread-49) [3e34977e] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2013-09-12 18:02:03,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (pool-5-thread-49) [3e34977e] Command PollVDS execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException

Comment 8 Alon Bar-Lev 2013-09-14 21:13:33 UTC
(In reply to Moti Asayag from comment #7)
> (In reply to Dan Kenigsberg from comment #6)
> > vdsm.log except is missing the all-important call to setupNetwork (or any
> > other API call). Please verify whether none exists.
> > 
> > If setupNetwork do show up in the log, please attach supervdsm.log as well.
> > 
> > I'm setting the needinfo flag for Alon's comment 3 request of
> > ovirt-host-deploy logs.
> 
> There will be no call to setupNetworks not in engine nor in vdsm logs.
> According to the engine.log the vdsm agent haven't been ready to accept any
> requests by the engine in the expected time after the host deploy was
> completed.

Yes, but we did get host-deploy complete, so for some reason vdsm is non responding.

Comment 9 Moti Asayag 2013-09-14 21:50:22 UTC
The engine.log contains multiple errors of no route to host:

2013-09-12 18:02:07,060 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.NoRouteToHostException: No route to host
2013-09-12 18:02:07,880 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-92) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out

but the log doesn't specify to which host specifically. By the frequency of the errors, it might be more than a single host is unreachable due to the same reason.

So if we start from the beginning, this is how the host was installed/approved:
 2013-09-12 18:01:00,754 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-49) [3e34977e] Correlation ID: 3e34977e, Call Stack: null, Custom Event ID: -1, Message: Installing Host dhcp-65-147.nay.redhat.com. Connected to host 10.66.65.147 with SSH key fingerprint: 22:4c:cb:54:99:10:9b:32:1d:a1:ff:dc:df:b4:a8:0e.

It was added by its ip address 10.66.65.147 and not by its resolvable name dhcp-65-147.nay.redhat.com.

and after the host-deploy finished, there is no longer route to the host.
Perhaps some more info can be revealed in the /var/log/messages. 
Ying, could you attach it in addition to the ovirt-host-deploy log ?

Comment 10 Ying Cui 2013-09-16 10:54:13 UTC
Created attachment 798222 [details]
host deply log

Comment 11 Ying Cui 2013-09-16 10:56:01 UTC
I install new RHEV-H host, and register RHEV-M again. the problem still reproduce.

attached the host deply log in comment 10.

Comment 12 Alon Bar-Lev 2013-09-16 11:06:11 UTC
I don't know the meaning of this, but:

2013-09-16 10:45:23 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:366 execute: ('/sbin/service', 'vdsmd', 'start'), executable='None', cwd='None', env=None
2013-09-16 10:45:43 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:383 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=0
2013-09-16 10:45:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:441 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
checking certs..
libvirt is already configured for vdsm
Starting iscsid...
Starting supervdsmd...
SUCCESS: ssl configured to true. No conflicts
Starting up vdsm daemon: 
[  OK  ]
vdsm start[  OK  ]

2013-09-16 10:45:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:446 execute-output: ('/sbin/service', 'vdsmd', 'start') stderr:
Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 143, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 140, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 405, in service_disable
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 353, in _runAlts
service.ServiceNotExistError: ServiceNotExistError: Tried all alternatives but failed:
ServiceNotExistError: libvirt-guests is not a SysV service
ServiceNotExistError: libvirt-guests is not an Upstart service
libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

Comment 13 Moti Asayag 2013-09-16 12:06:43 UTC

*** This bug has been marked as a duplicate of bug 1006842 ***