Created attachment 1182277 [details] setup_and_vdsm_logs Description of problem: If the system hostname is changed with hostnamectl, Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.0.1-1 vdsm-4.18.6-1 How reproducible: 100% Steps to Reproduce: 1. Install a system 2. hostnamectl set-hostname foobar && hosted-engine --deploy Actual results: ovirt-hosted-engine-setup times out trying to connect to vdsm Expected results: This works Additional info:
We will not be fixing this type of issue. You can do a lot of ugly things during setup like change networks, storage, host name and much more. We will not support this. You can either change name before or after the setup.
(In reply to Yaniv Dary from comment #3) > We will not be fixing this type of issue. You can do a lot of ugly things > during setup like change networks, storage, host name and much more. We will > not support this. You can either change name before or after the setup. Yaniv, this is a very basic flow and probably expected to be done by every user. he logs in to cockpit, changes the hostname and going to install hosted-engine. happened to me last night with beta2 bits. I think it needs an immediate care to understand the scope of it (it wasn't reproduced on virt-qe env)
(In reply to Yaniv Dary from comment #3) > We will not be fixing this type of issue. You can do a lot of ugly things > during setup like change networks, storage, host name and much more. We will > not support this. You can either change name before or after the setup. Note that this is about changing the hostname BEFORE setup, with hostnamectl.
OK, in bash: hostnamectl set-hostname foobar && hosted-engine --deploy means to run at the same time.
Sandro, can you have a look?
I think that this could be somehow related to: http://lists.ovirt.org/pipermail/users/2016-June/040578.html an we have other complains about similar issues. On hosted-engine-setup we are using something like: requestQueues = vdsmconfig.get('addresses', 'request_queues') requestQueue = requestQueues.split(",")[0] cli = jsonrpcvdscli.connect( requestQueue=requestQueue, ) Under lib/vdsm/jsonrpcvdscli.py in vdsm we have: def _create(requestQueue, host=None, port=None, useSSL=None, responseQueue=None): if host is None: host = socket.gethostname() if port is None: port = int(config.getint('addresses', 'management_port')) so, since we are not passing any value for host parameter, the behavior simply depends on the result of socket.gethostname()
While from VDSM logs we see: MainThread::INFO::2016-07-20 18:32:57,812::protocoldetector::179::vds.MultiProtocolAcceptor::(__init__) Listening at :::54321 so it seams related to ipv4/ipv6 topic
(In reply to Yaniv Dary from comment #6) > OK, in bash: > hostnamectl set-hostname foobar && hosted-engine --deploy > means to run at the same time. Actually, '&' means run in the background. '&&' is the 'and' operator, which means only run the second command if the first succeeds.
I experienced a similar issue when setting up one of my hosts on 3.6. I set the hostname with nmtui before disabling NetworkManager and the vdsm would not start during hosted-engine --deploy. I can't say for certain how I got it working (sorry) as I tried many things. I did open a second session to the server and try starting (or restarting) the vdsmd service before hosted-engine --deploy timed out. Now I am on 4.0.1 and having errors. It has been suggested that the errors I am having are related to this one.
Dan, Oved, can you please have somebody taking a look? It seams white relevant since we already have different complains from upstream users.
Simone, Ryan, can you please check if the myhostname module is used in the hosts line in /etc/nsswitch.conf. I suspect that this is bug 1329943.
I don't think it is the same. The things that are failing in https://bugzilla.redhat.com/show_bug.cgi?id=1329943 all seem to work fine for me: ``` $ hostname -f cultivar3.grove.silverorange.com $ cat /etc/nsswitch.conf | grep host #hosts: db files nisplus nis dns hosts: files dns myhostname $ cat /etc/hostname cultivar3.grove.silverorange.com $ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ### oVirt role entries ### 192.168.0.203 cultivar.grove.silverorange.com cultivar 192.168.0.204 cultivar0.grove.silverorange.com cultivar0 192.168.0.205 cultivar1.grove.silverorange.com cultivar1 192.168.0.206 cultivar2.grove.silverorange.com cultivar2 192.168.0.207 cultivar3.grove.silverorange.com cultivar3 $ ping cultivar3.grove.silverorange.com PING cultivar3.grove.silverorange.com (192.168.0.207) 56(84) bytes of data. 64 bytes from cultivar3.grove.silverorange.com (192.168.0.207): icmp_seq=1 ttl=64 time=0.028 ms ... ``` (In reply to Fabian Deutsch from comment #13) > Simone, Ryan, can you please check if the myhostname module is used in the > hosts line in /etc/nsswitch.conf. > > I suspect that this is bug 1329943.
Can you share /var/log/messages log as well? I can't see if vdsm failed to start, by vdsm.log it looks like vdsm is up. Can you check if vdsClient getVdsCaps works ? looks like the requests from HE don't get to vdsm
AFAIK Yaniv, VDSM is up but HE could not connect it using jsonrpcvdscli with the default values.
As Simone pointed in comment #8 we use host socket.gethostname() if the host name was not provided. We assume that we want to connect to localhost. Whatever value is provided by this call we use it to connect to vdsm. There are 2 ways to overcome the issue: 1. Make sure that socket.gethostname() returns pingable hostname 2. Update the code to provide proper hostname
So in general the name resolution should properly work also to connect to vdsm on the local host. Under this assumption, rhbz#1160423 becomes more relevant.
See also rhbz#1350883 since now vdsm by default binds on :: and so we can experiment a connection issue also if ipv6 is disabled on the host.
The description of the bug is not reasonable: 1. Install a system 2. hostnamectl set-hostname foobar && hosted-engine --deploy The new hostname must be resolvable of course. As mentioned in comment #14 the issue is probably related to some other configuration and not the connectivity to vdsm. Please update if that's the case - from my check, after setting hostname to something that defined to localhost in /etc/hosts , jsonrpcvdscli works well.
(In reply to Yaniv Bronhaim from comment #20) > The description of the bug is not reasonable: > 1. Install a system > 2. hostnamectl set-hostname foobar && hosted-engine --deploy > > The new hostname must be resolvable of course. As mentioned in comment #14 > the issue is probably related to some other configuration and not the > connectivity to vdsm. Please update if that's the case - from my check, > after setting hostname to something that defined to localhost in /etc/hosts > , jsonrpcvdscli works well. this is true (just validated it), nevertheless it's a bad user experience. is it a hostnamectl bug from your perspective?
(In reply to Moran Goldboim from comment #21) > this is true (just validated it), nevertheless it's a bad user experience. > is it a hostnamectl bug from your perspective? Now we just found another reproducer, we have the patch for 1350883, we didn't change the hostname with hostnamectl, the hostname is resolvable but the issue is still here.
(In reply to Simone Tiraboschi from comment #22) > (In reply to Moran Goldboim from comment #21) > > this is true (just validated it), nevertheless it's a bad user experience. > > is it a hostnamectl bug from your perspective? > > Now we just found another reproducer, we have the patch for 1350883, we > didn't change the hostname with hostnamectl, the hostname is resolvable but > the issue is still here. What issue exactly? We started with the hostname unresolved and mentioned the IPv6 issue. Please include the logs with the problem.
(In reply to Edward Haas from comment #25) > What issue exactly? > We started with the hostname unresolved and mentioned the IPv6 issue. The IPv6 issue has been correctly addressed here: https://gerrit.ovirt.org/#/c/61363/ hosted-engine-setup uses jsonrpcvdscli to connect VDSM in loopback on the same host. It just use the default address that should end in socket.gethostname() The issue is what happens if it's not correctly/easily resolvable. In the reproducer we saw yesterday, due to other reasons, we missed the default route so it wasn't able to reach the DNS. The hostname was correctly configured under /etc/hostname but there wasn't any specific entry under /etc/hosts ping <hostname> was working, but hosted-engine-setup was still not able to connect to vdsm using the same hostname so it seams that the python name resolution works a bit differently than the OS one. > Please include the logs with the problem. in hosted-engine-setup logs, 2016-07-28 12:03:43 INFO otopi.plugins.gr_he_setup.system.vdsmenv util.connect_vdsm_json_rpc:194 Waiting for VDSM to reply in loop till 2016-07-28 12:03:43 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/system/vdsmenv.py", line 94, in _late_setup timeout=ohostedcons.Const.VDSCLI_SSL_TIMEOUT, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 198, in connect_vdsm_json_rpc timeout=MAX_RETRY * DELAY RuntimeError: Couldnt connect to VDSM within 240 seconds jsonrpcclient doesn't log anywhere, and in VDSM client we didn't fount any sign of attempted connection.
Reducing priority to high according to comment #26
The client heuristic is to use localhost if no target is provided, it seems reasonable to me to keep this logic. If the caller prefers, the client can be called with a specific target, overriding the default. What is the output of: python -c 'import socket ; print socket.gethostname()' Compare to hostname?
Also one replying, please move to ON_QA and add TestOnly if should be fixed.
(In reply to Edward Haas from comment #28) > The client heuristic is to use localhost if no target is provided, it seems > reasonable to me to keep this logic. This is not really true, the default is socket.gethostname() https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/jsonrpcvdscli.py;h=98f6f9ca02ae0590c8ad8eeb70ebcfcd8a457b97;hb=refs/heads/ovirt-4.0#l201 and this requires the hostname to be correctly resolvable and so this issue when we have DNS or network issues. Probably using 'localhost' as the default can make it more robust on this kind of issues.
moving to vdsm - jsonrpcclient as for comment 30.
We could fix it by changing jsonrpcvdscli or calling code and provide correct host name when calling the client.
(In reply to Simone Tiraboschi from comment #30) > (In reply to Edward Haas from comment #28) > > The client heuristic is to use localhost if no target is provided, it seems > > reasonable to me to keep this logic. > > This is not really true, > the default is socket.gethostname() Sorry, I meant hostname. > > https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/jsonrpcvdscli. > py;h=98f6f9ca02ae0590c8ad8eeb70ebcfcd8a457b97;hb=refs/heads/ovirt-4.0#l201 > > and this requires the hostname to be correctly resolvable and so this issue > when we have DNS or network issues. hostname is usually resolvable from the hosts file, it does not require dns or network access. I guess it depends on /etc/nsswitch.conf settings. > Probably using 'localhost' as the default can make it more robust on this > kind of issues. hostname was used with the hope to avoid the question of which IP version to use. I am not sure how localhost will behave in this case. We can try it, but it must be checked with IPv6 enabled and disabled (at host level).
*** Bug 1367458 has been marked as a duplicate of this bug. ***
*** Bug 1373018 has been marked as a duplicate of this bug. ***
Verified in 4.0.4-4 vdsm-4.18.13-1.el7ev.x86_64