Bug 1262550 - 3.6.0 nighty hosted-engine deploy quits silently on second host
Summary: 3.6.0 nighty hosted-engine deploy quits silently on second host
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 1.3.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.0-beta
: 1.3.0
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard: integration
Depends On: 1255638
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-12 17:00 UTC by acrow
Modified: 2015-11-27 07:52 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-11-27 07:52:42 UTC
oVirt Team: ---
Embargoed:
rule-engine: ovirt-3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
gklein: testing_ack+


Attachments (Terms of Use)
hosted-engine --deploy done from scratch. (161.32 KB, text/plain)
2015-09-12 18:36 UTC, acrow
no flags Details
strace and logs (3.44 MB, application/x-gzip)
2015-09-13 14:44 UTC, acrow
no flags Details

Description acrow 2015-09-12 17:00:08 UTC
Description of problem:


I've manged to get a 3.6.0-pre self-hosted engine running on one Centos 7.1 host (glusterfs-backed) but on the second host it the deploy process just quits with no error before I even get to configure the storage


Version-Release number of selected component (if applicable):

1.3.0-0.0.master.20150909090214.git794400d.el7.centos

How reproducible:

Always

Steps to Reproduce:
1. Set up clean Centos 7.1 host
2. Install glusterfs 3.7 repos
3. Install ovirt-3.6 and ovirt-3.6 snapshot repos
4. Install ovirt-hosted-engine-setup
5. Run hosted-engine --deploy

Actual results:

Install just quits:

[root@olympia ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: yes
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20150912174714-4jr1ru.log
          Version: otopi-1.4.0_master (otopi-1.4.0-0.0.master.20150910103318.gitdd73099.el7)
[ INFO  ] Hardware supports virtualization
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[root@olympia ~]# 

Expected results:

Install continues to ask required questions and sets up second host for engine.

Additional info:

Tail end of setup log:

2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/reboot=str:'/sbin/reboot'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/remote-viewer=str:'/bin/remote-viewer'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/service=str:'/sbin/service'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/sshd=str:'/sbin/sshd'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/sudo=str:'/bin/sudo'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/systemctl=str:'/bin/systemctl'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/truncate=str:'/bin/truncate'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/umount=str:'/bin/umount'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/vdsm-tool=str:'/bin/vdsm-tool'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.systemd.Plugin._prog
rams
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'show-environment'), exec
utable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'show-environment'
), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'show-environment') stdout:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
LANG=en_GB.UTF-8

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'show-environment') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd._programs:64 registering systemd provider
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.rhel.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:828 execute: ('/bin/systemctl', 'show-environment'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'show-environment'), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:936 execute-output: ('/bin/systemctl', 'show-environment') stdout:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
LANG=en_GB.UTF-8

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:941 execute-output: ('/bin/systemctl', 'show-environment') stderr:


2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.openrc.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.ovirt_hosted_engine_setup.ha.ha_services.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service ovirt-ha-agent status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service'), rc=3
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service') stdout:
ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled)
   Active: inactive (dead)


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service ovirt-ha-broker status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service'), rc=3
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service') stdout:
ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled)
   Active: inactive (dead)


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service') stderr:


2015-09-12 17:15:12 INFO otopi.context context.runSequence:427 Stage: Environment setup
2015-09-12 17:15:12 DEBUG otopi.context context.runSequence:431 STAGE late_setup
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.vdsmd.vdsmconf.Plugin._late_setup
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv.Plugin._late_setup
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service vdsmd status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'vdsmd.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'vdsmd.service'), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stdout:
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Sat 2015-09-12 17:13:53 BST; 1min 19s ago
  Process: 3140 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 3234 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─3234 /usr/bin/python /usr/share/vdsm/vdsm

Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 1
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 1
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 make_client_response()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 2
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 parse_server_challenge()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 make_client_response()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 3

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:62 {'status': {'message': 'Done', 'code': 0}, 'info': {'systemProductName': 'X9DR3-F', 'systemSerialNumber': '0123456789', 'systemFamily': 'To be filled by O.E.M.', 'systemVersion': '0123456789', 'systemUUID': '00000000-0000-0000-0000-002590E5DAD8', 'systemManufacturer': 'Supermicro'}}

Comment 1 acrow 2015-09-12 18:36:30 UTC
Created attachment 1072739 [details]
hosted-engine --deploy done from scratch.

Firewall off, selinux disabled. Still bails out with no obvious error.

Alex

Comment 2 Yedidyah Bar David 2015-09-13 06:15:10 UTC
Looks really weird.

Can you please try to run this under strace and post the log? Something like:

strace -tt -s 4096 -f -o /tmp/hosted-engine-trace hosted-engine --deploy

Also please check/post vdsm logs.

Thanks.

Comment 3 acrow 2015-09-13 14:44:39 UTC
Created attachment 1072902 [details]
strace and logs

Comment 4 acrow 2015-09-13 15:02:47 UTC
By the way this is from a fresh setup on a "first host", so it's not restricted to the second host use case.

I think it's a very recent change as a few days ago (I think on 9/9/15, possibly a bit earlier) I could deploy the hosted engine with no issues.

ALex

Comment 5 Simone Tiraboschi 2015-09-14 08:41:27 UTC
It seams that the issue is on the connect with VDSM:

3800  15:39:46.143821 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
3800  15:39:46.143887 fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
3800  15:39:46.143912 fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
3800  15:39:46.143946 connect(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress)
3800  15:39:46.144069 poll([{fd=10, events=POLLOUT}], 1, 900000) = 1 ([{fd=10, revents=POLLOUT}])
3800  15:39:46.144106 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
3800  15:39:46.144159 getpeername(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0

....

3800  15:39:46.152220 close(10)         = 0
3800  15:39:46.152612 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152660 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152693 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152719 rt_sigaction(SIGABRT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152744 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {SIG_IGN, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152768 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152805 rt_sigaction(SIGIO, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.177030 exit_group(1)     = ?
3800  15:39:46.180417 +++ exited with 1 +++

Comment 6 Simone Tiraboschi 2015-09-14 10:18:30 UTC
OK, found running it with OTOPI_DEBUG=1

2015-09-14 12:12:43 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:62 {'status': {'message': 'Done', 'code': 0}, 'info': {'systemProductName': 'oVirt Node', 'systemUUID': '8d225eaa-6dcd-441a-80a1-1e43335dc7b0', 'systemSerialNumber': '4C4C4544-0056-4B10-8044-B9C04F513032', 'systemVersion': '20-4', 'systemManufacturer': 'oVirt'}}
('FATAL Internal error (main): <Fault 1: \'<type \\\'exceptions.Exception\\\'>:method "__unicode__" is not supported\'>',)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/__main__.py", line 103, in main
    installer.execute()
  File "/usr/lib/python2.7/site-packages/otopi/main.py", line 167, in execute
    self.context.runSequence()
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 443, in runSequence
    self.dumpEnvironment(old=oldEnvironment)
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 492, in dumpEnvironment
    value = common.toStr(self.environment[key])
  File "/usr/lib/python2.7/site-packages/otopi/common.py", line 88, in toStr
    return toStr(o.__unicode__())
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1297, in single_request
    return self.parse_response(response)
  File "/usr/lib/python2.7/site-packages/vdsm/vdscli.py", line 43, in wrapped_parse_response
    return old_parse_response(*args, **kwargs)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1473, in parse_response
    return u.close()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 793, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: '<type \'exceptions.Exception\'>:method "__unicode__" is not supported'>

It's logging the fault only with OTOPI_DEBUG=1, otherwise is silently quitting:
it would be better to find a way to have it logged also non in debug mode.

I suspect that it has been introduced with this one:
https://gerrit.ovirt.org/#/c/45990
witch tries __unicode__() before __repr__()

Comment 7 Nikolai Sednev 2015-11-25 08:15:58 UTC
Works for me on these components:
# rpm -qa vdsm libvirt-client sanlock qemu-kvm-rhev mom ovirt*
ovirt-vmconsole-host-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch
ovirt-release36-002-2.noarch
ovirt-engine-sdk-python-3.6.0.4-0.2.20151123.gita2f81ed.el7.centos.noarch
sanlock-3.2.4-1.el7.x86_64
ovirt-setup-lib-1.0.1-0.0.master.20151119123055.gitfa908be.el7.centos.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64
ovirt-hosted-engine-ha-1.3.3-0.0.master.20151118145556.20151118145552.git71b535e.el7.noarch
ovirt-vmconsole-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch
ovirt-release36-snapshot-002-2.noarch
libvirt-client-1.2.17-13.el7.x86_64
ovirt-hosted-engine-setup-1.3.1-0.0.master.20151118143825.gitc013638.el7.centos.noarch
ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-2.el7.noarch

Deployed 2 hosts over NFS SD.

Comment 8 Sandro Bonazzola 2015-11-27 07:52:42 UTC
Since oVirt 3.6.0 has been released, moving from verified to closed current release.


Note You need to log in before you can comment on or make changes to this bug.