Bug 1262550

Summary: 3.6.0 nighty hosted-engine deploy quits silently on second host
Product: [oVirt] ovirt-hosted-engine-setup Reporter: acrow
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.3.0CC: alonbl, bugs, didi, gklein, hawk, sbonazzo
Target Milestone: ovirt-3.6.0-betaKeywords: TestOnly, Triaged
Target Release: 1.3.0Flags: rule-engine: ovirt-3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
gklein: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-27 07:52:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1255638    
Bug Blocks:    
Attachments:
Description Flags
hosted-engine --deploy done from scratch.
none
strace and logs none

Description acrow 2015-09-12 17:00:08 UTC
Description of problem:


I've manged to get a 3.6.0-pre self-hosted engine running on one Centos 7.1 host (glusterfs-backed) but on the second host it the deploy process just quits with no error before I even get to configure the storage


Version-Release number of selected component (if applicable):

1.3.0-0.0.master.20150909090214.git794400d.el7.centos

How reproducible:

Always

Steps to Reproduce:
1. Set up clean Centos 7.1 host
2. Install glusterfs 3.7 repos
3. Install ovirt-3.6 and ovirt-3.6 snapshot repos
4. Install ovirt-hosted-engine-setup
5. Run hosted-engine --deploy

Actual results:

Install just quits:

[root@olympia ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: yes
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20150912174714-4jr1ru.log
          Version: otopi-1.4.0_master (otopi-1.4.0-0.0.master.20150910103318.gitdd73099.el7)
[ INFO  ] Hardware supports virtualization
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[root@olympia ~]# 

Expected results:

Install continues to ask required questions and sets up second host for engine.

Additional info:

Tail end of setup log:

2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/reboot=str:'/sbin/reboot'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/remote-viewer=str:'/bin/remote-viewer'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/service=str:'/sbin/service'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/sshd=str:'/sbin/sshd'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/sudo=str:'/bin/sudo'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/systemctl=str:'/bin/systemctl'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/truncate=str:'/bin/truncate'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/umount=str:'/bin/umount'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:510 ENV COMMAND/vdsm-tool=str:'/bin/vdsm-tool'
2015-09-12 17:15:12 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.systemd.Plugin._prog
rams
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'show-environment'), exec
utable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'show-environment'
), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'show-environment') stdout:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
LANG=en_GB.UTF-8

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'show-environment') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd._programs:64 registering systemd provider
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.rhel.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:828 execute: ('/bin/systemctl', 'show-environment'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'show-environment'), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:936 execute-output: ('/bin/systemctl', 'show-environment') stdout:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
LANG=en_GB.UTF-8

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:941 execute-output: ('/bin/systemctl', 'show-environment') stderr:


2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.otopi.services.openrc.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage programs METHOD otopi.plugins.ovirt_hosted_engine_setup.ha.ha_services.Plugin._programs
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service ovirt-ha-agent status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service'), rc=3
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service') stdout:
ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled)
   Active: inactive (dead)


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-agent.service') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service ovirt-ha-broker status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service'), rc=3
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service') stdout:
ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled)
   Active: inactive (dead)


2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'ovirt-ha-broker.service') stderr:


2015-09-12 17:15:12 INFO otopi.context context.runSequence:427 Stage: Environment setup
2015-09-12 17:15:12 DEBUG otopi.context context.runSequence:431 STAGE late_setup
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.vdsmd.vdsmconf.Plugin._late_setup
2015-09-12 17:15:12 DEBUG otopi.context context._executeMethod:142 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv.Plugin._late_setup
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd systemd.status:105 check service vdsmd status
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:828 execute: ('/bin/systemctl', 'status', 'vdsmd.service'), executable='None', cwd='None', env=None
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:878 execute-result: ('/bin/systemctl', 'status', 'vdsmd.service'), rc=0
2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:936 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stdout:
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Sat 2015-09-12 17:13:53 BST; 1min 19s ago
  Process: 3140 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 3234 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─3234 /usr/bin/python /usr/share/vdsm/vdsm

Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 1
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 1
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 make_client_response()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 2
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 parse_server_challenge()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 ask_user_info()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 make_client_response()
Sep 12 17:13:55 olympia python[3234]: DIGEST-MD5 client step 3

2015-09-12 17:15:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:941 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stderr:


2015-09-12 17:15:12 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:62 {'status': {'message': 'Done', 'code': 0}, 'info': {'systemProductName': 'X9DR3-F', 'systemSerialNumber': '0123456789', 'systemFamily': 'To be filled by O.E.M.', 'systemVersion': '0123456789', 'systemUUID': '00000000-0000-0000-0000-002590E5DAD8', 'systemManufacturer': 'Supermicro'}}

Comment 1 acrow 2015-09-12 18:36:30 UTC
Created attachment 1072739 [details]
hosted-engine --deploy done from scratch.

Firewall off, selinux disabled. Still bails out with no obvious error.

Alex

Comment 2 Yedidyah Bar David 2015-09-13 06:15:10 UTC
Looks really weird.

Can you please try to run this under strace and post the log? Something like:

strace -tt -s 4096 -f -o /tmp/hosted-engine-trace hosted-engine --deploy

Also please check/post vdsm logs.

Thanks.

Comment 3 acrow 2015-09-13 14:44:39 UTC
Created attachment 1072902 [details]
strace and logs

Comment 4 acrow 2015-09-13 15:02:47 UTC
By the way this is from a fresh setup on a "first host", so it's not restricted to the second host use case.

I think it's a very recent change as a few days ago (I think on 9/9/15, possibly a bit earlier) I could deploy the hosted engine with no issues.

ALex

Comment 5 Simone Tiraboschi 2015-09-14 08:41:27 UTC
It seams that the issue is on the connect with VDSM:

3800  15:39:46.143821 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
3800  15:39:46.143887 fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
3800  15:39:46.143912 fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
3800  15:39:46.143946 connect(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress)
3800  15:39:46.144069 poll([{fd=10, events=POLLOUT}], 1, 900000) = 1 ([{fd=10, revents=POLLOUT}])
3800  15:39:46.144106 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
3800  15:39:46.144159 getpeername(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0

....

3800  15:39:46.152220 close(10)         = 0
3800  15:39:46.152612 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152660 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152693 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152719 rt_sigaction(SIGABRT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152744 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {SIG_IGN, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152768 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.152805 rt_sigaction(SIGIO, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0
3800  15:39:46.177030 exit_group(1)     = ?
3800  15:39:46.180417 +++ exited with 1 +++

Comment 6 Simone Tiraboschi 2015-09-14 10:18:30 UTC
OK, found running it with OTOPI_DEBUG=1

2015-09-14 12:12:43 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:62 {'status': {'message': 'Done', 'code': 0}, 'info': {'systemProductName': 'oVirt Node', 'systemUUID': '8d225eaa-6dcd-441a-80a1-1e43335dc7b0', 'systemSerialNumber': '4C4C4544-0056-4B10-8044-B9C04F513032', 'systemVersion': '20-4', 'systemManufacturer': 'oVirt'}}
('FATAL Internal error (main): <Fault 1: \'<type \\\'exceptions.Exception\\\'>:method "__unicode__" is not supported\'>',)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/__main__.py", line 103, in main
    installer.execute()
  File "/usr/lib/python2.7/site-packages/otopi/main.py", line 167, in execute
    self.context.runSequence()
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 443, in runSequence
    self.dumpEnvironment(old=oldEnvironment)
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 492, in dumpEnvironment
    value = common.toStr(self.environment[key])
  File "/usr/lib/python2.7/site-packages/otopi/common.py", line 88, in toStr
    return toStr(o.__unicode__())
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1297, in single_request
    return self.parse_response(response)
  File "/usr/lib/python2.7/site-packages/vdsm/vdscli.py", line 43, in wrapped_parse_response
    return old_parse_response(*args, **kwargs)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1473, in parse_response
    return u.close()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 793, in close
    raise Fault(**self._stack[0])
Fault: <Fault 1: '<type \'exceptions.Exception\'>:method "__unicode__" is not supported'>

It's logging the fault only with OTOPI_DEBUG=1, otherwise is silently quitting:
it would be better to find a way to have it logged also non in debug mode.

I suspect that it has been introduced with this one:
https://gerrit.ovirt.org/#/c/45990
witch tries __unicode__() before __repr__()

Comment 7 Nikolai Sednev 2015-11-25 08:15:58 UTC
Works for me on these components:
# rpm -qa vdsm libvirt-client sanlock qemu-kvm-rhev mom ovirt*
ovirt-vmconsole-host-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch
ovirt-release36-002-2.noarch
ovirt-engine-sdk-python-3.6.0.4-0.2.20151123.gita2f81ed.el7.centos.noarch
sanlock-3.2.4-1.el7.x86_64
ovirt-setup-lib-1.0.1-0.0.master.20151119123055.gitfa908be.el7.centos.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64
ovirt-hosted-engine-ha-1.3.3-0.0.master.20151118145556.20151118145552.git71b535e.el7.noarch
ovirt-vmconsole-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch
ovirt-release36-snapshot-002-2.noarch
libvirt-client-1.2.17-13.el7.x86_64
ovirt-hosted-engine-setup-1.3.1-0.0.master.20151118143825.gitc013638.el7.centos.noarch
ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-2.el7.noarch

Deployed 2 hosts over NFS SD.

Comment 8 Sandro Bonazzola 2015-11-27 07:52:42 UTC
Since oVirt 3.6.0 has been released, moving from verified to closed current release.