Bug 1262550
Summary: | 3.6.0 nighty hosted-engine deploy quits silently on second host | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-setup | Reporter: | acrow | ||||||
Component: | General | Assignee: | Simone Tiraboschi <stirabos> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 1.3.0 | CC: | alonbl, bugs, didi, gklein, hawk, sbonazzo | ||||||
Target Milestone: | ovirt-3.6.0-beta | Keywords: | TestOnly, Triaged | ||||||
Target Release: | 1.3.0 | Flags: | rule-engine:
ovirt-3.6.0+
rule-engine: blocker+ ylavi: planning_ack+ sbonazzo: devel_ack+ gklein: testing_ack+ |
||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | integration | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-11-27 07:52:42 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1255638 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
acrow
2015-09-12 17:00:08 UTC
Created attachment 1072739 [details]
hosted-engine --deploy done from scratch.
Firewall off, selinux disabled. Still bails out with no obvious error.
Alex
Looks really weird. Can you please try to run this under strace and post the log? Something like: strace -tt -s 4096 -f -o /tmp/hosted-engine-trace hosted-engine --deploy Also please check/post vdsm logs. Thanks. Created attachment 1072902 [details]
strace and logs
By the way this is from a fresh setup on a "first host", so it's not restricted to the second host use case. I think it's a very recent change as a few days ago (I think on 9/9/15, possibly a bit earlier) I could deploy the hosted engine with no issues. ALex It seams that the issue is on the connect with VDSM: 3800 15:39:46.143821 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10 3800 15:39:46.143887 fcntl(10, F_GETFL) = 0x2 (flags O_RDWR) 3800 15:39:46.143912 fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 3800 15:39:46.143946 connect(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress) 3800 15:39:46.144069 poll([{fd=10, events=POLLOUT}], 1, 900000) = 1 ([{fd=10, revents=POLLOUT}]) 3800 15:39:46.144106 getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 3800 15:39:46.144159 getpeername(10, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 .... 3800 15:39:46.152220 close(10) = 0 3800 15:39:46.152612 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152660 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152693 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152719 rt_sigaction(SIGABRT, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152744 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {SIG_IGN, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152768 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.152805 rt_sigaction(SIGIO, {SIG_DFL, [], SA_RESTORER, 0x7efe5af03130}, {0x7efe5b221620, [], SA_RESTORER, 0x7efe5af03130}, 8) = 0 3800 15:39:46.177030 exit_group(1) = ? 3800 15:39:46.180417 +++ exited with 1 +++ OK, found running it with OTOPI_DEBUG=1 2015-09-14 12:12:43 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:62 {'status': {'message': 'Done', 'code': 0}, 'info': {'systemProductName': 'oVirt Node', 'systemUUID': '8d225eaa-6dcd-441a-80a1-1e43335dc7b0', 'systemSerialNumber': '4C4C4544-0056-4B10-8044-B9C04F513032', 'systemVersion': '20-4', 'systemManufacturer': 'oVirt'}} ('FATAL Internal error (main): <Fault 1: \'<type \\\'exceptions.Exception\\\'>:method "__unicode__" is not supported\'>',) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/__main__.py", line 103, in main installer.execute() File "/usr/lib/python2.7/site-packages/otopi/main.py", line 167, in execute self.context.runSequence() File "/usr/lib/python2.7/site-packages/otopi/context.py", line 443, in runSequence self.dumpEnvironment(old=oldEnvironment) File "/usr/lib/python2.7/site-packages/otopi/context.py", line 492, in dumpEnvironment value = common.toStr(self.environment[key]) File "/usr/lib/python2.7/site-packages/otopi/common.py", line 88, in toStr return toStr(o.__unicode__()) File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1297, in single_request return self.parse_response(response) File "/usr/lib/python2.7/site-packages/vdsm/vdscli.py", line 43, in wrapped_parse_response return old_parse_response(*args, **kwargs) File "/usr/lib64/python2.7/xmlrpclib.py", line 1473, in parse_response return u.close() File "/usr/lib64/python2.7/xmlrpclib.py", line 793, in close raise Fault(**self._stack[0]) Fault: <Fault 1: '<type \'exceptions.Exception\'>:method "__unicode__" is not supported'> It's logging the fault only with OTOPI_DEBUG=1, otherwise is silently quitting: it would be better to find a way to have it logged also non in debug mode. I suspect that it has been introduced with this one: https://gerrit.ovirt.org/#/c/45990 witch tries __unicode__() before __repr__() Works for me on these components: # rpm -qa vdsm libvirt-client sanlock qemu-kvm-rhev mom ovirt* ovirt-vmconsole-host-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch ovirt-release36-002-2.noarch ovirt-engine-sdk-python-3.6.0.4-0.2.20151123.gita2f81ed.el7.centos.noarch sanlock-3.2.4-1.el7.x86_64 ovirt-setup-lib-1.0.1-0.0.master.20151119123055.gitfa908be.el7.centos.noarch qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64 ovirt-hosted-engine-ha-1.3.3-0.0.master.20151118145556.20151118145552.git71b535e.el7.noarch ovirt-vmconsole-1.0.1-0.0.master.20151105234454.git3e5d52e.el7.noarch ovirt-release36-snapshot-002-2.noarch libvirt-client-1.2.17-13.el7.x86_64 ovirt-hosted-engine-setup-1.3.1-0.0.master.20151118143825.gitc013638.el7.centos.noarch ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch vdsm-4.17.10.1-0.el7ev.noarch mom-0.5.1-2.el7.noarch Deployed 2 hosts over NFS SD. Since oVirt 3.6.0 has been released, moving from verified to closed current release. |