Bug 1127224
Summary: | Hosted-engine --deploy failed with "Failed to execute stage 'Misc configuration': Connection to storage server failed" | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | cshao <cshao> | ||||||||||||
Component: | ovirt-hosted-engine-setup | Assignee: | Sandro Bonazzola <sbonazzo> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | meital avital <mavital> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 3.5 | CC: | alukiano, amureini, bugs, cshao, danken, dfediuck, didi, ecohen, fdeutsch, gklein, gouyang, hadong, huiwa, iheim, leiwang, lveyde, mgoldboi, ovirt-bugs, pstehlik, rbalakri, rbarry, sbonazzo, stirabos, yaniwang, ycui, yeylon | ||||||||||||
Target Milestone: | --- | Keywords: | Reopened, TestBlocker, Triaged, Unconfirmed | ||||||||||||
Target Release: | 3.5.0 | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | integration | ||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2014-09-09 12:25:13 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
cshao
2014-08-06 12:24:09 UTC
Created attachment 924464 [details]
ovirt-hosted-engine-setup.1.log
Created attachment 924465 [details]
ovirt-hosted-engine-setup.2.log
Created attachment 924466 [details]
ovirt-hosted-engine-setup.3.log
Created attachment 924467 [details]
ovirt-node.log
Hi Shao Chen, Do we test it run hosted-engine --deploy on fedora? If it works good in fedora, but failed on ovirt-node, so this bug may be ovirt-node-plugin-hosted-engine specific issue only, need Fabian to help check. Thanks Ying (In reply to Ying Cui from comment #5) > Hi Shao Chen, > Do we test it run hosted-engine --deploy on fedora? If it works good in > fedora, but failed on ovirt-node, so this bug may be > ovirt-node-plugin-hosted-engine specific issue only, need Fabian to help > check. > Thanks > Ying Hi ying, Yes, we have tested it on fedora19 and it can work fine. It only failed on ovirt-node side. Thanks! According to comment 6, I changed the component to ovirt-node to pay attention in advance, and there is no ovirt-node-plugin-hosted-engine component in bugzilla. Hi fabiand, "hosted-engine --deploy" test result on CentOS is here: Test version: CentOS release 6.5 (Final) ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch Test result: [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Connection to storage server failed [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Hosted-engine --deploy still got failed. The nic lost after fail. Leave test env to here: root.9.139 p:redhat Chen said: """ "hosted-engine --deploy" test result on CentOS is here: Test version: CentOS release 6.5 (Final) ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch Test result: [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Connection to storage server failed [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Hosted-engine --deploy still got failed. The nic lost after fail. """ So according to this comment the problem also exists on plain centos, thus moving this bug to hosted-engine. Hi, looking at the logs, VDSM replied with: 'statuslist': [{'status': 477, 'id': 'c59fa471-707f-48b4-8118-3aab2af8a462'}]} I need full vdsm, supervdsm, libvirt and sanlock logs in order to try to understand why it happened. What do you mean by "The nic lost after fail." ? If I understood correctly, this affects EL6 only right? (In reply to Sandro Bonazzola from comment #10) > Hi, > looking at the logs, VDSM replied with: > > 'statuslist': [{'status': 477, 'id': > 'c59fa471-707f-48b4-8118-3aab2af8a462'}]} > > I need full vdsm, supervdsm, libvirt and sanlock logs in order to try to > understand why it happened. log in attachment. > > What do you mean by "The nic lost after fail." ? That means: When hosted-engine configuring the management bridge, but actually it will failed, so the network will lost, please see ifcfg-eth0. DEVICE=eth0 ONBOOT=no HWADDR=00:24:21:7f:b7:19 BRIDGE=ovirtmgmt NM_CONTROLLED=no > > If I understood correctly, this affects EL6 only right? not sure, I just reproduce this issue on CentOS6.5 & ovirt-node3.5. Created attachment 925053 [details]
needlog.tar.gz
(In reply to shaochen from comment #1) > Created attachment 924464 [details] > ovirt-hosted-engine-setup.1.log 2014-08-06 12:09:59 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py", line 261, in _misc raiseOnError=True File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 871, in execute command=args[0], RuntimeError: Command '/usr/bin/vdsClient' failed to execute is not in the vdsm and supervdsm logs, they start on 2014-08-07. (In reply to shaochen from comment #2) > Created attachment 924465 [details] > ovirt-hosted-engine-setup.2.log Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/boot_cdrom.py", line 162, in _customization self.environment[ohostedcons.VMEnv.CDROM] File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/boot_cdrom.py", line 54, in _check_iso_readable file_stat = os.stat(realpath) OSError: [Errno 2] No such file or directory: '/None' looks like iso file path supplied by answer file was empty., not a bug (In reply to shaochen from comment #3) > Created attachment 924466 [details] > ovirt-hosted-engine-setup.3.log Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 899, in _misc self._storageServerConnection() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 471, in _storageServerConnection _('Connection to storage server failed') RuntimeError: Connection to storage server failed Need older logs, the ones attached starts the day after the failure. > is not in the vdsm and supervdsm logs, they start on 2014-08-07.
Those logs come form CentOS, you means you need the log from ovirt-node side?
If so let me re-setup env for you.
Thanks!
(In reply to shaochen from comment #16) > > is not in the vdsm and supervdsm logs, they start on 2014-08-07. > Those logs come form CentOS, you means you need the log from ovirt-node side? > If so let me re-setup env for you. > > Thanks! Well, I need the logs on both ovirt-hosted-engine-setup and subsystems (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try to trace where the issue originated. I'm not able to reproduce installing hosted-engine with nfs3 storage. I don't have AMD hardware but I don't think it's related. - ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch - ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch - vdsm-4.16.1-16.git27555ec.el6.x86_64 - libvirt-0.10.2-29.el6_5.10.x86_64 - sanlock-2.8-1.el6.x86_64 Can you please try to reproduce with ovirt-3.5-snapshot? Reducing severity and priority since it's not 100% reproducible. (In reply to Sandro Bonazzola from comment #17) > (In reply to shaochen from comment #16) > > > is not in the vdsm and supervdsm logs, they start on 2014-08-07. > > Those logs come form CentOS, you means you need the log from ovirt-node side? > > If so let me re-setup env for you. > > > > Thanks! > > Well, I need the logs on both ovirt-hosted-engine-setup and subsystems > (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try > to trace where the issue originated. Re-setup env for you debug. ovirt-node env: ssh admin.9.139 (In reply to shaochen from comment #19) > (In reply to Sandro Bonazzola from comment #17) > > (In reply to shaochen from comment #16) > > > > is not in the vdsm and supervdsm logs, they start on 2014-08-07. > > > Those logs come form CentOS, you means you need the log from ovirt-node side? > > > If so let me re-setup env for you. > > > > > > Thanks! > > > > Well, I need the logs on both ovirt-hosted-engine-setup and subsystems > > (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try > > to trace where the issue originated. > > Re-setup env for you debug. > ovirt-node env: ssh admin.9.139 Re-setup env for you debug. ovirt-node env: ssh admin.9.139 P:redhat Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied by server while mounting 10.66.8.184:/home Looks like you configured an unreachable storage. Please reopen if you're able to reproduce with valid storage. (In reply to Sandro Bonazzola from comment #21) > Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied > by server while mounting 10.66.8.184:/home No, that is not the reason, actually the nfs is available all the time, and the failure is not in here. # showmount -e 10.66.8.184 Export list for 10.66.8.184: /home/vol/cshao/export * /home/vol/cshao/iso4 * /home/vol/cshao/iso3 * /home/vol/cshao/iso2 * /home/vol/cshao/iso * /home/vol/cshao/data3 * /home/vol/cshao/data2 * /home/vol/cshao/data It will report error if the nfs can't access and the deploy will interrupt. e.g. Error while mpunting specified path: mount.nfs: access denied by server while mounting 10.66.8.184:/home/vol/cshao/iso5. Maybe there were some another reason of failure. (In reply to shaochen from comment #23) > (In reply to Sandro Bonazzola from comment #21) > > Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied > > by server while mounting 10.66.8.184:/home > > No, that is not the reason, actually the nfs is available all the time, and > the failure is not in here. > # showmount -e 10.66.8.184 > Export list for 10.66.8.184: > /home/vol/cshao/export * > /home/vol/cshao/iso4 * > /home/vol/cshao/iso3 * > /home/vol/cshao/iso2 * > /home/vol/cshao/iso * > /home/vol/cshao/data3 * > /home/vol/cshao/data2 * > /home/vol/cshao/data > > It will report error if the nfs can't access and the deploy will interrupt. > e.g. > Error while mpunting specified path: mount.nfs: access denied by server > while mounting 10.66.8.184:/home/vol/cshao/iso5. > Maybe there were some another reason of failure. Can't find one in the provided logs (In reply to Sandro Bonazzola from comment #24) > Can't find one in the provided logs 1. Run "hosted-engine --deploy" Configure for the first time, The network will lost, "ONBOOT=no", "BRIDGE=ovirtmgmt" A new bridge will be create but actually failed. ifcfg-eth0 DEVICE=eth0 ONBOOT=no HWADDR=00:24:21:7f:b7:19 BRIDGE=ovirtmgmt NM_CONTROLLED=no #brctl show bridge name bridge id STP enabled interface ;vdsmdummy; 8000.000000000000 no 2. Configure the network back. 3. Run "hosted-engine --deploy" [INFO] Configure the management bridge. [ERROR] Failed to execute stage Misc configuration': Command '/usr/bin/vdsClient' failed to execute [INFO] Stage: Clean up Please see log: admin.11.13 p:redhat /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20140813065404-nanrcg.log Moving back to NEW, looks like it's related to Bug #1128065. A new node iso is needed for testing this again. Fabian, can you build a new Node iso? rbarry built a new iso, can you try to reproduce with: http://resources.ovirt.org/pub/ovirt-3.5-pre/iso/ovirt-node-iso-3.5.0.ovirt35.20140827.el6.iso Test version: ovirt-node-iso-3.5.0.ovirt35.20140827.el6.iso ovirt-node-3.1.0-0.0.master.el6.noarch ovirt-node-plugin-hosted-engine-0.1.0-0.0.master.el6.x86_64 Test steps: 1. Run "hosted-engine --deploy" Test result: Met bug 1134873. it will report error as follows: [ERROR] Failed to execute stage 'Programs detection': Hosted Engine HA services are already running on this system. Hosted Engine cannot be deployed on a host already running those services. Test env: 10.66.11.13 P:redhat Hi Sandro, I tried install hosted-engine to clean host and I also had problem with connectivity, when hosted-engine-setup create bridge, setup change ONBOOT=no(exactly like shaochen said)of interface where you create bridge and after it restart network, clearly that network still in status down, it lead to installation failed from some connectivity reason. Problem exist only for clean host installation. It also can be a big problem for installation via ssh, if you have no network after bridge configuration the only way to reach host it physically or via power management. ovirt-hosted-engine-setup-1.2.0-0.1.master.20140820130617.gitd832f86.el6.noarch If you will need additional logs ask me. My mistake it was correct for earlier version for ovirt-hosted-engine-setup-1.2.0-0.1.master.20140820130617.gitd832f86.el6.noarch all works fine also on clean install. Closing based on comment 30. If there's a specific ovirt-node issue, please open an ovirt-node issue. The same goes for networking issues- please open a new bz with the relevant information. |