Bug 1127224 - Hosted-engine --deploy failed with "Failed to execute stage 'Misc configuration': Connection to storage server failed"
Summary: Hosted-engine --deploy failed with "Failed to execute stage 'Misc configurati...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-hosted-engine-setup
Version: 3.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.5.0
Assignee: Sandro Bonazzola
QA Contact: meital avital
URL:
Whiteboard: integration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-06 12:24 UTC by cshao
Modified: 2014-09-26 14:56 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-09 12:25:13 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
ovirt-hosted-engine-setup.1.log (248.94 KB, text/plain)
2014-08-06 12:25 UTC, cshao
no flags Details
ovirt-hosted-engine-setup.2.log (171.43 KB, text/plain)
2014-08-06 12:25 UTC, cshao
no flags Details
ovirt-hosted-engine-setup.3.log (268.84 KB, text/plain)
2014-08-06 12:26 UTC, cshao
no flags Details
ovirt-node.log (24.92 KB, text/plain)
2014-08-06 12:26 UTC, cshao
no flags Details
needlog.tar.gz (50.80 KB, application/x-gzip)
2014-08-08 03:13 UTC, cshao
no flags Details

Description cshao 2014-08-06 12:24:09 UTC
Description of problem:
Hosted-engine --deploy failed with "Failed to execute stage 'Misc configuration': Connection to storage server failed"

Version-Release number of selected component (if applicable):
ovirt-node-iso-3.5.0.ovirt35.20140805.0.el6.iso
ovirt-node-3.1.0-0.0.master.20140805.git67e6b92.el6.noarch
ovirt-node-plugin-hosted-engine-0.1.0-0.0.master.20140701.gitb651331.el6.x86_64
ovirt-node-plugin-vdsm-0.2.0-1.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install ovirt-node-iso-3.5.0.ovirt35.20140805.0.el6.iso.
2. Run hosted-engine --deploy
3.

Actual results:
Hosted-engine --deploy failed with "Failed to execute stage 'Misc configuration': Connection to storage server failed"

Expected results:
Hosted-engine --deploy can successful.

Additional info:
Add keyword "test_blocker" due to this bug blocked our testing on hosted-engine feature.

Comment 1 cshao 2014-08-06 12:25:16 UTC
Created attachment 924464 [details]
ovirt-hosted-engine-setup.1.log

Comment 2 cshao 2014-08-06 12:25:56 UTC
Created attachment 924465 [details]
ovirt-hosted-engine-setup.2.log

Comment 3 cshao 2014-08-06 12:26:22 UTC
Created attachment 924466 [details]
ovirt-hosted-engine-setup.3.log

Comment 4 cshao 2014-08-06 12:26:48 UTC
Created attachment 924467 [details]
ovirt-node.log

Comment 5 Ying Cui 2014-08-07 02:22:51 UTC
Hi Shao Chen,
  Do we test it run hosted-engine --deploy on fedora? If it works good in fedora, but failed on ovirt-node, so this bug may be ovirt-node-plugin-hosted-engine specific issue only, need Fabian to help check.
Thanks
Ying

Comment 6 cshao 2014-08-07 02:43:02 UTC
(In reply to Ying Cui from comment #5)
> Hi Shao Chen,
>   Do we test it run hosted-engine --deploy on fedora? If it works good in
> fedora, but failed on ovirt-node, so this bug may be
> ovirt-node-plugin-hosted-engine specific issue only, need Fabian to help
> check.
> Thanks
> Ying

Hi ying,

Yes, we have tested it on fedora19 and it can work fine.
It only failed on ovirt-node side.

Thanks!

Comment 7 Ying Cui 2014-08-07 04:29:10 UTC
According to comment 6, I changed the component to ovirt-node to pay attention in advance, and there is no ovirt-node-plugin-hosted-engine component in bugzilla.

Comment 8 cshao 2014-08-07 11:31:35 UTC
Hi fabiand,

"hosted-engine --deploy" test result on CentOS is here:

Test version:
CentOS release 6.5 (Final)
ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch
ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch

Test result:
[ INFO  ] Configuring the management bridge
[ ERROR ] Failed to execute stage 'Misc configuration': Connection to storage server failed
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

Hosted-engine --deploy still got failed.
The nic lost after fail.

Leave test env to here:
root.9.139  p:redhat

Comment 9 Fabian Deutsch 2014-08-07 12:44:25 UTC
Chen said:
"""
"hosted-engine --deploy" test result on CentOS is here:

Test version:
CentOS release 6.5 (Final)
ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch
ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch

Test result:
[ INFO  ] Configuring the management bridge
[ ERROR ] Failed to execute stage 'Misc configuration': Connection to storage server failed
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

Hosted-engine --deploy still got failed.
The nic lost after fail.
"""

So according to this comment the problem also exists on plain centos, thus moving this bug to hosted-engine.

Comment 10 Sandro Bonazzola 2014-08-07 12:54:38 UTC
Hi,
looking at the logs, VDSM replied with:

'statuslist': [{'status': 477, 'id': 'c59fa471-707f-48b4-8118-3aab2af8a462'}]}

I need full vdsm, supervdsm, libvirt and sanlock logs in order to try to understand why it happened.

What do you mean by "The nic lost after fail." ?

If I understood correctly, this affects EL6 only right?

Comment 11 cshao 2014-08-08 03:12:44 UTC
(In reply to Sandro Bonazzola from comment #10)
> Hi,
> looking at the logs, VDSM replied with:
> 
> 'statuslist': [{'status': 477, 'id':
> 'c59fa471-707f-48b4-8118-3aab2af8a462'}]}
> 
> I need full vdsm, supervdsm, libvirt and sanlock logs in order to try to
> understand why it happened.
log in attachment.
> 
> What do you mean by "The nic lost after fail." ?
That means:
When hosted-engine configuring the management bridge, but actually it will failed, so the network will lost, please see ifcfg-eth0.

DEVICE=eth0
ONBOOT=no
HWADDR=00:24:21:7f:b7:19
BRIDGE=ovirtmgmt
NM_CONTROLLED=no


> 
> If I understood correctly, this affects EL6 only right?
not sure, I just reproduce this issue on CentOS6.5 & ovirt-node3.5.

Comment 12 cshao 2014-08-08 03:13:14 UTC
Created attachment 925053 [details]
needlog.tar.gz

Comment 13 Sandro Bonazzola 2014-08-08 06:06:50 UTC
(In reply to shaochen from comment #1)
> Created attachment 924464 [details]
> ovirt-hosted-engine-setup.1.log

2014-08-06 12:09:59 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py", line 261, in _misc
    raiseOnError=True
  File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 871, in execute
    command=args[0],
RuntimeError: Command '/usr/bin/vdsClient' failed to execute

is not in the vdsm and supervdsm logs, they start on 2014-08-07.

Comment 14 Sandro Bonazzola 2014-08-08 06:09:01 UTC
(In reply to shaochen from comment #2)
> Created attachment 924465 [details]
> ovirt-hosted-engine-setup.2.log

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/boot_cdrom.py", line 162, in _customization
    self.environment[ohostedcons.VMEnv.CDROM]
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/boot_cdrom.py", line 54, in _check_iso_readable
    file_stat = os.stat(realpath)
OSError: [Errno 2] No such file or directory: '/None'

looks like iso file path supplied by answer file was empty., not a bug

Comment 15 Sandro Bonazzola 2014-08-08 06:10:26 UTC
(In reply to shaochen from comment #3)
> Created attachment 924466 [details]
> ovirt-hosted-engine-setup.3.log

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 899, in _misc
    self._storageServerConnection()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 471, in _storageServerConnection
    _('Connection to storage server failed')
RuntimeError: Connection to storage server failed

Need older logs, the ones attached starts the day after the failure.

Comment 16 cshao 2014-08-08 06:11:21 UTC
> is not in the vdsm and supervdsm logs, they start on 2014-08-07.
Those logs come form CentOS, you means you need the log from ovirt-node side?
If so let me re-setup env for you.

Thanks!

Comment 17 Sandro Bonazzola 2014-08-08 06:15:41 UTC
(In reply to shaochen from comment #16)
> > is not in the vdsm and supervdsm logs, they start on 2014-08-07.
> Those logs come form CentOS, you means you need the log from ovirt-node side?
> If so let me re-setup env for you.
> 
> Thanks!

Well, I need the logs on both ovirt-hosted-engine-setup and subsystems (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try to trace where the issue originated.

Comment 18 Sandro Bonazzola 2014-08-08 08:44:57 UTC
I'm not able to reproduce installing hosted-engine with nfs3 storage.
I don't have AMD hardware but I don't think it's related.

 - ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch
 - ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch
 - vdsm-4.16.1-16.git27555ec.el6.x86_64
 - libvirt-0.10.2-29.el6_5.10.x86_64
 - sanlock-2.8-1.el6.x86_64

Can you please try to reproduce with ovirt-3.5-snapshot?
Reducing severity and priority since it's not 100% reproducible.

Comment 19 cshao 2014-08-08 08:50:49 UTC
(In reply to Sandro Bonazzola from comment #17)
> (In reply to shaochen from comment #16)
> > > is not in the vdsm and supervdsm logs, they start on 2014-08-07.
> > Those logs come form CentOS, you means you need the log from ovirt-node side?
> > If so let me re-setup env for you.
> > 
> > Thanks!
> 
> Well, I need the logs on both ovirt-hosted-engine-setup and subsystems
> (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try
> to trace where the issue originated.

Re-setup env for you debug.
ovirt-node env: ssh admin.9.139

Comment 20 cshao 2014-08-08 08:52:02 UTC
(In reply to shaochen from comment #19)
> (In reply to Sandro Bonazzola from comment #17)
> > (In reply to shaochen from comment #16)
> > > > is not in the vdsm and supervdsm logs, they start on 2014-08-07.
> > > Those logs come form CentOS, you means you need the log from ovirt-node side?
> > > If so let me re-setup env for you.
> > > 
> > > Thanks!
> > 
> > Well, I need the logs on both ovirt-hosted-engine-setup and subsystems
> > (vdsm, supervdsm, libvirt, sanlock) with the same time interval so I can try
> > to trace where the issue originated.
> 
> Re-setup env for you debug.
> ovirt-node env: ssh admin.9.139

Re-setup env for you debug.
ovirt-node env: ssh admin.9.139  P:redhat

Comment 21 Sandro Bonazzola 2014-08-08 10:51:09 UTC
Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied by server while mounting 10.66.8.184:/home

Comment 22 Sandro Bonazzola 2014-08-08 10:51:25 UTC
Looks like you configured an unreachable storage.

Please reopen if you're able to reproduce with valid storage.

Comment 23 cshao 2014-08-11 03:05:56 UTC
(In reply to Sandro Bonazzola from comment #21)
> Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied
> by server while mounting 10.66.8.184:/home

No, that is not the reason, actually the nfs is available all the time, and the failure is not in here.
# showmount -e 10.66.8.184
Export list for 10.66.8.184:
/home/vol/cshao/export *
/home/vol/cshao/iso4   *
/home/vol/cshao/iso3   *
/home/vol/cshao/iso2   *
/home/vol/cshao/iso    *
/home/vol/cshao/data3  *
/home/vol/cshao/data2  *
/home/vol/cshao/data   

It will report error if the nfs can't access and the deploy will interrupt.
e.g. 
Error while mpunting specified path: mount.nfs: access denied by server while mounting 10.66.8.184:/home/vol/cshao/iso5.
Maybe there were some another reason of failure.

Comment 24 Sandro Bonazzola 2014-08-11 13:08:56 UTC
(In reply to shaochen from comment #23)
> (In reply to Sandro Bonazzola from comment #21)
> > Cannot access storage connection 10.66.8.184:/home: mount.nfs: access denied
> > by server while mounting 10.66.8.184:/home
> 
> No, that is not the reason, actually the nfs is available all the time, and
> the failure is not in here.
> # showmount -e 10.66.8.184
> Export list for 10.66.8.184:
> /home/vol/cshao/export *
> /home/vol/cshao/iso4   *
> /home/vol/cshao/iso3   *
> /home/vol/cshao/iso2   *
> /home/vol/cshao/iso    *
> /home/vol/cshao/data3  *
> /home/vol/cshao/data2  *
> /home/vol/cshao/data   
> 
> It will report error if the nfs can't access and the deploy will interrupt.
> e.g. 
> Error while mpunting specified path: mount.nfs: access denied by server
> while mounting 10.66.8.184:/home/vol/cshao/iso5.
> Maybe there were some another reason of failure.

Can't find one in the provided logs

Comment 25 cshao 2014-08-13 07:04:10 UTC
(In reply to Sandro Bonazzola from comment #24)

> Can't find one in the provided logs

1. Run "hosted-engine --deploy"
Configure for the first time,  
The network will lost, "ONBOOT=no", "BRIDGE=ovirtmgmt"
A new bridge will be create but actually failed. 
ifcfg-eth0
DEVICE=eth0
ONBOOT=no
HWADDR=00:24:21:7f:b7:19
BRIDGE=ovirtmgmt
NM_CONTROLLED=no

#brctl show
bridge name  bridge id            STP enabled   interface
;vdsmdummy;  8000.000000000000    no

2. Configure the network back.
3. Run "hosted-engine --deploy"
[INFO] Configure the management bridge.
[ERROR] Failed to execute stage Misc configuration': Command '/usr/bin/vdsClient' failed to execute
[INFO] Stage: Clean up

Please see log:
admin.11.13  p:redhat
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20140813065404-nanrcg.log

Comment 26 Sandro Bonazzola 2014-08-27 06:39:40 UTC
Moving back to NEW, looks like it's related to Bug #1128065.
A new node iso is needed for testing this again.

Fabian, can you build a new Node iso?

Comment 27 Sandro Bonazzola 2014-08-29 11:43:57 UTC
rbarry built a new iso, can you try to reproduce with:
http://resources.ovirt.org/pub/ovirt-3.5-pre/iso/ovirt-node-iso-3.5.0.ovirt35.20140827.el6.iso

Comment 28 cshao 2014-09-01 02:37:58 UTC
Test version:
ovirt-node-iso-3.5.0.ovirt35.20140827.el6.iso
ovirt-node-3.1.0-0.0.master.el6.noarch
ovirt-node-plugin-hosted-engine-0.1.0-0.0.master.el6.x86_64

Test steps:
1. Run "hosted-engine --deploy"

Test result:
Met bug 1134873.
it will report error as follows:
[ERROR] Failed to execute stage 'Programs detection': Hosted Engine HA services are already running on this system. Hosted Engine cannot be deployed on a host already running those services.

Test env:
10.66.11.13 P:redhat

Comment 29 Artyom 2014-09-01 08:57:51 UTC
Hi Sandro, I tried install hosted-engine to clean host and I also had problem with connectivity, when hosted-engine-setup create bridge, setup change ONBOOT=no(exactly like shaochen said)of interface where you create bridge and after it restart network, clearly that network still in status down, it lead to installation failed from some connectivity reason. Problem exist only for clean host installation. It also can be a big problem for installation via ssh, if you have no network after bridge configuration the only way to reach host it physically or via power management.

ovirt-hosted-engine-setup-1.2.0-0.1.master.20140820130617.gitd832f86.el6.noarch

If you will need additional logs ask me.

Comment 30 Artyom 2014-09-01 12:14:20 UTC
My mistake it was correct for earlier version for ovirt-hosted-engine-setup-1.2.0-0.1.master.20140820130617.gitd832f86.el6.noarch all works fine also on clean install.

Comment 31 Doron Fediuck 2014-09-09 12:25:13 UTC
Closing based on comment 30.

If there's a specific ovirt-node issue, please open an ovirt-node issue.
The same goes for networking issues- please open a new bz with the relevant information.


Note You need to log in before you can comment on or make changes to this bug.