Bug 1276736 - After upgrading vdsm rpms from 3.5 to 3.6, vdsm fails to restart cause it's not configured [NEEDINFO]
After upgrading vdsm rpms from 3.5 to 3.6, vdsm fails to restart cause it's n...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Tools (Show other bugs)
4.17.10
Unspecified Unspecified
urgent Severity urgent (vote)
: ovirt-3.6.1
: ---
Assigned To: Yeela Kaplan
Jiri Belka
infra
:
Depends On:
Blocks: RHEV_36_HTB RHEV3.6Upgrade
  Show dependency treegraph
 
Reported: 2015-10-30 13:09 EDT by Simone Tiraboschi
Modified: 2016-05-23 07:03 EDT (History)
13 users (show)

See Also:
Fixed In Version: v4.17.10.1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-22 08:28:30 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sbonazzo: needinfo? (ykaplan)
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
oourfali: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
messages log contianing start of vdsm after upgrade (3.96 KB, text/plain)
2015-11-06 03:53 EST, Piotr Kliczewski
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47941 master MERGED init: configure multipath on upgrade from ovirt-3.5 Never
oVirt gerrit 47956 ovirt-3.6 MERGED init: configure multipath on upgrade from ovirt-3.5 Never
oVirt gerrit 47957 ovirt-3.6.0 MERGED init: configure multipath on upgrade from ovirt-3.5 Never
oVirt gerrit 48194 master ABANDONED init: run configure --force on upgrade from ovirt-3.5 Never
oVirt gerrit 48222 master MERGED Adding call to configure passwd (libvirt sasl password) Never
oVirt gerrit 48231 ovirt-3.6.0 MERGED Adding call to configure passwd (libvirt sasl password) Never
oVirt gerrit 48232 ovirt-3.6 MERGED Adding call to configure passwd (libvirt sasl password) Never

  None (edit)
Description Simone Tiraboschi 2015-10-30 13:09:17 EDT
Description of problem:
After upgrading vdsm rpms from 3.5 to 3.6, vdsm fails to restart:

yum install -y http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
yum update -y
[root@c71het20151029 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: failed (Result: start-limit) since ven 2015-10-30 17:57:24 CET; 1min 24s ago
 Main PID: 3083 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/vdsmd.service

ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Failed to start Virtual Desktop Server Manager.
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Unit vdsmd.service entered failed state.
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Starting Virtual Desktop Server Manager...
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: vdsmd.service holdoff time over, scheduling restart.
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Stopping Virtual Desktop Server Manager...
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Starting Virtual Desktop Server Manager...
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: vdsmd.service start request repeated too quickly, refusing to start.
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Failed to start Virtual Desktop Server Manager.
ott 30 17:57:24 c71het20151029.localdomain systemd[1]: Unit vdsmd.service entered failed state.


[root@c71het20151029 ~]# systemctl restart vdsmd
Job for vdsmd.service failed. See 'systemctl status vdsmd.service' and 'journalctl -xn' for details.

VDSM fails cause is not fully configured:

Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: Error:
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: One of the modules is not configured to work with VDSM.
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: To configure the module use the following:
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: 'vdsm-tool configure [--module module-name]'.
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: If all modules are not configured try to use:
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: 'vdsm-tool configure --force'
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: (The force flag will stop the module's service and start it
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: afterwards automatically to load the new configuration.)
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: Downrev multipath.conf detected, upgrade required
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: libvirt is already configured for vdsm
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: Modules multipath are not configured
Oct 30 18:03:03 c71het20151029 vdsmd_init_common.sh: vdsm: stopped during execute check_is_configured task (task returned with error code 1).
Oct 30 18:03:03 c71het20151029 systemd: vdsmd.service: control process exited, code=exited status=1

To fix is necessary to manually run:
vdsm-tool configure --force

At that point it will start again.


Version-Release number of selected component (if applicable):
upgrading from vdsm-4.16.27-0.el7.centos to vdsm-4.16.27-0.el7.centos

How reproducible:
100%

Steps to Reproduce:
1. deploy the host with 3.5 repo
2. add 3.6 repo
3. yum update
4. VDSM will not restart

Actual results:
VDSM will not restart after yum update cause is not well configured

Expected results:
VDSM should be retrocompatible on its own configuration or at least it should auto fix on rpm post install

Additional info:
Redploy the host from the engine is not an issue cause host-deploy will always call vdsm-tool configure --force
Comment 1 Yaniv Bronhaim 2015-11-01 03:50:07 EST
Redeploy is indeed not relevant here. Manual intervention shouldn't take place as well. 
I see "Modules multipath are not configured" - First I thought that it was sanlock which alerted it, and it can happen if the host was connected to storage domain while you perform the upgrade. means that it might hold locks which disturb sanlock from being restarted, as vdsm-tool tries to restart the services after the reconfigure.

We might need to prevent upgrade if the host is attached and might hold sanlock locks .

But this is not the case here - its multipath configurator which we need to check. 

Please attach /var/log/messages and vdsm.log and we'll see if it can give a lead, if not - we will try to reproduce it.
Comment 2 Oved Ourfali 2015-11-01 04:05:43 EST
Simone, please also check what happens when upgrading through the UI (if you're working with 3.6 engine).

So, after adding the 3.6 repo, do the following:
1. engine side: engine-config -s HostPackagesUpdateTimeInHours=0.05 
2. restart the engine
3. go to webadmin
4. after no more than 3 minutes, there should be an "upgrade" button enabled on that host.
5. press upgrade

It will move the host to maintenance and then upgrade.
Comment 3 Yeela Kaplan 2015-11-01 08:28:23 EST
This should have been tested manually much earlier.

Multipath configurator through vdsm-tool was added only in ovirt-3.6

On upgrade from ovirt-3.5 to ovirt-3.6 configuration is done in postun section of the spec file of ovirt-3.5. 

Which currently only configures libvirt.


The way I see it, we have to options to fix this.

Either add to the ovirt-3.5 postun section a call to configure all modules through vdsm-tool.

Or change is-configure for multipath configurator in vdsm-tool on 3.6 to configure multipath in case it is not configured.

Dan,
What do you think is a better option? Or do you have a better idea?

Do you remember why we did not add the configure of all modules to the 3.5 spec in the first place?
Comment 4 Dan Kenigsberg 2015-11-01 10:06:51 EST
We do not have an open-source time machine, so we cannot change ovirt-3.5's %postun section.

Adding a side effect to is-configure is too evil.

I'm afraid that our only option is to do something like
http://gerrit.ovirt.org/31561

I'd consider this bug as an ovirt-3.6.0-ga blocker: it would make every upgraded host unusable.
Comment 5 Red Hat Bugzilla Rules Engine 2015-11-01 10:06:53 EST
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.
Comment 6 Dan Kenigsberg 2015-11-01 10:27:12 EST
Come to think of that, the workaround is simple

  vdsm-tool configure --module multipath; service vdsmd start

so a prominent release note could be enough.
Comment 7 Red Hat Bugzilla Rules Engine 2015-11-02 04:06:19 EST
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.
Comment 8 Simone Tiraboschi 2015-11-02 04:18:57 EST
(In reply to Dan Kenigsberg from comment #6)
> Come to think of that, the workaround is simple
> 
>   vdsm-tool configure --module multipath; service vdsmd start
> 
> so a prominent release note could be enough.

We have an issue with hosted-engine upgrade path:
on 3.5 -> 3.6 upgrade, the HA agent is performing some operations witch requires VDSM to be up (creating a new volume to store the hosted-engine configuration on the shared volume for instance).

The upgrade path is simply started upgrading ha-agent rpm (and it's dependencies including VDSM) from 3.5 to 3.6 without requiring manual actions on the host.
So if VDSM doesn't restart after the upgrade, the whole hosted-engine upgrade path is locked till the user manually reconfigures vdsm connecting to the host.
Comment 9 Sandro Bonazzola 2015-11-02 04:46:51 EST
Oved I suggest to re consider this as blocker seeing comment #8
Comment 10 Oved Ourfali 2015-11-02 04:52:06 EST
(In reply to Sandro Bonazzola from comment #9)
> Oved I suggest to re consider this as blocker seeing comment #8

For such use-cases there is one host that you can do the manual step above after upgrading vdsm, and then run the hosted engine upgrade.

We have a patch that is under review, but not sure worth postponing the release on that. We'll consult this offline, and re-address this bug if needed.
Comment 11 Simone Tiraboschi 2015-11-02 06:15:39 EST
(In reply to Oved Ourfali from comment #2)
> Simone, please also check what happens when upgrading through the UI (if
> you're working with 3.6 engine).

Yes, it happens also upgrading from the engine.


Nov 2, 2015 12:07:55 PM

Host not_he is non responsive.

Nov 2, 2015 12:07:51 PM

Host not_he is not responding. Host cannot be fenced automatically because power management for the host is disabled.

Nov 2, 2015 12:07:49 PM

Host not_he upgrade was completed successfully.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Stage: Termination.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-mgmt-20151102120749-c71het20151030.localdomain-a8eb0c4.log'.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Stage: Pre-termination.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Stage: Closing up.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Stage: Transaction commit.

Nov 2, 2015 12:07:49 PM

Installing Host not_he. Stage: Misc configuration.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 15/15: vdsm-python.noarch 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 14/15: vdsm-cli.noarch 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 13/15: vdsm-yajsonrpc.noarch 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 12/15: vdsm-jsonrpc.noarch 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 11/15: vdsm.x86_64 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 10/15: vdsm-xmlrpc.noarch 0:4.16.27-0.el7.centos - ud.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 9/15: vdsm-python.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 8/15: vdsm-cli.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 7/15: vdsm-infra.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:48 PM

Installing Host not_he. Yum Verify: 6/15: vdsm-xmlrpc.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:47 PM

Installing Host not_he. Yum Verify: 5/15: vdsm.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:47 PM

Installing Host not_he. Yum Verify: 4/15: vdsm-jsonrpc.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:47 PM

Installing Host not_he. Yum Verify: 3/15: safelease.x86_64 0:1.0-5.el7 - u.

Nov 2, 2015 12:07:47 PM

Installing Host not_he. Yum Verify: 2/15: vdsm-yajsonrpc.noarch 0:4.17.10-0.el7.centos - u.

Nov 2, 2015 12:07:47 PM

Installing Host not_he. Yum Verify: 1/15: ovirt-vmconsole.noarch 0:1.0.0-1.el7.centos - u.

Nov 2, 2015 12:07:46 PM

Installing Host not_he. Yum updated: 15/15: vdsm-yajsonrpc.

Nov 2, 2015 12:07:46 PM

Installing Host not_he. Yum updated: 14/15: vdsm-python.

Nov 2, 2015 12:07:46 PM

Installing Host not_he. Yum updated: 13/15: vdsm-xmlrpc.

Nov 2, 2015 12:07:45 PM

Installing Host not_he. Yum updated: 12/15: vdsm-cli.

Nov 2, 2015 12:07:45 PM

Installing Host not_he. Yum updated: 11/15: vdsm-jsonrpc.

Nov 2, 2015 12:07:36 PM

Installing Host not_he. Yum updated: 10/15: vdsm.

Nov 2, 2015 12:07:36 PM

Installing Host not_he. Yum update: 9/15: vdsm-cli-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:32 PM

Installing Host not_he. Yum update: 8/15: vdsm-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:12 PM

Installing Host not_he. Yum install: 7/15: ovirt-vmconsole-1.0.0-1.el7.centos.noarch.

Nov 2, 2015 12:07:11 PM

Installing Host not_he. Yum update: 6/15: vdsm-jsonrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:11 PM

Installing Host not_he. Yum update: 5/15: vdsm-yajsonrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:10 PM

Installing Host not_he. Yum install: 4/15: safelease-1.0-5.el7.x86_64.

Nov 2, 2015 12:07:09 PM

Installing Host not_he. Yum update: 3/15: vdsm-xmlrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:06 PM

Installing Host not_he. Yum update: 2/15: vdsm-python-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:06 PM

Installing Host not_he. Yum install: 1/15: vdsm-infra-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Status: Running Transaction.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Status: Running Test Transaction.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Status: Check Package Signatures.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Download/Verify: vdsm-yajsonrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Download/Verify: vdsm-xmlrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:05 PM

Installing Host not_he. Yum Download/Verify: vdsm-python-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:04 PM

Installing Host not_he. Yum Download/Verify: vdsm-jsonrpc-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:04 PM

Installing Host not_he. Yum Download/Verify: vdsm-infra-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:04 PM

Installing Host not_he. Yum Download/Verify: vdsm-cli-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:04 PM

Installing Host not_he. Yum Download/Verify: vdsm-4.17.10-0.el7.centos.noarch.

Nov 2, 2015 12:07:03 PM

Installing Host not_he. Yum Download/Verify: safelease-1.0-5.el7.x86_64.

Nov 2, 2015 12:07:03 PM

Installing Host not_he. Yum Download/Verify: ovirt-vmconsole-1.0.0-1.el7.centos.noarch.

Nov 2, 2015 12:07:03 PM

Installing Host not_he. Yum Status: Downloading Packages.

Nov 2, 2015 12:06:58 PM

Installing Host not_he. Stage: Package installation.

Nov 2, 2015 12:06:58 PM

Installing Host not_he. Stage: Misc configuration.

Nov 2, 2015 12:06:58 PM

Installing Host not_he. Stage: Transaction setup.

Nov 2, 2015 12:06:58 PM

Installing Host not_he. Stage: Setup validation.

Nov 2, 2015 12:06:57 PM

Installing Host not_he. Logs at host located at: '/tmp/ovirt-host-mgmt-20151102120646-7uf4be.log'.

Nov 2, 2015 12:06:57 PM

Installing Host not_he. Stage: Environment customization.

Nov 2, 2015 12:06:57 PM

Installing Host not_he. Stage: Programs detection.

Nov 2, 2015 12:06:48 PM

Storage Pool Manager runs on Host hosted_engine_1 (Address: c71het20151028.localdomain).

Nov 2, 2015 12:06:46 PM

Installing Host not_he. Stage: Environment packages setup.

Nov 2, 2015 12:06:46 PM

Installing Host not_he. Stage: Environment setup.

Nov 2, 2015 12:06:46 PM

Installing Host not_he. Stage: Initializing.

Nov 2, 2015 12:06:46 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Nov 2, 2015 12:06:45 PM

Installing Host not_he. Connected to host c71het20151030.localdomain with SSH key fingerprint: SHA256:KXfePh0dbmBw3ZXcHkgaov14XUbF00XKnCPVPGdnmns.

Nov 2, 2015 12:06:41 PM

Host not_he upgrade was started (User: admin@internal).

Nov 2, 2015 12:06:41 PM

Host not_he was switched to Maintenance Mode.

Nov 2, 2015 12:06:28 PM

User admin@internal logged in.

Nov 2, 2015 12:05:32 PM

Host not_he has available updates: vdsm,vdsm-cli.
Comment 12 Simone Tiraboschi 2015-11-02 06:18:32 EST
(In reply to Oved Ourfali from comment #10)
> (In reply to Sandro Bonazzola from comment #9)
> > Oved I suggest to re consider this as blocker seeing comment #8
> 
> For such use-cases there is one host that you can do the manual step above
> after upgrading vdsm, and then run the hosted engine upgrade.

There isn't a specific user command to start the hosted-engine upgrade path:
just upgrading ovirt-hosted-engine-ha rpm (and its dependencies including vdsm) it's enough to trigger it.
The issue is that here it blocks cause vdsm doesn't restart restart and the user ha to manually recover.
Comment 13 Red Hat Bugzilla Rules Engine 2015-11-04 07:25:53 EST
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 14 Sandro Bonazzola 2015-11-04 09:43:39 EST
This issue should be fixed in oVirt 3.6.0 released on November 4th 2015 but still need to be checked by QE
Comment 15 Jiri Belka 2015-11-05 11:20:45 EST
after having a 3.5.6 host in 3.5 cluster (3.6 engine) highlighted as ready for upgrade and executing Upgrade action, upgrade was successful to vdsm-4.17.10.1-0.el7ev.noarch but the host is nonresponsive state.

2015-11-05 17:08:28,447 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [23bcfe3] Correlation ID: 23bcfe3, Call Stack: null, Custom Event ID: -1, Message: Installing Host 
dell-r210ii-13. Yum updated: 11/17: vdsm.
2015-11-05 17:08:33,288 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages
...
2015-11-05 17:08:40,339 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-2) [23bcfe3] Correlation ID: 23bcfe3, Call Stack: null, Custom Event ID: -1, Message: Host dell-r210ii-13 upgrade was completed successfully.
2015-11-05 17:08:40,504 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to dell-r210ii-13.example.com/10.34.62.205
2015-11-05 17:08:40,506 WARN  [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed
2015-11-05 17:08:40,506 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (DefaultQuartzScheduler_Worker-19) [] Exception during connection
2015-11-05 17:08:40,526 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-19) [] Command 'GetCapabilitiesVDSCommand(HostName = dell-r210ii-13, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='b4e27f36-4789-48f9-8f6e-1aed155e0648', vds='Host[dell-r210ii-13,b4e27f36-4789-48f9-8f6e-1aed155e0648]'})' execution failed: java.net.ConnectException: Connection refused
2015-11-05 17:08:40,527 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-19) [] Failure to refresh Vds runtime info: java.net.ConnectException: Connection refused
2015-11-05 17:08:40,528 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-19) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:]


[root@dell-r210ii-13 yum.repos.d]# vdsClient -s 0 getVdsCaps
Connection to 0.0.0.0:54321 refused

[root@dell-r210ii-13 yum.repos.d]# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: failed (Result: start-limit) since Thu 2015-11-05 17:08:38 CET; 7min ago
  Process: 15672 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 16004 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE)
 Main PID: 12914 (code=exited, status=0/SUCCESS)

Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: Unit vdsmd.service entered failed state.
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: vdsmd.service failed.
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: start request repeated too quickly for vdsmd.service
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: Unit vdsmd.service entered failed state.
Nov 05 17:08:38 dell-r210ii-13.example.com systemd[1]: vdsmd.service failed.

# yum history info 4 | grep vdsm
    Updated     vdsm-4.16.29-1.el7ev.x86_64             @rhevm35
    Updated     vdsm-cli-4.16.29-1.el7ev.noarch         @rhevm35
    Dep-Install vdsm-infra-4.17.10.1-0.el7ev.noarch     @rhev-master
    Updated     vdsm-jsonrpc-4.16.29-1.el7ev.noarch     @rhevm35
    Updated     vdsm-python-4.16.29-1.el7ev.noarch      @rhevm35
    Updated     vdsm-xmlrpc-4.16.29-1.el7ev.noarch      @rhevm35
    Updated     vdsm-yajsonrpc-4.16.29-1.el7ev.noarch   @rhevm35
   1 warning: /etc/vdsm/vdsm.conf created as /etc/vdsm/vdsm.conf.rpmnew
Comment 16 Jiri Belka 2015-11-05 11:27 EST
Created attachment 1090277 [details]
/var/log/ovirt-engine
Comment 18 Oved Ourfali 2015-11-05 12:37:19 EST
Jiri, all I see in the vdsm.log is messages like:
Detector thread::DEBUG::2015-11-05 17:21:11,047::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 10.34.63.116:34710
Detector thread::DEBUG::2015-11-05 17:21:14,081::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 10.34.63.116:34711


Repeating all the time.
Since November 4th.

Are you sure the host was up in the engine before you upgraded it?
Can you make sure it is up?

Also, please try to do that manually on the host itself (as reported in the bug).
Comment 19 Yaniv Bronhaim 2015-11-05 12:43:36 EST
Please attach /var/log/messages - only there you see what failed
Comment 20 Piotr Kliczewski 2015-11-06 03:16:19 EST
Looking at the engine and vdsm logs I can see that we fail handshake. The engine was never able to connect and we can see in the logs:

Connection refused

whereas in vdsm logs we can see:

(_process_handshake) Error during handshake: unexpected eof

Has ssl configuration changed or certificates were renewed?
Comment 21 Oved Ourfali 2015-11-06 03:21:10 EST
Also, logs suggest it ws like that even before upgrading according to the date.
Comment 22 Piotr Kliczewski 2015-11-06 03:53 EST
Created attachment 1090517 [details]
messages log contianing start of vdsm after upgrade
Comment 23 Piotr Kliczewski 2015-11-06 04:00:03 EST
in messages file I can see:

Nov  6 09:44:48 dell-r210ii-13 daemonAdapter: libvirt: XML-RPC error : authentication failed: authentication failed
Nov  6 09:44:49 dell-r210ii-13 journal: authentication failed: Failed to start SASL negotiation: -20 (SASL(-13): user not found: unable to canonify user and get auxprops)
Nov  6 09:44:49 dell-r210ii-13 journal: authentication failed: authentication failed
Nov  6 09:44:49 dell-r210ii-13 journal: End of file while reading data: Input/output error
Nov  6 09:44:49 dell-r210ii-13 daemonAdapter: libvirt: XML-RPC error : authentication failed: authentication failed

and mom-vdsm service is down and not able to start.
Comment 24 Oved Ourfali 2015-11-06 04:53:18 EST
Restoring needinfo as I've asked Jiri additional questions. 

In addition, Piotr, do you think the issue is with mom?
Comment 25 Piotr Kliczewski 2015-11-06 05:10:49 EST
I am not sure why mom was not able to start. I collected required information and checked that configure --force fixes the issue. I suggest to run it instead only reconfiguration for multipath.
Comment 26 Oved Ourfali 2015-11-06 05:12:12 EST
Yaniv/Dan, thoughts?
Comment 27 Jiri Belka 2015-11-06 06:14:33 EST
read from bottom to top. the host was Up before upgrade.
 
 2015-11-05 17:08:01.492+01 | Installing Host dell-r210ii-13. Connected to host dell-r210ii-13.rhev.lab.eng.brq.redhat.com with SSH key fingerprint: SHA256:RySgMVzyTYEUFuaA9Az+/jvc/aIiC2ZTraVNLTczCRU.
 2015-11-05 17:07:57.347+01 | Host dell-r210ii-13 upgrade was started (User: admin@internal).
 2015-11-05 17:07:57.236+01 | Host dell-r210ii-13 was switched to Maintenance Mode.
 2015-11-05 17:02:24.714+01 | Host dell-r210ii-13 has available updates: vdsm,vdsm-cli.
 2015-11-05 16:47:44.458+01 | Status of host dell-r210ii-13 was set to Up.
 2015-11-05 16:47:42.609+01 | Host dell-r210ii-13 installed
 2015-11-05 16:47:42.557+01 | Network changes were saved on host dell-r210ii-13
 2015-11-05 16:47:28.91+01  | Installing Host dell-r210ii-13. Stage: Termination.
Comment 28 Jiri Belka 2015-11-06 07:08:58 EST
I just reproduced it again, vdsm from 3.5.6 -> vdsm-4.17.10.1-0.el7ev.noarch
Comment 29 Yaniv Bronhaim 2015-11-06 15:32:43 EST
So the problem is not the multipath thing now - this is new issue in upgrade that we didn't check at all. 

In 3.5.6 rhevh we added specific change to rhevm build (https://gerrit.eng.lab.tlv.redhat.com/#/c/15009/) if you can't access the link it just one liner which configures the sasl user (that vdsm uses to access libvirt) to be vdsm@rhevh (the reasons does not matter for our bug)

Now, in 3.6 we check that vdsm@ovirt user is configured when starting vdsm - because we do not touch the passwd configuration during upgrade (as explained in comment #3 we only run libvirt configurator during upgrade between 3.5 to 3.6), the user that is configured is only vdsm@rhevh which vdsm 4.17.x does not care about..
running manually "vdsm-tool configure" for all modules will run the passwd.py configurator which will set vdsm@ovirt - so indeed running it as pioter suggests will do the work - but won't remove the vdsm@rhevh user (not sure how much its important, but its ugly)

I don't see any risks to run configure for all modules as in https://gerrit.ovirt.org/#/c/48194/1/init/vdsmd_init_common.sh.in.

btw, it works in ovirt 3.5 to ovirt 3.6 - you used rhevm build, that's why Yeela and I failed to reproduce it
Comment 30 Yaniv Bronhaim 2015-11-07 08:42:13 EST
another innocent alternative https://gerrit.ovirt.org/#/c/48222/
Comment 31 Oved Ourfali 2015-11-08 01:40:54 EST
One correction, the change was added in RHEV 3.5 (before 3.5.6), so it will effect any upgrade between RHEV 3.5 and RHEV 3.6.

And, as Yaniv stated, oVirt isn't affected.
Comment 32 Oved Ourfali 2015-11-16 08:47:08 EST
As VDSM wasn't respinned, moving to 3.6.1.
Comment 33 Jiri Belka 2015-11-26 05:56:01 EST
ok, vdsm-4.17.11-0.el7ev.noarch

upgrade from vdsm 3.5.6 to vdsm from 3.6.0-22 build, all works ok
Comment 34 Sandro Bonazzola 2015-12-16 07:22:23 EST
Accordiing to target milestone and verification status this bug should be solved in oVirt 3.6.1. However, missing the target release field, this bug can't be moved to closed current release automatically.
Please check target_milestone and target_release and move to closed current release if included in oVirt <= 3.6.1.
Thanks.
Comment 35 Sandro Bonazzola 2015-12-22 08:28:30 EST
oVirt 3.6.1 has been released and bz verified, moving to closed current release

Note You need to log in before you can comment on or make changes to this bug.