1528868 – problems upgrading from ovirt 4.1.

Bug 1528868 - problems upgrading from ovirt 4.1.

Summary: problems upgrading from ovirt 4.1.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.2.0.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	ovirt-4.2.3
Target Release:	4.2.3.2
Assignee:	Arik
QA Contact:	Jiri Belka
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1559332 (view as bug list)
Depends On:
Blocks:	1420115
TreeView+	depends on / blocked

Reported:	2017-12-24 20:11 UTC by jas
Modified:	2018-05-10 06:32 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ovirt-engine-4.2.3.2
Clone Of:
Environment:
Last Closed:	2018-05-10 06:32:23 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+

Attachments	(Terms of Use)
ovirt engine log (1.96 MB, text/plain) 2017-12-24 20:11 UTC, jas	no flags	Details
shows console option grayed out (81.80 KB, image/png) 2017-12-28 22:05 UTC, jas	no flags	Details
shows console options for vm with grayed out console (104.57 KB, image/png) 2017-12-28 22:05 UTC, jas	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	90175	master	MERGED	core: ensure that graphics and console devices are plugged	2018-04-16 05:24:01 UTC
oVirt gerrit	90288	ovirt-engine-4.2	MERGED	core: ensure that graphics and console devices are plugged	2018-04-17 08:03:13 UTC
oVirt gerrit	90367	master	MERGED	core: ensure imported console device is plugged	2018-04-17 07:37:57 UTC

Description jas 2017-12-24 20:11:26 UTC

Created attachment 1371956 [details]
ovirt engine log

I had several problems upgrading 4.1.0.8 to 4.2.0.2.

1) oVirt type on most VMs changed from server to desktop.

2) I'm unable to get a console on CentOS 7 hosts after changing cluster compatibility from 4.1 to 4.2, and rebooting hosts.

On the first host, archive, I can't get console, but it's up and I can access it on the network.
On the second host, dist, ovirt says that it was up, but not only I couldn't get console, I couldn't access the host either on the network.  I was able to "reboot" it, and shut it down through ovirt, but not able to access it.  For this host, I reinstalled it, and found that after reinstall, I could access console even after several reboots.
On the third host, ftp, I can't access console after rebooting it, but it's on the network.

I need to access console on these machines without reinstalling them all.

3) The "host update" function fails.  All the hosts have been updated with yum to the latest, and rebooted, but host updated fails.

This is what I see:

2017-12-23 19:11:36,479-05 INFO [org.ovirt.engine.core.bll.hos
tdeploy.HostUpgradeCheckCommand] (default task-156)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Running command:
HostUpgradeCheckCommand internal: false. Entities affected :  ID:
45f8b331-842e-48e7-9df8-56adddb93836 Type: VDSAction group
EDIT_HOST_CONFIGURATION with role type ADMIN
2017-12-23 19:11:36,496-05 INFO [org.ovirt.engine.core.dal.dbb
roker.auditloghandling.AuditLogDirector] (default task-156) [] EVENT_ID:
HOST_AVAILABLE_UPDATES_STARTED(884), Started to check for available
updates on host virt1.
2017-12-23 19:11:36,500-05 INFO [org.ovirt.engine.core.bll.hos
tdeploy.HostUpgradeCheckInternalCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Running command:
HostUpgradeCheckInternalCommand internal: true. Entities affected : ID:
45f8b331-842e-48e7-9df8-56adddb93836 Type: VDS
2017-12-23 19:11:36,504-05 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Executing Ansible command:
ANSIBLE_STDOUT_CALLBACK=hostupgradeplugin [/usr/bin/ansible-playbook,
--check, --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa,
--inventory=/tmp/ansible-inventory1039100972039373314,
/usr/share/ovirt-engine/playbooks/ovirt-host-upgrade.yml] [Logfile: null]
2017-12-23 19:11:37,897-05 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Ansible playbook command has
exited with value: 4
2017-12-23 19:11:37,897-05 ERROR [org.ovirt.engine.core.bll.host.HostUpgradeManager]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Failed to run check-update of host
'virt1-mgmt'.
2017-12-23 19:11:37,897-05 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] Failed to check if updates are
available for host 'virt1' with error message 'Failed to run check-update
of host 'virt1-mgmt'.'
2017-12-23 19:11:37,904-05 ERROR [org.ovirt.engine.core.dal.dbb
roker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[ae11a704-3b40-45d3-9850-932f6ed91ed9] EVENT_ID:
HOST_AVAILABLE_UPDATES_FAILED(839), Failed to check for available updates
on host virt1 with message 'Failed to run check-update of host
'virt1-mgmt'.'.

I know these are 3 separate issues, but I'm going to post my logs, and it doesn't make sense to post my logs 3 separate times, so I'm collecting these all together into one bug report.  There are so many logs for ovirt, so I have no idea what's right to include.  I'm starting with the engine log on the standalone engine, and the vdsm log on virt1 (though I have 4 hosts).

Comment 1 jas 2017-12-24 20:14:41 UTC

See virt1 log here .. too big to attach.. https://www.eecs.yorku.ca/~jas/vdsm.log.virt1

Comment 2 Yaniv Kaul 2017-12-25 07:38:31 UTC

Arik - please take a look.

Comment 3 Arik 2017-12-25 10:30:18 UTC

I can't reproduce #1 and can't figure out from examining the code how the type of VMs may change unintentionally from server to desktop.
Can you please provide the output of:
"select vm_guid, vm_name, vm_type from vm_static;"
and name a VM that changed?

Comment 4 Arik 2017-12-25 13:10:54 UTC

(In reply to jas from comment #0)
> 2) I'm unable to get a console on CentOS 7 hosts after changing cluster
> compatibility from 4.1 to 4.2, and rebooting hosts.
> 
> On the first host, archive, I can't get console, but it's up and I can
> access it on the network.
> On the second host, dist, ovirt says that it was up, but not only I couldn't
> get console, I couldn't access the host either on the network.  I was able
> to "reboot" it, and shut it down through ovirt, but not able to access it. 
> For this host, I reinstalled it, and found that after reinstall, I could
> access console even after several reboots.
> On the third host, ftp, I can't access console after rebooting it, but it's
> on the network.
> 
> I need to access console on these machines without reinstalling them all.

Do you have the cockpit service running on these hosts?

> 
> 3) The "host update" function fails.  All the hosts have been updated with
> yum to the latest, and rebooted, but host updated fails.

I separated this one out (bz 1528974) since it is in the scope of a different team.

Comment 5 Red Hat Bugzilla Rules Engine 2017-12-25 13:11:48 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 6 jas 2017-12-25 16:38:13 UTC

Hi.
Cockpit service is not running on engine (systemctl status cockpit shows 
# systemctl status cockpit
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:cockpit-ws(8)
On virt1, virt2, virt3, virt4 the same command reveals: Unit cockpit.service could not be found.
On the actual VMs, same.
Yet "dist" allows console (after reinstallation).
 which leads me to believe that this cockpit service is unlikely the cause of the console failure?

In order to give you the output of the sql that you require, I had to figure out how to access the ovirt db...

sudo sh
# psql
psql (9.2.23, server 9.5.9)
WARNING: psql version 9.2, server version 9.5.
         Some psql features might not work.
Type "help" for help.
HMM: I know that ovirt 4.2 upgraded psql, but server, and not client?
postgres=# \connect engine
You are now connected to database "engine" as user "postgres".
engine=# select vm_guid, vm_name, vm_type from vm_static;
               vm_guid                |  vm_name   | vm_type
--------------------------------------+------------+---------
 00000010-0010-0010-0010-0000000002a0 | Tiny       |       0
 00000012-0012-0012-0012-00000000016b | Small      |       0
 00000016-0016-0016-0016-00000000025e | Large      |       0
 00000018-0018-0018-0018-0000000003eb | XLarge     |       0
 386504b5-9abd-4d4b-b8c3-17fb2c79f199 | newtest    |       1
 00000014-0014-0014-0014-00000000010d | Medium     |       0
 00000000-0000-0000-0000-000000000000 | Blank      |       0
 81eb2dfc-4cf1-4136-9811-29683998e3c0 | dist       |       1
 aa487207-7ff4-465a-9d9b-2a103d50dc77 | ftp        |       1
 07331c78-6e09-41b4-9b29-edda32f46351 | samba      |       1
 1325cebf-834e-4571-a3e1-82beb0021ffe | dns1       |       0
 1734fed5-fc97-4714-bb63-55e8f4eb6d34 | wiki       |       1
 1c7e966f-29e8-4788-b64c-86e52f415061 | udb        |       1
 21409a44-a8cd-409c-91f1-1f654b0f3e5a | vpn3       |       0
 3bb83ad5-dc00-445d-8708-2db18c53e0e6 | www-vhost  |       1
 3e5a5728-8940-4fe7-b4af-11fb8e12f116 | www        |       1
 4c865b27-bce6-493b-8ad7-856599a2dd33 | winlicense |       1
 5885c43a-b133-45dc-b8d5-1aa2a4a19579 | prs        |       0
 59287796-c634-4f3c-94b1-2c9699986637 | vpn1       |       1
 64734fca-4d6c-443c-82d1-abf7cf106601 | webdav     |       1
 77766cb4-0625-4432-a16f-def5e702102a | labtest1   |       1
 899e3a1c-dcc4-4426-9dd4-0f1e8b94a2b8 | ipa        |       1
 911d6883-607c-464e-b46b-8e6141bdd2de | hdb        |       1
 922e2017-0fd6-4ac8-a204-449bfcd3f2c5 | ingtest    |       1
 92f312fd-dd53-42e6-8a7c-bf5c7de48661 | print      |       1
 9ea2821e-557f-45cb-9872-1e6c5aeaf246 | rs         |       0
 a82b4cc3-6ef6-4a8e-97ef-1d4ea1b7e07c | indigo     |       1
 a8d4e149-55ec-4961-a414-d7d1e1b05a59 | fix        |       0
 ae56c31c-71f7-433a-b198-a1c2c846325e | forum      |       1
 ae65a104-ed3f-40e0-ba92-5708cf338918 | vpn2       |       1
 bcc648ed-3c6e-4434-a09e-275d92502c8b | www-las    |       1
 bfe9a524-2238-4bab-a91f-5dc6ed270896 | pcmode     |       1
 c64d80f9-a3bd-44f3-89c4-f1b9b509cdde | red        |       1
 c8682859-9e55-460a-9ef0-d03e4a85f50e | dhcp1      |       0
 cc1dbe93-bb80-451e-8b3b-c9cc5e90432f | license    |       0
 d56363ad-9cb1-4a70-a33a-ce439b044d7d | svn-old    |       1
 dc66c051-24b7-4cb8-99eb-db398d396ef1 | samba1     |       1
 ed2ef552-f570-4670-9ef6-15b2ea71677f | svn        |       1
 f87ec08d-0fdf-42ba-8aac-c66c404d7aae | webapp     |       1
 fffde246-af07-480d-819c-5614c5ddf5ec | win7test   |       0
 8391f9a7-7fe7-4025-8fba-6273ae587a52 | archive    |       1
 bcd5776e-978b-4723-933d-70a80322a796 | cape       |       1
 ddf37076-e795-4c87-a4fb-9efcaae76d72 | dc1        |       1
(43 rows)

dns1, dhcp1, fix, license, prs, rs, vpn3 were, to my knowledge, server before the upgrade.  If this is an anomoly that isn't repeatable, it's obviously not a problem for me to change them back.. on the other hand, the console issue is a much bigger issue.

Comment 7 Arik 2017-12-25 19:00:14 UTC

(In reply to jas from comment #6)
> Yet "dist" allows console (after reinstallation).
>  which leads me to believe that this cockpit service is unlikely the cause
> of the console failure?

I'm a bit confused - do you have both a VM and a host named 'dist'? Because connecting with a console to a VM is very different than to a host.

In general, you don't need the cockpit service to be running on the engine but it must be running on the host(s) that you'd like to open a console to. You may want to have the service named 'ovirt-cockpit-sso' running on the engine's host to get single-sign-on into the cockpit interface but cockpit should be accessible for you even without it.

So please install cockpit on the hosts virt1,...,virt4 and make sure it is running.
 
> dns1, dhcp1, fix, license, prs, rs, vpn3 were, to my knowledge, server
> before the upgrade.  If this is an anomoly that isn't repeatable, it's
> obviously not a problem for me to change them back.. on the other hand, the
> console issue is a much bigger issue.

Thanks for this input.
I noticed that there were several non-automatic VM updates (i.e., not because of the cluster being upgraded) and I suspected that this may have caused the type of the VMs to change but this theory does not hold for the VMs you mention here.
Sure, the effect of changing VM type is insignificant after the VM is created, it may only add a sound card device to those VMs, so you can change it back to 'server' - but I'd still like to understand if there's a bug hiding here.
Before upgrading the engine to version 4.2 the system took a backup of your database, will it be possible for you to share that backup?

Comment 8 jas 2017-12-25 20:03:15 UTC

It seems that I'm not using the right terminology.
I have hosts virt1, virt2, virt3, and virt4.  You see my list in VMs above: dist, ftp, samba, etc. 
I intended to report a problem with my inability to get a console on the VMs.  For example, I had console on dist VM after the upgrade, but after changing cluster compatibility to 4.2, and rebooting, I lost access to the VM console for dist.    
However, since you mention the ability to access console on hosts (virt1-virt4), I see that I now have that ability in 4.2, but it doesn't work either. In order to solve the host console problem, you say that I would need to run ovirt-cockpit-sso on the hosts.  I actually think this should be something that setup configures automatically.  I generally haven't had to enable any additional services on the hosts by hand.  That being said, on virt1, I just tried doing a "yum install ovirt-cockpit-sso".
I then did a: systemctl enable ovirt-cockpit-sso
I then did a: systemctl start ovirt-cockpit-sso
I got back:
Job for ovirt-cockpit-sso.service failed because the control process exited with error code. See "systemctl status ovirt-cockpit-sso.service" and "journalctl -xe" for details.
Further status info is:

● ovirt-cockpit-sso.service - oVirt-Cockpit SSO service
   Loaded: loaded (/usr/lib/systemd/system/ovirt-cockpit-sso.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2017-12-25 14:54:59 EST; 1min 21s ago
  Process: 12179 ExecStartPre=/usr/share/ovirt-cockpit-sso/prestart.sh (code=exited, status=217/USER)

Dec 25 14:54:58 virt1.eecs.yorku.ca systemd[1]: Failed to start oVirt-Cockpit SSO service.
Dec 25 14:54:58 virt1.eecs.yorku.ca systemd[1]: Unit ovirt-cockpit-sso.service entered failed state.
Dec 25 14:54:58 virt1.eecs.yorku.ca systemd[1]: ovirt-cockpit-sso.service failed.
Dec 25 14:54:59 virt1.eecs.yorku.ca systemd[1]: ovirt-cockpit-sso.service holdoff time over, scheduling restart.
Dec 25 14:54:59 virt1.eecs.yorku.ca systemd[1]: start request repeated too quickly for ovirt-cockpit-sso.service
Dec 25 14:54:59 virt1.eecs.yorku.ca systemd[1]: Failed to start oVirt-Cockpit SSO service.
Dec 25 14:54:59 virt1.eecs.yorku.ca systemd[1]: Unit ovirt-cockpit-sso.service entered failed state.
Dec 25 14:54:59 virt1.eecs.yorku.ca systemd[1]: ovirt-cockpit-sso.service failed.

Engine backup shared.

Comment 9 jas 2017-12-25 20:04:46 UTC

Created attachment 1372224 [details]
engine dump before upgrade

Comment 10 jas 2017-12-25 20:07:14 UTC

If possible, mark the engine backup private so it is not publically downloadable.  I don't know what other details are in the dump.

Comment 11 jas 2017-12-25 21:40:09 UTC

My apologies.  I re-read what you wrote, and I realized that ovirt-cockpit-sso runs on the engine, and cockpit runs on the hosts.

On virt1 and virt2, I was able to successfully enable and start cockpit.
Now the hosts listen on port 9090.
When I try to connect through engine,by selecting the host and clicking "Host Console",  I now get:
Authentication failed: internal-error: Error validating auth token

One thing to note is that ovirt-mgmt is a private network between engine, and all the hosts, and my local machine from which I'm accessing engine isn't on that network.   If accessing the host via my machine is a preqreq, then I won't be able to make this work.  

(Though fixing the problem on VM console is more of a priority for me anyway.)

Comment 12 Arik 2017-12-26 08:56:53 UTC

(In reply to jas from comment #9)
> Created attachment 1372224 [details]
> engine dump before upgrade

Thanks, that is indeed the right dump (taken while the compatibility of the single cluster in the system was 4.1). I checked the mentioned VMs (dns1, dhcp1, fix, license, prs, rs, vpn3) and they all were defined as 'desktop' before the upgrade.

Comment 13 Arik 2017-12-26 09:13:09 UTC

(In reply to jas from comment #11)
> My apologies.  I re-read what you wrote, and I realized that
> ovirt-cockpit-sso runs on the engine, and cockpit runs on the hosts.
> 
> On virt1 and virt2, I was able to successfully enable and start cockpit.
> Now the hosts listen on port 9090.
> When I try to connect through engine,by selecting the host and clicking
> "Host Console",  I now get:
> Authentication failed: internal-error: Error validating auth token

And what's the status of these hosts (virt1 and virt2)?
If they are not up, I would try putting them into maintenance and do 'Enroll certificate'.
Anyway, if single-sign-on is not important for you, you can disable the ovirt-cockpit-sso service on the engine's hosts and manually log in to cockpit.

> 
> One thing to note is that ovirt-mgmt is a private network between engine,
> and all the hosts, and my local machine from which I'm accessing engine
> isn't on that network.   If accessing the host via my machine is a preqreq,
> then I won't be able to make this work.  

That's ok, we don't rely on having connectivity between clients (webadmin/sdk) and the hosts.

> 
> (Though fixing the problem on VM console is more of a priority for me
> anyway.)

So let's concentrate on that - could you also attach the dump of the current database?

Comment 14 Arik 2017-12-26 10:58:07 UTC

(In reply to Arik from comment #13) 
> > (Though fixing the problem on VM console is more of a priority for me
> > anyway.)
> 
> So let's concentrate on that - could you also attach the dump of the current
> database?

And also if you could please tell:
1. Where did you try to open a console from - VM portal or webadmin?
2. In the edit-VM dialog, what's the console type of VM 'archive'?

Comment 15 jas 2017-12-26 15:05:34 UTC

Actually, we can probably deal with both issues simultaneously...

In terms of host console: virt1, and virt2 are up (as are virt3 and virt4).  virt1 presently has 5 VMs, virt2 11. I only login through the administrative web portal to engine.  If I stop ovirt-cockpit-sso on engine, then when I click on virt1 host console, it goes to the URL https://virt1-mgmt:9090.  However, my home machine isn't on that network,of course, so my web browser cannot connect.  On the other hand, virt1.eecs.yorku.ca is the external network, and is marked as the display network.  That's the network it should be using.  That's the network that "console" uses. In my web browser, if I go to https://virt1.eecs.yorku.ca:9090, then I actually DO get the cockpit service running on virt1.  I can login, and I can access the console.  It's very nice. It looks like host console is potentially not using the network marked as display, so ovirt-cockpit-sso fails.

In terms of VM console: I opened the console from the web admin portal (logging into https://engine.eecs.yorku.ca), clicking on the host and choosing console.  archive is set to "Server".  I believe it was initially set to desktop after the upgrade, but I changed it to server before rebooting it.   

"fix" VM is set to desktop.  Without changing it to server, I connect to console (it works because it hasn't been rebooted), then I choose reboot.. the console closes, and the console option never ungrays out after that even after "fix" VM reboots.  "fix" is on host virt4.

"samba1" VM is server type on host "virt3".  I open up console on "samba1",  I then choose "reboot".  This time, I am surprisingly ABLE to connect to console on "samba1" after the reboot.

I then try on server "labtest1".  "labtest1" is running on virt4.  This time, I can't get to the console just as before.  Just in case it's an issue with virt4 (I don't think it is), I migrate labtest1 to virt3, then reboot it.  Again, I get no more console.

The current database is attached.

Comment 16 jas 2017-12-26 15:07:11 UTC

Created attachment 1372469 [details]
current engine backup

Comment 17 Michal Skrivanek 2017-12-27 12:50:34 UTC

(In reply to jas from comment #15)
> Actually, we can probably deal with both issues simultaneously...
> 
> In terms of host console: virt1, and virt2 are up (as are virt3 and virt4). 
> virt1 presently has 5 VMs, virt2 11. I only login through the administrative
> web portal to engine.  If I stop ovirt-cockpit-sso on engine, then when I
> click on virt1 host console, it goes to the URL https://virt1-mgmt:9090.

it opens whatever you configured as a hostname for that host

> However, my home machine isn't on that network,of course, so my web browser
> cannot connect.  On the other hand, virt1.eecs.yorku.ca is the external
> network, and is marked as the display network.  That's the network it should
> be using.  That's the network that "console" uses. In my web browser, if I
> go to https://virt1.eecs.yorku.ca:9090, then I actually DO get the cockpit
> service running on virt1.  I can login, and I can access the console.  It's
> very nice. It looks like host console is potentially not using the network
> marked as display, so ovirt-cockpit-sso fails.

it does not use display network. Display network is only for graphical VM consoles (SPICE/VNC) and not for the host management console which is rather just a web service

Comment 18 jas 2017-12-27 13:22:26 UTC

But isn't the hostname the name used by engine for monitoring the host?
Doesn't it make sense for that to happen over a private network that the typical host accessing engine would not have access to?
Sure, I could change the hostname from virt1-mgmt to virt1.eecs.yorku.ca and now cockpit would work but now engine wouldn't be using virt1-mgmt private network for monitoring the host - a network specifically for that purpose!
It sounds to me like each host needs an extra configurable option for specifying the cockpit address.

Comment 19 jas 2017-12-27 20:31:35 UTC

Arik - just to let you know, I rebooted all the VMS.  Some have console option, most do not..

cape (a windows host) on virt4.
dc1 on virt4.
dist on virt4. (I had reinstalled this one)
pcmode on virt2.
print on virt3.
samba1 on virt3.
svn on virt4.
udb on virt4.
vpn1 on virt1.
vpn2 on virt1.
winlicense on virt3.
www-las on virt2.

All the other VMs do not have the console option: archive, dhcp1, dns1, fix, forum, ftp, hdb, indigo, ipa, labtest1, license, prs, red, rs, samba, webapp, wiki, www, www-vhost

Comment 20 jas 2017-12-28 15:54:36 UTC

I can return the console option to a VM by deleting the VM, keeping the disk, recreating the VM with the exact same options and reattaching the old disk.  I have now done this successfully on archive, dhcp1, and dns1.

Comment 21 Arik 2017-12-28 16:18:59 UTC

(In reply to jas from comment #20)
> I can return the console option to a VM by deleting the VM, keeping the
> disk, recreating the VM with the exact same options and reattaching the old
> disk.  I have now done this successfully on archive, dhcp1, and dns1.

Sure, recreating the VMs bypasses any issue you might have with your previous VM settings.
I would prefer to understand if there is a real issue here though. But the investigation is complicated. I looked at the VM 'archive' that you wrote that you have no console to, and I see in the attached engine log:

2017-12-23 15:34:26,234-05 INFO  [org.ovirt.engine.core.bll.SetVmTicketCommand] (default task-316) [70286839] Running command: SetVmTicketCommand internal: false. Entities affected :  ID: 8391f9a7-7fe7-4025-8fba-6273ae587a52 Type: VMAction group CONNECT_TO_VM with role type USER
2017-12-23 15:34:26,245-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetVmTicketVDSCommand] (default task-316) [70286839] START, SetVmTicketVDSCommand(HostName = virt1, SetVmTicketVDSCommandParameters:{hostId='45f8b331-842e-48e7-9df8-56adddb93836', vmId='8391f9a7-7fe7-4025-8fba-6273ae587a52', protocol='SPICE', ticket='nd+Itv2M0jtF', validTime='120', userName='jas', userId='4a225b32-bc9d-42b7-90f8-f7848db0e59f', disconnectAction='LOCK_SCREEN'}), log id: 65d30db8
2017-12-23 15:34:26,273-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetVmTicketVDSCommand] (default task-316) [70286839] FINISH, SetVmTicketVDSCommand, log id: 65d30db8
2017-12-23 15:34:26,310-05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-316) [70286839] EVENT_ID: VM_SET_TICKET(164), User jas.ca-authz initiated console session for VM archive

So oVirt succeeded to make the preparations needed for the viewer application you're using to be able to connect to the VM. At least in this particular case. So I would need more details from you in order to understand what goes wrong there.

Comment 22 jas 2017-12-28 16:35:36 UTC

The thing is - I wrote down the exact settings from before and after.  I make very few changes from the defaults.   Anything in the newly created VM is identical to the old VM.  Therefore, there has to be something different in the DB?  Do you, for example, see any difference in archive before and after?    The log entry that you've provided would have had to have been before the upgrade.  After the upgrade, the console option is entirely grayed out, and hence I couldn't have selected it.

Comment 23 Arik 2017-12-28 21:32:03 UTC

(In reply to jas from comment #22)
> The thing is - I wrote down the exact settings from before and after.  I
> make very few changes from the defaults.   Anything in the newly created VM
> is identical to the old VM.  Therefore, there has to be something different
> in the DB?  Do you, for example, see any difference in archive before and
> after?    The log entry that you've provided would have had to have been
> before the upgrade.  After the upgrade, the console option is entirely
> grayed out, and hence I couldn't have selected it.

Is the console option grayed out in the webadmin?
Can you please edit of those VMs that the console option is grayed out for and upload a screenshot of the console tab?

Comment 24 jas 2017-12-28 22:05:01 UTC

Created attachment 1373465 [details]
shows console option grayed out

Comment 25 jas 2017-12-28 22:05:46 UTC

Created attachment 1373466 [details]
shows console options for vm with grayed out console

Comment 26 jas 2017-12-28 22:10:20 UTC

Screenshots have been attached....
It's cut off, but "Use Guest Agent", "Enable SPICE file transfer" and "Enable SPICE clipboard copy and paste" are selected. 
A direct comparison between this host, and another host where console is working shows that they are identical.

Comment 27 jas 2017-12-29 19:21:06 UTC

I have now gone through the time consuming process of deleted all the VMs, but 1, and re-attaching the original disks because it seems this problem may take time to solve, and I need access to host console on these hosts in the event of problems.  This is my production ovirt setup.  I will leave the 1 for now to help debug the problem if you need further feedback from me. 

I am also interested in further feedback on the host console issue.  An attempt to access host console resulted in making my web browser attempt to access my host on the private network, https://virt1-mgmt.(domain), instead of the public network, https://virt1.(domain). Michal says that this is just using "hostname".  If I change hostname on my virt hosts from private address (virt1-priv) to public address (virt1), am I also moving all communication between engine and the hosts from private network to public network?  I'd like to understand if this is a problem in ovirt that needs to be dealt with, or a simple change for me to make which will not affect how engine communicates with the hosts.

Comment 28 Yaniv Kaul 2017-12-31 11:46:04 UTC

(In reply to jas from comment #27)
> I have now gone through the time consuming process of deleted all the VMs,
> but 1, and re-attaching the original disks because it seems this problem may
> take time to solve, and I need access to host console on these hosts in the
> event of problems.  This is my production ovirt setup.  I will leave the 1
> for now to help debug the problem if you need further feedback from me. 
> 
> I am also interested in further feedback on the host console issue.  An
> attempt to access host console resulted in making my web browser attempt to
> access my host on the private network, https://virt1-mgmt.(domain), instead
> of the public network, https://virt1.(domain). Michal says that this is just
> using "hostname".  If I change hostname on my virt hosts from private
> address (virt1-priv) to public address (virt1), am I also moving all
> communication between engine and the hosts from private network to public
> network?  I'd like to understand if this is a problem in ovirt that needs to
> be dealt with, or a simple change for me to make which will not affect how
> engine communicates with the hosts.

How did you define the hosts in Engine? It'll use whatever you have defined for the host - is it the public name or the private name?
It's definitely an interesting, non-standard setup we should look at.

Comment 29 jas 2017-12-31 19:22:39 UTC

Hi Yaniv.

I have 4 hosts named virt1, virt2, virt3, virt4.
Each host has a network interface on a private 10G network: virt1-mgmt, virt2-mgmt, virt3-mgmt, virt4-mgmt
They also have a separate NIC for public network (bonded 2 x 1G interfaces): virt1, virt2, virt3, virt4
(There's also a few other networks, but they don't interfere with this.)

When I added the hosts to engine, I used their private network address.  I thought this made sense because engine would be contacting each host, setting up SSH key, etc.   That being said, each host has a network that is defined for management and migration (the -mgmt address), and I suspect that engine is using this for monitoring the hosts anyway.  In that case, I could switch the primary hostname to the public address, then cockpit would just work.  Let me know what you think.

Comment 30 Michal Skrivanek 2018-01-11 09:00:28 UTC

it would require a feature similar to Display Address Override on a per Host basis
Perhaps the same feature can actually be used....I guess if one bothers with configuring IP for external address it's similarly useful for management console access. At least I do not really see any drawback

Comment 31 jas 2018-01-11 11:06:03 UTC

I opened an RFE for it ... That's exactly what I was thinking (similar to Display Address Override)..
https://bugzilla.redhat.com/show_bug.cgi?id=1532686

Comment 32 Marek Libra 2018-01-11 13:10:16 UTC

Possibility to override the address might make sense.

jas, the address used to connect to the host console is the same as listed here: `/ovirt-engine/api/hosts/[HOST_ID]`.
It is expected, that address can be reached from
- engine, if ovirt-cockpit-sso is used
- browser, if sso is not involved

Comment 33 jas 2018-01-11 15:29:09 UTC

If running ovirt-cockpit-sso means that I can still access my host consoles from my web browser, then I'm **more** than happy to do that!

It was enabled, but I was getting this error:

Jan 05 19:55:02 engine.cs.yorku.ca systemd[1]: Starting oVirt-Cockpit SSO service...
Jan 05 19:55:03 engine.cs.yorku.ca prestart.sh[1141]: /bin/ln: failed to create symbolic link ‘/usr/share/ovirt-cockpit-sso/config/cockpit/ws-certs.d/ws-certs.d’: File exists
Jan 05 19:55:03 engine.cs.yorku.ca systemd[1]: Started oVirt-Cockpit SSO service.
Jan 05 19:55:06 engine.cs.yorku.ca start.sh[1283]: Installed cockpit version: 155
Jan 05 19:55:06 engine.cs.yorku.ca start.sh[1283]: Cockpit version check passed
Jan 05 19:55:06 engine.cs.yorku.ca cockpit-ws[1518]: Using certificate: /usr/share/ovirt-cockpit-sso/config/cockpit/ws-certs.d/0-self-signed.cert
Jan 09 08:38:53 engine.cs.yorku.ca cockpit-ws[1518]: couldn't read from connection: Error reading data from TLS socket: A TLS fatal alert has been received.
Jan 09 08:39:03 engine.cs.yorku.ca cockpit-ws[1518]: couldn't read from connection: Error reading data from TLS socket: A TLS fatal alert has been received.
Jan 09 08:39:04 engine.cs.yorku.ca cockpit-ws[1518]: couldn't read from connection: Error reading data from TLS socket: A TLS fatal alert has been received.

Anyway, I stopped it, rebooted, reinstalled ovirt-cockipit-sso, and now it actually works!  That's awesome! Thank you!

One minor issue is that the first time I start up a console on each host, I get the "Connect to oVirt Engine" box where I have to enter the fully qualified name for engine.  It should know that already!

One issue which seems like a bug is that if I click on a VM console from within the host console, I select a VM, then "Console", then click on "Launch Remote Viewer", then I get the typical Firefox box asking me if I want to launch with "Remote Viewer (default)", then when I click click OK, I get an error: Unable to connect to graphic server <file>
[It would be great if the console could open up directly in the web session without requiring remote viewer.]



is if I try to access a VM console from within the host console, it called remote viewer, but then there's this error:


  I'm not really sure why...

(It would be great if I could use similar functionality for VM consoles rather than starting up a remote viewer!)

Comment 34 jas 2018-01-11 15:31:22 UTC

(sorry.. the last 9 lines were supposed to be deleted, but they were off my screen.. ignore those.)

Comment 35 Marek Libra 2018-02-07 12:51:58 UTC

Regarding the need to enter the engine's fqdn in cockpit: right, this would be nice feature. Unfortunately, there's recently no way to determine engine's url on the host. So to implement this feature, the host-deployment would need to store it there.

Comment 36 Marek Libra 2018-02-07 12:56:25 UTC

Regarding the graphical console: recently only the VNC is supported for in=browser rendering. The SPICE console is left for an external application, like the remote-viewer. Anyway, the graphical console should be working even from Cockpit (means the Host Console). I suggest opening separate bug for that containing description of your setup, including info about proxy if used.

Comment 37 Yaniv Kaul 2018-02-21 11:48:32 UTC

Arik, can you clarify what we need to do here?

Comment 38 Michal Skrivanek 2018-02-26 10:24:43 UTC

I believe the only remaining issue is about grayed out console button. We need a reproducer there, so far we have no clues

Comment 39 Nicolas Ecarnot 2018-02-28 16:16:56 UTC

Hello,

I just exported a first VM from a 4.1 oVirt into an oVirt 4.2.
Everything worked well.

I did the exact same thing with a second VM from the same source to the same target and I hit the grayed out console button issue.

In both case, the source was setup to 'Cirrus', and after the export, they came out as 'VGA'.

I tried the headless off then on workaround and it worked : the button isn't grayed anymore.

Comment 40 Yaniv Kaul 2018-03-15 13:48:39 UTC

Arik, is this on track to 4.2.2? Otherwise, please defer.

Comment 41 Michal Skrivanek 2018-03-19 11:06:33 UTC

(In reply to Nicolas Ecarnot from comment #39)
> Hello,
> 
> I just exported a first VM from a 4.1 oVirt into an oVirt 4.2.

what were the source and target cluster versions in each product version?

> Everything worked well.
> 
> I did the exact same thing with a second VM from the same source to the same
> target and I hit the grayed out console button issue.

hm, we really need to understand what was different for the second VM. Do you have logs by any chance? or db backups.
 
> In both case, the source was setup to 'Cirrus', and after the export, they
> came out as 'VGA'.

So you did export and import, not a regular upgrade? Or are they 2 environments?

Thanks,
michal

Comment 42 Nicolas Ecarnot 2018-04-09 12:28:42 UTC

I should have answered all these when my memory was fresh enough and logs were still there...

(In reply to Michal Skrivanek from comment #41)
> (In reply to Nicolas Ecarnot from comment #39)
> > Hello,
> > 
> > I just exported a first VM from a 4.1 oVirt into an oVirt 4.2.
> 
> what were the source and target cluster versions in each product version?

I guess 4.1 and 4.2

> 
> > Everything worked well.
> > 
> > I did the exact same thing with a second VM from the same source to the same
> > target and I hit the grayed out console button issue.
> 
> hm, we really need to understand what was different for the second VM. Do
> you have logs by any chance? or db backups.

I do have db backups since 19th of feb 2018 on this new engine, but I don't have the old 4.1 one.
I must admit I tweaked so much the setup of this first VM that I'm not sure that I couldn't have made additional changes.
What looks to me more interesting is that for 10+ VMs after this one, which I exported from 4.1 to 4.2, I had to follow the workaround and it worked this way.
So for me, this bug was reproducible.

>  
> > In both case, the source was setup to 'Cirrus', and after the export, they
> > came out as 'VGA'.
> 
> So you did export and import, not a regular upgrade? Or are they 2
> environments?

They were two completely distinct env, only thinly linked via a common NFS-export domain, obviously attached only one DC at a time, used as an export-import gateway.

> 
> Thanks,
> michal

Comment 43 Michal Skrivanek 2018-04-10 07:37:15 UTC

so it was export from 4.1, import to 4.2. That is a very different flow than upgrade. Arik, can you think of any recent changes there? Do we import it as 4.1 or 4.2 VM?

Comment 44 Arik 2018-04-11 13:41:30 UTC

I think I got it.

All the mentioned VMs in comment 19 appear in the database with unplugged graphics devices. This is a problem of neither cluster-upgrade nor vm-import, as it is exactly the same in the database before the upgrade occurred.

So my theory is as follow:
All the VMs that were affected by this issue were created in earlier version than oVirt  3.6. Since that version didn't include [1], the graphics devices could get unplugged. Up until oVirt 4.2, that non-ideal configuration didn't affect users because the graphics devices were always sent on run-VM. However, in oVirt 4.2, only plugged devices are sent and so in this scenario the VMs were started without graphics devices and therefore the console option was disabled.

I will post an upgrade script that changes all existing graphics devices to be plugged.

Nicolas, can you please confirm that the VM that the console option was enabled for was created in oVirt >= 3.6 and the one that the console option was disabled for was created in an earlier version of oVirt?

[1] https://gerrit.ovirt.org/#/c/51707/

Comment 45 Nicolas Ecarnot 2018-04-11 15:18:33 UTC

(In reply to Arik from comment #44)
> I think I got it.
> 
> All the mentioned VMs in comment 19 appear in the database with unplugged
> graphics devices. This is a problem of neither cluster-upgrade nor
> vm-import, as it is exactly the same in the database before the upgrade
> occurred.
> 
> So my theory is as follow:
> All the VMs that were affected by this issue were created in earlier version
> than oVirt  3.6. Since that version didn't include [1], the graphics devices
> could get unplugged. Up until oVirt 4.2, that non-ideal configuration didn't
> affect users because the graphics devices were always sent on run-VM.
> However, in oVirt 4.2, only plugged devices are sent and so in this scenario
> the VMs were started without graphics devices and therefore the console
> option was disabled.
> 
> I will post an upgrade script that changes all existing graphics devices to
> be plugged.
> 
> Nicolas, can you please confirm that the VM that the console option was
> enabled for was created in oVirt >= 3.6 and the one that the console option
> was disabled for was created in an earlier version of oVirt?
> 
> [1] https://gerrit.ovirt.org/#/c/51707/

Arik,

Sorry but most of the VMs in this DC were created on 4.1 .
Furthermore, I saw no diff between 3.6+ and 4.1 VMs.
I had to apply the workaround the same way on every VM.

Comment 46 Arik 2018-04-11 15:31:38 UTC

(In reply to Nicolas Ecarnot from comment #45)
> Arik,
> 
> Sorry but most of the VMs in this DC were created on 4.1 .
> Furthermore, I saw no diff between 3.6+ and 4.1 VMs.
> I had to apply the workaround the same way on every VM.

Thanks. Let me rephrase the question - that VM that you exported in 4.1 and when importing to 4.2 ended up with its console option being grayed-out, can you tell in what version of oVirt it was created?

Comment 47 Nicolas Ecarnot 2018-04-11 16:05:16 UTC

(In reply to Arik from comment #46)
> (In reply to Nicolas Ecarnot from comment #45)
> > Arik,
> > 
> > Sorry but most of the VMs in this DC were created on 4.1 .
> > Furthermore, I saw no diff between 3.6+ and 4.1 VMs.
> > I had to apply the workaround the same way on every VM.
> 
> Thanks. Let me rephrase the question - that VM that you exported in 4.1 and
> when importing to 4.2 ended up with its console option being grayed-out, can
> you tell in what version of oVirt it was created?

Arik,

I can't be 100% sure, but AFAIR, maybe 3 of them were created with oVirt 3.6.x, and the 10 other ones with oVirt 4.1.

Both sets hit the same issue.

It doesn't seem to be a criteria.

Comment 48 Arik 2018-04-11 16:23:36 UTC

(In reply to Nicolas Ecarnot from comment #47) 
> Arik,
> 
> I can't be 100% sure, but AFAIR, maybe 3 of them were created with oVirt
> 3.6.x, and the 10 other ones with oVirt 4.1.
> 
> Both sets hit the same issue.
> 
> It doesn't seem to be a criteria.

Even if not, although I still think it is most probably the right criteria, the posted patch would fix it.

Thanks for the input!

Comment 49 Arik 2018-04-11 16:58:53 UTC

(In reply to Arik from comment #48)
> Even if not, although I still think it is most probably the right criteria,
> the posted patch would fix it.

Two notes:
1. The posted patch needs to be enhanced a bit (to address new imports)
2. It would fix the original problem that this bug was filed for. The problem described in comment 39 may be different - there is not enough information to figure out what happens in that case. It may be the same issue though.

Comment 50 jas 2018-04-11 17:19:53 UTC

Hi Arik,
I suspect that your description of the problem is relevant to my situation.  How would you like me to test? I have one VM left with the grayed console.

Comment 51 Arik 2018-04-11 17:32:58 UTC

(In reply to jas from comment #50)
> Hi Arik,
> I suspect that your description of the problem is relevant to my situation. 
> How would you like me to test? I have one VM left with the grayed console.

Hi,
It would be great if you could execute the following statement on your database and see whether it solves the issue for that VM (the VM needs to be restarted afterward):

update vm_device set is_plugged=TRUE where type='graphics' and vm_id in (select vm_guid from vm_static where vm_name='<vm_name>');

where <vm_name> is properly set.

Comment 52 jas 2018-04-12 12:22:52 UTC

Yup. That fixed it.

Comment 53 Arik 2018-04-12 12:28:44 UTC

(In reply to jas from comment #52)
> Yup. That fixed it.

Thanks!

Comment 54 Michal Skrivanek 2018-04-17 07:42:03 UTC

*** Bug 1559332 has been marked as a duplicate of this bug. ***

Comment 55 Jiri Belka 2018-04-20 12:42:19 UTC

ok, ovirt-engine-webadmin-portal-4.2.3.2-0.1.el7.noarch

tested with is_plugged=FALSE set in DB in 4.1 env and then upgraded, it's dbscripts/upgrade sql scripts which changed the DB.

Comment 56 Sandro Bonazzola 2018-05-10 06:32:23 UTC

This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.