Bug 1403956

Summary: After logging in to the engine, getting the " Operation Canceled Error while executing action: A Request to the Server failed: Unable to evaluate payload" and can't do anything.
Product: [oVirt] ovirt-engine Reporter: Nikolai Sednev <nsednev>
Component: Frontend.WebAdminAssignee: Alexander Wels <awels>
Status: CLOSED WORKSFORME QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, mavital, nsednev, oourfali, sbonazzo, stirabos
Target Milestone: ovirt-4.1.0-betaKeywords: Regression, Reopened
Target Release: ---Flags: oourfali: ovirt-4.1?
nsednev: planning_ack?
nsednev: devel_ack?
nsednev: testing_ack?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-19 14:39:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: UX RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1264085, 1313916, 1361511    
Attachments:
Description Flags
Screenshot from 2016-12-12 19-22-22.png
none
Screenshot from 2016-12-12 19-24-00.png
none
engine.log
none
sosreport from alma04
none
server.log
none
ui.log none

Description Nikolai Sednev 2016-12-12 17:21:30 UTC
Description of problem:
After logging in to the engine, getting the "	 	 Operation Canceled Error while executing action: A Request to the Server failed: Unable to evaluate payload" and can't do anything.

Version-Release number of selected component (if applicable):
these components on host:
ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-setup-lib-1.1.0-0.0.master.20161107100014.gitb73abeb.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161130101611.gitb3ad261.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch
rhev-release-4.0.6-6-001.noarch
sanlock-3.4.0-1.el7.x86_64
mom-0.5.8-1.el7ev.noarch
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-release41-pre-4.1.0-0.0.beta.20161201085255.git731841c.el7.centos.noarch
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161130135331.20161130135328.git3541725.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Used ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch for the engine.

How reproducible:
100%

Steps to Reproduce:
1.Deploy HE using ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch over NFS.
2.Once deployment finished, try logging in to your engine.
3.

Actual results:
Operation Canceled
Error while executing action: A Request to the Server failed: Unable to evaluate payload

Expected results:
Should get normal UI.

Additional info:

Comment 1 Nikolai Sednev 2016-12-12 17:23:42 UTC
Created attachment 1230857 [details]
Screenshot from 2016-12-12 19-22-22.png

Comment 2 Nikolai Sednev 2016-12-12 17:24:27 UTC
Created attachment 1230858 [details]
Screenshot from 2016-12-12 19-24-00.png

Comment 3 Nikolai Sednev 2016-12-12 17:26:37 UTC
Created attachment 1230859 [details]
engine.log

Comment 4 Nikolai Sednev 2016-12-12 17:31:30 UTC
Created attachment 1230873 [details]
sosreport from alma04

Comment 5 Oved Ourfali 2016-12-13 13:02:06 UTC
Please also attach the server.log and ui.log.
Also, what jboss version you're working with?

Comment 6 Nikolai Sednev 2016-12-13 15:46:06 UTC
Created attachment 1231255 [details]
server.log

Comment 7 Nikolai Sednev 2016-12-13 15:46:27 UTC
Created attachment 1231256 [details]
ui.log

Comment 8 Nikolai Sednev 2016-12-13 15:49:53 UTC
(In reply to Oved Ourfali from comment #5)
> Please also attach the server.log and ui.log.
> Also, what jboss version you're working with?

From /var/log/ovirt-engine/boot.log:

19:10:29,183 INFO  [org.jboss.modules] JBoss Modules version 1.5.2.Final
19:10:29,502 INFO  [org.jboss.msc] JBoss MSC version 1.2.6.Final
19:10:29,626 INFO  [org.jboss.as] WFLYSRV0049: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) starting
19:10:31,902 INFO  [org.jboss.as.server.deployment.scanner] WFLYDS0004: Found restapi.war in deployment directory. To trigger deployment create a file called restapi.war.dodeploy
19:10:31,903 INFO  [org.jboss.as.server.deployment.scanner] WFLYDS0004: Found engine.ear in deployment directory. To trigger deployment create a file called engine.ear.dodeploy
19:10:31,903 INFO  [org.jboss.as.server.deployment.scanner] WFLYDS0004: Found apidoc.war in deployment directory. To trigger deployment create a file called apidoc.war.dodeploy
19:10:31,934 INFO  [org.jboss.as.server] WFLYSRV0039: Creating http management service using socket-binding (management)
19:10:31,962 INFO  [org.xnio] XNIO version 3.4.0.Final
19:10:31,973 INFO  [org.xnio.nio] XNIO NIO Implementation Version 3.4.0.Final

Comment 9 Alexander Wels 2016-12-13 16:31:57 UTC
Looks like there is an issue with a database connection?

2016-12-12 19:07:50,310+02 ERROR [org.quartz.impl.jdbcjobstore.JobStoreCMT] (QuartzScheduler_QuartzOvirtDBScheduler-NON_CLUSTERED_MisfireHandler) MisfireHandler: Error handling misfires: Failed to obtain DB connection from data source 'NMEngineDS': java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: javax.resource.ResourceException: IJ000470: You are trying to use a connection factory that has been shut down: java:/ENGINEDataSourceNoJTA: org.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'NMEngineDS': java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: javax.resource.ResourceException: IJ000470: You are trying to use a connection factory that has been shut down: java:/ENGINEDataSourceNoJTA [See nested exception: java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: javax.resource.ResourceException: IJ000470: You are trying to use a connection factory that has been shut down: java:/ENGINEDataSourceNoJTA]
	at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:168) [quartz.jar:]
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3169) [quartz.jar:]
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3946) [quartz.jar:]
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3967) [quartz.jar:]
Caused by: java.sql.SQLException: Could not retrieve datasource via JNDI url 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: javax.resource.ResourceException: IJ000470: You are trying to use a connection factory that has been shut down: java:/ENGINEDataSourceNoJTA
	at org.quartz.utils.JNDIConnectionProvider.getConnection(JNDIConnectionProvider.java:163) [quartz.jar:]
	at org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:108) [quartz.jar:]
	at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:165) [quartz.jar:]
	... 3 more

Comment 10 Alexander Wels 2016-12-13 16:32:54 UTC
Are you sure you configured everything correctly? Looking at the database connection issue, looks like a huge problem and everything else comes from that.

Comment 11 Nikolai Sednev 2016-12-14 07:46:09 UTC
(In reply to Alexander Wels from comment #10)
> Are you sure you configured everything correctly? Looking at the database
> connection issue, looks like a huge problem and everything else comes from
> that.

Simple deployment on fresh environment.
I've used repos from Mojo for host and http://www.ovirt.org/release/4.1.0/ for upstream engine. This flow of deployment worked correctly before. BTW, I'm getting in to the user portal without any issue, the only affected part here is a webadmin portal.

Comment 12 Oved Ourfali 2016-12-14 07:55:11 UTC
Can you try to restart the DB and the engine?

Comment 13 Oved Ourfali 2016-12-14 07:55:47 UTC
Sandro - can you also take a look at the HE deployment?

Comment 14 Nikolai Sednev 2016-12-14 09:01:51 UTC
(In reply to Oved Ourfali from comment #12)
> Can you try to restart the DB and the engine?

I've tried to restart the whole environment several times already without any effect.

Comment 15 Oved Ourfali 2016-12-14 09:04:14 UTC
Restoring needinfo on Sandro.

Comment 16 Simone Tiraboschi 2016-12-14 10:01:13 UTC
From hosted-engine-setup point of view, everything here worked as expected.

2016-12-12 19:04:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 |- [ INFO  ] Stage: Termination
2016-12-12 19:04:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 |- [ INFO  ] Execution of setup completed successfully
2016-12-12 19:04:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 |- HE_APPLIANCE_ENGINE_SETUP_SUCCESS
...
2016-12-12 19:08:01 INFO otopi.plugins.gr_he_common.core.misc misc._terminate:188 Hosted Engine successfully deployed
2016-12-12 19:08:01 DEBUG otopi.context context._executeMethod:128 Stage terminate METHOD otopi.plugins.otopi.dialog.human.Plugin._terminate
2016-12-12 19:08:02 DEBUG otopi.context context._executeMethod:128 Stage terminate METHOD otopi.plugins.otopi.dialog.machine.Plugin._terminate
2016-12-12 19:08:02 DEBUG otopi.context context._executeMethod:134 condition False
2016-12-12 19:08:02 DEBUG otopi.context context._executeMethod:128 Stage terminate METHOD otopi.plugins.otopi.core.log.Plugin._terminate

Comment 17 Simone Tiraboschi 2016-12-14 10:09:57 UTC
On my opinion the issue is due to the fact that for some reason at a certain point the DBMS died:

LOG:  database system was interrupted; last known up at 2016-12-14 10:16:09 IST
FATAL:  the database system is starting up
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/180CD728
LOG:  record with zero length at 0/1812C488
LOG:  redo done at 0/1812C458
LOG:  last completed transaction was at log time 2016-12-14 10:17:29.59582+02
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

Comment 18 Nikolai Sednev 2016-12-14 10:18:17 UTC
After https://bugzilla.redhat.com/show_bug.cgi?id=1403956#c14 I've purged my Firefox cache and then got logged in normally to WEBADMIN portal. But still need to investigate why we've hit the https://bugzilla.redhat.com/show_bug.cgi?id=1403956#c17 after successful HE deployment, which passed OK as I've seen it via CLI.

Comment 19 Alexander Wels 2016-12-19 13:52:11 UTC
Okay so I am going to close this as not a bug as clearing the cache fixed it.

Comment 20 Nikolai Sednev 2016-12-19 14:32:38 UTC
(In reply to Alexander Wels from comment #19)
> Okay so I am going to close this as not a bug as clearing the cache fixed it.

Why closing the bug?! Look at the comments 16 and 17, the DBMS died.
This should be investigated using logs and resolved.

I'm returning this bug to assigned.

Comment 21 Oved Ourfali 2016-12-19 14:39:01 UTC
The issue with the cache has nothing to do with the DB.
For some reason your DB was down, and we don't have any way to know why.
If it reproduces please re-open and give us access to the environment immediately.