Bug 1158016

Summary: service ovirt-engine restart while there are tasks in progress cause Err status code 500
Product: Red Hat Enterprise Virtualization Manager Reporter: Gal Amado <gamado>
Component: ovirt-engineAssignee: Ravi Nori <rnori>
Status: CLOSED DUPLICATE QA Contact: Petr Beňas <pbenas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: acanan, cmestreg, ebenahar, ecohen, gamado, gklein, iheim, lpeer, lsurette, oourfali, pstehlik, rbalakri, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---Keywords: Regression
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-05 20:27:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Engine / vdsm logs none

Description Gal Amado 2014-10-28 10:07:28 UTC
Created attachment 951341 [details]
Engine / vdsm logs

Description of problem:
After restarting the engine while there are tasts in progress:
1. engine GUI does not respond.
2. Internal Server Error response on REST api
3. java.lang.IllegalStateException in engine.log



Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.17.beta.el6ev
(c) 2014 Red Hat Inc. All rights reserved.

VDSM OS:[root@puma29 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.0 (Maipo)

VDSM : vdsm-4.16.7.1-1.el7.x86_64

How reproducible:
Hapans all the time to http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_async_tasks-iscsi/

reproduced manually once (make sure there are active tasks while restarting the engine) 

Steps to Reproduce:
Automatic test : http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.5/job/3.5-storage_async_tasks-iscsi/

which implement TCMS (fail on step #2): https://tcms.engineering.redhat.com/case/288728/?from_plan=10029

I was using up and running engine + 1 rhel7 host + ISCSI SD.
 
1.Create new VM with a large preallocated disk, so the task will take some time  (I used 10G)
2.run "service ovirt-engine restart" on the engine.
3. wait a while (some 20 sec.)
Actual results:
4. can't login to engine GUI (status code 500)
5. java.lang.IllegalStateException in engine.log

Expected results:
no exceptions in engine.log , active GUI / REST 

Additional info:

Comment 1 Oved Ourfali 2014-10-29 08:18:11 UTC
Can you attach the engine+host logs?

Comment 2 Gal Amado 2014-11-02 06:55:37 UTC
logs are attached.

Comment 3 Ravi Nori 2014-11-03 13:58:50 UTC
The issue does not seem to be related to backend. If I refresh the webadmin before trying to login, the error does not appear. 

Please refresh webadmin, try to login and let me know if it is reproducible.

Comment 4 Oved Ourfali 2014-11-03 13:59:48 UTC
Reducing severity based on comment #3.

Comment 5 Gal Amado 2014-11-05 12:18:23 UTC
Browser refresh didn't help.
Besides, It fails our Automation, where we don't use webadmin at all.
Did you manage to see the exceptions on the attached engine log ?

Comment 6 Elad 2014-11-05 12:21:34 UTC
Increasing back the severity since we've encoutered this issue on several setups. After it occurs, nothing helps but to cleanup the environment with rhevm-cleanup

Comment 7 Ravi Nori 2014-11-05 17:38:31 UTC
The error in the logs, should have been fixed by Change-Id: I03873642ac5 

Caused by: org.apache.commons.lang.SerializationException: org.codehaus.jackson.map.JsonMappingException: No default constructor for [collection type; class java.util.Collections$UnmodifiableCollection, contains [simple type, class java.lang.String]] (through reference chain: org.ovirt.engine.core.common.action.CreateAllSnapshotsFromVmParameters["parametersCurrentUser"]->org.ovirt.engine.core.common.businessentities.aaa.DbUser

Comment 8 Yair Zaslavsky 2014-11-05 20:27:16 UTC

*** This bug has been marked as a duplicate of bug 1155084 ***