Bug 1302374

Summary: [Upgrade] Upgrade 3.5->3.6 failed when a system is using the legacy engine-manage-domains
Product: [oVirt] ovirt-engine Reporter: Gil Klein <gklein>
Component: AAAAssignee: Martin Perina <mperina>
Status: CLOSED NOTABUG QA Contact: Ondra Machacek <omachace>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.6.2CC: bugs, iheim, mgoldboi, oourfali
Target Milestone: ovirt-3.6.3Flags: gklein: ovirt-3.6.z?
mgoldboi: blocker+
mgoldboi: planning_ack+
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-04 08:13:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1285700    
Attachments:
Description Flags
ovirt engine setup log
none
server.log none

Description Gil Klein 2016-01-27 16:18:35 UTC
Created attachment 1118830 [details]
ovirt engine setup log

Description of problem:
When upgrading a 3.5.7 engine to 3.6.3 while using the legacy engine-manage-domains (AD), upgrade failes with an error

[ INFO  ] Rolling back database schema
[ INFO  ] Clearing Engine database engine
[ INFO  ] Restoring Engine database engine
[ INFO  ] Restoring file '/var/lib/ovirt-engine/backups/engine-20160127154812.6H7WvC.dump' to database localhost:engine.
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20160127153009-dvdf1g.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20160127155435-setup.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of setup failed


2016-01-27 15:49:09 DEBUG otopi.context context._executeMethod:156 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 146, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/config/aaajdbc.py", line 381, in _misc
    self._setupAdminUser()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/config/aaajdbc.py", line 281, in _setupAdminUser
    name=adminUser,
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/config/aaajdbc.py", line 61, in _userExists
    envAppend=toolEnv,
  File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 946, in execute
    command=args[0],
RuntimeError: Command '/usr/bin/ovirt-aaa-jdbc-tool' failed to execute
2016-01-27 15:49:09 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': Command '/usr/bin/ovirt-aaa-jdbc-tool' failed to execute


Version-Release number of selected component (if applicable):
From: rhevm-3.5.7-0.1.el6ev.noarch
To: rhevm-3.6.2.6-0.1.el6.noarch


How reproducible:
100% on this system


Steps to Reproduce:
1. Setup a 3.5.7 engine
2. Configure AD authentication using engine-manage-domains 
3. Upgrade the system to 3.6.3

Actual results:
Upgrade is failing:
[ ERROR ] Execution of setup failed
 


Expected results:
Upgrade should succeeded 


Additional info:

Comment 1 Gil Klein 2016-01-27 16:26:30 UTC
Found a time gap between the engine and the AD server. It might be related.

# engine-manage-domains validate
Failure while testing domain qa.lab.tlv.redhat.com. Details: Authentication Failed. The Engine clock is not synchronized with directory services (must be within 5 minutes difference). Please verify the clocks are synchronized

Comment 2 Gil Klein 2016-01-27 16:54:03 UTC
Syncing AD and engine time did not help.

I also notice internal.properties is missing:

# ls -l /etc/ovirt-engine/aaa/internal.properties
ls: cannot access /etc/ovirt-engine/aaa/internal.properties: No such file or directory

Comment 3 Gil Klein 2016-02-02 08:36:55 UTC
Created attachment 1120357 [details]
server.log

Comment 4 Martin Perina 2016-02-02 10:13:43 UTC
Hi Gil,

I wasn't able to reproduce it on my machine using these steps:

1. Install rhevm-3.5.7-0.1, configure AD access using manage-domains
2. Add repos for rhevm 3.6.3-1
3. Execute:
     yum update -y 'rhevm-setup*'
     engine-setup

Everything went fine, I haven't found any error in the logs after upgrade and I was able to login successfully into webadmin using both admin@internal and user from AD domain.

I used latest JBoss EAP 6.4.6 (jboss-as-server-7.5.6-1) for 3.5.7, have also jboss been upgraded in your case or not?

Did I miss anything from your steps? Did this happen only on one machine?

I will try again to look into you logs, but at the moment I don't see a reason why it failed in your case.

Comment 5 Gil Klein 2016-02-02 11:09:28 UTC
(In reply to Martin Perina from comment #4)
> Hi Gil,
> 
> I wasn't able to reproduce it on my machine using these steps:
> 
> 1. Install rhevm-3.5.7-0.1, configure AD access using manage-domains
> 2. Add repos for rhevm 3.6.3-1
> 3. Execute:
>      yum update -y 'rhevm-setup*'
>      engine-setup
> 
> Everything went fine, I haven't found any error in the logs after upgrade
> and I was able to login successfully into webadmin using both admin@internal
> and user from AD domain.
> 
So I guess something is something more specific on this system.

> I used latest JBoss EAP 6.4.6 (jboss-as-server-7.5.6-1) for 3.5.7, have also
> jboss been upgraded in your case or not?
No Jboss upgrade was done. I've used 7.5.5.-2

# grep "jboss-as-server" /var/log/yum.log 
Jan 20 16:52:07 Installed: jboss-as-server-7.5.5-2.Final_redhat_3.1.ep6.el6.noarch

> 
> Did I miss anything from your steps? Did this happen only on one machine?
> 
> I will try again to look into you logs, but at the moment I don't see a
> reason why it failed in your case.
So I guess my assumption was wrong, and the case is related to something else on this system.

I believe the problem has something to do with this failure [1]

What it the purpose of this call, can it be fixed somehow, and should we fail an upgrade if it fails?

[1]  
# /usr/bin/ovirt-aaa-jdbc-tool --db-config=/etc/ovirt-engine/aaa/internal.properties query --what=user --pattern=name=admin
# echo $?
1

Comment 6 Martin Perina 2016-02-02 15:17:47 UTC
I investigated logs again and I haven't found any reason why execution of ovirt-aaa-jdbc-tool during engine-setup should fail with following exception:

Exception in thread "main" org.jboss.modules.ModuleLoadError: org.ovirt.engine.api.ovirt-engine-extensions-api:main
        at org.jboss.modules.ModuleLoadException.toError(ModuleLoadException.java:78)                                                                         
        at org.jboss.modules.Module.getPathsUnchecked(Module.java:1392)                                                                                                                                       
        at org.jboss.modules.Module.loadModuleClass(Module.java:563)                                                                                                                                          
        at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:205)                                                                                                                          
        at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:459)                                                                                                  
        at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:408)                                                                                                    
        at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:389)                                                                                                           
        at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:134)                                                                                                                  
        at org.ovirt.engine.extension.aaa.jdbc.binding.cli.Cli.<clinit>(Cli.java:86)                                                                                                                          
        at java.lang.Class.forName0(Native Method)                                                                                                                                                            
        at java.lang.Class.forName(Class.java:278)                                                                                                                                                            
        at org.jboss.modules.Module.run(Module.java:302)                                                                                                                                                      
        at org.jboss.modules.Main.main(Main.java:473)

Every steps in engine-setup flow successful up to this point.

The only difference between your and mine setup is that you have rhevm-reports configured, so I will try to reproduce again with reports configured.

In the meantime, is this setup still available? If so could you please do following:

1. Please verify if ovirt-engine service is running, Iit should be stopped successfully according to log, but please check processes, if it's stucked somewhere, please kill.
2. Please execute engine-setup again so we know if this error is persistent or just some random JBoss bug

Thanks

Comment 7 Gil Klein 2016-02-02 16:13:36 UTC
(In reply to Martin Perina from comment #6)
> I investigated logs again and I haven't found any reason why execution of
> ovirt-aaa-jdbc-tool during engine-setup should fail with following exception:
> 
> Exception in thread "main" org.jboss.modules.ModuleLoadError:
> org.ovirt.engine.api.ovirt-engine-extensions-api:main
>         at
> org.jboss.modules.ModuleLoadException.toError(ModuleLoadException.java:78)  
> 
>         at org.jboss.modules.Module.getPathsUnchecked(Module.java:1392)     
> 
>         at org.jboss.modules.Module.loadModuleClass(Module.java:563)        
> 
>         at
> org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:205)   
> 
>         at
> org.jboss.modules.ConcurrentClassLoader.
> performLoadClassUnchecked(ConcurrentClassLoader.java:459)                   
> 
>         at
> org.jboss.modules.ConcurrentClassLoader.
> performLoadClassChecked(ConcurrentClassLoader.java:408)                     
> 
>         at
> org.jboss.modules.ConcurrentClassLoader.
> performLoadClass(ConcurrentClassLoader.java:389)                            
> 
>         at
> org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:
> 134)                                                                        
> 
>         at
> org.ovirt.engine.extension.aaa.jdbc.binding.cli.Cli.<clinit>(Cli.java:86)   
> 
>         at java.lang.Class.forName0(Native Method)                          
> 
>         at java.lang.Class.forName(Class.java:278)                          
> 
>         at org.jboss.modules.Module.run(Module.java:302)                    
> 
>         at org.jboss.modules.Main.main(Main.java:473)
> 
> Every steps in engine-setup flow successful up to this point.
> 
> The only difference between your and mine setup is that you have
> rhevm-reports configured, so I will try to reproduce again with reports
> configured.
> 
> In the meantime, is this setup still available? If so could you please do
> following:
> 
> 1. Please verify if ovirt-engine service is running, Iit should be stopped
> successfully according to log, but please check processes, if it's stucked
> somewhere, please kill.
It is stopped completely during the upgrade
> 2. Please execute engine-setup again so we know if this error is persistent
> or just some random JBoss bug
100% reproduced on this system on 2 additional attempts  
> 
> Thanks

Comment 8 Gil Klein 2016-02-04 08:13:07 UTC
Turns out to be caused by a miss configured file, added manually under   /etc/ovirt-engine/engine.conf.d/ as "1-ovirt-engine.conf"

The added file was a copy of a file containing the defaults. Because of
the numeric prefix it sorts after the 10-setup-... file, and one of its
effects is that it resets the ENGINE_JAVA_MODULEPATH variable. In 3.6
the modules have been moved to subdirectories (common and tools) and
this means that the tools won't find them, because 1-ovirt-engine.conf
instruct them to look only in /usr/share/ovirt-engine/modules, and not
in the subdirectories.

To workaround it, I've:
1. Renamed the file to "99-increase-heap-size.conf"
2. Made sure the new file only includes the minimal settings needed to be override:
 ENGINE_HEAP_MIN=1g
 ENGINE_HEAP_MAX=2g 

engine-setup passed this phase, as soon as I've applied those changes.