Bug 1129815

Summary: /var/cache/ovirt-engine/ovirt-host-deploy.tar is not being updated during a RHEV upgrade
Product: Red Hat Enterprise Virtualization Manager Reporter: James W. Mills <jamills>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.1-1CC: acathrow, bazulay, dougsland, ecohen, iheim, jamills, jentrena, lpeer, oourfali, pdwyer, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-06 09:48:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James W. Mills 2014-08-13 17:46:04 UTC
Description of problem:

After upgrading from 3.1 to 3.2 and then 3.2 to 3.3, no hypervisors can be added to the system.

The problem turned out to be an older version of /var/cache/ovirt-engine/ovirt-host-deploy.tar

Removing this outdated file caused the process to regenerate an updated file and full functionality was immediately restored.


Version-Release number of selected component (if applicable):

rhevm 3.1.x / 3.2.x / 3.3.x


How reproducible:

Unknown

Steps to Reproduce (based on customer case):
1. Upgrade from 3.1 to 3.2
2. Immediately upgrade from 3.2 to 3.3
3. Try to add a hypervisor

Actual results:

* No hypervisors can be added
* No host deploy log created under /var/log/ovirt-engine/host-deploy

Expected results:

Hypervisors can be added

Additional info:

Exceptions like this are thrown when trying to a hypervisor:

2014-07-29 14:46:17,172 ERROR [org.ovirt.engine.core.bll.VdsDeploy] (VdsDeploy) Error during deploy dialog: java.lang.NullPointerException
        at org.ovirt.otopi.dialog.MachineDialogParser.cliEnvironmentGet(MachineDialogParser.java:236) [otopi.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$6.call(VdsDeploy.java:286) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._nextCustomizationEntry(VdsDeploy.java:594) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._threadMain(VdsDeploy.java:806) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy.access$1800(VdsDeploy.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$45.run(VdsDeploy.java:897) [bll.jar:]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_55]

2014-07-29 14:46:17,182 ERROR [org.ovirt.engine.core.bll.VdsDeploy] (pool-4-thread-47) [26e18e73] Error during host XX.XX.XX.XX install: java.lang.NullPointerException
        at org.ovirt.otopi.dialog.MachineDialogParser.cliEnvironmentGet(MachineDialogParser.java:236) [otopi.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$6.call(VdsDeploy.java:286) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._nextCustomizationEntry(VdsDeploy.java:594) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._threadMain(VdsDeploy.java:806) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy.access$1800(VdsDeploy.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$45.run(VdsDeploy.java:897) [bll.jar:]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_55]

2014-07-29 14:46:17,184 ERROR [org.ovirt.engine.core.bll.InstallerMessages] (pool-4-thread-47) [26e18e73] Installation XX.XX.XX.XX: null
2014-07-29 14:46:17,204 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-47) [26e18e73] Correlation ID: 26e18e73, Call Stack: null, Custom Event ID: -1, Message: Failed to install Host XX.XX.XX.XX. <UNKNOWN>.
2014-07-29 14:46:17,204 ERROR [org.ovirt.engine.core.bll.VdsDeploy] (pool-4-thread-47) [26e18e73] Error during host XX.XX.XX.XX install, prefering first exception: java.lang.NullPointerException
        at org.ovirt.otopi.dialog.MachineDialogParser.cliEnvironmentGet(MachineDialogParser.java:236) [otopi.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$6.call(VdsDeploy.java:286) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._nextCustomizationEntry(VdsDeploy.java:594) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._threadMain(VdsDeploy.java:806) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy.access$1800(VdsDeploy.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$45.run(VdsDeploy.java:897) [bll.jar:]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_55]

2014-07-29 14:46:17,204 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (pool-4-thread-47) [26e18e73] Host installation failed for host e50ca0ea-b96a-4732-8692-91019fda5816, XX.XX.XX.XX.: java.lang.NullPointerException
        at org.ovirt.otopi.dialog.MachineDialogParser.cliEnvironmentGet(MachineDialogParser.java:236) [otopi.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$6.call(VdsDeploy.java:286) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._nextCustomizationEntry(VdsDeploy.java:594) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy._threadMain(VdsDeploy.java:806) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy.access$1800(VdsDeploy.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy$45.run(VdsDeploy.java:897) [bll.jar:]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_55]


The code responsible for determining whether or not the tar package needs to be updated is:

backend/manager/modules/utils/src/main/java/org/ovirt/engine/core/utils/archivers/tar/CachedTar.java, specifically the "ensure" method:

private void ensure() throws IOException {                                                                                                                                                                     
        if (!this.archive.exists()) {                                                                                                                                                                              
            log.info(                                                                                                                                                                                              
                String.format(                                                                                                                                                                                     
                    "Tarball '%1$s' is missing, creating",                                                                                                                                                         
                    this.archive.getAbsolutePath()                                                                                                                                                                 
                )                                                                                                                                                                                                  
            );                                                                                                                                                                                                     
            this.nextCheckTime = System.currentTimeMillis() + this.refreshInterval;                                                                                                                                
            create(getTimestampRecursive(this.dir));                                                                                                                                                               
        }                                                                                                                                                                                                          
        else if (this.nextCheckTime <= System.currentTimeMillis()) {                                                                                                                                               
            this.nextCheckTime = System.currentTimeMillis() + this.refreshInterval;                                                                                                                                
                                                                                                                                                                                                                   
            long treeTimestamp = getTimestampRecursive(this.dir);                                                                                                                                                  
            if (archive.lastModified() != treeTimestamp) {                                                                                                                                                         
                log.info(                                                                                                                                                                                          
                    String.format(                                                                                                                                                                                 
                        "Tarball '%1$s' is out of date, re-creating",                                                                                                                                              
                        this.archive.getAbsolutePath()                                                                                                                                                             
                    )                                                                                                                                                                                              
                );                                                                                                                                                                                                 
                create(treeTimestamp);                                                                                                                                                                             
            }                                                                                                                                                                                                      
        }                                                                                                                                                                                                          
    }

Comment 1 Alon Bar-Lev 2014-08-13 17:51:22 UTC
The tarball is updated per the last timestamp of the files that are at /usr/share/ovirt-host-deploy

Have you saved the tarball, so its timestamp and content can be compared to files?

If there was a problem in this mechanism I expect lots of bugs regarding this issue, but we have not gotten any.

Comment 2 James W. Mills 2014-08-13 20:23:02 UTC
I'll have the customer send me the information you've requested, and we'll see if the timestamp is still intact.

I'll also try and setup a test environment here, as this might have something to do the "double upgrade" from 3.1 to 3.3.

Thanks!
~james

Comment 3 Oved Ourfali 2014-08-25 06:29:11 UTC
any updates?

Comment 4 James W. Mills 2014-08-26 16:23:47 UTC
No update yet.  The customer has not responded, and I am unable to replicate this here.

Thanks!
~james

Comment 5 Alon Bar-Lev 2014-08-26 17:11:59 UTC
OK, please reopen when new information is available.
Thanks!

Comment 6 Julio Entrena Perez 2015-08-06 09:38:14 UTC
(In reply to Alon Bar-Lev from comment #5)
> OK, please reopen when new information is available.

Re-opening since this has been reported by another customer, I'm following up with the details.

Comment 7 Alon Bar-Lev 2015-08-06 09:48:14 UTC
(In reply to Julio Entrena Perez from comment #6)
> (In reply to Alon Bar-Lev from comment #5)
> > OK, please reopen when new information is available.
> 
> Re-opening since this has been reported by another customer, I'm following
> up with the details.

Please open a new bug, you do not re-open a year old bug that may or may not happen in different environment and probably different versions.