Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1416337 - Uncaught exception when increasing the number of VMs in a pool
Summary: Uncaught exception when increasing the number of VMs in a pool
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.0.7
: ---
Assignee: Fred Rolland
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On: 1392461
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-25 10:29 UTC by Tal Nisan
Modified: 2017-03-16 15:31 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1392461
Environment:
Last Closed: 2017-03-16 15:31:44 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0542 0 normal SHIPPED_LIVE Red Hat Virtualization Manager 4.0.7 2017-03-16 19:25:04 UTC
oVirt gerrit 69851 0 None None None 2017-01-25 10:29:00 UTC
oVirt gerrit 69858 0 None None None 2017-01-25 10:29:00 UTC
oVirt gerrit 69859 0 None None None 2017-01-25 10:29:00 UTC

Description Tal Nisan 2017-01-25 10:29:00 UTC
+++ This bug was initially created as a clone of Bug #1392461 +++

Description of problem:

When increasing the number of VMs on an already existing pool, an exception is thrown.

2016-11-07 14:35:44,265 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-29) [] Permutation name: F3121572B0E0E92B5F00060D10B3F8B7
Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError)
SEVERE: Uncaught exceptioncom.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) 
 __gwt$exception: <skipped>: Cannot read property 'Vg' of undefined
	at Unknown.Ev(webadmin-0.js@25078)
	at Unknown.Mv(webadmin-0.js@41)
	at Unknown.X7(webadmin-0.js@19)
	at Unknown.$7(webadmin-0.js@19)
	at Unknown.i7(webadmin-0.js@117)
	at Unknown.oq(webadmin-0.js@26)
	at Unknown.yq(webadmin-0.js@23798)
	at Unknown.Y2(webadmin-0.js@149)
	at Unknown.qq(webadmin-0.js@112)
	at Unknown.R9e(webadmin-0.js@964)
	at Unknown.z$e(webadmin-0.js@85)
	at Unknown.B0e(webadmin-0.js@46)
	at Unknown.Sx(webadmin-0.js@29)
	at Unknown.Wx(webadmin-0.js@57)
	at Unknown.eval(webadmin-0.js@54)
	at Unknown.OC(webadmin-0.js@20)
	at Unknown.v9e(webadmin-0.js@98)
	at Unknown.lvl(webadmin-0.js@10635)
	at Unknown.R9e(webadmin-0.js@582)
	at Unknown.z$e(webadmin-0.js@85)
	at Unknown.y$e(webadmin-0.js@60)
	at Unknown.z0e(webadmin-0.js@52)
	at Unknown.Sx(webadmin-0.js@29)
	at Unknown.Wx(webadmin-0.js@57)
	at Unknown.eval(webadmin-0.js@54)

Hash Ev in permutation F3121572B0E0E92B5F00060D10B3F8B7 is:

Ev,java.lang.Throwable::fillInStackTrace()Ljava/lang/Throwable;,java.lang.Throwable,fillInStackTrace,com/google/gwt/emul/java/lang/Throwable.java,114,0

Version-Release number of selected component (if applicable):

4.0.4.4-1

How reproducible:

Always 

Steps to Reproduce:
1. Create a VmPool (with one VM, for example)
2. Edit the pool
3. In the "Increase VM pool size in..." field enter 1
4. Save

Actual results:

The exception above is thrown

Expected results:

The pool should have extended their size

--- Additional comment from Oved Ourfali on 2016-11-08 08:26:38 IST ---

Tomas - moving this to virt. If it turns out relevant to recent UX changes then please move it to UX.

--- Additional comment from Tomas Jelinek on 2016-11-08 10:03:03 IST ---

I can not simulate this on my env, please follow this instructions to provide the debug logs for the frontend:

- yum install ovirt-engine-webadmin-portal-debuginfo
- restart the engine
- simulate the issue again and provide the logs from var/log/ovirt-engine/ui.log

Thank you!

--- Additional comment from  on 2016-11-08 10:08 IST ---



--- Additional comment from Michal Skrivanek on 2016-11-08 11:35:15 IST ---

looks related to latest ux changes indeed

--- Additional comment from Oved Ourfali on 2016-11-08 12:58:49 IST ---

Actually it looks quite different.
Nothing in the call stack is related to the latest cleanup code.
Alexander - can you take a look and see if that's related?

--- Additional comment from Tomas Jelinek on 2016-11-08 13:46:01 IST ---

This is the obfuscated function of validation which fails:
function zAq(a,b){
  var c,d,e,f;
  _Ao(a,b);
   if(!a.i){return}
   f=true;
   for(d=new tVf(a.i);d.c<d.e.Zg();) {
     c=mfb(rVf(d),1612);
     e=c.V;
     if(!e.xeb().Vg().xk()||e.zeb()==null){
       c.V.Jc.Pg('Storage Domain must be specified.');
       Qtl(c.V,false);
       f=false
     }
     c.J.ueb(cfb(tEe,Epy,1691,[new J_r,new TSq]));
     f=f&&c.J.Oc}Qtl(a,f)
  }
The part which fails is e.xeb().Vg() which is the diskStorageDomains.getItems().iterator(). Long story short, the diskStorageDomains.getItems() returns null. 

@Tal, do you know about some changes in the storage code which could cause the diskStorageDomains.getItems() to return null?

--- Additional comment from Greg Sheremeta on 2016-11-08 18:33:44 IST ---

It looks like a cleanup regression because it's one Model referencing another Model that died. That's been our pattern.


Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError) 
 __gwt$exception: <skipped>: Cannot read property 'Vg' of undefined
	at org.ovirt.engine.ui.uicommonweb.models.storage.DisksAllocationModel.$validateEntity(DisksAllocationModel.java:369)


So the validator is walking the DisksAllocationModel. That property it can't read is another Model (well, ListModel) that is null because -- my guess -- it got prematurely cleaned?


        boolean isModelValid = true;
        for (DiskModel diskModel : getDisks()) {
            ListModel diskStorageDomains = diskModel.getStorageDomain();
            if (!diskStorageDomains.getItems().iterator().hasNext() || diskStorageDomains.getSelectedItem() == null) {
                diskModel.getStorageDomain().getInvalidityReasons().add(
                        constants.storageDomainMustBeSpecifiedInvalidReason());
                diskModel.getStorageDomain().setIsValid(false);
                isModelValid = false;
            }
            diskModel.getAlias().validateEntity(new IValidation[] { new NotEmptyValidation(), new I18NNameValidation() });
            isModelValid = isModelValid && diskModel.getAlias().getIsValid();
        }
        setIsValid(isModelValid);


Need Vojtech or Alexander to chime in, as I'm not sure what the cleanup pattern is that caused the diskStorageDomains to get null.

--- Additional comment from  on 2016-11-08 19:20:48 IST ---

After switching to 4.0.4 branch [commit `build: post ovirt-engine-4.0.4.4`], I don't see *any* commits related to memory leak fixes. In other words, commit [1] isn't there.

[1] https://gerrit.ovirt.org/#/c/65357/ (lead commit for memory leak fixes)

This leads me to a conclusion that this bug isn't related to memory leak fixes.

Please switch to 4.0.4 branch, build Engine & reproduce the problem.

--- Additional comment from Oved Ourfali on 2016-11-08 19:25:40 IST ---

Moving back to virt as 4.0.4 indeed didn't include any cleanup fix.

--- Additional comment from Tomas Jelinek on 2016-11-09 17:27:47 IST ---

I can not simulate this on any branch and don't even see from code how could the diskStorageDomains.getItems() become null.

@Nicolas: Could you please provide also the engine logs from time this happens? Maybe some query failed and returned something the frontend did not expect...

Also, could you please give some more details about the storages you have? Both storage domains and what disks the template on which this VM is based on has?

--- Additional comment from  on 2016-11-09 19:49:15 IST ---

Unfortunately, there's absolutely nothing in the engine logs. Just the ui.log shows the exception.

There are some additional facts I can provide: It's not needed to extend the VmPool, just clicking on Edit and OK just after (with no changes) throws the exception. So anything I do to edit the VmPool raises the exception.

The template on which the VmPool is based is an ubuntu-1404 template. As per storage domains we have currently 7, and each of them have a copy of the template. All of them are iSCSI based, 4 of them are iSCSI native and 3 of them are based on a gateway for Ceph. No Cinder installation here. The template only has one disk (20GB), thin provisioned.

If I can provide any additional info don't hesitate to ask.

--- Additional comment from Tomas Jelinek on 2016-11-10 12:28:12 IST ---

Any chance some of the storage domains on which the template disks are, are in maintenance/down? I came to a similar stack trace when this was the case.

Also, could you please provide a screenshot of the edit pool dialog's resource allocation side tab and the bottom where the "Disk Allocation" is shown?

--- Additional comment from  on 2016-11-10 12:43 IST ---

See snapshot attached.

I see 'Target' and 'Disk profile' fields are empty, could this be the reason of the exception? I see we have this issue in all of our VmPools, even if they use different templates.

We have a script that each night copies all the templates to all of the Storage Domains (to allow migrate any disk to any storage domain).

This is the snippet that does this action:

    template = api.templates.get(name='...')
    disk = template.disk.list()[0]          # Just an example
    action = params.Action(storage_domain=destination_storagedomain, async=False)
    disk.copy(action=action)

Maybe this action is not setting the profile and target fields?

--- Additional comment from  on 2016-11-10 12:52:09 IST ---

BTW, forgot to tell: No DSs in maintenance/down.

--- Additional comment from Tomas Jelinek on 2016-11-10 13:59:08 IST ---

The NPE happens because the "target" is empty in the dialog.

Seems the api call done as explained in comment 13 causes some issues with the disks not having "target" filled in edit pool dialog.

Moving to storage for further investigation.

--- Additional comment from Fred Rolland on 2016-12-21 16:42:13 IST ---

(In reply to nicolas from comment #13)
> Created attachment 1219306 [details]
> Screenshot of the disk allocation tab
> 
> See snapshot attached.
> 
> I see 'Target' and 'Disk profile' fields are empty, could this be the reason
> of the exception? I see we have this issue in all of our VmPools, even if
> they use different templates.
> 
> We have a script that each night copies all the templates to all of the
> Storage Domains (to allow migrate any disk to any storage domain).
> 
> This is the snippet that does this action:
> 
>     template = api.templates.get(name='...')
>     disk = template.disk.list()[0]          # Just an example
>     action = params.Action(storage_domain=destination_storagedomain,
> async=False)
>     disk.copy(action=action)
> 
> Maybe this action is not setting the profile and target fields?

Hi,
What do you mean by "each night copies all the templates to all of the Storage Domains" ?
The disks should be already in all SD after the first run, no ?

--- Additional comment from Fred Rolland on 2016-12-21 18:06:21 IST ---

Can you describe also what is your script exactly is doing ?
If I try to copy a disk to a storage domain where it exists already I get an error :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <fault>
        <detail>[Cannot copy Virtual Disk. One of the Template Images already exists.]</detail>
        <reason>Operation Failed</reason>
    </fault>
    <status>failed</status>
    <storage_domain>
        <name>NFS_SD10</name>
    </storage_domain>
</action>

--- Additional comment from  on 2016-12-22 17:26:50 IST ---

> Hi,
> What do you mean by "each night copies all the templates to all of the
> Storage Domains" ?
> The disks should be already in all SD after the first run, no ?

Yes for stateful disks, but we work with lots of VMPools and these need the template to be available on each of the DS if you want to be able to migrate their disk to any of them. We also have several iSCSI-based DS, usually not bigger than 500GB to avoid iSCSI performance issues (if you have more than X LVs per DS, performance starts degrading).

I'm on holidays so I have no access to the script that replicates the templates, but basically the algorithm is:

1. Iterate over each template, x
2. Iterate over each DS, y
3. If x is not in y:
4.    copy x to y

If you need to know exactly how it's done I can send it after holidays.

--- Additional comment from Fred Rolland on 2017-01-02 12:20:37 IST ---

(In reply to nicolas from comment #18)
> > Hi,
> > What do you mean by "each night copies all the templates to all of the
> > Storage Domains" ?
> > The disks should be already in all SD after the first run, no ?
> 
> Yes for stateful disks, but we work with lots of VMPools and these need the
> template to be available on each of the DS if you want to be able to migrate
> their disk to any of them. We also have several iSCSI-based DS, usually not
> bigger than 500GB to avoid iSCSI performance issues (if you have more than X
> LVs per DS, performance starts degrading).
> 
> I'm on holidays so I have no access to the script that replicates the
> templates, but basically the algorithm is:
> 
> 1. Iterate over each template, x
> 2. Iterate over each DS, y
> 3. If x is not in y:
> 4.    copy x to y
> 
> If you need to know exactly how it's done I can send it after holidays.

Hi,

Thanks for your reply.
What about the pools ? They are already existing when the script runs ?
Are you creating new storage domains, and then copy the template's disks ?
If you can send the script,it will be great.

Thanks

--- Additional comment from  on 2017-01-02 12:58:29 IST ---

(In reply to Fred Rolland from comment #19)
> (In reply to nicolas from comment #18)
> > > Hi,
> > > What do you mean by "each night copies all the templates to all of the
> > > Storage Domains" ?
> > > The disks should be already in all SD after the first run, no ?
> > 
> > Yes for stateful disks, but we work with lots of VMPools and these need the
> > template to be available on each of the DS if you want to be able to migrate
> > their disk to any of them. We also have several iSCSI-based DS, usually not
> > bigger than 500GB to avoid iSCSI performance issues (if you have more than X
> > LVs per DS, performance starts degrading).
> > 
> > I'm on holidays so I have no access to the script that replicates the
> > templates, but basically the algorithm is:
> > 
> > 1. Iterate over each template, x
> > 2. Iterate over each DS, y
> > 3. If x is not in y:
> > 4.    copy x to y
> > 
> > If you need to know exactly how it's done I can send it after holidays.
> 
> Hi,
> 
> Thanks for your reply.
> What about the pools ? They are already existing when the script runs ?

Yes, we have plenty of VmPools since a long time ago because they're used for academic subjects, so all of them are already created when the script runs.

> Are you creating new storage domains, and then copy the template's disks ?

Yes. We're creating some SD lately and each time a SD is created, when the script runs, it sees which templates are not copied yet (all of them the first time) and then copy them to the new SD.

> If you can send the script,it will be great.
> 

Find the script attached to this BZ.

> Thanks

--- Additional comment from  on 2017-01-02 12:59 IST ---



--- Additional comment from Fred Rolland on 2017-01-05 17:11:09 IST ---

Thanks for the script, I used to try to reproduced, but I could not get to the same situation you encountered.

Can you provide the engine log? Maybe I will be able to find some clue there.

Thanks.

--- Additional comment from  on 2017-01-05 17:31 IST ---

Find the log attached.

I successfully managed to extend one of our pools which I seem not have tried before (it's called DEMO), so you'll able to find in the logs that it has been extended successfully.

Immediately afterwards I tried to extend a different pool (SIGATEST) and the exception showed up again. I believe it didn't generate any event in the engine log but hopefully you'll be able to find something relevant in it that might shed some light on the issue.

--- Additional comment from Fred Rolland on 2017-01-08 17:44:51 IST ---

Thanks for the log. I see there that the update of "Demo" succeeded, but no clues for the other.

Looking at the code , there might be a possibility that it is related to permissions.
Which user do you use for editing the pools ?
Which user created the storage domains?

BTW is remote debug an option ?

Thanks

--- Additional comment from  on 2017-01-08 17:59:22 IST ---

Hey Fred,

It was all created and edited with the admin@internal user: Storage domains, pools, etc.

Remote debug might be an option, I'll ask my boss to enable an account with permissions for all the relevant servers and get back to you. We're a moderately big institution so maybe it will take some days but as soon as I get a response I'll reach you.

--- Additional comment from Fred Rolland on 2017-01-09 16:23:29 IST ---

Hi,

I succeeded to reproduce the issue finally.

- Create a few Storage Domains
- Create a Template with 1 disk
- Create a Pool from template with 1 VM prestarted
- Copy the template disk using the script provided by Nicolas
- Edit the VM pool

--- Additional comment from  on 2017-01-09 21:15:56 IST ---

Nice!

So no remote debug is needed by now?

May I know what's actually the issue that makes the exception be thrown? It's kind of curious that I can reproduce it on one pool and can't on another, as you could see in the logs.

--- Additional comment from Fred Rolland on 2017-01-15 08:58:17 IST ---

(In reply to nicolas from comment #27)
> Nice!
> 
> So no remote debug is needed by now?
> 
> May I know what's actually the issue that makes the exception be thrown?
> It's kind of curious that I can reproduce it on one pool and can't on
> another, as you could see in the logs.

Hi,
No need for remote debug. Thanks for the goodwill !!
It is quite a combination like I described in the steps to reproduce,
For example if the pool vms are not prestarted , it did reproduced.

In any case, it is a UI side validation issue.
If you need to update the poll, it will work by REST API even without this fix.

--- Additional comment from Sandro Bonazzola on 2017-01-25 09:55:01 IST ---

4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 1 Raz Tamir 2017-01-26 12:32:34 UTC
Verified on rhevm-4.0.7-0.1.el7ev

Increased the number of VM in VM pool - works fine

Comment 3 errata-xmlrpc 2017-03-16 15:31:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0542.html


Note You need to log in before you can comment on or make changes to this bug.