Bug 1109749
| Summary: | User fails to get attached to a prestarted pool in case messages parameter of canRunVm is null - NPE is throwed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Tomas Dosek <tdosek> | ||||
| Component: | ovirt-engine | Assignee: | Arik <ahadas> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ilanit Stein <istein> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.3.0 | CC: | aberezin, acathrow, ederevea, eedri, gchaplik, gwatson, iheim, lpeer, mavital, michal.skrivanek, ofrenkel, Rhev-m-bugs, tdosek, yeylon, yobshans | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | 3.4.1 | Flags: | istein:
needinfo+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | virt | ||||||
| Fixed In Version: | ovirt-engine-3.5.0_beta | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-08-05 08:25:36 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
There is no flow where the passed 'messages' can be null. In 3.3.3 VmPoolCommandBase#229 is: runVmParams.setUseVnc(osRepository.isLinux(vm.getVmOsId()) || vm.getVmType() == VmType.Server); Since the VM can't be null (because of prior check), the only option is that osRepository is null. we suspect that the loading of the os-repository file failed. Please: 1. Check with the customer whether he can edit/create VM. 2. Check whether the customer edited the os-repository file. ok, so we discovered that there was a compensation for AddVmAndAttachToPool.. It means we touch the VmPoolCommandBase class, thus its static fields are initialized and it happens before we initialize os-info. So in order to reproduce this bug: 1. define pool with prestarted VMs 2. while VMs are being prestarted, reboot the engine (and hope that it will be in the middle of one of the operations) After the reboot, you won't be able to attach users to prestarted VMs. I'll move the os-info initialization to be executed before the compensation. (In reply to Arik from comment #5) > ok, so we discovered that there was a compensation for AddVmAndAttachToPool.. > It means we touch the VmPoolCommandBase class, thus its static fields are > initialized and it happens before we initialize os-info. > > So in order to reproduce this bug: > 1. define pool with prestarted VMs > 2. while VMs are being prestarted, reboot the engine (and hope that it will > be in the middle of one of the operations) correction: the problematic operation is AttachUserToVmFromPoolAndRunCommand so you need to attach users to VMs from pool and restart the engine in the middle > > After the reboot, you won't be able to attach users to prestarted VMs. > > I'll move the os-info initialization to be executed before the compensation. (In reply to Arik from comment #6) > correction: the problematic operation is AttachUserToVmFromPoolAndRunCommand > so you need to attach users to VMs from pool and restart the engine in the > middle The compensation is create in AddPermissionsCommand so it is important that the engine will be restarted after the compensation was created in that command. This was tested on 3.4.1 (av10): 1. Add a VM pool via webadmin. 2. Add permission 'everyone' , 'userRole' to VM pool via webadmin. 3. Connect via userPortal, and run a VM 4. Immediately,right after, restart ovirt-engine service. engine.log show, the engine restart was not done in the desired timing, which is between these 2 steps: 2014-07-03 14:46:38,880 INFO [org.ovirt.engine.core.bll.AddPermissionCommand] (ajp-/127.0.0.1:8702-1) [23a06d0f] Running command: AddPermissionCommand internal: true. Entities affected : ID: b1fd528e-804b-4b4e-86de-d1a8f53e6412 Type: V M 2014-07-03 14:46:38,891 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ajp-/127.0.0.1:8702-1) [23a06d0f] Succceeded giving user fdfc627c-d875-11e0-90f0-83df133b58cc permission to Vm b1fd528e-804b-4b4e-86de-d1a8f53 e6412 I've tried to run several times, on user portal, stop VM, start VM, and engine restart, but couldn't create the desired state of having the engine restart happen between the 2 steps, mention above. Gil, I recommend to move this bug to QE scale team, to run many VMs, and try to stop & start them, and run engine restart in parallel. We tested on RHEVM version 3.4.1-0.23.el6ev 1. Created Pool with 50 VMs 2. Added 50 users to Pool (UserRole) 3. Ran JMeter script performed 50 current threads which call REST APIs /api/vmpools/765e3c19-a118-49cc-bc83-3ae56016c668/allocatevm with different users Authorizations 4. Imminently restarted Engine 5. Analyze engine.log file Bug was verified by test in comment #10: Engine restart was done in the middle of add permission command, and there was a roleback, and compensation, as expected: 2014-07-08 15:41:20,798 ERROR [org.ovirt.engine.core.bll.AddPermissionCommand] (ajp-/127.0.0.1:8702-12) Transaction rolled-back for command: org.ovirt.engine.core.bll.AddPermissionCommand. 2014-07-08 15:41:20,798 ERROR [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ajp-/127.0.0.1:8702-12) Transaction rolled-back for command: org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand. 2014-07-08 15:41:27,607 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ServerService Thread Pool -- 47) Command [id=df6e8b71-3d00-473d-b1f5-968fa5f007d2]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common .businessentities.Permissions; snapshot: 9d466422-988a-4561-a098-cec04959c99f. 2014-07-08 15:41:27,620 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 47) Running compensation on startup for Command : org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand , Command Id : df6e8b71-3d00- 473d-b1f5-968fa5f007d2 Created attachment 916383 [details]
verification engine.log
This bugzilla was mistakenly not part of official RHEV 3.4.1 Errata announcement. Please consider this BZ as released by the relevant errata. More information is available via following article: https://access.redhat.com/solutions/1155243 all these bugs were fixed and released as part of 3.4.1, and weren't closed since they weren't included in any errata. closing as current release. |
Description of problem: Engine validator canRunVm cannot handle situation of messages parameter being Null. This, under certain circumstances, blocks users from being attached to a pool VM and as a result of that they're not able to get a VM from Pool. Version-Release number of selected component (if applicable): 3.3.3 How reproducible: Not sure of the reproducer. In customer's usecase any pool created under 3.3 Compatibility mode exhibits this behavior. Additional info: 014-06-02 09:05:51,603 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ajp-/127.0.0.1:8702-41) [343c44e3] Lock freed to object EngineLock [exclusiveLocks= key: 2ce9ba7c-bc2c-4b36-afdd-44122092b16d value: USER_VM_POOL , sharedLocks= ] 2014-06-02 09:05:51,687 INFO [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ajp-/127.0.0.1:8702-16) [4aa9b181] Lock Acquired to object EngineLock [exclusiveLocks= key: 3486bf14-1e2a-49d8-a3b8-53e4edee3f0e value: USER_VM_POOL , sharedLocks= ] 2014-06-02 09:05:51,932 ERROR [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (ajp-/127.0.0.1:8702-16) [4aa9b181] Error during CanDoActionFailure.: java.lang.NullPointerException at org.ovirt.engine.core.bll.VmPoolCommandBase.canRunPoolVm(VmPoolCommandBase.java:229) [bll.jar:] at org.ovirt.engine.core.bll.VmPoolCommandBase.isVmFree(VmPoolCommandBase.java:139) [bll.jar:] at org.ovirt.engine.core.bll.VmPoolCommandBase.canAttachNonPrestartedVmToUser(VmPoolCommandBase.java:105) [bll.jar:] at org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand.getNonPrestartedVmToAttach(AttachUserToVmFromPoolAndRunCommand.java:127) [bll.jar:] at org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand.getVmToAttach(AttachUserToVmFromPoolAndRunCommand.java:104) [bll.jar:] at org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand.canDoAction(AttachUserToVmFromPoolAndRunCommand.java:64) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalCanDoAction(CommandBase.java:741) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:356) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:416) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:395) [bll.jar:] at org.ovirt.engine.core.bll.Backend.RunAction(Backend.java:353) [bll.jar:] at sun.reflect.GeneratedMethodAccessor187.invoke(Unknown Source) [:1.7.0_51] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.3.1.Final-redhat-3] Taking a look into the code the NullPointer Exception is thrown from CanDoAction method which checks availability of Pool VM in AttachUserToVmFromPoolAndRunCommand.java class protected boolean canDoAction() { boolean returnValue = true; synchronized (_lockObject) { // no available VMs: if (Guid.Empty.equals(getVmToAttach(getParameters().getVmPoolId()))) { addCanDoActionMessage(VdcBllMessages.ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS); returnValue = false; } } This means NPE has to be thrown out by following clause: if (Guid.Empty.equals(getVmToAttach(getParameters().getVmPoolId()))) Following code is the one throwing NPE: protected static boolean canRunPoolVm(Guid vmId, ArrayList<String> messages) { VM vm = DbFacade.getInstance().getVmDao().get(vmId); if (vm == null) { messages.add(VdcBllMessages.ACTION_TYPE_FAILED_VM_NOT_FOUND.name()); return false; } // TODO: This is done to keep consistency with VmDAO.getById. // It can probably be removed, but that requires some more research VmHandler.updateNetworkInterfacesFromDb(vm); RunVmParams runVmParams = new RunVmParams(vmId); runVmParams.setUseVnc(osRepository.isLinux(vm.getVmOsId()) || vm.getVmType() == VmType.Server); return new RunVmValidator(vm, runVmParams, false, findActiveISODomain(vm.getStoragePoolId())) .canRunVm( ---------> messages, fetchStoragePool(vm.getStoragePoolId()), Collections.<Guid> emptyList(), null, null, DbFacade.getInstance().getVdsGroupDao().get(vm.getVdsGroupId())); } ---------> It looks like the messages parameter of canRunVm is null Checking lower level code it looks like the validator expects the messages to be a list, but does not anyhow handle if the list is null and no messages are populated in it: public boolean canRunVm(List<String> messages, StoragePool storagePool, List<Guid> vdsBlackList, List<Guid> vdsWhiteList, Guid destVds, VDSGroup vdsGroup) { if (vm.getStatus() == VMStatus.Paused) { // if the VM is paused, we should only check the VDS status // as the rest of the checks were already checked before return validate(validateVdsStatus(vm), messages); } return validateVmProperties(vm, runVmParam.getCustomProperties(), messages) && validate(validateBootSequence(vm, runVmParam.getBootSequence(), getVmDisks(), activeIsoDomainId), messages) && validate(validateDisplayType(), messages) && validate(new VmValidator(vm).vmNotLocked(), messages) && validate(getSnapshotValidator().vmNotDuringSnapshot(vm.getId()), messages) && validate(validateVmStatusUsingMatrix(vm), messages) && validate(validateStoragePoolUp(vm, storagePool, getVmImageDisks()), messages) && validate(validateIsoPath(vm, runVmParam.getDiskPath(), runVmParam.getFloppyPath(), activeIsoDomainId), messages) && validate(vmDuringInitialization(vm), messages) && validate(validateStatelessVm(vm, getVmDisks(), runVmParam.getRunAsStateless()), messages) && validate(validateStorageDomains(vm, isInternalExecution, getVmImageDisks()), messages) && validate(validateImagesForRunVm(vm, getVmImageDisks()), messages) && SchedulingManager.getInstance().canSchedule( vdsGroup, vm, vdsBlackList, vdsWhiteList, destVds, messages); }