Bug 1966535

Summary: NullPointerException when trying to delete uploaded disks with using transfer_url
Product: [oVirt] ovirt-engine Reporter: Ilan Zuckerman <izuckerm>
Component: BLL.StorageAssignee: Bella Khizgiyaev <bkhizgiy>
Status: CLOSED CURRENTRELEASE QA Contact: sshmulev
Severity: medium Docs Contact:
Priority: high    
Version: futureCC: aefrat, bugs, dfodor, eshames, eshenitz, michal.skrivanek
Target Milestone: ovirt-4.4.8Keywords: Automation, Regression, ZStream
Target Release: ---Flags: pm-rhel: ovirt-4.4+
michal.skrivanek: blocker-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-19 06:23:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Logs none

Description Ilan Zuckerman 2021-06-01 11:04:36 UTC
Created attachment 1788466 [details]
Logs

Description of problem:

There is a NullPointerException on engine log when trying to remove the disks which were earlier uploaded to the engine NFS storage type with usage of transfer_url.
The attempt to delete disks is done AFTER taking the checksums of the uploaded disks. So the reason for the NPE might be that.
This NPE wasnt seen when deleting the disks uploaded with proxy_url,
This NPE Was seen also when deleting the disks (with proxy) on a block storage type (ISCSI).

Although NPE on the engine log, the actual disks removal is finished successfully, and there is no sign of a failure on the engine UI as well.

Marking this as a regression, because this NPE wasnt seen earlier. Although this is a regression bug, I am marking it as 'medium' because this doesnt look like it affects the disk upload or removal of the disks.
However, this could affect some other areas that we are not yet aware of. And also the NPE shouldn't be there if all works fine.

Version-Release number of selected component (if applicable):
rhv-4.4.6-10

How reproducible:
100%

Steps to Reproduce:
1. Starting parallel upload of disks ['/root/upload/qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', '/root/upload/qcow2_v3_cow_sparse_disk_1G.qcow2', '/root/upload/test_raw_to_delete.raw', '/root/upload/1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']
2. Check image checksums for local images: ['qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', 'qcow2_v3_cow_sparse_disk_1G.qcow2', 'test_raw_to_delete.raw', '1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']
3. Check disk checksums for the uploaded disks: ['qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', 'qcow2_v3_cow_sparse_disk_1G.qcow2', 'test_raw_to_delete.raw', '1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']


Actual results:
NPE

2021-06-01 13:42:58,077+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Exception in invoking callback of command TransferDiskImage (922f35a5-a542-4c79-9a0e-0be0d920f84d): NullPointerException: 
2021-06-01 13:42:58,077+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Error invoking callback method 'doPolling' for 'ACTIVE' command '922f35a5-a542-4c79-9a0e-0be0d920f84d'
2021-06-01 13:42:58,077+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Exception: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getImageAlias(TransferDiskImageCommand.java:326)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getTransferDescription(TransferDiskImageCommand.java:1401)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.handleFinishedSuccess(TransferDiskImageCommand.java:904)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.executeStateHandler(TransferDiskImageCommand.java:535)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.proceedCommandExecution(TransferDiskImageCommand.java:492)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)




Expected results:
Shouldnt be NPE

Additional info:
Attaching engine + vdsm + daemon logs from all involved.

Comment 1 RHEL Program Management 2021-06-14 09:51:07 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Bella Khizgiyaev 2021-08-12 08:02:51 UTC
The steps listed above are not reproducing the bug, the only way to reproduce it was by debugging and adding breakpoints to the transfer thread.

need to check the following scenario:

1. initiate disk upload - > as soon as the disk status becomes OK try to delete it.
2. initiate disk download - > as soon as the disk status becomes OK try to delete it.

No NPEs errors should be raised for both of the scenarios.

Comment 4 sshmulev 2021-08-16 14:25:21 UTC
Verified.

Versions:
ovirt-engine-4.4.8.4-0.7.el8ev.noarch

Steps to Reproduce:
The same steps as described in the bug Description which was tested in the automation test "TestUploadImages".
First tried to reproduce the NullPointerException in an older version without the fix (ovirt-engine-4.4.8.3-0.10.el8ev.noarch):

2021-08-16 16:18:47,423+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-5) [b5864677-df89-4554-9a92-b2ad41ff3164] Exception: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getImageAlias(TransferDiskImageCommand.java:326)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getTransferDescription(TransferDiskImageCommand.java:1401)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.handleFinishedSuccess(TransferDiskImageCommand.java:904)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.executeStateHandler(TransferDiskImageCommand.java:535)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.proceedCommandExecution(TransferDiskImageCommand.java:492)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)


Then, tried to run it on the fixed version (ovirt-engine-4.4.8.4-0.7.el8ev.noarch) twice successfully with no NPE.

Comment 5 Sandro Bonazzola 2021-08-19 06:23:13 UTC
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.