Bug 1966535 - NullPointerException when trying to delete uploaded disks with using transfer_url
Summary: NullPointerException when trying to delete uploaded disks with using transfer...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: future
Hardware: x86_64
OS: Linux
high
medium vote
Target Milestone: ovirt-4.4.8
: ---
Assignee: Bella Khizgiyaev
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-01 11:04 UTC by Ilan Zuckerman
Modified: 2021-08-19 06:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-19 06:23:13 UTC
oVirt Team: Storage
pm-rhel: ovirt-4.4+
michal.skrivanek: blocker-


Attachments (Terms of Use)
Logs (361.95 KB, application/x-xz)
2021-06-01 11:04 UTC, Ilan Zuckerman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 116064 0 master MERGED core: avoid passing lock to child command for image transfer 2021-08-09 13:20:53 UTC

Description Ilan Zuckerman 2021-06-01 11:04:36 UTC
Created attachment 1788466 [details]
Logs

Description of problem:

There is a NullPointerException on engine log when trying to remove the disks which were earlier uploaded to the engine NFS storage type with usage of transfer_url.
The attempt to delete disks is done AFTER taking the checksums of the uploaded disks. So the reason for the NPE might be that.
This NPE wasnt seen when deleting the disks uploaded with proxy_url,
This NPE Was seen also when deleting the disks (with proxy) on a block storage type (ISCSI).

Although NPE on the engine log, the actual disks removal is finished successfully, and there is no sign of a failure on the engine UI as well.

Marking this as a regression, because this NPE wasnt seen earlier. Although this is a regression bug, I am marking it as 'medium' because this doesnt look like it affects the disk upload or removal of the disks.
However, this could affect some other areas that we are not yet aware of. And also the NPE shouldn't be there if all works fine.

Version-Release number of selected component (if applicable):
rhv-4.4.6-10

How reproducible:
100%

Steps to Reproduce:
1. Starting parallel upload of disks ['/root/upload/qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', '/root/upload/qcow2_v3_cow_sparse_disk_1G.qcow2', '/root/upload/test_raw_to_delete.raw', '/root/upload/1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']
2. Check image checksums for local images: ['qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', 'qcow2_v3_cow_sparse_disk_1G.qcow2', 'test_raw_to_delete.raw', '1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']
3. Check disk checksums for the uploaded disks: ['qcow2_v2_rhel74_ovirt42_guest_disk_1G.qcow2', 'qcow2_v3_cow_sparse_disk_1G.qcow2', 'test_raw_to_delete.raw', '1G_Fedora-Workstation-Live-x86_64-25-1.3.iso']


Actual results:
NPE

2021-06-01 13:42:58,077+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Exception in invoking callback of command TransferDiskImage (922f35a5-a542-4c79-9a0e-0be0d920f84d): NullPointerException: 
2021-06-01 13:42:58,077+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Error invoking callback method 'doPolling' for 'ACTIVE' command '922f35a5-a542-4c79-9a0e-0be0d920f84d'
2021-06-01 13:42:58,077+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [a74c219e-622e-45a0-b05c-c80437802c13] Exception: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getImageAlias(TransferDiskImageCommand.java:326)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getTransferDescription(TransferDiskImageCommand.java:1401)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.handleFinishedSuccess(TransferDiskImageCommand.java:904)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.executeStateHandler(TransferDiskImageCommand.java:535)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.proceedCommandExecution(TransferDiskImageCommand.java:492)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)




Expected results:
Shouldnt be NPE

Additional info:
Attaching engine + vdsm + daemon logs from all involved.

Comment 1 RHEL Program Management 2021-06-14 09:51:07 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Bella Khizgiyaev 2021-08-12 08:02:51 UTC
The steps listed above are not reproducing the bug, the only way to reproduce it was by debugging and adding breakpoints to the transfer thread.

need to check the following scenario:

1. initiate disk upload - > as soon as the disk status becomes OK try to delete it.
2. initiate disk download - > as soon as the disk status becomes OK try to delete it.

No NPEs errors should be raised for both of the scenarios.

Comment 4 sshmulev 2021-08-16 14:25:21 UTC
Verified.

Versions:
ovirt-engine-4.4.8.4-0.7.el8ev.noarch

Steps to Reproduce:
The same steps as described in the bug Description which was tested in the automation test "TestUploadImages".
First tried to reproduce the NullPointerException in an older version without the fix (ovirt-engine-4.4.8.3-0.10.el8ev.noarch):

2021-08-16 16:18:47,423+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-5) [b5864677-df89-4554-9a92-b2ad41ff3164] Exception: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getImageAlias(TransferDiskImageCommand.java:326)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.getTransferDescription(TransferDiskImageCommand.java:1401)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.handleFinishedSuccess(TransferDiskImageCommand.java:904)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.executeStateHandler(TransferDiskImageCommand.java:535)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.proceedCommandExecution(TransferDiskImageCommand.java:492)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
        at org.glassfish.javax.enterprise.concurrent@1.0.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)


Then, tried to run it on the fixed version (ovirt-engine-4.4.8.4-0.7.el8ev.noarch) twice successfully with no NPE.

Comment 5 Sandro Bonazzola 2021-08-19 06:23:13 UTC
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.