Bug 1875951

Summary: Disk hot-unplug fails on engine side with NPE in setDiskVmElements after unplugging from the VM.
Product: Red Hat Enterprise Virtualization Manager Reporter: Anitha Udgiri <audgiri>
Component: ovirt-engineAssignee: Ahmad Khiet <akhiet>
Status: CLOSED ERRATA QA Contact: Ilan Zuckerman <izuckerm>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.9CC: aefrat, bcholler, dfodor, eshenitz, gveitmic, jeokim, mavital, tnisan
Target Milestone: ovirt-4.4.4   
Target Release: 4.4.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-02 13:57:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ssl_access_log of RHV-M's sosreport none

Description Anitha Udgiri 2020-09-04 17:24:40 UTC
Description of problem:

Customer is running RHVM 4.3.9 and gets a NPE(NullPointerException) in ovirt-engine when some disks are detached(hot-unplug) from VM via RESTapi.

Version-Release number of selected component (if applicable):

rhvm-4.3.9.4-15.bz1830762.el7.noarch

How reproducible:

Intermittent

Comment 4 Eyal Shenitzky 2020-09-07 14:16:52 UTC
Hi Anitha,

Can you please add the steps to reproduce the issue?

Comment 5 Anitha Udgiri 2020-09-08 16:57:11 UTC
(In reply to Eyal Shenitzky from comment #4)
> Hi Anitha,
> 
> Can you please add the steps to reproduce the issue?

Eyal,
   Have the Customer for the steps they followed when they saw this error ( it is intermittent as per their update ) . Will update as as soon as they respond.

Comment 13 Jeongtae Kim 2020-09-11 16:08:03 UTC
Created attachment 1714584 [details]
ssl_access_log of RHV-M's sosreport

Comment 14 Anitha Udgiri 2020-09-14 18:44:30 UTC
(In reply to Eyal Shenitzky from comment #4)
> Hi Anitha,
> 
> Can you please add the steps to reproduce the issue?

Eyal,

Jeongtae Kim has provided you the details and Bimal also has provided some steps for reproducing the issue.

I have added another Customer facing the issue.

Let me know if there is anything else that you would like.

Thanks,
Anitha

Comment 17 Germano Veit Michel 2020-09-17 00:37:32 UTC
Ahmad, I've reproduced the NPE at BaseDisk.java:88 in 4.4. 
It seems to be caused by simultaneous access to vm_disk_element.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.1.10-0.1.el8ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create 10 Disks
2. Attach all of them to a VM
3. Detach them in parallel
4. 1 or 2 will fail with NPE at BaseDisk.java:88

Example:
~~~
# cat reproducer.sh 
#!/bin/bash
ENGINE='engine.kvm'
USER='admin'
DOMAIN='internal'
PASS='redhat'
DATA_VM='Ubuntu'
DISK_IDS=(
'3d666ced-288f-4ddb-9c31-a533af191469'
'105d8c0b-566d-47ef-8cda-d7add5253f08'
'22fe589e-0257-4232-bbd7-0ee7d2a8ab20'
'17baf778-b1d9-4061-9537-a0de8113f5e7'
'2245d18d-95bd-4966-b60d-38e82f8d574c'
'dbf65c1a-9dbf-49c6-9bb4-3b461d6d862f'
'a43885df-2855-4262-ab78-dfece6708924'
'a8e17a6f-2725-4aa9-8501-f472f3aa3962'
'7fc44709-89b2-40ee-8418-fda7fc984561'
'8b79e0d5-2997-4c2b-a79c-5678fac4fea1'
)

VM_URL=$(curl -k  -u "${USER}@${DOMAIN}:${PASS}" -X GET https://${ENGINE}/ovirt-engine/api/vms?search=name%3D${DATA_VM} | xmllint --xpath 'string(/vms/vm/@href)' -)

for ID in ${DISK_IDS[@]}
do
curl --cacert /etc/pki/ovirt-engine/ca.pem -u "${USER}@${DOMAIN}:${PASS}" -X POST -H "Accept: application/xml" -H "Content-type: application/xml" https://${ENGINE}${VM_URL}/diskattachments/ --data-binary @- << EOF
<disk_attachment>
  <active>true</active>
  <interface>virtio_scsi</interface>
  <disk id="${ID}"/>
</disk_attachment>
EOF
done

for ID in ${DISK_IDS[@]}
do
curl -k -u "${USER}@${DOMAIN}:${PASS}" -X DELETE -H "Accept: application/xml" -H "Content-type: application/xml" https://${ENGINE}${VM_URL}/diskattachments/${ID}?detach_only=true &
done

~~~

The NPEs are the same as in comment #1.

2020-09-17 10:20:39,330+10 ERROR [org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand] (default task-23) [cd7e5534-0d1d-438f-b609-305da6e004ff] Command 'org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand' failed: null
2020-09-17 10:20:39,330+10 ERROR [org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand] (default task-23) [cd7e5534-0d1d-438f-b609-305da6e004ff] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.storage.BaseDisk.lambda$setDiskVmElements$0(BaseDisk.java:88)
        at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:177)
        at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
        at java.base/java.util.Collections$2.tryAdvance(Collections.java:4747)
        at java.base/java.util.Collections$2.forEachRemaining(Collections.java:4755)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.storage.BaseDisk.setDiskVmElements(BaseDisk.java:88)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VmHandler.updateDisksVmDataForVm(VmHan

Comment 20 Ilan Zuckerman 2020-12-08 14:36:00 UTC
Verified on rhv-release-4.4.4-4-001.noarch with the steps and script from comment #17

1. Create 10 Disks
2. Attach all of them to a VM
3. Detach them in parallel

All the detachments succeeded without any ERRORs.
I repeated the steps with VM up/down with disks thin and, with VM up/down with disks raw

Comment 24 errata-xmlrpc 2021-02-02 13:57:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: RHV-M(ovirt-engine) 4.4.z security, bug fix, enhancement update [ovirt-4.4.4]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0381

Comment 25 meital avital 2022-08-08 08:48:44 UTC
Due to QE capacity, we are not going to cover this issue in our automation