Bug 1875951
| Summary: | Disk hot-unplug fails on engine side with NPE in setDiskVmElements after unplugging from the VM. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Anitha Udgiri <audgiri> | ||||
| Component: | ovirt-engine | Assignee: | Ahmad Khiet <akhiet> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Ilan Zuckerman <izuckerm> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.3.9 | CC: | aefrat, bcholler, dfodor, eshenitz, gveitmic, jeokim, mavital, tnisan | ||||
| Target Milestone: | ovirt-4.4.4 | ||||||
| Target Release: | 4.4.4 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-02-02 13:57:12 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Anitha Udgiri
2020-09-04 17:24:40 UTC
Hi Anitha, Can you please add the steps to reproduce the issue? (In reply to Eyal Shenitzky from comment #4) > Hi Anitha, > > Can you please add the steps to reproduce the issue? Eyal, Have the Customer for the steps they followed when they saw this error ( it is intermittent as per their update ) . Will update as as soon as they respond. Created attachment 1714584 [details]
ssl_access_log of RHV-M's sosreport
(In reply to Eyal Shenitzky from comment #4) > Hi Anitha, > > Can you please add the steps to reproduce the issue? Eyal, Jeongtae Kim has provided you the details and Bimal also has provided some steps for reproducing the issue. I have added another Customer facing the issue. Let me know if there is anything else that you would like. Thanks, Anitha Ahmad, I've reproduced the NPE at BaseDisk.java:88 in 4.4.
It seems to be caused by simultaneous access to vm_disk_element.
Version-Release number of selected component (if applicable):
ovirt-engine-4.4.1.10-0.1.el8ev.noarch
How reproducible:
Always
Steps to Reproduce:
1. Create 10 Disks
2. Attach all of them to a VM
3. Detach them in parallel
4. 1 or 2 will fail with NPE at BaseDisk.java:88
Example:
~~~
# cat reproducer.sh
#!/bin/bash
ENGINE='engine.kvm'
USER='admin'
DOMAIN='internal'
PASS='redhat'
DATA_VM='Ubuntu'
DISK_IDS=(
'3d666ced-288f-4ddb-9c31-a533af191469'
'105d8c0b-566d-47ef-8cda-d7add5253f08'
'22fe589e-0257-4232-bbd7-0ee7d2a8ab20'
'17baf778-b1d9-4061-9537-a0de8113f5e7'
'2245d18d-95bd-4966-b60d-38e82f8d574c'
'dbf65c1a-9dbf-49c6-9bb4-3b461d6d862f'
'a43885df-2855-4262-ab78-dfece6708924'
'a8e17a6f-2725-4aa9-8501-f472f3aa3962'
'7fc44709-89b2-40ee-8418-fda7fc984561'
'8b79e0d5-2997-4c2b-a79c-5678fac4fea1'
)
VM_URL=$(curl -k -u "${USER}@${DOMAIN}:${PASS}" -X GET https://${ENGINE}/ovirt-engine/api/vms?search=name%3D${DATA_VM} | xmllint --xpath 'string(/vms/vm/@href)' -)
for ID in ${DISK_IDS[@]}
do
curl --cacert /etc/pki/ovirt-engine/ca.pem -u "${USER}@${DOMAIN}:${PASS}" -X POST -H "Accept: application/xml" -H "Content-type: application/xml" https://${ENGINE}${VM_URL}/diskattachments/ --data-binary @- << EOF
<disk_attachment>
<active>true</active>
<interface>virtio_scsi</interface>
<disk id="${ID}"/>
</disk_attachment>
EOF
done
for ID in ${DISK_IDS[@]}
do
curl -k -u "${USER}@${DOMAIN}:${PASS}" -X DELETE -H "Accept: application/xml" -H "Content-type: application/xml" https://${ENGINE}${VM_URL}/diskattachments/${ID}?detach_only=true &
done
~~~
The NPEs are the same as in comment #1.
2020-09-17 10:20:39,330+10 ERROR [org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand] (default task-23) [cd7e5534-0d1d-438f-b609-305da6e004ff] Command 'org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand' failed: null
2020-09-17 10:20:39,330+10 ERROR [org.ovirt.engine.core.bll.storage.disk.DetachDiskFromVmCommand] (default task-23) [cd7e5534-0d1d-438f-b609-305da6e004ff] Exception: java.lang.NullPointerException
at org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.storage.BaseDisk.lambda$setDiskVmElements$0(BaseDisk.java:88)
at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:177)
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:4747)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:4755)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.storage.BaseDisk.setDiskVmElements(BaseDisk.java:88)
at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VmHandler.updateDisksVmDataForVm(VmHan
Verified on rhv-release-4.4.4-4-001.noarch with the steps and script from comment #17 1. Create 10 Disks 2. Attach all of them to a VM 3. Detach them in parallel All the detachments succeeded without any ERRORs. I repeated the steps with VM up/down with disks thin and, with VM up/down with disks raw Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: RHV-M(ovirt-engine) 4.4.z security, bug fix, enhancement update [ovirt-4.4.4]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0381 Due to QE capacity, we are not going to cover this issue in our automation |