Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1704468

Summary:	[ISCSI] vmware snapshots getting corrupted following hardware move and network change
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	jquinn <jquinn>
Component:	iSCSI	Assignee:	Mike Christie <mchristi>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Madhavi Kasturi <mkasturi>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.1	CC:	ceph-eng-bugs, ceph-qe-bugs, gsitlani, jbiao, kjosy, linuxkidd, mchristi, vumrao
Target Milestone:	rc
Target Release:	4.0
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-03 16:18:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Mike Christie 2019-04-29 23:23:27 UTC

How are you alerted that a vmware snapshot is corrupted? Is it when you try to open it? Could you attach the vmkernel.log when this happens?

Does the issue happen with snapshots you have already taken or taken while the lock error messages are being reported?


Could you give me the output of:

esxcli storage core device vaai status get -d your_device
esxcli storage core device list -d your_device

from one of the hosts.

For the lock bouncing message, you should also run this command:

esxcli storage nmp path list -d your_device

on all the ESX hosts.


The lock bouncing messages are likely due to a ESX hosts seeing igw-ssd-01.vmw-ssd-05 through different paths. The hosts will then try to access the disk through different iscsi gws and the lock will bounce between the gateways. This commonly happens when the hosts have network issues on the iscsi paths. Running the nmp path list command should show some of the hosts trying to use different paths.

Note that this should not cause corruption, but could cause IO failures due to the lock bouncing causing a command to be retried too many times and ESX marking the command as failed.