Bug 1224921

Summary: IO operation is getting hanged with Mandatory exclusive lock enabled rbd image
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tanay Ganguly <tganguly>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.3.0CC: ceph-eng-bugs, flucifre, hnallurv, jdillama, kdreyer, kurs, tganguly
Target Milestone: rc   
Target Release: 1.3.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-0.94.5-2.el7cp Ubuntu: ceph_0.94.5-2redhat1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-29 14:42:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tanay Ganguly 2015-05-26 08:37:52 UTC
Description of problem:
IO operation is getting hanged while trying to write on same rbd image ( mandatory exclusive feature enabled )  one after the other from 3 different VMs.

Version-Release number of selected component (if applicable):
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
librbd1-0.94.1-10.el7cp.x86_64

How reproducible:
2 out of 2 times

Steps to Reproduce:
1. Create a rbd image,take snap and clone it with --image-features 5 (Exclusive lock enabled)

rbd image 'snap1':
        size 13240 MB in 3310 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.a0f202eb141f2
        format: 2
        features: layering, exclusive
        flags: 
        parent: Tanay-RBD/Anfield@1
        overlap: 13240 MB

2. Attach the same rbd image on 3 different VMs as a spare disk.
3. write a small 10 mb file from all 3 different VMs one after the other.

while true; do dd if=10M of=/dev/vdb bs=1M count=10; echo "Sleeping now"; sleep 45; done

4. When the write complete on 1st VM and starts sleeping( it was taking around 4-5 seconds to complete the write) start the same command on 2nd Vm and so on for 3rd vm.

Actual results:
The lock was getting changed when the VM2 tries to write the file and same for VM3 and VM1

TG1     client.663435 (VM1)
TG2     client.663762 (VM2)
TEST    client.662967 (VM3)

It was working smoothly for 20 odd iterations, but after that the IO didn't continue and the lock got stuck  with VM1 forever, hence the IO is stalled on all the 3 Vms.

And i could see all the clients got blacklisted.

ceph osd blacklist ls
listed 3 entries
10.12.27.45:0/1005465 2015-05-26 04:06:34.530285
10.12.27.45:0/2005465 2015-05-26 04:02:37.331390
10.12.27.45:0/1005376 2015-05-26 04:03:16.993895

Expected results:
The IO should have continued and the lock changing should have happened forever among the 3 VMs.

Additional info:
After a while the the blacklisting is showing no entries.
listed 0 entries

Didn't see any log message on MON and OSD's.

Comment 2 Jason Dillaman 2015-05-26 13:10:56 UTC
Is this different from BZ 1223652 (besides three VMs instead of two)?

Comment 3 Tanay Ganguly 2015-05-27 09:08:30 UTC
Yes,

The writes was not parallel from all the 3 VM, it was one after the other.
There was a sleep for 45 seconds after every write.

It was good for some 20 odd iterations, then it stopped changing the lock owner.

Comment 4 Josh Durgin 2015-05-28 19:05:33 UTC
Since the dd didn't use oflag=direct, the writes did go through the page cache in the vm, and there could have been some parallelism.

This bug may be different in that all three clients became blacklisted. Similar reasoning as for BZ 1223652 makes me inclined to address this in 1.3.1 or z-stream, and not block 1.3.0.

Comment 7 Ken Dreyer (Red Hat) 2015-12-10 20:45:45 UTC
Fixed upstream in v0.94.4

Comment 9 Tanay Ganguly 2016-02-05 10:17:39 UTC
Marking this BUG as Verified.

Ran the same test for 1000+ iterations.
The lock owner continuously changed

Comment 11 errata-xmlrpc 2016-02-29 14:42:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313