Description of problem: IO operation is getting hanged while trying to write on same rbd image ( mandatory exclusive feature enabled ) one after the other from 3 different VMs. Version-Release number of selected component (if applicable): ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) librbd1-0.94.1-10.el7cp.x86_64 How reproducible: 2 out of 2 times Steps to Reproduce: 1. Create a rbd image,take snap and clone it with --image-features 5 (Exclusive lock enabled) rbd image 'snap1': size 13240 MB in 3310 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.a0f202eb141f2 format: 2 features: layering, exclusive flags: parent: Tanay-RBD/Anfield@1 overlap: 13240 MB 2. Attach the same rbd image on 3 different VMs as a spare disk. 3. write a small 10 mb file from all 3 different VMs one after the other. while true; do dd if=10M of=/dev/vdb bs=1M count=10; echo "Sleeping now"; sleep 45; done 4. When the write complete on 1st VM and starts sleeping( it was taking around 4-5 seconds to complete the write) start the same command on 2nd Vm and so on for 3rd vm. Actual results: The lock was getting changed when the VM2 tries to write the file and same for VM3 and VM1 TG1 client.663435 (VM1) TG2 client.663762 (VM2) TEST client.662967 (VM3) It was working smoothly for 20 odd iterations, but after that the IO didn't continue and the lock got stuck with VM1 forever, hence the IO is stalled on all the 3 Vms. And i could see all the clients got blacklisted. ceph osd blacklist ls listed 3 entries 10.12.27.45:0/1005465 2015-05-26 04:06:34.530285 10.12.27.45:0/2005465 2015-05-26 04:02:37.331390 10.12.27.45:0/1005376 2015-05-26 04:03:16.993895 Expected results: The IO should have continued and the lock changing should have happened forever among the 3 VMs. Additional info: After a while the the blacklisting is showing no entries. listed 0 entries Didn't see any log message on MON and OSD's.
Is this different from BZ 1223652 (besides three VMs instead of two)?
Yes, The writes was not parallel from all the 3 VM, it was one after the other. There was a sleep for 45 seconds after every write. It was good for some 20 odd iterations, then it stopped changing the lock owner.
Since the dd didn't use oflag=direct, the writes did go through the page cache in the vm, and there could have been some parallelism. This bug may be different in that all three clients became blacklisted. Similar reasoning as for BZ 1223652 makes me inclined to address this in 1.3.1 or z-stream, and not block 1.3.0.
Fixed upstream in v0.94.4
Marking this BUG as Verified. Ran the same test for 1000+ iterations. The lock owner continuously changed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0313