Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1119226

Summary: Thin provisioning disks broken on block storage when using pthreading 1.3
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: python-pthreadingAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact: Ruediger Landmann <rlandman>
Priority: urgent    
Version: 3.4.0CC: amureini, bazulay, danken, dougsland, gamado, gklein, iheim, lbopf, nsoffer, oourfali, pstehlik, scohen, tnisan, ybronhei, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, when using pthreading 1.3, disk extend requests sent over the mailbox were not handled, which caused virtual machines to pause when their disk(s) became full. This happened because a locked() method was incorrectly handled by VDSM. Now, a patch adds the missing method, implementing it in the same way that Python implements it, and VDSM interacts correctly with pthreading 1.3.
Story Points: ---
Clone Of: 1117795 Environment:
Last Closed: 2014-07-29 14:19:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117795    
Bug Blocks:    
Attachments:
Description Flags
Fake dd simulating errors when accessing the inbox none

Comment 2 Jiri Belka 2014-07-18 13:16:05 UTC
Please provide exact steps for reproduction/verification. If not possible would be enough to check python code if it contains the hack with Lock/_Lock class?

Comment 3 Jiri Belka 2014-07-22 08:13:05 UTC
Could you provide exact steps to reproduce this issue? I can't reproduce it with ISCSI-based VM disk (writing into it and it got extended without pause) on 0.1.3-1?

Comment 4 Nir Soffer 2014-07-22 12:32:27 UTC
This happens when you have some io error when reading from the inbox lv
on the master domain. The current code will log an error and try again,
while the old code was checking if a lock is locked, and because the 
locked() method was not implemented by the lock in pthreading < 0.1.3-3,
the thread would exit.

So to reproduce, you have to cause an io error when accessing the inbox
lv on the master domain.

One option to do this is to replace /usr/bin/dd with a script simulating
errors. Check the dd.fake attachment for details.

Comment 5 Nir Soffer 2014-07-22 12:33:14 UTC
Created attachment 919911 [details]
Fake dd simulating errors when accessing the inbox

Comment 6 Jiri Belka 2014-07-22 14:43:11 UTC
ok, 0.1.3-3. (with older version no msg for mailbox lv io error were logged.)

simulating io issue from inbox lv:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
libvirtEventLoop::INFO::2014-07-22 14:38:13,548::vm::4574::vm.Vm::(_onIOError) vmId=`cb025f1d-e18f-44cc-96f8-8aab6c99de8a`::abnormal vm stop device virtio-disk0 error eother

the VM gets paused.

stop simulating io issue makes the VM change status to Up.

Comment 7 Jiri Belka 2014-07-23 07:11:03 UTC
Verified based on #6.

Comment 8 Gal Amado 2014-07-24 12:25:21 UTC
seems like we need to sort out the Installation instruction (inline as comment on dd.fake)

What worked for me was :
...
chmod +x dd.fake
cd /usr/bin
mv /bin/dd /usr/bin/dd.real
ln -sf /usr/bin/dd.fake /bin/dd
...

Comment 9 Jiri Belka 2014-07-24 12:32:57 UTC
You discovered difference between Fedora and RHEL :)

Comment 10 Nir Soffer 2014-07-24 13:07:36 UTC
(In reply to Gal Amado from comment #8)
> seems like we need to sort out the Installation instruction (inline as
> comment on dd.fake)
> 
> What worked for me was :
> ...
> chmod +x dd.fake
> cd /usr/bin
> mv /bin/dd /usr/bin/dd.real

This leaves you fro a moment without a dd program, so if vdsm try to run dd now, it will fail.

This is why I was using ln, which create a hard link of dd - you have now 2 dd
programs.

> ln -sf /usr/bin/dd.fake /bin/dd

And this line replaces one of the real dd programs with a symbolic link atomically.

Comment 12 errata-xmlrpc 2014-07-29 14:19:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0975.html