Bug 173157

Summary: kernel dm-log: big endian 64-bit corruption
Product: Red Hat Enterprise Linux 4 Reporter: Alasdair Kergon <agk>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0132 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 20:43:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168429    

Description Alasdair Kergon 2005-11-14 18:28:56 UTC
The linux bitset operators (test_bit, set_bit etc) work on arrays of
"unsigned long".  dm-log uses such bitsets but treats them as
arrays of uint32_t, only allocating and zeroing a multiple of 4 bytes
(as 'clean_bits' is a uint32_t).
                                                                                
The patch below fixes this problem.
                                                                                
The problem is specific to 64-bit big endian machines such as s390x or
ppc-64 and can prevent pvmove terminating.
                                                                                
                                                                                
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
                                                                                
   0 0 0 0 X X X X
                 ^
                 here
                                                                                
which hasn't been cleared properly.
                                                                                
As the dm-raid1 code only syncs and counts regions which have a 0 in
the 'sync_bits' bitset, and only finishes when it has counted high
enough, a large number of 1's among those 'X's will cause the sync to
not complete.
                                                                                
It is worth noting that the code uses the same bitsets for in-memory
and on-disk logs.  As these bitsets are host-endian and host-sized,
this means that they cannot safely be moved between computers with
different architectures.
                                                                                
                                                                                
Signed-off-by: Neil Brown <neilb>

dm-log-bitset-fix.patch

Comment 5 Red Hat Bugzilla 2006-03-07 20:43:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html