Bug 173157 - kernel dm-log: big endian 64-bit corruption
kernel dm-log: big endian 64-bit corruption
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alasdair Kergon
Brian Brock
:
Depends On:
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-11-14 13:28 EST by Alasdair Kergon
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 15:43:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alasdair Kergon 2005-11-14 13:28:56 EST
The linux bitset operators (test_bit, set_bit etc) work on arrays of
"unsigned long".  dm-log uses such bitsets but treats them as
arrays of uint32_t, only allocating and zeroing a multiple of 4 bytes
(as 'clean_bits' is a uint32_t).
                                                                                
The patch below fixes this problem.
                                                                                
The problem is specific to 64-bit big endian machines such as s390x or
ppc-64 and can prevent pvmove terminating.
                                                                                
                                                                                
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
                                                                                
   0 0 0 0 X X X X
                 ^
                 here
                                                                                
which hasn't been cleared properly.
                                                                                
As the dm-raid1 code only syncs and counts regions which have a 0 in
the 'sync_bits' bitset, and only finishes when it has counted high
enough, a large number of 1's among those 'X's will cause the sync to
not complete.
                                                                                
It is worth noting that the code uses the same bitsets for in-memory
and on-disk logs.  As these bitsets are host-endian and host-sized,
this means that they cannot safely be moved between computers with
different architectures.
                                                                                
                                                                                
Signed-off-by: Neil Brown <neilb@suse.de>

dm-log-bitset-fix.patch
Comment 5 Red Hat Bugzilla 2006-03-07 15:43:47 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.