Bug 173157 - kernel dm-log: big endian 64-bit corruption
Summary: kernel dm-log: big endian 64-bit corruption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-11-14 18:28 UTC by Alasdair Kergon
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 20:43:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:808 0 normal SHIPPED_LIVE Important: kernel security update 2005-10-27 04:00:00 UTC
Red Hat Product Errata RHSA-2006:0132 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3 2006-03-09 16:31:00 UTC

Description Alasdair Kergon 2005-11-14 18:28:56 UTC
The linux bitset operators (test_bit, set_bit etc) work on arrays of
"unsigned long".  dm-log uses such bitsets but treats them as
arrays of uint32_t, only allocating and zeroing a multiple of 4 bytes
(as 'clean_bits' is a uint32_t).
                                                                                
The patch below fixes this problem.
                                                                                
The problem is specific to 64-bit big endian machines such as s390x or
ppc-64 and can prevent pvmove terminating.
                                                                                
                                                                                
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
                                                                                
   0 0 0 0 X X X X
                 ^
                 here
                                                                                
which hasn't been cleared properly.
                                                                                
As the dm-raid1 code only syncs and counts regions which have a 0 in
the 'sync_bits' bitset, and only finishes when it has counted high
enough, a large number of 1's among those 'X's will cause the sync to
not complete.
                                                                                
It is worth noting that the code uses the same bitsets for in-memory
and on-disk logs.  As these bitsets are host-endian and host-sized,
this means that they cannot safely be moved between computers with
different architectures.
                                                                                
                                                                                
Signed-off-by: Neil Brown <neilb>

dm-log-bitset-fix.patch

Comment 5 Red Hat Bugzilla 2006-03-07 20:43:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html



Note You need to log in before you can comment on or make changes to this bug.