Bug 468438

Summary: List corruption in cmirror causes machine lock-up or cmirror processing stoppage
Product: Red Hat Enterprise Linux 5 Reporter: Jonathan Earl Brassow <jbrassow>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: agk, ccaulfie, cmarthal, dwysocha, edamato, heinzm, mbroz, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 21:26:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jonathan Earl Brassow 2008-10-24 18:33:13 UTC
'commit e07369b28d7a569e742d80152ef10c9d42bc2650' introduced a bug where a tfr struct would get added to one queue (cluster_queue) before being removed from another (x->delay_queue).

This causes a variety of issues, including:
- machine hang (if clogd is in real time scheduling mode)
- LVM/dmsetup command hangs
- sync stoppage
... and any number of things that can result from corrupted list or lost requests.

Comment 1 Jonathan Earl Brassow 2008-10-27 22:58:19 UTC
commit cc5877e65fad20dd8657881a7ca7361e6e4c08bf
Author: Jonathan Brassow <jbrassow>
Date:   Fri Oct 24 13:42:06 2008 -0500

    clogd: Fix for bug 468438 - list corruption

    'commit e07369b28d7a569e742d80152ef10c9d42bc2650' introduced the
    concept of a delay queue to hold requests while membership changes
    occurred.  Sometimes, a request would be added to the delay_queue
    /and/ the cluster_queue, resulting in list corruption.  Depending
    on how the list was corrupted, infinite loops could occur, or
    requests could simply be lost.

Comment 7 errata-xmlrpc 2009-01-20 21:26:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html