Cluster mirrors do not (and can not) allow writes to mirror regions which have not recovered yet. When a mirror suspends it: 1) stops recovery 2) flushes the work queue (finishing outstanding writes) 3) calls log presuspend The log has no way of knowing that the mirror is suspending until #3, so it blocks any outstanding writes to regions that are not-in-sync. This could hang the mirror because recovery has been stopped and the not-in-sync regions will never be cleared - delaying the writes indefinitely. The solution is to switch the order of operations to: 1) stop recovery 2) call log presuspend 3) flush work queue By doing this, the log knows that recovery has been stopped and that there will be no collisions with writes. Therefore, it can allow writes to regions which have not yet been recovered.
committed in stream U6 build 61. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html