Description of problem: I was running revolver on the 4 node x86_64 link cluster with 2 gfs filesystems (one w/ 2 legs and one with 3 legs). During the second iteration I failed link-04 and that cause the mirror recovery to go hay wire. When I brought link-04 back into the cluster and attempt to remount the first gfs, it hung. I'll leave the cluster in this state if you'd like to gather more info then provided below: Messages from link-07: [...] dm-cmirror: unable to notify server of completed resync work Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (8 415) Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1 Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (5 793) Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1 Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (5 795) Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1 Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (6 273) Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1 Mar 23 15:19:51 link-07 kernel: dm-cmirror: unable to notify server of completed resyn c work dm-cmirror: unable to get server (3) to mark region (8192) dm-cmirror: Reason :: 1 Mar 23 15:20:00 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (8 192) Mar 23 15:20:00 link-07 kernel: dm-cmirror: Reason :: 1 dm-cmirror: unable to get server (3) to mark region (2067) dm-cmirror: Reason :: 1 Mar 23 15:20:35 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (2 067) Mar 23 15:20:35 link-07 kernel: dm-cmirror: Reason :: 1 dm-cmirror: unable to get server (3) to mark region (2067) dm-cmirror: Reason :: 1 Mar 23 15:20:35 link-07 kernel: dm-cmirror: unable to get server (3) to mark region (2 067) Mar 23 15:20:35 link-07 kernel: dm-cmirror: Reason :: 1 Messages from link-08 (looping over and over): [...] Mar 23 11:44:21 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:21 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:21 link-08 kernel: dm-cmirror: Mark requester : 4 Mar 23 11:44:21 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:21 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:21 link-08 kernel: dm-cmirror: Mark requester : 4 Mar 23 11:44:22 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:22 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:22 link-08 kernel: dm-cmirror: Mark requester : 4 Mar 23 11:44:22 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:22 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:22 link-08 kernel: dm-cmirror: Mark requester : 4 Mar 23 11:44:23 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:23 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:23 link-08 kernel: dm-cmirror: Mark requester : 4 Mar 23 11:44:23 link-08 kernel: dm-cmirror: Attempt to mark a region 5578/C33UfFkJ which is being recovered. Mar 23 11:44:23 link-08 kernel: dm-cmirror: Current recoverer: 1 Mar 23 11:44:23 link-08 kernel: dm-cmirror: Mark requester : 4 [root@link-07 ~]# dmsetup table revolver-mirror2_mimage_2: 0 10485760 linear 8:49 384 revolver-mirror1_mlog: 0 8192 linear 8:113 384 revolver-mirror2_mimage_1: 0 10485760 linear 8:33 384 revolver-mirror2_mimage_0: 0 10485760 linear 8:1 10486144 revolver-mirror2_mlog: 0 8192 linear 8:17 10486144 revolver-mirror1_mimage_1: 0 10485760 linear 8:17 384 revolver-mirror1_mimage_0: 0 10485760 linear 8:1 384 revolver-mirror2: 0 10485760 mirror clustered_disk 5 253:6 1024 LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3U9Iz0MnCtDa0QV7z8Qpsi749eaAIovqe nosync block_on_error 3 253:7 0 253:8 0 253:9 0 VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760 revolver-mirror1: 0 10485760 mirror clustered_disk 5 253:2 1024 LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34coTfAbowArYJ4dLpVYZKagWC33UfFkJ nosync block_on_error 2 253:3 0 253:4 0 VolGroup00-LogVol00: 0 151781376 linear 3:2 384 [root@link-07 ~]# dmsetup info Name: revolver-mirror2_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3isDcxNcz4wZdJDn4Xe8iZUruUQJ4ZRTe Name: revolver-mirror1_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34coTfAbowArYJ4dLpVYZKagWC33UfFkJ Name: revolver-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3mZUe0RrJKkU91zuIoqHmYHWD8uboH8f5 Name: revolver-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34L8Fe5dimlQkHP0J4VRTo6VYGF1Vl6yp Name: revolver-mirror2_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3U9Iz0MnCtDa0QV7z8Qpsi749eaAIovqe Name: revolver-mirror1_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3kUiOVkdLj6U18LdGB92sLcSI1TO7Rgts Name: revolver-mirror1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3W26y7KjUX3f6NjgBr0GjYrdSuxMZHA4b Name: revolver-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3OTZEJ9N5A6CnpxcgWLLsFYES7vrRWrGE Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xQjIYEP6InPgUU5nsDPYSZl5EAEKqRWcY Name: revolver-mirror1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik33eay0GrlKTcJaZKdaEAig2hY2MNmHS5q Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xrapcuzOGNgADTzIRUNTk0MZBbtWAyXhh Version-Release number of selected component (if applicable): 2.6.9-50.ELsmp cmirror-kernel-2.6.9-25.0
'Reason' should be a negative number. This suggests that the client is recieving a message from the server that is not a response that it is expecting. sequence numbers were put int (3/22/2007) to fix this problem. The cmirror-kernel package you are using was built 3/14/2007. new -> post
post -> modified
Fix verified in cmirror-kernel-2.6.9-30.0.
Assuming this VERIFIED fix got released. Closing. Reopen if it's not yet resolved.