Description of problem: After killing one of the legs in a 3 leg corelog mirror, as well as one on the nodes in the cluster (link-07), all gfs I/O that I had going to that one cmirror stopped, and all future I/O got the standard "`/mnt/mirror3': Input/output error". The down conversion appeared to work fine (although, it was suspiciously instantaneous) and clvmd is not deadlocked. The only issue is that one gfs filesystem which appears to be stuck in dlm recovery # BEFORE THE FAILURE [root@link-08 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2318) mirror1 corey1 Mwi-ao 10.00G mirror1_mlog 100.00 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] corey1 iwi-ao 10.00G /dev/sda1(0) [mirror1_mimage_1] corey1 iwi-ao 10.00G /dev/sdb1(0) [mirror1_mlog] corey1 lwi-ao 4.00M /dev/sdb2(0) mirror2 corey2 Mwi-ao 10.00G 100.00 mirror2_mimage_0(0),mirror2_mimage_1(0) [mirror2_mimage_0] corey2 iwi-ao 10.00G /dev/sdd1(0) [mirror2_mimage_1] corey2 iwi-ao 10.00G /dev/sdc1(0) mirror3 corey3 Mwi-ao 10.00G 100.00 mirror3_mimage_0(0),mirror3_mimage_1(0),mirror3_mimage_2(0) [mirror3_mimage_0] corey3 iwi-ao 10.00G /dev/sde1(0) [mirror3_mimage_1] corey3 iwi-ao 10.00G /dev/sdf1(0) [mirror3_mimage_2] corey3 iwi-ao 10.00G /dev/sdg1(0) # ON ALL NODES [root@link-08 ~]# echo offline > /sys/block/sde/device/state # I KILLED LINK-07 [root@link-08 ~]# lvs -a -o +devices /dev/sde1: open failed: No such device or address LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2318) mirror1 corey1 Mwi-ao 10.00G mirror1_mlog 100.00 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] corey1 iwi-ao 10.00G /dev/sda1(0) [mirror1_mimage_1] corey1 iwi-ao 10.00G /dev/sdb1(0) [mirror1_mlog] corey1 lwi-ao 4.00M /dev/sdb2(0) mirror2 corey2 Mwi-ao 10.00G 100.00 mirror2_mimage_0(0),mirror2_mimage_1(0) [mirror2_mimage_0] corey2 iwi-ao 10.00G /dev/sdd1(0) [mirror2_mimage_1] corey2 iwi-ao 10.00G /dev/sdc1(0) mirror3 corey3 Mwi-ao 10.00G 100.00 mirror3_mimage_2(0),mirror3_mimage_1(0) [mirror3_mimage_1] corey3 iwi-ao 10.00G /dev/sdf1(0) [mirror3_mimage_2] corey3 iwi-ao 10.00G /dev/sdg1(0) [root@link-08 ~]# touch /mnt/mirror3/foo touch: cannot touch `/mnt/mirror3/foo': Input/output error [root@link-08 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 72G 2.3G 66G 4% / /dev/hda1 99M 19M 75M 21% /boot none 500M 0 500M 0% /dev/shm /dev/mapper/corey1-mirror1 9.5G 3.9M 9.5G 1% /mnt/mirror1 df: `/mnt/mirror3': Input/output error /dev/mapper/corey2-mirror2 9.5G 3.9M 9.5G 1% /mnt/mirror2 [root@link-08 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "3" 85 8 run S-10,200,0 [2 1 4] DLM Lock Space: "2" 81 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "3" 87 9 recover 2 - [2 1 4] GFS Mount Group: "2" 83 11 run - [2 1 4] [root@link-08 ~]# dmsetup status corey1-mirror1_mimage_1: 0 20971520 linear corey1-mirror1: 0 20971520 mirror 2 253:7 253:8 20480/20480 1 AA 3 clustered_disk 253:6 A corey1-mirror1_mimage_0: 0 20971520 linear corey2-mirror2_mimage_1: 0 20971520 linear corey2-mirror2_mimage_0: 0 20971520 linear corey2-mirror2: 0 20971520 mirror 2 253:10 253:11 20480/20480 1 AA 1 clustered_core corey3-mirror3_mimage_2: 0 20971520 linear corey3-mirror3_mimage_1: 0 20971520 linear VolGroup00-LogVol01: 0 4063232 linear corey3-mirror3: 0 20971520 mirror 2 253:4 253:3 20480/20480 1 AA 1 clustered_core VolGroup00-LogVol00: 0 151912448 linear corey1-mirror1_mlog: 0 8192 linear [root@link-08 ~]# dmsetup info Name: corey1-mirror1_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1V9V23NFX7ZxmKdVvCMdSh2B0Q5uPTSKB Name: corey1-mirror1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1fydUC356OtfUZVfh74E4yWHGepLEAQu5 Name: corey1-mirror1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1YjGdtpLiSDkmhud2mVjyNgZveaSecV87 Name: corey2-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vpnsQHYANmpEyo0yXJQwqe62JRBXRDtT9 Name: corey2-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vczGS49XAjOu44xh1tr4PQr7WiR5ErZl3 Name: corey2-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vC3HFk9CLB9ELCxhSZBOf1lkkE11Rdz25 Name: corey3-mirror3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2R2yZvBBJDzam6k5CuJbD1uAfD3FTZvNwi Name: corey3-mirror3_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RJMiGbaBAD0GCnWpSCxIZ7ot8if1bjIBE Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-TCx5xJ7FuRhXzJ4g7CvPsw2AhhFBNLQUvNlDc7SvClgdBMh2WD6TraPFgzjSVMRp Name: corey3-mirror3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 7 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RrVxFHPheVHLeYGToOyHqciYLB1QRho7o Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-TCx5xJ7FuRhXzJ4g7CvPsw2AhhFBNLQUrfFjpdEgWnBeJx2UpyC5Mr6XpgXdHWCh Name: corey1-mirror1_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1hCJh1EVAOxhYcsIkh5Fwap1yOkxFeidY Version-Release number of selected component (if applicable): 2.6.9-55.ELlargesmp dlm-kernel-2.6.9-46.16 cmirror-kernel-2.6.9-32.0
More info... [root@link-02 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 13 run - [2 1 4] DLM Lock Space: "clustered_log" 75 14 run - [2 1 4] DLM Lock Space: "1" 77 15 run - [2 1 4] DLM Lock Space: "2" 81 17 run - [2 1 4] DLM Lock Space: "3" 85 19 run - [2 1 4] GFS Mount Group: "1" 79 16 run - [2 1 4] GFS Mount Group: "2" 83 18 run - [2 1 4] GFS Mount Group: "3" 87 20 recover 4 - [2 1 4] [root@link-04 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-04 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "2" 81 8 run - [2 1 4] DLM Lock Space: "3" 85 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "2" 83 9 run - [2 1 4] GFS Mount Group: "3" 87 11 recover 4 - [2 1 4] [root@link-08 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "3" 85 8 run S-10,200,0 [2 1 4] DLM Lock Space: "2" 81 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "3" 87 9 recover 2 - [2 1 4] GFS Mount Group: "2" 83 11 run - [2 1 4] [root@link-02 ~]# cat /proc/cluster/sm_debug 3 0200004f recover state 2 0200004f cb recover state 2 02000057 recover state 4 02000053 recover state 4 0200004f recover state 3 02000057 recover state 4 02000053 recover state 4 0200004f recover state 5 02000057 recover state 4 02000053 recover state 5 [root@link-04 ~]# cat /proc/cluster/sm_debug 3 0200004f recover state 2 0200004f cb recover state 2 02000057 recover state 4 02000053 recover state 4 0200004f recover state 3 02000057 recover state 4 02000053 recover state 4 0200004f recover state 5 02000057 recover state 4 02000053 recover state 5 [root@link-08 ~]# cat /proc/cluster/sm_debug 2 02000057 recover state 2 0200004f recover state 3 02000053 recover state 2 02000057 recover state 2 0200004f recover state 5 02000053 cb recover state 2 02000053 recover state 3 02000057 recover state 2 02000053 recover state 5 02000057 recover state 2 [root@link-02 ~]# cat /proc/cluster/dlm_debug s clvmd update remastered resources 3 updated 1 resources 3 rebuild locks 2 updated 2 resources 2 rebuild locks 1 updated 3 resources 1 rebuild locks clvmd updated 3 resources clvmd rebuild locks 3 rebuilt 1 locks 3 recover event 114 done 2 rebuilt 2 locks 2 recover event 114 done 1 rebuilt 3 locks 1 recover event 114 done clvmd rebuilt 3 locks clvmd recover event 114 done 3 move flags 0,0,1 ids 112,114,114 3 process held requests 3 processed 0 requests 3 resend marked requests 3 resent 0 requests 3 recover event 114 finished 1 move flags 0,0,1 ids 108,114,114 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 114 finished 2 move flags 0,0,1 ids 110,114,114 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 114 finished clvmd move flags 0,0,1 ids 106,114,114 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 114 finished [root@link-04 ~]# cat /proc/cluster/dlm_debug s 1 marked 0 requests 1 purge locks of departed nodes clvmd mark waiting requests clvmd marked 0 requests clvmd purge locks of departed nodes clvmd purged 0 locks clvmd update remastered resources 2 purged 1 locks 2 update remastered resources 1 purged 2 locks 1 update remastered resources 1 updated 3 resources 1 rebuild locks 2 updated 2 resources 2 rebuild locks clvmd updated 3 resources clvmd rebuild locks 1 rebuilt 3 locks 1 recover event 12 done 2 rebuilt 2 locks 2 recover event 12 done clvmd rebuilt 3 locks clvmd recover event 12 done 1 move flags 0,0,1 ids 6,12,12 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 12 finished 2 move flags 0,0,1 ids 8,12,12 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 12 finished clvmd move flags 0,0,1 ids 4,12,12 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 12 finished [root@link-08 ~]# cat /proc/cluster/dlm_debug t 0 locks 3 recover event 24 done 2 purged 1 locks 2 update remastered resources clvmd purged 0 locks clvmd update remastered resources 2 updated 2 resources 2 rebuild locks 2 rebuilt 0 locks 2 recover event 24 done clvmd updated 3 resources clvmd rebuild locks 1 updated 3 resources 1 rebuild locks clvmd rebuilt 0 locks 1 rebuilt 0 locks clvmd recover event 24 done 1 recover event 24 done 3 move flags 0,0,1 ids 22,24,24 3 process held requests 3 processed 0 requests 3 resend marked requests 3 resent 0 requests 3 recover event 24 finished 1 move flags 0,0,1 ids 18,24,24 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 24 finished 2 move flags 0,0,1 ids 20,24,24 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 24 finished clvmd move flags 0,0,1 ids 16,24,24 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 24 finished
[root@link-02 ~]# ps ax -o pid,stat,cmd,wchan PID STAT CMD WCHAN 1 S init [3] - 2 S [migration/0] migration_thread 3 SN [ksoftirqd/0] ksoftirqd 4 S [migration/1] migration_thread 5 SN [ksoftirqd/1] ksoftirqd 6 S< [events/0] worker_thread 7 S< [events/1] worker_thread 8 S< [khelper] worker_thread 9 S< [kacpid] worker_thread 37 S< [kblockd/0] worker_thread 38 S< [kblockd/1] worker_thread 39 S [khubd] hub_thread 62 S [pdflush] pdflush 63 D [pdflush] wait_on_buffer 64 S [kswapd0] kswapd 65 S< [aio/0] worker_thread 66 S< [aio/1] worker_thread 210 S [kseriod] serio_thread 445 S [scsi_eh_0] 16045567552327254017 446 S< [qla2300_0_dpc] 16045567552327254017 506 S [kjournald] kjournald 1896 S< [kmirrord] worker_thread 1993 S<s udevd - 2099 S< [kedac] - 2226 S< [kauditd] kauditd_thread 2338 S [kjournald] kjournald 2918 Ss /sbin/dhclient - - 2961 Ss syslogd -m 0 - 2965 Ss klogd -x syslog 2978 Ss irqbalance - 2989 Ss portmap - 3008 Ss rpc.statd - 3036 Ss rpc.idmapd - 3121 S /usr/sbin/smartd - 3130 Ss /usr/sbin/acpid - 3218 Ss /usr/sbin/sshd - 3237 Ss xinetd -stayaliv - 3255 Ss sendmail: accept - 3264 Ss sendmail: Queue pause 3318 Ss gpm -m /dev/inpu - 3469 Ss crond - 3490 Ss xfs -droppriv -d - 3507 Ss /usr/sbin/atd - 3516 Ss dbus-daemon-1 -- - 3527 Ss hald - 3542 S<sl modclusterd - 3617 Ss /usr/sbin/oddjob - 3653 Ss /usr/sbin/saslau fcntl_setlk 3656 S /usr/sbin/saslau - 3657 S /usr/sbin/saslau fcntl_setlk 3658 S /usr/sbin/saslau fcntl_setlk 3659 S /usr/sbin/saslau fcntl_setlk 3680 S<s ricci -u 101 - 3685 Ss login -- root wait 3686 Ss+ /sbin/mingetty t - 3687 Ss+ /sbin/mingetty t - 3688 Ss+ /sbin/mingetty t - 3689 Ss+ /sbin/mingetty t - 3691 Ss+ /sbin/mingetty t - 3692 Ss+ /sbin/mingetty t - 4238 R+ ps ax -o pid,sta - 4509 Ss -bash wait 4616 Ssl ccsd - 4665 S [cman_comms] cluster_kthread 4666 S [cman_memb] membership_kthread 4667 S< [cman_serviced] serviced 4668 S [cman_hbeat] hello_kthread 4687 Ss fenced -t 120 -w rt_sigsuspend 10514 Ss cupsd - 11595 Rs sshd: root@notty - 27225 S< [dlm_astd] dlm_astd 27226 S< [dlm_recvd] dlm_recvd 27227 S< [dlm_sendd] dlm_sendd 27723 Ssl clvmd -T20 -t 90 - 27724 S< [dlm_recoverd] dlm_recoverd 28467 S [cluster_log_ser - 28512 S< [kmirrord] worker_thread 28513 S< [kcopyd] worker_thread 30090 S<Lsl [dmeventd] - 30111 S< [dlm_recoverd] dlm_recoverd 30112 S< [lock_dlm1] dlm_async 30113 S< [lock_dlm2] dlm_async 30114 S [gfs_scand] - 30115 S [gfs_glockd] gfs_glockd 30121 S [gfs_recoverd] - 30122 S [gfs_logd] - 30123 S [gfs_quotad] - 30124 S [gfs_inoded] - 30128 S< [dlm_recoverd] dlm_recoverd 30138 S< [lock_dlm1] dlm_async 30139 S< [lock_dlm2] dlm_async 30140 S [gfs_scand] - 30141 S [gfs_glockd] gfs_glockd 30142 S [gfs_recoverd] - 30148 S [gfs_logd] - 30149 S [gfs_quotad] - 30150 S [gfs_inoded] - 30154 S< [dlm_recoverd] dlm_recoverd 30164 S< [lock_dlm1] dlm_async 30165 S< [lock_dlm2] dlm_async 30166 S [gfs_scand] - 30167 S [gfs_glockd] gfs_glockd 30168 S [gfs_recoverd] - 30174 S [gfs_logd] - 30175 S [gfs_quotad] - 30176 S [gfs_inoded] - 30242 S xiogen -f buffer pipe_wait 30243 S xdoio -vD pipe_wait 30244 S xiogen -f buffer pipe_wait 30245 S xdoio -vD pipe_wait 30246 S xiogen -f buffer pipe_wait 30248 S xdoio -vD pipe_wait 30252 D xdoio -vD - 30253 D xdoio -vD glock_wait_internal 30254 R xdoio -vD - 30534 S+ tail -f /var/log - 30672 Ss sshd: root@pts/0 - 30679 Ss -bash wait 31346 S< [kmirrord] worker_thread [root@link-04 ~]# ps ax -o pid,stat,cmd,wchan PID STAT CMD WCHAN 1 S init [3] - 2 S [migration/0] migration_thread 3 SN [ksoftirqd/0] ksoftirqd 4 S< [events/0] worker_thread 5 S< [khelper] worker_thread 6 S< [kacpid] worker_thread 30 S< [kblockd/0] worker_thread 31 S [khubd] hub_thread 52 S [pdflush] pdflush 53 D [pdflush] wait_on_buffer 54 S [kswapd0] kswapd 55 S< [aio/0] worker_thread 199 S [kseriod] serio_thread 429 S [scsi_eh_0] 16045567552327254017 454 S [kjournald] kjournald 1530 S<s udevd - 1595 S< [kedac] - 1717 S< [kauditd] kauditd_thread 1822 S [kjournald] kjournald 2383 Ss /sbin/dhclient - - 2426 Ss syslogd -m 0 - 2430 Ss klogd -x syslog 2450 Ss portmap - 2469 Ss rpc.statd - 2496 Ss rpc.idmapd - 2572 S /usr/sbin/smartd - 2581 Ss /usr/sbin/acpid - 2590 Ss cupsd - 2648 Ss /usr/sbin/sshd - 2661 Ss xinetd -stayaliv - 2679 Ss sendmail: accept - 2689 Ss sendmail: Queue pause 2736 Ss gpm -m /dev/inpu - 2883 Ss crond - 2904 Ss xfs -droppriv -d - 2921 Ss /usr/sbin/atd - 2930 Ss dbus-daemon-1 -- - 2941 Ss hald - 2956 S<sl modclusterd - 3051 Ss /usr/sbin/oddjob - 3087 Ss /usr/sbin/saslau fcntl_setlk 3088 S /usr/sbin/saslau - 3089 S /usr/sbin/saslau fcntl_setlk 3090 S /usr/sbin/saslau fcntl_setlk 3091 S /usr/sbin/saslau fcntl_setlk 3100 S<s ricci -u 101 - 3105 Ss login -- root wait 3106 Ss+ /sbin/mingetty t - 3107 Ss+ /sbin/mingetty t - 3108 Ss+ /sbin/mingetty t - 3109 Ss+ /sbin/mingetty t - 3110 Ss+ /sbin/mingetty t - 3111 Ss+ /sbin/mingetty t - 4744 Ss sshd: root@notty - 4801 Ssl ccsd - 4854 S [cman_comms] cluster_kthread 4855 S [cman_memb] membership_kthread 4856 S< [cman_serviced] serviced 4863 S [cman_hbeat] hello_kthread 4878 Ss -bash wait 4914 S+ tail -f /var/log - 4934 Ss fenced -t 120 -w rt_sigsuspend 6402 Ss sshd: root@pts/1 - 6404 Ss -bash wait 6592 S [scsi_eh_1] 16045567552327254017 7192 Ssl clvmd -T20 -t 90 - 7193 S< [dlm_astd] dlm_astd 7194 S< [dlm_recvd] dlm_recvd 7195 S< [dlm_sendd] dlm_sendd 7196 S< [dlm_recoverd] dlm_recoverd 7274 S [cluster_log_ser - 7284 S< [kcopyd] worker_thread 7286 S<Lsl [dmeventd] - 7311 S< [kmirrord] worker_thread 7364 S< [kmirrord] worker_thread 7520 S< [dlm_recoverd] dlm_recoverd 7521 S< [lock_dlm1] dlm_async 7522 S< [lock_dlm2] dlm_async 7523 S [gfs_scand] - 7524 S [gfs_glockd] gfs_glockd 7525 S [gfs_recoverd] - 7526 S [gfs_logd] - 7527 S [gfs_quotad] - 7528 S [gfs_inoded] - 7532 S< [dlm_recoverd] dlm_recoverd 7538 S< [lock_dlm1] dlm_async 7539 S< [lock_dlm2] dlm_async 7540 S [gfs_scand] - 7541 S [gfs_glockd] gfs_glockd 7551 S [gfs_recoverd] - 7552 S [gfs_logd] - 7553 S [gfs_quotad] - 7554 S [gfs_inoded] - 7558 S< [dlm_recoverd] dlm_recoverd 7559 S< [lock_dlm1] dlm_async 7560 S< [lock_dlm2] dlm_async 7561 S [gfs_scand] - 7562 S [gfs_glockd] gfs_glockd 7572 S [gfs_recoverd] - 7573 S [gfs_logd] - 7574 S [gfs_quotad] - 7575 S [gfs_inoded] - 7632 S xiogen -f buffer pipe_wait 7633 S xdoio -vD pipe_wait 7634 S xiogen -f buffer pipe_wait 7635 S xdoio -vD pipe_wait 7636 S xiogen -f buffer pipe_wait 7637 S xdoio -vD pipe_wait 7642 R xdoio -vD - 7643 R xdoio -vD - 7644 D xdoio -vD glock_wait_internal 7708 S< [kmirrord] worker_thread 10021 R+ ps ax -o pid,sta - [root@link-08 ~]# ps ax -o pid,stat,cmd,wchan PID STAT CMD WCHAN 1 S init [3] - 2 S [migration/0] migration_thread 3 SN [ksoftirqd/0] ksoftirqd 4 S [migration/1] migration_thread 5 SN [ksoftirqd/1] ksoftirqd 6 S< [events/0] worker_thread 7 S< [events/1] worker_thread 8 S< [khelper] worker_thread 9 S< [kacpid] worker_thread 38 S< [kblockd/0] worker_thread 39 S< [kblockd/1] worker_thread 40 S [khubd] hub_thread 63 S [pdflush] pdflush 64 D [pdflush] wait_on_buffer 65 S [kswapd1] kswapd 66 S [kswapd0] kswapd 67 S< [aio/0] worker_thread 68 S< [aio/1] worker_thread 212 S [kseriod] serio_thread 447 S [scsi_eh_0] 16045567552327254017 449 S [scsi_eh_1] 16045567552327254017 466 S [scsi_eh_2] 16045567552327254017 467 S< [qla2300_2_dpc] 16045567552327254017 527 S [kjournald] kjournald 2036 S<s udevd - 2145 S< [kedac] - 2272 S< [kauditd] kauditd_thread 2381 S [kjournald] kjournald 2975 Ss syslogd -m 0 - 2979 Ss klogd -x syslog 2992 Ss irqbalance - 3003 Ss portmap - 3022 Ss rpc.statd - 3050 Ss rpc.idmapd - 3129 S /usr/sbin/smartd - 3138 Ss /usr/sbin/acpid - 3147 Ss cupsd - 3210 Ss /usr/sbin/sshd - 3223 Ss xinetd -stayaliv - 3241 Ss sendmail: accept - 3250 Ss sendmail: Queue pause 3298 Ss gpm -m /dev/inpu - 3435 Ss crond - 3456 Ss xfs -droppriv -d - 3473 Ss /usr/sbin/atd - 3482 Ss dbus-daemon-1 -- - 3493 Ss hald - 3509 S<sl modclusterd - 3576 Ss /usr/sbin/oddjob - 3598 Ss /usr/sbin/saslau fcntl_setlk 3599 S /usr/sbin/saslau - 3600 S /usr/sbin/saslau fcntl_setlk 3601 S /usr/sbin/saslau fcntl_setlk 3602 S /usr/sbin/saslau fcntl_setlk 3624 S<s ricci -u 101 - 3629 Ss login -- root wait 3630 Ss+ /sbin/mingetty t - 3631 Ss+ /sbin/mingetty t - 3632 Ss+ /sbin/mingetty t - 3633 Ss+ /sbin/mingetty t - 3634 Ss+ /sbin/mingetty t - 3635 Ss+ /sbin/mingetty t - 4624 Ss -bash wait 5492 Ss /sbin/dhclient - - 5552 Ssl ccsd - 5643 S [cman_comms] cluster_kthread 5644 S [cman_memb] membership_kthread 5645 S< [cman_serviced] serviced 5646 S [cman_hbeat] hello_kthread 5665 Ss fenced -t 120 -w rt_sigsuspend 5774 S<Lsl [dmeventd] - 5977 Ssl clvmd -T20 -t 90 - 5978 S< [dlm_astd] dlm_astd 5979 S< [dlm_recvd] dlm_recvd 5980 S< [dlm_sendd] dlm_sendd 5981 S< [dlm_recoverd] dlm_recoverd 6024 S [cluster_log_ser - 6069 S< [kcopyd] worker_thread 6089 S< [kmirrord] worker_thread 6260 Ss sshd: root@pts/0 - 6262 Ss -bash - 6333 S< [dlm_recoverd] dlm_recoverd 6339 S< [lock_dlm1] dlm_async 6340 S< [lock_dlm2] dlm_async 6341 S [gfs_scand] - 6342 S [gfs_glockd] gfs_glockd 6343 S [gfs_recoverd] - 6344 D [gfs_logd] - 6345 S [gfs_quotad] - 6346 S [gfs_inoded] - 6351 S< [dlm_recoverd] dlm_recoverd 6352 D< [lock_dlm1] kcl_leave_service 6353 S< [lock_dlm2] dlm_async 6354 S [gfs_scand] - 6355 S [gfs_glockd] gfs_glockd 6365 D [gfs_recoverd] glock_wait_internal 6366 S [gfs_logd] - 6367 S [gfs_quotad] - 6368 S [gfs_inoded] - 6431 Ss sshd: root@pts/1 - 6433 Ss -bash wait 6496 S< [kmirrord] worker_thread 6550 S< [dlm_recoverd] dlm_recoverd 6551 S< [lock_dlm1] dlm_async 6552 S< [lock_dlm2] dlm_async 6553 S [gfs_scand] - 6554 S [gfs_glockd] gfs_glockd 6579 S+ tail -f /var/log - 8746 S [gfs_recoverd] - 8877 S [gfs_logd] - 8878 S [gfs_quotad] - 8879 S [gfs_inoded] - 9376 S xiogen -f buffer pipe_wait 9377 S xdoio -vD pipe_wait 9378 S xiogen -f buffer pipe_wait 9379 S xdoio -vD pipe_wait 9381 S+ xiogen -f buffer pipe_wait 9382 S+ xdoio -vD pipe_wait 9385 D+ xdoio -vD glock_wait_internal 9386 R xdoio -vD - 9387 D xdoio -vD - 9543 S< [kmirrord] worker_thread 11855 R+ ps ax -o pid,sta -
Looks liek this may be the priblem, on link-08: Jun 6 16:31:21 link-08 lvm[5774]: Completed: vgreduce --config devices{ignore_suspended_devices=1} --removemissing corey3 Jun 6 16:31:21 link-08 lvm[5774]: corey3-mirror3 is now in-sync Jun 6 16:31:22 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Replaying journal... Jun 6 16:31:22 link-08 lvm[5774]: No longer monitoring mirror device corey3-mirror3 for events Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Replayed 2 of 2 blocks Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: replays = 2, skips = 0, sames = 0 Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Replaying journal... Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Replayed 2 of 2 blocks Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: replays = 2, skips = 0, sames = 0 Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: fatal: filesystem consistency error Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: function = trans_go_xmote_bh Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-72/largesmp/src/gfs/glops.c, line = 542 Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: time = 1181165483 Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: about to withdraw from the cluster Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: waiting for outstanding I/O Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: telling LM to withdraw Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Journal replayed in 4s Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Done Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Journal replayed in 4s Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Done Jun 6 16:32:34 link-08 kernel: cdrom: open failed. Jun 6 16:33:35 link-08 kernel: cdrom: open failed.
Reproduced this issue without the "function = trans_go_xmote_bh" assertion. [root@link-02 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-02 2 1 4 X link-08 3 1 4 M link-04 4 1 4 M link-07 [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [1 4 3] DLM Lock Space: "clvmd" 58 11 run - [1 3 4] DLM Lock Space: "clustered_log" 59 12 run - [1 3 4] DLM Lock Space: "1" 61 13 run - [1 3 4] DLM Lock Space: "2" 65 15 run - [1 3 4] DLM Lock Space: "3" 69 17 run - [1 3 4] GFS Mount Group: "1" 63 14 run - [1 3 4] GFS Mount Group: "2" 67 16 run - [1 3 4] GFS Mount Group: "3" 71 18 recover 4 - [1 3 4] [root@link-02 ~]# dmsetup info Name: corey1-mirror1_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j11Q90OmNWZQ6Pox3i0Ng53hJV0Tw6oKpc Name: corey1-mirror1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1TYbQrVgkVw59UKMAeI2N6MQ8XJohDhmq Name: corey1-mirror1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1vvItnjlKtYjveNxMlJPa2LEWsyskBGD5 Name: corey2-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vVcr1q3EQly4f8qd6A66B9dlTNkRh0V0T Name: corey2-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1v2Ap1nemAZlcYodVA8c5oX6J5GYd8zuEs Name: corey2-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vc0yz9j3KbEPcj4crAfUQJ9Y0s97ihs1t Name: corey3-mirror3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RAK1EBHNcPALYNfaYSHr2bY6AyKwHHnza Name: corey3-mirror3_mimage_1 State: ACTIVE Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RT3l70pL9L2NATt0FXghyhR94GkwJo8W0 Name: corey3-mirror3_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RmuLBJPJc9DB6v2gRUV9rU4gUY11kgrHQ Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-dq1liKVsB8CzZiuNRtOF1tYTkXqgKO85b28r2eyrPqurvpuPPS83R8fcaG7QULHG Name: corey3-mirror3 State: ACTIVE Tables present: LIVE & INACTIVE Open count: 1 Event number: 5 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RCC3uqHwUhyadUOOqFLGNuGfqmWpgP3OK Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-dq1liKVsB8CzZiuNRtOF1tYTkXqgKO85W9BzgAV2x8i4cT3pKw0TArv1HJm54Tu6 Name: corey1-mirror1_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1kWTC4gefpSn1T4NFkhOoSw4DoWdYP1iv
This could very well be a dup of bz 257241
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Making this a dup of 359341 (going to newer bug because it has better recreation information) *** This bug has been marked as a duplicate of 359341 ***