Bug 243013
| Summary: | stuck dlm recovery causes corrupt filesystem after cmirror leg and node failure | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
| Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 4 | CC: | agk, dwysocha, mbroz, prockai |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-03-26 17:51:42 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
More info... [root@link-02 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 13 run - [2 1 4] DLM Lock Space: "clustered_log" 75 14 run - [2 1 4] DLM Lock Space: "1" 77 15 run - [2 1 4] DLM Lock Space: "2" 81 17 run - [2 1 4] DLM Lock Space: "3" 85 19 run - [2 1 4] GFS Mount Group: "1" 79 16 run - [2 1 4] GFS Mount Group: "2" 83 18 run - [2 1 4] GFS Mount Group: "3" 87 20 recover 4 - [2 1 4] [root@link-04 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-04 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "2" 81 8 run - [2 1 4] DLM Lock Space: "3" 85 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "2" 83 9 run - [2 1 4] GFS Mount Group: "3" 87 11 recover 4 - [2 1 4] [root@link-08 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "3" 85 8 run S-10,200,0 [2 1 4] DLM Lock Space: "2" 81 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "3" 87 9 recover 2 - [2 1 4] GFS Mount Group: "2" 83 11 run - [2 1 4] [root@link-02 ~]# cat /proc/cluster/sm_debug 3 0200004f recover state 2 0200004f cb recover state 2 02000057 recover state 4 02000053 recover state 4 0200004f recover state 3 02000057 recover state 4 02000053 recover state 4 0200004f recover state 5 02000057 recover state 4 02000053 recover state 5 [root@link-04 ~]# cat /proc/cluster/sm_debug 3 0200004f recover state 2 0200004f cb recover state 2 02000057 recover state 4 02000053 recover state 4 0200004f recover state 3 02000057 recover state 4 02000053 recover state 4 0200004f recover state 5 02000057 recover state 4 02000053 recover state 5 [root@link-08 ~]# cat /proc/cluster/sm_debug 2 02000057 recover state 2 0200004f recover state 3 02000053 recover state 2 02000057 recover state 2 0200004f recover state 5 02000053 cb recover state 2 02000053 recover state 3 02000057 recover state 2 02000053 recover state 5 02000057 recover state 2 [root@link-02 ~]# cat /proc/cluster/dlm_debug s clvmd update remastered resources 3 updated 1 resources 3 rebuild locks 2 updated 2 resources 2 rebuild locks 1 updated 3 resources 1 rebuild locks clvmd updated 3 resources clvmd rebuild locks 3 rebuilt 1 locks 3 recover event 114 done 2 rebuilt 2 locks 2 recover event 114 done 1 rebuilt 3 locks 1 recover event 114 done clvmd rebuilt 3 locks clvmd recover event 114 done 3 move flags 0,0,1 ids 112,114,114 3 process held requests 3 processed 0 requests 3 resend marked requests 3 resent 0 requests 3 recover event 114 finished 1 move flags 0,0,1 ids 108,114,114 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 114 finished 2 move flags 0,0,1 ids 110,114,114 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 114 finished clvmd move flags 0,0,1 ids 106,114,114 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 114 finished [root@link-04 ~]# cat /proc/cluster/dlm_debug s 1 marked 0 requests 1 purge locks of departed nodes clvmd mark waiting requests clvmd marked 0 requests clvmd purge locks of departed nodes clvmd purged 0 locks clvmd update remastered resources 2 purged 1 locks 2 update remastered resources 1 purged 2 locks 1 update remastered resources 1 updated 3 resources 1 rebuild locks 2 updated 2 resources 2 rebuild locks clvmd updated 3 resources clvmd rebuild locks 1 rebuilt 3 locks 1 recover event 12 done 2 rebuilt 2 locks 2 recover event 12 done clvmd rebuilt 3 locks clvmd recover event 12 done 1 move flags 0,0,1 ids 6,12,12 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 12 finished 2 move flags 0,0,1 ids 8,12,12 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 12 finished clvmd move flags 0,0,1 ids 4,12,12 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 12 finished [root@link-08 ~]# cat /proc/cluster/dlm_debug t 0 locks 3 recover event 24 done 2 purged 1 locks 2 update remastered resources clvmd purged 0 locks clvmd update remastered resources 2 updated 2 resources 2 rebuild locks 2 rebuilt 0 locks 2 recover event 24 done clvmd updated 3 resources clvmd rebuild locks 1 updated 3 resources 1 rebuild locks clvmd rebuilt 0 locks 1 rebuilt 0 locks clvmd recover event 24 done 1 recover event 24 done 3 move flags 0,0,1 ids 22,24,24 3 process held requests 3 processed 0 requests 3 resend marked requests 3 resent 0 requests 3 recover event 24 finished 1 move flags 0,0,1 ids 18,24,24 1 process held requests 1 processed 0 requests 1 resend marked requests 1 resent 0 requests 1 recover event 24 finished 2 move flags 0,0,1 ids 20,24,24 2 process held requests 2 processed 0 requests 2 resend marked requests 2 resent 0 requests 2 recover event 24 finished clvmd move flags 0,0,1 ids 16,24,24 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 24 finished [root@link-02 ~]# ps ax -o pid,stat,cmd,wchan
PID STAT CMD WCHAN
1 S init [3] -
2 S [migration/0] migration_thread
3 SN [ksoftirqd/0] ksoftirqd
4 S [migration/1] migration_thread
5 SN [ksoftirqd/1] ksoftirqd
6 S< [events/0] worker_thread
7 S< [events/1] worker_thread
8 S< [khelper] worker_thread
9 S< [kacpid] worker_thread
37 S< [kblockd/0] worker_thread
38 S< [kblockd/1] worker_thread
39 S [khubd] hub_thread
62 S [pdflush] pdflush
63 D [pdflush] wait_on_buffer
64 S [kswapd0] kswapd
65 S< [aio/0] worker_thread
66 S< [aio/1] worker_thread
210 S [kseriod] serio_thread
445 S [scsi_eh_0] 16045567552327254017
446 S< [qla2300_0_dpc] 16045567552327254017
506 S [kjournald] kjournald
1896 S< [kmirrord] worker_thread
1993 S<s udevd -
2099 S< [kedac] -
2226 S< [kauditd] kauditd_thread
2338 S [kjournald] kjournald
2918 Ss /sbin/dhclient - -
2961 Ss syslogd -m 0 -
2965 Ss klogd -x syslog
2978 Ss irqbalance -
2989 Ss portmap -
3008 Ss rpc.statd -
3036 Ss rpc.idmapd -
3121 S /usr/sbin/smartd -
3130 Ss /usr/sbin/acpid -
3218 Ss /usr/sbin/sshd -
3237 Ss xinetd -stayaliv -
3255 Ss sendmail: accept -
3264 Ss sendmail: Queue pause
3318 Ss gpm -m /dev/inpu -
3469 Ss crond -
3490 Ss xfs -droppriv -d -
3507 Ss /usr/sbin/atd -
3516 Ss dbus-daemon-1 -- -
3527 Ss hald -
3542 S<sl modclusterd -
3617 Ss /usr/sbin/oddjob -
3653 Ss /usr/sbin/saslau fcntl_setlk
3656 S /usr/sbin/saslau -
3657 S /usr/sbin/saslau fcntl_setlk
3658 S /usr/sbin/saslau fcntl_setlk
3659 S /usr/sbin/saslau fcntl_setlk
3680 S<s ricci -u 101 -
3685 Ss login -- root wait
3686 Ss+ /sbin/mingetty t -
3687 Ss+ /sbin/mingetty t -
3688 Ss+ /sbin/mingetty t -
3689 Ss+ /sbin/mingetty t -
3691 Ss+ /sbin/mingetty t -
3692 Ss+ /sbin/mingetty t -
4238 R+ ps ax -o pid,sta -
4509 Ss -bash wait
4616 Ssl ccsd -
4665 S [cman_comms] cluster_kthread
4666 S [cman_memb] membership_kthread
4667 S< [cman_serviced] serviced
4668 S [cman_hbeat] hello_kthread
4687 Ss fenced -t 120 -w rt_sigsuspend
10514 Ss cupsd -
11595 Rs sshd: root@notty -
27225 S< [dlm_astd] dlm_astd
27226 S< [dlm_recvd] dlm_recvd
27227 S< [dlm_sendd] dlm_sendd
27723 Ssl clvmd -T20 -t 90 -
27724 S< [dlm_recoverd] dlm_recoverd
28467 S [cluster_log_ser -
28512 S< [kmirrord] worker_thread
28513 S< [kcopyd] worker_thread
30090 S<Lsl [dmeventd] -
30111 S< [dlm_recoverd] dlm_recoverd
30112 S< [lock_dlm1] dlm_async
30113 S< [lock_dlm2] dlm_async
30114 S [gfs_scand] -
30115 S [gfs_glockd] gfs_glockd
30121 S [gfs_recoverd] -
30122 S [gfs_logd] -
30123 S [gfs_quotad] -
30124 S [gfs_inoded] -
30128 S< [dlm_recoverd] dlm_recoverd
30138 S< [lock_dlm1] dlm_async
30139 S< [lock_dlm2] dlm_async
30140 S [gfs_scand] -
30141 S [gfs_glockd] gfs_glockd
30142 S [gfs_recoverd] -
30148 S [gfs_logd] -
30149 S [gfs_quotad] -
30150 S [gfs_inoded] -
30154 S< [dlm_recoverd] dlm_recoverd
30164 S< [lock_dlm1] dlm_async
30165 S< [lock_dlm2] dlm_async
30166 S [gfs_scand] -
30167 S [gfs_glockd] gfs_glockd
30168 S [gfs_recoverd] -
30174 S [gfs_logd] -
30175 S [gfs_quotad] -
30176 S [gfs_inoded] -
30242 S xiogen -f buffer pipe_wait
30243 S xdoio -vD pipe_wait
30244 S xiogen -f buffer pipe_wait
30245 S xdoio -vD pipe_wait
30246 S xiogen -f buffer pipe_wait
30248 S xdoio -vD pipe_wait
30252 D xdoio -vD -
30253 D xdoio -vD glock_wait_internal
30254 R xdoio -vD -
30534 S+ tail -f /var/log -
30672 Ss sshd: root@pts/0 -
30679 Ss -bash wait
31346 S< [kmirrord] worker_thread
[root@link-04 ~]# ps ax -o pid,stat,cmd,wchan
PID STAT CMD WCHAN
1 S init [3] -
2 S [migration/0] migration_thread
3 SN [ksoftirqd/0] ksoftirqd
4 S< [events/0] worker_thread
5 S< [khelper] worker_thread
6 S< [kacpid] worker_thread
30 S< [kblockd/0] worker_thread
31 S [khubd] hub_thread
52 S [pdflush] pdflush
53 D [pdflush] wait_on_buffer
54 S [kswapd0] kswapd
55 S< [aio/0] worker_thread
199 S [kseriod] serio_thread
429 S [scsi_eh_0] 16045567552327254017
454 S [kjournald] kjournald
1530 S<s udevd -
1595 S< [kedac] -
1717 S< [kauditd] kauditd_thread
1822 S [kjournald] kjournald
2383 Ss /sbin/dhclient - -
2426 Ss syslogd -m 0 -
2430 Ss klogd -x syslog
2450 Ss portmap -
2469 Ss rpc.statd -
2496 Ss rpc.idmapd -
2572 S /usr/sbin/smartd -
2581 Ss /usr/sbin/acpid -
2590 Ss cupsd -
2648 Ss /usr/sbin/sshd -
2661 Ss xinetd -stayaliv -
2679 Ss sendmail: accept -
2689 Ss sendmail: Queue pause
2736 Ss gpm -m /dev/inpu -
2883 Ss crond -
2904 Ss xfs -droppriv -d -
2921 Ss /usr/sbin/atd -
2930 Ss dbus-daemon-1 -- -
2941 Ss hald -
2956 S<sl modclusterd -
3051 Ss /usr/sbin/oddjob -
3087 Ss /usr/sbin/saslau fcntl_setlk
3088 S /usr/sbin/saslau -
3089 S /usr/sbin/saslau fcntl_setlk
3090 S /usr/sbin/saslau fcntl_setlk
3091 S /usr/sbin/saslau fcntl_setlk
3100 S<s ricci -u 101 -
3105 Ss login -- root wait
3106 Ss+ /sbin/mingetty t -
3107 Ss+ /sbin/mingetty t -
3108 Ss+ /sbin/mingetty t -
3109 Ss+ /sbin/mingetty t -
3110 Ss+ /sbin/mingetty t -
3111 Ss+ /sbin/mingetty t -
4744 Ss sshd: root@notty -
4801 Ssl ccsd -
4854 S [cman_comms] cluster_kthread
4855 S [cman_memb] membership_kthread
4856 S< [cman_serviced] serviced
4863 S [cman_hbeat] hello_kthread
4878 Ss -bash wait
4914 S+ tail -f /var/log -
4934 Ss fenced -t 120 -w rt_sigsuspend
6402 Ss sshd: root@pts/1 -
6404 Ss -bash wait
6592 S [scsi_eh_1] 16045567552327254017
7192 Ssl clvmd -T20 -t 90 -
7193 S< [dlm_astd] dlm_astd
7194 S< [dlm_recvd] dlm_recvd
7195 S< [dlm_sendd] dlm_sendd
7196 S< [dlm_recoverd] dlm_recoverd
7274 S [cluster_log_ser -
7284 S< [kcopyd] worker_thread
7286 S<Lsl [dmeventd] -
7311 S< [kmirrord] worker_thread
7364 S< [kmirrord] worker_thread
7520 S< [dlm_recoverd] dlm_recoverd
7521 S< [lock_dlm1] dlm_async
7522 S< [lock_dlm2] dlm_async
7523 S [gfs_scand] -
7524 S [gfs_glockd] gfs_glockd
7525 S [gfs_recoverd] -
7526 S [gfs_logd] -
7527 S [gfs_quotad] -
7528 S [gfs_inoded] -
7532 S< [dlm_recoverd] dlm_recoverd
7538 S< [lock_dlm1] dlm_async
7539 S< [lock_dlm2] dlm_async
7540 S [gfs_scand] -
7541 S [gfs_glockd] gfs_glockd
7551 S [gfs_recoverd] -
7552 S [gfs_logd] -
7553 S [gfs_quotad] -
7554 S [gfs_inoded] -
7558 S< [dlm_recoverd] dlm_recoverd
7559 S< [lock_dlm1] dlm_async
7560 S< [lock_dlm2] dlm_async
7561 S [gfs_scand] -
7562 S [gfs_glockd] gfs_glockd
7572 S [gfs_recoverd] -
7573 S [gfs_logd] -
7574 S [gfs_quotad] -
7575 S [gfs_inoded] -
7632 S xiogen -f buffer pipe_wait
7633 S xdoio -vD pipe_wait
7634 S xiogen -f buffer pipe_wait
7635 S xdoio -vD pipe_wait
7636 S xiogen -f buffer pipe_wait
7637 S xdoio -vD pipe_wait
7642 R xdoio -vD -
7643 R xdoio -vD -
7644 D xdoio -vD glock_wait_internal
7708 S< [kmirrord] worker_thread
10021 R+ ps ax -o pid,sta -
[root@link-08 ~]# ps ax -o pid,stat,cmd,wchan
PID STAT CMD WCHAN
1 S init [3] -
2 S [migration/0] migration_thread
3 SN [ksoftirqd/0] ksoftirqd
4 S [migration/1] migration_thread
5 SN [ksoftirqd/1] ksoftirqd
6 S< [events/0] worker_thread
7 S< [events/1] worker_thread
8 S< [khelper] worker_thread
9 S< [kacpid] worker_thread
38 S< [kblockd/0] worker_thread
39 S< [kblockd/1] worker_thread
40 S [khubd] hub_thread
63 S [pdflush] pdflush
64 D [pdflush] wait_on_buffer
65 S [kswapd1] kswapd
66 S [kswapd0] kswapd
67 S< [aio/0] worker_thread
68 S< [aio/1] worker_thread
212 S [kseriod] serio_thread
447 S [scsi_eh_0] 16045567552327254017
449 S [scsi_eh_1] 16045567552327254017
466 S [scsi_eh_2] 16045567552327254017
467 S< [qla2300_2_dpc] 16045567552327254017
527 S [kjournald] kjournald
2036 S<s udevd -
2145 S< [kedac] -
2272 S< [kauditd] kauditd_thread
2381 S [kjournald] kjournald
2975 Ss syslogd -m 0 -
2979 Ss klogd -x syslog
2992 Ss irqbalance -
3003 Ss portmap -
3022 Ss rpc.statd -
3050 Ss rpc.idmapd -
3129 S /usr/sbin/smartd -
3138 Ss /usr/sbin/acpid -
3147 Ss cupsd -
3210 Ss /usr/sbin/sshd -
3223 Ss xinetd -stayaliv -
3241 Ss sendmail: accept -
3250 Ss sendmail: Queue pause
3298 Ss gpm -m /dev/inpu -
3435 Ss crond -
3456 Ss xfs -droppriv -d -
3473 Ss /usr/sbin/atd -
3482 Ss dbus-daemon-1 -- -
3493 Ss hald -
3509 S<sl modclusterd -
3576 Ss /usr/sbin/oddjob -
3598 Ss /usr/sbin/saslau fcntl_setlk
3599 S /usr/sbin/saslau -
3600 S /usr/sbin/saslau fcntl_setlk
3601 S /usr/sbin/saslau fcntl_setlk
3602 S /usr/sbin/saslau fcntl_setlk
3624 S<s ricci -u 101 -
3629 Ss login -- root wait
3630 Ss+ /sbin/mingetty t -
3631 Ss+ /sbin/mingetty t -
3632 Ss+ /sbin/mingetty t -
3633 Ss+ /sbin/mingetty t -
3634 Ss+ /sbin/mingetty t -
3635 Ss+ /sbin/mingetty t -
4624 Ss -bash wait
5492 Ss /sbin/dhclient - -
5552 Ssl ccsd -
5643 S [cman_comms] cluster_kthread
5644 S [cman_memb] membership_kthread
5645 S< [cman_serviced] serviced
5646 S [cman_hbeat] hello_kthread
5665 Ss fenced -t 120 -w rt_sigsuspend
5774 S<Lsl [dmeventd] -
5977 Ssl clvmd -T20 -t 90 -
5978 S< [dlm_astd] dlm_astd
5979 S< [dlm_recvd] dlm_recvd
5980 S< [dlm_sendd] dlm_sendd
5981 S< [dlm_recoverd] dlm_recoverd
6024 S [cluster_log_ser -
6069 S< [kcopyd] worker_thread
6089 S< [kmirrord] worker_thread
6260 Ss sshd: root@pts/0 -
6262 Ss -bash -
6333 S< [dlm_recoverd] dlm_recoverd
6339 S< [lock_dlm1] dlm_async
6340 S< [lock_dlm2] dlm_async
6341 S [gfs_scand] -
6342 S [gfs_glockd] gfs_glockd
6343 S [gfs_recoverd] -
6344 D [gfs_logd] -
6345 S [gfs_quotad] -
6346 S [gfs_inoded] -
6351 S< [dlm_recoverd] dlm_recoverd
6352 D< [lock_dlm1] kcl_leave_service
6353 S< [lock_dlm2] dlm_async
6354 S [gfs_scand] -
6355 S [gfs_glockd] gfs_glockd
6365 D [gfs_recoverd] glock_wait_internal
6366 S [gfs_logd] -
6367 S [gfs_quotad] -
6368 S [gfs_inoded] -
6431 Ss sshd: root@pts/1 -
6433 Ss -bash wait
6496 S< [kmirrord] worker_thread
6550 S< [dlm_recoverd] dlm_recoverd
6551 S< [lock_dlm1] dlm_async
6552 S< [lock_dlm2] dlm_async
6553 S [gfs_scand] -
6554 S [gfs_glockd] gfs_glockd
6579 S+ tail -f /var/log -
8746 S [gfs_recoverd] -
8877 S [gfs_logd] -
8878 S [gfs_quotad] -
8879 S [gfs_inoded] -
9376 S xiogen -f buffer pipe_wait
9377 S xdoio -vD pipe_wait
9378 S xiogen -f buffer pipe_wait
9379 S xdoio -vD pipe_wait
9381 S+ xiogen -f buffer pipe_wait
9382 S+ xdoio -vD pipe_wait
9385 D+ xdoio -vD glock_wait_internal
9386 R xdoio -vD -
9387 D xdoio -vD -
9543 S< [kmirrord] worker_thread
11855 R+ ps ax -o pid,sta -
Looks liek this may be the priblem, on link-08:
Jun 6 16:31:21 link-08 lvm[5774]: Completed: vgreduce --config
devices{ignore_suspended_devices=1} --removemissing corey3
Jun 6 16:31:21 link-08 lvm[5774]: corey3-mirror3 is now in-sync
Jun 6 16:31:22 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Replaying journal...
Jun 6 16:31:22 link-08 lvm[5774]: No longer monitoring mirror device
corey3-mirror3 for events
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Replayed 2 of 2
blocks
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: replays = 2,
skips = 0, sames = 0
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Replaying journal...
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Replayed 2 of 2
blocks
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: replays = 2,
skips = 0, sames = 0
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: fatal: filesystem
consistency error
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: function =
trans_go_xmote_bh
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: file =
/builddir/build/BUILD/gfs-kernel-2.6.9-72/largesmp/src/gfs/glops.c, line = 542
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: time = 1181165483
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: about to withdraw from
the cluster
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: waiting for outstanding I/O
Jun 6 16:31:23 link-08 kernel: GFS: fsid=LINK_128:3.1: telling LM to withdraw
Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Journal replayed
in 4s
Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:1.1: jid=2: Done
Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Journal replayed
in 4s
Jun 6 16:31:24 link-08 kernel: GFS: fsid=LINK_128:2.1: jid=2: Done
Jun 6 16:32:34 link-08 kernel: cdrom: open failed.
Jun 6 16:33:35 link-08 kernel: cdrom: open failed.
Reproduced this issue without the "function = trans_go_xmote_bh" assertion. [root@link-02 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-02 2 1 4 X link-08 3 1 4 M link-04 4 1 4 M link-07 [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [1 4 3] DLM Lock Space: "clvmd" 58 11 run - [1 3 4] DLM Lock Space: "clustered_log" 59 12 run - [1 3 4] DLM Lock Space: "1" 61 13 run - [1 3 4] DLM Lock Space: "2" 65 15 run - [1 3 4] DLM Lock Space: "3" 69 17 run - [1 3 4] GFS Mount Group: "1" 63 14 run - [1 3 4] GFS Mount Group: "2" 67 16 run - [1 3 4] GFS Mount Group: "3" 71 18 recover 4 - [1 3 4] [root@link-02 ~]# dmsetup info Name: corey1-mirror1_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j11Q90OmNWZQ6Pox3i0Ng53hJV0Tw6oKpc Name: corey1-mirror1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1TYbQrVgkVw59UKMAeI2N6MQ8XJohDhmq Name: corey1-mirror1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1vvItnjlKtYjveNxMlJPa2LEWsyskBGD5 Name: corey2-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vVcr1q3EQly4f8qd6A66B9dlTNkRh0V0T Name: corey2-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1v2Ap1nemAZlcYodVA8c5oX6J5GYd8zuEs Name: corey2-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vc0yz9j3KbEPcj4crAfUQJ9Y0s97ihs1t Name: corey3-mirror3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RAK1EBHNcPALYNfaYSHr2bY6AyKwHHnza Name: corey3-mirror3_mimage_1 State: ACTIVE Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RT3l70pL9L2NATt0FXghyhR94GkwJo8W0 Name: corey3-mirror3_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RmuLBJPJc9DB6v2gRUV9rU4gUY11kgrHQ Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-dq1liKVsB8CzZiuNRtOF1tYTkXqgKO85b28r2eyrPqurvpuPPS83R8fcaG7QULHG Name: corey3-mirror3 State: ACTIVE Tables present: LIVE & INACTIVE Open count: 1 Event number: 5 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RCC3uqHwUhyadUOOqFLGNuGfqmWpgP3OK Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-dq1liKVsB8CzZiuNRtOF1tYTkXqgKO85W9BzgAV2x8i4cT3pKw0TArv1HJm54Tu6 Name: corey1-mirror1_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1kWTC4gefpSn1T4NFkhOoSw4DoWdYP1iv This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. |
Description of problem: After killing one of the legs in a 3 leg corelog mirror, as well as one on the nodes in the cluster (link-07), all gfs I/O that I had going to that one cmirror stopped, and all future I/O got the standard "`/mnt/mirror3': Input/output error". The down conversion appeared to work fine (although, it was suspiciously instantaneous) and clvmd is not deadlocked. The only issue is that one gfs filesystem which appears to be stuck in dlm recovery # BEFORE THE FAILURE [root@link-08 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2318) mirror1 corey1 Mwi-ao 10.00G mirror1_mlog 100.00 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] corey1 iwi-ao 10.00G /dev/sda1(0) [mirror1_mimage_1] corey1 iwi-ao 10.00G /dev/sdb1(0) [mirror1_mlog] corey1 lwi-ao 4.00M /dev/sdb2(0) mirror2 corey2 Mwi-ao 10.00G 100.00 mirror2_mimage_0(0),mirror2_mimage_1(0) [mirror2_mimage_0] corey2 iwi-ao 10.00G /dev/sdd1(0) [mirror2_mimage_1] corey2 iwi-ao 10.00G /dev/sdc1(0) mirror3 corey3 Mwi-ao 10.00G 100.00 mirror3_mimage_0(0),mirror3_mimage_1(0),mirror3_mimage_2(0) [mirror3_mimage_0] corey3 iwi-ao 10.00G /dev/sde1(0) [mirror3_mimage_1] corey3 iwi-ao 10.00G /dev/sdf1(0) [mirror3_mimage_2] corey3 iwi-ao 10.00G /dev/sdg1(0) # ON ALL NODES [root@link-08 ~]# echo offline > /sys/block/sde/device/state # I KILLED LINK-07 [root@link-08 ~]# lvs -a -o +devices /dev/sde1: open failed: No such device or address LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2318) mirror1 corey1 Mwi-ao 10.00G mirror1_mlog 100.00 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] corey1 iwi-ao 10.00G /dev/sda1(0) [mirror1_mimage_1] corey1 iwi-ao 10.00G /dev/sdb1(0) [mirror1_mlog] corey1 lwi-ao 4.00M /dev/sdb2(0) mirror2 corey2 Mwi-ao 10.00G 100.00 mirror2_mimage_0(0),mirror2_mimage_1(0) [mirror2_mimage_0] corey2 iwi-ao 10.00G /dev/sdd1(0) [mirror2_mimage_1] corey2 iwi-ao 10.00G /dev/sdc1(0) mirror3 corey3 Mwi-ao 10.00G 100.00 mirror3_mimage_2(0),mirror3_mimage_1(0) [mirror3_mimage_1] corey3 iwi-ao 10.00G /dev/sdf1(0) [mirror3_mimage_2] corey3 iwi-ao 10.00G /dev/sdg1(0) [root@link-08 ~]# touch /mnt/mirror3/foo touch: cannot touch `/mnt/mirror3/foo': Input/output error [root@link-08 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 72G 2.3G 66G 4% / /dev/hda1 99M 19M 75M 21% /boot none 500M 0 500M 0% /dev/shm /dev/mapper/corey1-mirror1 9.5G 3.9M 9.5G 1% /mnt/mirror1 df: `/mnt/mirror3': Input/output error /dev/mapper/corey2-mirror2 9.5G 3.9M 9.5G 1% /mnt/mirror2 [root@link-08 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 2 1 4 M link-02 3 1 4 X link-07 4 1 4 M link-04 [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 5 2 run - [2 1 4] DLM Lock Space: "clvmd" 74 4 run - [2 1 4] DLM Lock Space: "clustered_log" 75 5 run - [2 1 4] DLM Lock Space: "1" 77 6 run - [2 1 4] DLM Lock Space: "3" 85 8 run S-10,200,0 [2 1 4] DLM Lock Space: "2" 81 10 run - [2 1 4] GFS Mount Group: "1" 79 7 run - [2 1 4] GFS Mount Group: "3" 87 9 recover 2 - [2 1 4] GFS Mount Group: "2" 83 11 run - [2 1 4] [root@link-08 ~]# dmsetup status corey1-mirror1_mimage_1: 0 20971520 linear corey1-mirror1: 0 20971520 mirror 2 253:7 253:8 20480/20480 1 AA 3 clustered_disk 253:6 A corey1-mirror1_mimage_0: 0 20971520 linear corey2-mirror2_mimage_1: 0 20971520 linear corey2-mirror2_mimage_0: 0 20971520 linear corey2-mirror2: 0 20971520 mirror 2 253:10 253:11 20480/20480 1 AA 1 clustered_core corey3-mirror3_mimage_2: 0 20971520 linear corey3-mirror3_mimage_1: 0 20971520 linear VolGroup00-LogVol01: 0 4063232 linear corey3-mirror3: 0 20971520 mirror 2 253:4 253:3 20480/20480 1 AA 1 clustered_core VolGroup00-LogVol00: 0 151912448 linear corey1-mirror1_mlog: 0 8192 linear [root@link-08 ~]# dmsetup info Name: corey1-mirror1_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 8 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1V9V23NFX7ZxmKdVvCMdSh2B0Q5uPTSKB Name: corey1-mirror1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 9 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1fydUC356OtfUZVfh74E4yWHGepLEAQu5 Name: corey1-mirror1_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 7 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1YjGdtpLiSDkmhud2mVjyNgZveaSecV87 Name: corey2-mirror2_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 11 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vpnsQHYANmpEyo0yXJQwqe62JRBXRDtT9 Name: corey2-mirror2_mimage_0 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 10 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vczGS49XAjOu44xh1tr4PQr7WiR5ErZl3 Name: corey2-mirror2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 12 Number of targets: 1 UUID: LVM-caAVDd2dPhYq3rRmtemqUvgAt90oxO1vC3HFk9CLB9ELCxhSZBOf1lkkE11Rdz25 Name: corey3-mirror3_mimage_2 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2R2yZvBBJDzam6k5CuJbD1uAfD3FTZvNwi Name: corey3-mirror3_mimage_1 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RJMiGbaBAD0GCnWpSCxIZ7ot8if1bjIBE Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-TCx5xJ7FuRhXzJ4g7CvPsw2AhhFBNLQUvNlDc7SvClgdBMh2WD6TraPFgzjSVMRp Name: corey3-mirror3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 7 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-vCWvB54l0xEiiQuUH1anc8UJwd1KeH2RrVxFHPheVHLeYGToOyHqciYLB1QRho7o Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-TCx5xJ7FuRhXzJ4g7CvPsw2AhhFBNLQUrfFjpdEgWnBeJx2UpyC5Mr6XpgXdHWCh Name: corey1-mirror1_mlog State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-0QLi8UKjRguEwy4KI1k9KriBg4lyb2j1hCJh1EVAOxhYcsIkh5Fwap1yOkxFeidY Version-Release number of selected component (if applicable): 2.6.9-55.ELlargesmp dlm-kernel-2.6.9-46.16 cmirror-kernel-2.6.9-32.0