Bug 503941

Summary: After a node is fenced, got messages about unlink ckpt error
Product: Red Hat Enterprise Linux 5 Reporter: Flávio do Carmo Júnior <flaviocj>
Component: cmanAssignee: David Teigland <teigland>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: cluster-maint, flaviocj, pdemauro, rpeterso, sdake, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-16 20:27:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Flávio do Carmo Júnior 2009-06-03 13:42:09 UTC
Description of problem:

When I got one node fenced by any reason, I'm seeing messages about gfs_controld and "unlink ckpt error".

See /var/log/messages below:

Jun  3 09:33:24 aramis fenced[6819]: fence "porthos-priv" success
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt error 12 ctdb
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt status error 9 ctdb
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt error 12 arquivodigital
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt status error 9 arquivodigital
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt error 12 geral
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt status error 9 geral
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt error 12 imagens
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt status error 9 imagens
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt error 12 plotagem
Jun  3 09:33:24 aramis gfs_controld[6831]: unlink ckpt status error 9 plotagem
Jun  3 09:33:24 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 projetoslv
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:imagens.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 projetoslv
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:arquivodigital.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 projetosfechados
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 projetosfechados
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 scripts
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:imagens.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 scripts
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:plotagem.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 sharedadm
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:arquivodigital.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 sharedadm
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 sharedprod
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:plotagem.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 sharedprod
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:projetoslv.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt error 12 util
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:projetoslv.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis gfs_controld[6831]: unlink ckpt status error 9 util
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:projetosfechados.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Acquiring the transaction lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Replaying journal...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Replayed 0 of 0 blocks
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Found 0 revoke tags
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Journal replayed in 1s
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:geral.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:scripts.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:scripts.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:imagens.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Acquiring the transaction lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Replaying journal...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Replayed 1 of 1 blocks
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Found 0 revoke tags
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Journal replayed in 1s
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:ctdb.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:util.1: jid=0: Trying to acquire journal lock...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:util.1: jid=0: Looking at journal...
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:plotagem.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:arquivodigital.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:scripts.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:projetoslv.1: jid=0: Done
Jun  3 09:33:25 aramis kernel: GFS2: fsid=MUSKETEER:util.1: jid=0: Done
Jun  3 09:33:32 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Looking at journal...
Jun  3 09:33:32 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Looking at journal...
Jun  3 09:33:32 aramis kernel: GFS2: fsid=MUSKETEER:projetosfechados.1: jid=0: Looking at journal...
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Acquiring the transaction lock...
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Replaying journal...
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Replayed 13 of 19 blocks
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Found 4 revoke tags
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Journal replayed in 1s
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedprod.1: jid=0: Done
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Acquiring the transaction lock...
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Replaying journal...
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Replayed 0 of 0 blocks
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Found 0 revoke tags
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Journal replayed in 1s
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:sharedadm.1: jid=0: Done
Jun  3 09:33:33 aramis kernel: GFS2: fsid=MUSKETEER:projetosfechados.1: jid=0: Done


This doesn't seem to be really a problem, after messages the filesystem mounts and it work, as a user viewing, normal.

Version-Release number of selected component (if applicable):
[root@athos ~]# rpm -qa| grep -iE 'gfs2|cman|openais|clust|ipmi'
system-config-cluster-1.0.55-1.0
cluster-snmp-0.12.1-2.el5
gfs2-utils-0.1.53-1.el5_3.3
cman-2.0.98-1.el5_3.1
Cluster_Administration-en-US-5.2-1
lvm2-cluster-2.02.40-7.el5
modcluster-0.12.1-2.el5
cluster-cim-0.12.1-2.el5
openais-0.80.3-22.el5_3.4
OpenIPMI-tools-2.0.6-11.el5
OpenIPMI-libs-2.0.6-11.el5
OpenIPMI-2.0.6-11.el5
[root@athos ~]# uname -a
Linux athos.intranet.prosul 2.6.18-128.1.10.el5 #1 SMP Wed Apr 29 13:53:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@athos ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

Additional info:
 I'm using fencing by IPMILAN, and gfs2 is serving for Samba+CTDB fileserver.

Comment 1 Robert Peterson 2009-06-03 14:29:22 UTC
Reassigning to Dave Teigland as per our discussion this morning.
I'm also changing the product to RHEL5, since this is clearly not
RHEL4.

Comment 2 Robert Peterson 2009-06-03 14:30:09 UTC
Adding Steve Dake to the cc list in case this is an openais issue.

Comment 3 David Teigland 2009-06-03 15:40:16 UTC
We've always seen ckpt unlink errors, and not known quite why they appear, it's generally not a problem.  In this case it seems likely to be a result of the node failure.  The critical bit of gfs_controld checkpoints are the creating and the reading.

Comment 4 RHEL Program Management 2010-04-16 20:27:23 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.