Bug 1207528

Summary: glusterd : unable to start glusterd after hard reboot as one of the peer info file is truncated to 0 byte
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED NOTABUG QA Contact: amainkar
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: nlevinki, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-31 06:51:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186580    

Description Rachana Patel 2015-03-31 06:41:28 UTC
Description of problem:
=======================
after hard reboot of all server glusterd is not cming up on one node and peer info file on that node is trauncated to zero


Version-Release number of selected component (if applicable):
=============================================================
0.803.gitf64666f.el6.x86_64

How reproducible:
=================
havent tried but intemittent

Steps to Reproduce:
===================
1. had cluster of 3 nodes and run few test. 
2. unable to login to server so performed hard reboot of all servers
3. glusterd was not coming up on one node after reboot and peer info file was trunctaed to zero

[root@rhs-client38 ~]# service glusterd status
glusterd is stopped
[root@rhs-client38 ~]# service glusterd start
Starting glusterd:                                         [FAILED]
[root@rhs-client38 ~]# ls -l /var/lib/glusterd/peers/
total 4
-rw------- 1 root root 73 Mar 26 14:29 33fd732c-41c3-4fa5-a588-f7b352333724
-rw------- 1 root root  0 Mar 30 06:29 743cfed7-9578-4fae-8a93-47ef73be22ed
[root@rhs-client38 ~]# tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2015-03-31 00:50:14.787013] I [glusterd.c:154:glusterd_uuid_init] 0-management: retrieved UUID: 9a7435a5-877e-45a4-a94e-4d7ef2ab9cbc
[2015-03-31 00:50:14.787107] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2015-03-31 00:50:14.787288] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2015-03-31 00:50:14.787416] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2015-03-31 00:50:14.787551] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2015-03-31 00:50:14.787676] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2015-03-31 00:50:16.050763] E [xlator.c:426:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2015-03-31 00:50:16.050795] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed
[2015-03-31 00:50:16.050807] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2015-03-31 00:50:16.051269] W [glusterfsd.c:1212:cleanup_and_exit] (--> 0-: received signum (0), shutting down


log snippet :-

Mar 30 06:37:30 rhs-client38 kernel: EXT4-fs (dm-0): orphan cleanup on readonly fs
Mar 30 06:37:30 rhs-client38 kernel: ------------[ cut here ]------------
Mar 30 06:37:30 rhs-client38 kernel: WARNING: at fs/ext4/inode.c:3929 ext4_flush_unwritten_io+0x74/0x80 [ext4]() (Not tainted)
Mar 30 06:37:30 rhs-client38 kernel: Hardware name: X9DRW-3LN4F+/X9DRW-3TF+
Mar 30 06:37:30 rhs-client38 kernel: Modules linked in: ext4 jbd2 mbcache sd_mod crc_t10dif isci libsas scsi_transport_sas ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Mar 30 06:37:30 rhs-client38 kernel: Pid: 862, comm: mount Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Mar 30 06:37:30 rhs-client38 kernel: Call Trace:
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff81074e4a>] ? warn_slowpath_null+0x1a/0x20
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00c8bb4>] ? ext4_flush_unwritten_io+0x74/0x80 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00f0937>] ? ext4_ext_truncate+0x37/0x1f0 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00cebf8>] ? ext4_truncate+0x4c8/0x6a0 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff81529445>] ? printk+0x41/0x44
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00dfed8>] ? ext4_msg+0x68/0x80 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00c61d3>] ? ext4_orphan_get+0xb3/0x1f0 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00e50d1>] ? ext4_fill_super+0x26e1/0x28f0 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff812969d4>] ? snprintf+0x34/0x40
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff8119156e>] ? get_sb_bdev+0x18e/0x1d0
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00e29f0>] ? ext4_fill_super+0x0/0x28f0 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffffa00de178>] ? ext4_get_sb+0x18/0x20 [ext4]
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff811909bb>] ? vfs_kern_mount+0x7b/0x1b0
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff81190b62>] ? do_kern_mount+0x52/0x130
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff811b270b>] ? do_mount+0x2fb/0x930
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff81145734>] ? strndup_user+0x64/0xc0
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff811b2dd0>] ? sys_mount+0x90/0xe0
Mar 30 06:37:30 rhs-client38 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Mar 30 06:37:30 rhs-client38 kernel: ---[ end trace a0b4d05ba5881b34 ]---
Mar 30 06:37:30 rhs-client38 kernel: EXT4-fs (dm-0): 6 orphan inodes deleted
Mar 30 06:37:30 rhs-client38 kernel: EXT4-fs (dm-0): 1 truncate cleaned up
Mar 30 06:37:30 rhs-client38 kernel: EXT4-fs (dm-0): recovery complete
Mar 30 06:37:30 rhs-client38 kernel: EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
Mar 30 06:37:30 rhs-client38 kernel: dracut: Mounted root filesystem /dev/mapper/vg_rhsclient38-lv_root
Mar 30 06:37:30 rhs-client38 kernel: SELinux:  Disabled at runtime.


Actual results:
===============
- glusterd not staring
- peer info file trucated