Bug 1209831 - peer probe fails because of missing glusterd.info file
Summary: peer probe fails because of missing glusterd.info file
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.6.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-08 09:54 UTC by ssamanta
Modified: 2015-06-11 01:09 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-04-14 17:33:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description ssamanta 2015-04-08 09:54:46 UTC
Description of problem:
Peer-Probe on a fresh cluster fails because the missing glusterd.info file.


Version-Release number of selected component (if applicable):
[root@gqas009 ~]# rpm -qa | grep glusterfs
glusterfs-api-devel-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hive-0.1-11.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hbase-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_fs_counters-0.1-10.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiuser_support-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_fileappend-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop-0.1-121.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_quota-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiple_volumes-0.1-17.noarch
glusterfs-libs-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio_io_exception-0.1-8.noarch
glusterfs-fuse-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_shim_access_error_messages-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_sqoop-0.1-1.noarch
glusterfs-devel-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_gluster-0.2-77.noarch
glusterfs-resource-agents-3.5.3-1.fc20.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_brick_sorted_order_of_filenames-0.1-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_bigtop-0.2.1-23.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_erroneous_multivolume_filepaths-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_selfheal-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_file_dir_permissions-0.1-8.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_selinux_persistently_disabled-0.1-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_user_mapred_job-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_generate_gridmix2_data-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop_security-0.0.1-7.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio-0.1-1.noarch
glusterfs-api-3.6.2-1.fc20.x86_64
glusterfs-extra-xlators-3.6.2-1.fc20.x86_64
glusterfs-server-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_common-0.2-111.noarch
glusterfs-hadoop-2.1.2-2.fc20.noarch
glusterfs-geo-replication-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_special_char_in_path-0.1-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_groovy_sync-0.1-23.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_quota_selfheal-0.2-10.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multifilewc_null_pointer_exception-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_pig-0.1-8.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gridmix3-0.1-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_setting_working_directory-0.1-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_rhs_georep-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_home_dir_listing-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_testcli-0.2-6.noarch
glusterfs-hadoop-javadoc-2.1.2-2.fc20.noarch
glusterfs-debuginfo-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_missing_dirs_create-0.1-3.noarch
glusterfs-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_mapreduce-0.1-5.noarch
glusterfs-cli-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_append_to_file-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_mahout-0.1-5.noarch
glusterfs-rdma-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop-0.1-7.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_default_block_size-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_ldap-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_junit_shim-0.1-12.noarch
[root@gqas009 ~]# 


How reproducible:
Tried once


Steps to Reproduce:
1.Installed fedora-20 and install the glusterfs rpms for 3.6.2 in 2 nodes
2.Started the glusterd service after modifying the glusterd.vol file to allow rpc requests from non-privillege ports.
3.Issue the command from node1 <gluster peer probe node2-ip>


Actual results:
Peer probe fails as the /var/lib/glusterd/glusterd.info file is missing.

Expected results:
Peer probe should not fail.

Workaround: After creating a volume with the bricks of the node1 and then peer probe is successful.

I think the starting of glusterd on the node should have been failed if there is a missing glusterd.info file for some reason.

Additional info:
I will attach the sos-reports shortly.
 

[root@gqas009 ~]# cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 30
    option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume
[root@gqas009 ~

[root@gqas009 ~]# pgrep glusterd
 25489
[root@gqas009 ~]#

root@gqas009 ~]# less /var/lib/glusterd/glusterd.info
/var/lib/glusterd/glusterd.info: No such file or directory
[root@gqas009 ~]#


Create a volume with set of bricks hosted on the same node.

[root@gqas009 ~]# gluster volume info
 
Volume Name: testvol1
Type: Distributed-Replicate
Volume ID: 5ee47ecc-e22c-4099-acfa-53d5364a16cc
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.16.156.24:/rhs/brick1/new_testvol2
Brick2: 10.16.156.24:/rhs/brick2/new_testvol2
Brick3: 10.16.156.24:/rhs/brick3/new_testvol2
Brick4: 10.16.156.24:/rhs/brick4/new_testvol2
Options Reconfigured:
server.ssl: on
client.ssl: on

Comment 1 SATHEESARAN 2015-04-08 10:41:21 UTC
Hi Sobhan,

Peer probing doesn't fail because of non-availability of glusterd.info file.
This glusterd.info file is lately created, after you initiate peer probe transaction at first place.

I don't see a correct correlation between the existence of glusterd.info file and gluster peer probe failing.

I request you to attach glusterd log( /var/log/glusterfs/etc-glusterfs-glusterd.vol.log ) and gluster command line log history file ( /var/log/glusterfs/cmd_history.log ).

Comment 3 ssamanta 2015-04-10 07:18:11 UTC
This problem is always reproducible. I tried to create a new cluster and the peer probe failed with missing glusterd.info file although glusterd is running on all nodes.


Node-1
======

[root@gqas006 ssl]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Thu 2015-04-09 10:29:02 EDT; 16h ago
 Main PID: 2068 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─2068 /usr/sbin/glusterd -p /var/run/glusterd.pid

Apr 09 10:29:02 gqas006.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas006 ssl]# hostname
gqas006.sbu.lab.eng.bos.redhat.com
[root@gqas006 ssl]# 

Node-2
======

[root@gqas005 ssl]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Thu 2015-04-09 10:29:14 EDT; 16h ago
 Main PID: 2066 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─2066 /usr/sbin/glusterd -p /var/run/glusterd.pid

Apr 09 10:29:14 gqas005.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas005 ssl]

Node-3
======
[root@gqas009 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Fri 2015-04-10 02:49:29 EDT; 18min ago
  Process: 868 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
 Main PID: 880 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─880 /usr/sbin/glusterd -p /var/run/glusterd.pid

Apr 10 02:49:23 gqas009.sbu.lab.eng.bos.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Apr 10 02:49:29 gqas009.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas009 ~]#

From Node1 adding the peers fails

[root@gqas005 ssl]# gluster peer status
Connection failed. Please check if gluster daemon is operational.
[root@gqas005 ssl]# 
[root@gqas005 ssl]# 
[root@gqas005 ssl]# 
[root@gqas005 ssl]# gluster peer probe gqas006.sbu.lab.eng.bos.redhat.com
Connection failed. Please check if gluster daemon is operational.
[root@gqas005 ssl]# 


[2015-04-09 14:29:11.205892] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536
[2015-04-09 14:29:11.205926] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory
[2015-04-09 14:29:11.210274] W [rdma.c:4221:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2015-04-09 14:29:11.210297] E [rdma.c:4519:init] 0-rdma.management: Failed to initialize IB Device
[2015-04-09 14:29:11.210309] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2015-04-09 14:29:11.210374] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2015-04-09 14:29:11.210741] E [socket.c:792:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2015-04-09 14:29:11.210762] E [socket.c:795:__socket_server_bind] 0-socket.management: Port is already in use
[2015-04-09 14:29:11.210777] W [rpcsvc.c:1531:rpcsvc_transport_create] 0-rpc-service: listening on transport failed
[2015-04-09 14:29:14.064473] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-04-09 14:29:14.064515] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-04-09 14:29:14.064528] I [glusterd-store.c:2063:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30600
[2015-04-09 14:29:14.064644] I [glusterd-store.c:3497:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
Final graph:

Comment 4 SATHEESARAN 2015-04-14 17:33:07 UTC
The error message, "glusterd.info" file missing is seen in all fresh installation, as that file is generated late, when the peer probe is initiated or volume is created. This error message may be misleading to anyone looking at it.
There is already a bug to move this ERROR message to DEBUG message

That error message has no concern with this problem reported in this bug.

I referred to the setup, where the old glusterd.socket file is not cleaned up and that has caused this issue. This is evident from cli.log
Removing this socket file, gluster commands executed successfully.

This is NOTABUG

Comment 5 SATHEESARAN 2015-04-14 17:34:47 UTC
Refer - https://bugzilla.redhat.com/show_bug.cgi?id=1211718
To move the "gluster.info file missing" information under DEBUG message


Note You need to log in before you can comment on or make changes to this bug.