Bug 1209831
| Summary: | peer probe fails because of missing glusterd.info file | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | ssamanta |
| Component: | glusterd | Assignee: | bugs <bugs> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.2 | CC: | bugs, gluster-bugs, mmadhusu, mzywusko, sasundar, ssamanta |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-04-14 17:33:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi Sobhan, Peer probing doesn't fail because of non-availability of glusterd.info file. This glusterd.info file is lately created, after you initiate peer probe transaction at first place. I don't see a correct correlation between the existence of glusterd.info file and gluster peer probe failing. I request you to attach glusterd log( /var/log/glusterfs/etc-glusterfs-glusterd.vol.log ) and gluster command line log history file ( /var/log/glusterfs/cmd_history.log ). This problem is always reproducible. I tried to create a new cluster and the peer probe failed with missing glusterd.info file although glusterd is running on all nodes.
Node-1
======
[root@gqas006 ssl]# service glusterd status
Redirecting to /bin/systemctl status glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
Active: active (running) since Thu 2015-04-09 10:29:02 EDT; 16h ago
Main PID: 2068 (glusterd)
CGroup: /system.slice/glusterd.service
└─2068 /usr/sbin/glusterd -p /var/run/glusterd.pid
Apr 09 10:29:02 gqas006.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas006 ssl]# hostname
gqas006.sbu.lab.eng.bos.redhat.com
[root@gqas006 ssl]#
Node-2
======
[root@gqas005 ssl]# service glusterd status
Redirecting to /bin/systemctl status glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
Active: active (running) since Thu 2015-04-09 10:29:14 EDT; 16h ago
Main PID: 2066 (glusterd)
CGroup: /system.slice/glusterd.service
└─2066 /usr/sbin/glusterd -p /var/run/glusterd.pid
Apr 09 10:29:14 gqas005.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas005 ssl]
Node-3
======
[root@gqas009 ~]# service glusterd status
Redirecting to /bin/systemctl status glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
Active: active (running) since Fri 2015-04-10 02:49:29 EDT; 18min ago
Process: 868 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
Main PID: 880 (glusterd)
CGroup: /system.slice/glusterd.service
└─880 /usr/sbin/glusterd -p /var/run/glusterd.pid
Apr 10 02:49:23 gqas009.sbu.lab.eng.bos.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Apr 10 02:49:29 gqas009.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@gqas009 ~]#
From Node1 adding the peers fails
[root@gqas005 ssl]# gluster peer status
Connection failed. Please check if gluster daemon is operational.
[root@gqas005 ssl]#
[root@gqas005 ssl]#
[root@gqas005 ssl]#
[root@gqas005 ssl]# gluster peer probe gqas006.sbu.lab.eng.bos.redhat.com
Connection failed. Please check if gluster daemon is operational.
[root@gqas005 ssl]#
[2015-04-09 14:29:11.205892] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536
[2015-04-09 14:29:11.205926] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory
[2015-04-09 14:29:11.210274] W [rdma.c:4221:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2015-04-09 14:29:11.210297] E [rdma.c:4519:init] 0-rdma.management: Failed to initialize IB Device
[2015-04-09 14:29:11.210309] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2015-04-09 14:29:11.210374] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2015-04-09 14:29:11.210741] E [socket.c:792:__socket_server_bind] 0-socket.management: binding to failed: Address already in use
[2015-04-09 14:29:11.210762] E [socket.c:795:__socket_server_bind] 0-socket.management: Port is already in use
[2015-04-09 14:29:11.210777] W [rpcsvc.c:1531:rpcsvc_transport_create] 0-rpc-service: listening on transport failed
[2015-04-09 14:29:14.064473] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-04-09 14:29:14.064515] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2015-04-09 14:29:14.064528] I [glusterd-store.c:2063:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30600
[2015-04-09 14:29:14.064644] I [glusterd-store.c:3497:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
Final graph:
The error message, "glusterd.info" file missing is seen in all fresh installation, as that file is generated late, when the peer probe is initiated or volume is created. This error message may be misleading to anyone looking at it. There is already a bug to move this ERROR message to DEBUG message That error message has no concern with this problem reported in this bug. I referred to the setup, where the old glusterd.socket file is not cleaned up and that has caused this issue. This is evident from cli.log Removing this socket file, gluster commands executed successfully. This is NOTABUG Refer - https://bugzilla.redhat.com/show_bug.cgi?id=1211718 To move the "gluster.info file missing" information under DEBUG message |
Description of problem: Peer-Probe on a fresh cluster fails because the missing glusterd.info file. Version-Release number of selected component (if applicable): [root@gqas009 ~]# rpm -qa | grep glusterfs glusterfs-api-devel-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hive-0.1-11.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hbase-0.1-3.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_fs_counters-0.1-10.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiuser_support-0.1-3.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_fileappend-0.1-4.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop-0.1-121.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_quota-0.1-5.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiple_volumes-0.1-17.noarch glusterfs-libs-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio_io_exception-0.1-8.noarch glusterfs-fuse-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_shim_access_error_messages-0.1-5.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_sqoop-0.1-1.noarch glusterfs-devel-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-setup_gluster-0.2-77.noarch glusterfs-resource-agents-3.5.3-1.fc20.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_brick_sorted_order_of_filenames-0.1-1.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-setup_bigtop-0.2.1-23.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_erroneous_multivolume_filepaths-0.1-3.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_selfheal-0.1-5.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_file_dir_permissions-0.1-8.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_selinux_persistently_disabled-0.1-1.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_user_mapred_job-0.1-4.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_generate_gridmix2_data-0.1-2.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop_security-0.0.1-7.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio-0.1-1.noarch glusterfs-api-3.6.2-1.fc20.x86_64 glusterfs-extra-xlators-3.6.2-1.fc20.x86_64 glusterfs-server-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-setup_common-0.2-111.noarch glusterfs-hadoop-2.1.2-2.fc20.noarch glusterfs-geo-replication-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_special_char_in_path-0.1-1.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_groovy_sync-0.1-23.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_quota_selfheal-0.2-10.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_multifilewc_null_pointer_exception-0.1-5.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_pig-0.1-8.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_gridmix3-0.1-1.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_setting_working_directory-0.1-1.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-setup_rhs_georep-0.1-2.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_home_dir_listing-0.1-4.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_testcli-0.2-6.noarch glusterfs-hadoop-javadoc-2.1.2-2.fc20.noarch glusterfs-debuginfo-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_missing_dirs_create-0.1-3.noarch glusterfs-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_mapreduce-0.1-5.noarch glusterfs-cli-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_append_to_file-0.1-5.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_mahout-0.1-5.noarch glusterfs-rdma-3.6.2-1.fc20.x86_64 glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop-0.1-7.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_default_block_size-0.1-3.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_ldap-0.1-6.noarch glusterfs-hadoop-distribution-glusterfs-hadoop-test_junit_shim-0.1-12.noarch [root@gqas009 ~]# How reproducible: Tried once Steps to Reproduce: 1.Installed fedora-20 and install the glusterfs rpms for 3.6.2 in 2 nodes 2.Started the glusterd service after modifying the glusterd.vol file to allow rpc requests from non-privillege ports. 3.Issue the command from node1 <gluster peer probe node2-ip> Actual results: Peer probe fails as the /var/lib/glusterd/glusterd.info file is missing. Expected results: Peer probe should not fail. Workaround: After creating a volume with the bricks of the node1 and then peer probe is successful. I think the starting of glusterd on the node should have been failed if there is a missing glusterd.info file for some reason. Additional info: I will attach the sos-reports shortly. [root@gqas009 ~]# cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 30 option rpc-auth-allow-insecure on # option base-port 49152 end-volume [root@gqas009 ~ [root@gqas009 ~]# pgrep glusterd 25489 [root@gqas009 ~]# root@gqas009 ~]# less /var/lib/glusterd/glusterd.info /var/lib/glusterd/glusterd.info: No such file or directory [root@gqas009 ~]# Create a volume with set of bricks hosted on the same node. [root@gqas009 ~]# gluster volume info Volume Name: testvol1 Type: Distributed-Replicate Volume ID: 5ee47ecc-e22c-4099-acfa-53d5364a16cc Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.16.156.24:/rhs/brick1/new_testvol2 Brick2: 10.16.156.24:/rhs/brick2/new_testvol2 Brick3: 10.16.156.24:/rhs/brick3/new_testvol2 Brick4: 10.16.156.24:/rhs/brick4/new_testvol2 Options Reconfigured: server.ssl: on client.ssl: on