Bug 1656828

Summary: [heketi]: Mount failing for PVC created with 14 characters
Product: Red Hat Gluster Storage Reporter: Rochelle <rallan>
Component: heketiAssignee: John Mulligan <jmulligan>
Status: CLOSED WONTFIX QA Contact: Prasanth <pprakash>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: hchiramm, kramdoss, madam, rallan, rhs-bugs, rtalur, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-11 22:09:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Rochelle 2018-12-06 12:13:46 UTC
Description of problem:
=======================

While creating a PVC with the name - 'masterfailover'

After doing an oc get pvc, the pvc was shown in bound state. 

On gluster end, the volume was as follows : (started state)
============================================
Volume Name: vol_app-storage_masterfailover_b5c15c6d-f78e-11e8-ac79-525400bb3330
Type: Replicate
Volume ID: 4c04229f-9024-48ae-a043-9f8dd1e02dee
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.157:/var/lib/heketi/mounts/vg_204f5586340926f092fe8ae143c16947/brick_b7b1305bef0e0e91cf51b8ae0e7b130a/brick
Brick2: 10.70.35.216:/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick
Brick3: 10.70.35.192:/var/lib/heketi/mounts/vg_c1b037de05a4476a1178163f5c559bad/brick_7e4ebdd45b52a2f2bcd3bdf45bd04427/brick
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
user.heketi.average-file-size: 64
user.heketi.arbiter: false
server.tcp-user-timeout: 42
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.enable-shared-storage: enable
 

I wasn't able to mount the volume:

[root@dhcp35-216 ~]# mount -t glusterfs 10.70.43.216:/vol_app-storage_masterfailover_b5c15c6d-f78e-11e8-ac79-525400bb3330 /mnt/failover1/
Mount failed. Please check the log file for more details.


The following were the mount logs:
==================================
[2018-12-06 11:19:16.040357] I [MSGID: 100030] [glusterfsd.c:2537:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.2 (args: /usr/sbin/glusterfs --volfile-server=10.70.43.216 --volfile-id=/vol_app-storage_masterfailover_b5c15c6d-f78e-11e8-ac79-525400bb3330 /mnt/failover1)
[2018-12-06 11:19:16.081787] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-12-06 11:19:16.112031] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-12-06 11:19:16.123373] E [glusterfsd-mgmt.c:1925:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
[2018-12-06 11:19:16.123424] E [glusterfsd-mgmt.c:2061:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/vol_app-storage_masterfailover_b5c15c6d-f78e-11e8-ac79-525400bb3330)
[2018-12-06 11:19:16.123725] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90) [0x7f291f549960] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x680) [0x55993f0d6510] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55993f0cf41b] ) 0-: received signum (0), shutting down
[2018-12-06 11:19:16.123773] I [fuse-bridge.c:5863:fini] 0-fuse: Unmounting '/mnt/failover1'.
[2018-12-06 11:19:16.155074] I [fuse-bridge.c:5868:fini] 0-fuse: Closing fuse connection to '/mnt/failover1'.
mnt-failover1.log (END)




In this bug :
=============
 https://bugzilla.redhat.com/show_bug.cgi?id=1622493

The volume creation fails, however, the volume is created and started in this case. 

Geo-rep is in 'FAULTY' state as well.
It fails to move into ACTIVE/PASSIVE state because: "failed to create logfile" because file name was too long
-mount-3Astwd     error=255
[2018-12-05 22:15:36.173824] E [syncdutils(/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick):82
5:logerr] Popen: /usr/sbin/glusterfs> ERROR: failed to create logfile "/var/log/glusterfs/geo-replication/vol_app-storage_masterfailover_b5c15c6d-f78e
-11e8-ac79-525400bb3330/ssh%3A%2F%2Froot%4010.70.35.13%3Agluster%3A%2F%2F127.0.0.1%3Avol_app-storage_slavefailover_6e0215c6-f78d-11e8-b813-52540018d11
0.%2Fvar%2Flib%2Fheketi%2Fmounts%2Fvg_a4caebcc4a7a571560da6147c2c0ae09%2Fbrick_956f1db4480f729af4f516cb7b8c425e%2Fbrick.gluster.log" (File name too lo
ng)
[2018-12-05 22:15:36.173957] E [syncdutils(/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick):82
5:logerr] Popen: /usr/sbin/glusterfs> ERROR: failed to open logfile /var/log/glusterfs/geo-replication/vol_app-storage_masterfailover_b5c15c6d-f78e-11
e8-ac79-525400bb3330/ssh%3A%2F%2Froot%4010.70.35.13%3Agluster%3A%2F%2F127.0.0.1%3Avol_app-storage_slavefailover_6e0215c6-f78d-11e8-b813-52540018d110.%
2Fvar%2Flib%2Fheketi%2Fmounts%2Fvg_a4caebcc4a7a571560da6147c2c0ae09%2Fbrick_956f1db4480f729af4f516cb7b8c425e%2Fbrick.gluster.log
[2018-12-05 22:15:36.209686] I [syncdutils(/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick):29
5:finalize] <top>: exiting.
[2018-12-05 22:15:36.241293] I [repce(/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick):92:serv
ice_loop] RepceServer: terminating on reaching EOF.
[2018-12-05 22:15:36.242067] I [syncdutils(/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick):29
5:finalize] <top>: exiting.
[2018-12-05 22:15:36.246022] I [monitor(monitor):289:monitor] Monitor: worker died before establishing connection       brick=/var/lib/heketi/mounts/v
g_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick
[2018-12-05 22:15:36.285170] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2018-12-05 22:15:46.918277] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
[2018-12-05 22:15:46.919061] I [monitor(monitor):199:monitor] Monitor: starting gsyncd worker   brick=/var/lib/heketi/mounts/vg_a4caebcc4a7a571560da6147c2c0ae09/brick_956f1db4480f729af4f516cb7b8c425e/brick   slave_node=ssh://root@10.70.35.236:gluster://localhost:vol_app-storage_slavefailover_6e0215c6-f78d-11e8-b813-52540018d110





Version-Release number of selected component (if applicable):
============================================================

[root@dhcp35-216 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.12.2-29.el7rhgs.x86_64
gluster-block-0.2.1-28.el7rhgs.x86_64
python2-gluster-3.12.2-29.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-29.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.12.2-29.el7rhgs.x86_64
glusterfs-fuse-3.12.2-29.el7rhgs.x86_64
glusterfs-rdma-3.12.2-29.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-3.12.2-29.el7rhgs.x86_64
glusterfs-cli-3.12.2-29.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.2.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-server-3.12.2-29.el7rhgs.x86_64
glusterfs-api-3.12.2-29.el7rhgs.x86_64
[root@dhcp35-216 ~]# 


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Create a pvc with 14 characters
2. Wait for it to be in bound state
3. Mount the volume

Results:
========
Mount fails. 

If a geo-rep session is created, it is in FAULTY state because the log file to be created is too long.

Comment 5 Raghavendra Talur 2019-01-24 20:51:44 UTC
Is 14 character volume name causing the log file creation failure? It appears like geo-rep could take a input for log file name that is not derived from path of the bricks. Please update the title with a proper root cause and also describe how to reproduce.