Bug 1318136 - Distributed-Replicate sharding corrupt VMs in ESXi
Summary: Distributed-Replicate sharding corrupt VMs in ESXi
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.7.12
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-16 07:35 UTC by Mahdi Adnan
Modified: 2016-09-29 17:35 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.13
Clone Of:
Environment:
Last Closed: 2016-07-21 10:15:56 UTC
Regression: ---
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Mahdi Adnan 2016-03-16 07:35:41 UTC
Description of problem: Creating a shard enabled Distributed Replicated volume and mounted in ESXi leads to corrupted VM disk, i cannot move, migrate, clone, or create VM in the mounted volume.


Version-Release number of selected component (if applicable): Gluster 3.7.8 - CentOS 7.2 - ESXi 6.0 U1


How reproducible:


Steps to Reproduce:
1. Create Distributed-Replicate volume, Enable Sharding and add it to Virt group.
2. Mount the volume in ESXi.
3. Clone a VM or create a new VM in the mounted volume.

Actual results:
Error: the virtual disk is either corrupted or not a supported format.


Expected results:
The VM created or migrated without errors.

Additional info:

Volume Info;

Volume Name: m
Type: Distributed-Replicate
Volume ID: db22b441-089d-413d-bbc7-b65dfe559ef9
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 192.168.208.138:/mnt/b1/m
Brick2: 192.168.208.138:/mnt/b2/m
Brick3: 192.168.208.138:/mnt/b3/m
Brick4: 192.168.208.138:/mnt/b4/m
Brick5: 192.168.208.138:/mnt/b5/m
Brick6: 192.168.208.138:/mnt/b6/m
Options Reconfigured:
features.shard-block-size: 256MB
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
features.shard: on
performance.readdir-ahead: on


NFS.log;

[2016-03-16 07:30:04.770129] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-m-dht: Setting layout of /Asterisk-B with [Subvol_name: m-replicate-0, Err: -1 , Start: 2863311530 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: m-replicate-1, Err: -1 , Start: 0 , Stop: 1431655764 , Hash: 1 ], [Subvol_name: m-replicate-2, Err: -1 , Start: 1431655765 , Stop: 2863311529 , Hash: 1 ], 
[2016-03-16 07:30:05.192586] I [MSGID: 109066] [dht-rename.c:1413:dht_rename] 0-m-dht: renaming /Asterisk-B/Asterisk-B-000001.vmdk~ (hash=m-replicate-0/cache=m-replicate-0) => /Asterisk-B/Asterisk-B-000001.vmdk (hash=m-replicate-0/cache=m-replicate-0)
[2016-03-16 07:30:05.326397] W [MSGID: 112032] [nfs3.c:3622:nfs3svc_rmdir_cbk] 0-nfs: 610b075: /Asterisk-B => -1 (Directory not empty) [Directory not empty]
[2016-03-16 07:30:05.329332] W [MSGID: 112032] [nfs3.c:3622:nfs3svc_rmdir_cbk] 0-nfs: 610b079: /Asterisk-B => -1 (Directory not empty) [Directory not empty]


ESXi VMkernel.log

2016-03-16T07:30:05.605Z cpu35:32825)WARNING: NFS: 4566: Short read for object b00f 60 281d92d6 653bd40b 4c474f3a 41b422db 3d419d08 5db6c7bb f99e55fe 5c536468 dfe44d99a34381c7 f652310f 0 431200000000 offset: 0x0 requested: 0x200 read: 0x94
2016-03-16T07:30:05.608Z cpu34:35990)WARNING: NFS: 4566: Short read for object b00f 60 281d92d6 653bd40b 4c474f3a 41b422db 3d419d08 5db6c7bb f99e55fe 5c536468 dfe44d99a34381c7 f652310f 0 431200000000 offset: 0x0 requested: 0x200 read: 0x94
2016-03-16T07:30:05.610Z cpu34:35990)WARNING: NFS: 4566: Short read for object b00f 60 281d92d6 653bd40b 4c474f3a 41b422db 3d419d08 5db6c7bb f99e55fe 5c536468 dfe44d99a34381c7 f652310f 0 431200000000 offset: 0x0 requested: 0x200 read: 0x94
2016-03-16T07:30:07.579Z cpu6:33451)NMP: nmp_ResetDeviceLogThrottling:3345: last error status from device naa.618e7283727721a01da402e8058b1c21 repeated 7 times
2016-03-16T07:30:26.724Z cpu4:33530)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439e182fa080, 0) to dev "naa.618e7283727721a01da402e8058b1c21" on path "vmhba0:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
2016-03-16T07:30:30.145Z cpu4:33530)ScsiDeviceIO: 2645: Cmd(0x439e16d9cb40) 0x1a, CmdSN 0x62ff from world 0 to dev "naa.618e7283727721a01da402e8058b1c21" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2016-03-16T07:30:30.149Z cpu4:33530)ScsiDeviceIO: 2645: Cmd(0x439e15dabc40) 0x1a, CmdSN 0x6304 from world 0 to dev "naa.618e7283727721a01da402e8058b1c21" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2016-03-16T07:30:30.155Z cpu4:33530)ScsiDeviceIO: 2645: Cmd(0x439e17ea0100) 0x1a, CmdSN 0x6309 from world 0 to dev "naa.618e7283727721a01da402e8058b1c21" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2016-03-16T07:30:30.785Z cpu21:89337)WARNING: NFS: 2208: Failed to get attributes (No connection)
2016-03-16T07:30:30.785Z cpu21:89337)NFS: 2264: Failed to get object 60 f73a15b1 1413267 4c474f3a 9d4bd56f df417ad4 5f9b7995 f70bd841 0 0 1000000 0 0 :No connection

Comment 1 Krutika Dhananjay 2016-03-16 09:03:08 UTC
Hi Mahdi,

In the nfs.log snippet that you have pasted, the error is due to rename() failing because the destination is not empty. It doesn't look like an error originating from sharding as it should have logged something if that was the case.

Could you please do the following:

1) attach the brick logs in their current state;
2) create an image on a FUSE mount, and share the logs of the FUSE client

-Krutika

Comment 2 Mahdi Adnan 2016-03-16 13:46:39 UTC
(In reply to Krutika Dhananjay from comment #1)
> Hi Mahdi,
> 
> In the nfs.log snippet that you have pasted, the error is due to rename()
> failing because the destination is not empty. It doesn't look like an error
> originating from sharding as it should have logged something if that was the
> case.
> 
> Could you please do the following:
> 
> 1) attach the brick logs in their current state;
> 2) create an image on a FUSE mount, and share the logs of the FUSE client
> 
> -Krutika


The volume is empty and there's no directory with such a name, i did clone another VM with a new name and same log appeared in nfs.log

1) brick log:


[2016-03-16 15:39:12.539421] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2016-03-16 15:39:12.541584] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

2) FUSE mount:


[2016-03-16 13:33:12.900934] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-4: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-03-16 13:33:12.901184] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-5: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-03-16 13:33:15.392434] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-3: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-03-16 13:33:15.392685] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 11 times between [2016-03-16 13:33:15.392685] and [2016-03-16 13:33:42.791602]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-3: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 11 times between [2016-03-16 13:33:15.392434] and [2016-03-16 13:33:42.791801]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-4: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 26 times between [2016-03-16 13:33:12.900934] and [2016-03-16 13:33:47.519553]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-glustervol1-client-5: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 26 times between [2016-03-16 13:33:12.901184] and [2016-03-16 13:33:47.519565]

Comment 3 Krutika Dhananjay 2016-06-15 07:38:07 UTC
Hi,

Quite a few fixes in sharding and replicate have gone in between 3.7.8 and 3.7.12. 3.7.12 is going to be released soon (you can look for announcement in gluster-users and gluster-devel mailing lists). Could you try the test with 3.7.12 once it is out and let us know if it fixes the issue for you?

-Krutika

Comment 4 Mahdi Adnan 2016-07-05 10:07:54 UTC
Hi,

So i have installed glusterfs 3.7.12 using three new servers.

Volume Info:

Volume Name: volume01
Type: Distributed-Replicate
Volume ID: 8ec983c8-aa94-4bf5-a32e-62a1fbdfffbd
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: gfs01:/bricks/b01/volume01
Brick2: gfs02:/bricks/b01/volume01
Brick3: gfs03:/bricks/b01/volume01
Brick4: gfs01:/bricks/b02/volume01
Brick5: gfs02:/bricks/b02/volume01
Brick6: gfs03:/bricks/b02/volume01
Options Reconfigured:
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
features.shard: on


I mounted the volume using gluster client and tried dd to write some bits to it and i got the same results.
i cannot even List the mounted directory.

Thanks.

Comment 5 Krutika Dhananjay 2016-07-05 10:12:27 UTC
What's the error that you get?
Could you attach the mount logs? You'll find them under /var/log/glusterfs.

-Krutika

Comment 6 Mahdi Adnan 2016-07-05 10:23:42 UTC
glustershd.log
########

[2016-07-05 02:58:15.162821] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:15.162898] I [MSGID: 108005] [afr-common.c:4142:afr_notify] 0-volume01-replicate-1: Subvolume 'volume01-client-3' came back up; going online.
[2016-07-05 02:58:15.163002] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-3: Server lk version = 1
[2016-07-05 02:58:15.165273] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-volume01-client-5: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2016-07-05 02:58:15.165361] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-volume01-client-5: disconnected from volume01-client-5. Client process will keep trying to connect to glusterd until brick's port is available
[2016-07-05 02:58:15.165385] W [MSGID: 108001] [afr-common.c:4222:afr_notify] 0-volume01-replicate-1: Client-quorum is not met
[2016-07-05 02:58:19.106263] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-1: changing port to 49153 (from 0)
[2016-07-05 02:58:19.111354] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-2: changing port to 49153 (from 0)
[2016-07-05 02:58:19.115796] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:19.116398] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-1: Connected to volume01-client-1, attached to remote volume '/bricks/b01/volume01'.
[2016-07-05 02:58:19.116436] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:19.116514] I [MSGID: 108002] [afr-common.c:4218:afr_notify] 0-volume01-replicate-0: Client-quorum is met
[2016-07-05 02:58:19.117113] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-1: Server lk version = 1
[2016-07-05 02:58:19.120402] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-4: changing port to 49154 (from 0)
[2016-07-05 02:58:19.125066] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:19.125841] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-2: Connected to volume01-client-2, attached to remote volume '/bricks/b01/volume01'.
[2016-07-05 02:58:19.125879] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:19.126221] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-2: Server lk version = 1
[2016-07-05 02:58:19.129339] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-5: changing port to 49154 (from 0)
[2016-07-05 02:58:19.133480] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:19.134047] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-4: Connected to volume01-client-4, attached to remote volume '/bricks/b02/volume01'.
[2016-07-05 02:58:19.134075] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:19.134176] I [MSGID: 108002] [afr-common.c:4218:afr_notify] 0-volume01-replicate-1: Client-quorum is met
[2016-07-05 02:58:19.134312] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-4: Server lk version = 1
[2016-07-05 02:58:19.137717] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:19.138353] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-5: Connected to volume01-client-5, attached to remote volume '/bricks/b02/volume01'.
[2016-07-05 02:58:19.138385] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-5: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:19.138714] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-5: Server lk version = 1





etc-glusterfs-glusterd.vol.log
#########

[2016-07-05 02:58:07.044962] E [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(glusterd_op_commit_hook+0x195) [0x7faa890a6e55] -->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x426) [0x7faa8913baa6] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7faa945a2815] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=volume01 --last=yes
[2016-07-05 02:58:07.045089] I [MSGID: 106132] [glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick already stopped
[2016-07-05 02:58:07.046940] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=588 max=1 total=1
[2016-07-05 02:58:07.046983] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=124 max=1 total=1
[2016-07-05 02:58:07.046996] I [MSGID: 106132] [glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick already stopped
[2016-07-05 02:58:08.200611] I [MSGID: 106540] [glusterd-utils.c:4409:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV3 successfully
[2016-07-05 02:58:08.201152] I [MSGID: 106540] [glusterd-utils.c:4418:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV1 successfully
[2016-07-05 02:58:08.201714] I [MSGID: 106540] [glusterd-utils.c:4427:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully
[2016-07-05 02:58:08.202259] I [MSGID: 106540] [glusterd-utils.c:4436:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully
[2016-07-05 02:58:08.202798] I [MSGID: 106540] [glusterd-utils.c:4445:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully
[2016-07-05 02:58:08.203340] I [MSGID: 106540] [glusterd-utils.c:4454:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully
[2016-07-05 02:58:09.204068] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2016-07-05 02:58:09.205895] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=588 max=1 total=1
[2016-07-05 02:58:09.205947] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=124 max=1 total=1
[2016-07-05 02:58:09.204004] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2016-07-05 02:58:14.812996] W [common-utils.c:1685:gf_string2boolean] (-->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(gd_mgmt_v3_commit_fn+0xf0) [0x7faa89164cc0] -->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(glusterd_op_start_volume+0x290) [0x7faa8912a860] -->/lib64/libglusterfs.so.0(gf_string2boolean+0x15a) [0x7faa9456483a] ) 0-management: argument invalid [Invalid argument]
[2016-07-05 02:58:14.866654] I [MSGID: 106143] [glusterd-pmap.c:231:pmap_registry_bind] 0-pmap: adding brick /bricks/b01/volume01 on port 49152
[2016-07-05 02:58:14.867874] I [rpc-clnt.c:1004:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-07-05 02:58:14.919192] I [MSGID: 106143] [glusterd-pmap.c:231:pmap_registry_bind] 0-pmap: adding brick /bricks/b02/volume01 on port 49153
[2016-07-05 02:58:14.922752] I [rpc-clnt.c:1004:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-07-05 02:58:15.073932] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2016-07-05 02:58:15.082580] W [socket.c:3135:socket_connect] 0-nfs: Ignore failed connection attempt on /var/run/gluster/1618178fab232f90df0449ce51a4a4cd.socket, (No such file or directory)
[2016-07-05 02:58:15.083671] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2016-07-05 02:58:15.090174] W [socket.c:3135:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/18a578aa8e7ec695066e71cabcd0a026.socket, (No such file or directory)
[2016-07-05 02:58:15.090462] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2016-07-05 02:58:15.090552] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2016-07-05 02:58:15.091593] I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd.
[2016-07-05 02:58:15.091658] I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: glustershd has disconnected from glusterd.
[2016-07-05 02:58:15.420966] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(+0x102092) [0x7faa8913c092] -->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4c5) [0x7faa8913bb45] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7faa945a2815] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=volume01 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2016-07-05 02:58:15.438271] E [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(+0x102092) [0x7faa8913c092] -->/usr/lib64/glusterfs/3.7.12/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x426) [0x7faa8913baa6] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7faa945a2815] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=volume01 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2016-07-05 03:03:09.051660] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2016-07-05 03:08:44.939825] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2016-07-05 03:08:44.941367] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req




nfs.log
######

[2016-07-05 02:58:15.227163] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-2: Connected to volume01-client-2, attached to remote volume '/bricks/b01/volume01'.
[2016-07-05 02:58:15.227195] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:15.227497] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-2: Server lk version = 1
[2016-07-05 02:58:15.250237] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-5: changing port to 49154 (from 0)
[2016-07-05 02:58:15.255742] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:15.260527] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-5: Connected to volume01-client-5, attached to remote volume '/bricks/b02/volume01'.
[2016-07-05 02:58:15.260564] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-5: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:15.260900] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-5: Server lk version = 1
[2016-07-05 02:58:15.403402] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-volume01-client-4: changing port to 49154 (from 0)
[2016-07-05 02:58:15.408956] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-volume01-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-05 02:58:15.413317] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-volume01-client-4: Connected to volume01-client-4, attached to remote volume '/bricks/b02/volume01'.
[2016-07-05 02:58:15.413352] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-volume01-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-05 02:58:15.413879] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-volume01-client-4: Server lk version = 1
[2016-07-05 02:58:15.415211] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-volume01-replicate-0: selecting local read_child volume01-client-0
[2016-07-05 02:58:15.416220] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-volume01-replicate-1: selecting local read_child volume01-client-3





bricks/bricks-b01-volume01.log
#########


[2016-07-05 02:58:15.409307] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.409344] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs03-9909-2016/07/05-16:14:11:91736-volume01-client-0-0-0 (version: 3.7.12)
[2016-07-05 02:58:15.451376] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.451410] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs02-10051-2016/07/05-09:59:26:615362-volume01-client-0-0-0 (version: 3.7.12)
[2016-07-05 02:58:15.508151] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.508189] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs02-10049-2016/07/05-09:59:26:609195-volume01-client-0-0-0 (version: 3.7.12)
[2016-07-05 02:58:19.118429] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.118466] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.118868] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.118899] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.125625] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.125663] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.126048] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.126078] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.128974] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.129009] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.129390] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.129427] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.131787] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.131819] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.132211] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.132241] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:31.161596] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-0-0-0 (version: 3.7.1)
[2016-07-05 03:08:19.478058] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-volume01-server: disconnecting connection from Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-0-0-0
[2016-07-05 03:08:19.478213] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-volume01-server: fd cleanup on /
[2016-07-05 03:08:19.478380] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-volume01-server: Shutting down connection Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-0-0-0
[2016-07-05 03:08:21.005482] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 03:08:21.005520] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)" repeated 3 times between [2016-07-05 03:08:21.005482] and [2016-07-05 03:08:21.008874]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-07-05 03:08:21.005520] and [2016-07-05 03:08:21.008876]
[2016-07-05 03:18:22.002142] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 03:18:22.002184] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)" repeated 3 times between [2016-07-05 03:18:22.002142] and [2016-07-05 03:18:22.005724]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-07-05 03:18:22.002184] and [2016-07-05 03:18:22.005726]





bricks/bricks-b02-volume01.log
################

[2016-07-05 02:58:15.432749] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.432785] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs03-9909-2016/07/05-16:14:11:91736-volume01-client-3-0-0 (version: 3.7.12)
[2016-07-05 02:58:15.478484] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.478525] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs02-10051-2016/07/05-09:59:26:615362-volume01-client-3-0-0 (version: 3.7.12)
[2016-07-05 02:58:15.530911] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 5f66aba0-51f0-4d54-a959-b0e7f1176fe4
[2016-07-05 02:58:15.530948] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from gfs02-10049-2016/07/05-09:59:26:609195-volume01-client-3-0-0 (version: 3.7.12)
[2016-07-05 02:58:19.135961] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.136001] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.136389] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.136429] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.138825] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.138864] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:19.139186] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:19.139206] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.142462] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.142501] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.142824] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.142843] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.145361] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.145400] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:20.145764] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 02:58:20.145795] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
[2016-07-05 02:58:31.177073] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-volume01-server: accepted client from Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-3-0-0 (version: 3.7.1)
[2016-07-05 03:08:19.478014] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-volume01-server: disconnecting connection from Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-3-0-0
[2016-07-05 03:08:19.478102] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-volume01-server: fd cleanup on /
[2016-07-05 03:08:19.478222] I [MSGID: 101055] [client_t.c:420:gf_client_unref] 0-volume01-server: Shutting down connection Mohammed-10532-2016/07/05-10:00:25:934353-volume01-client-3-0-0
[2016-07-05 03:08:21.002312] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 03:08:21.002350] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)" repeated 3 times between [2016-07-05 03:08:21.002312] and [2016-07-05 03:08:21.006761]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-07-05 03:08:21.002350] and [2016-07-05 03:08:21.006762]
[2016-07-05 03:18:22.002049] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)
[2016-07-05 03:18:22.002089] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-volume01-posix: null gfid for path (null)" repeated 3 times between [2016-07-05 03:18:22.002049] and [2016-07-05 03:18:22.007762]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-volume01-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2016-07-05 03:18:22.002089] and [2016-07-05 03:18:22.007764]

Comment 7 Mahdi Adnan 2016-07-11 15:33:05 UTC
I would like to inform you the issue is now fixed after upgrading to 3.7.13.
Both FUSE and NFS mount works well.

Thanks a lot

Comment 8 Krutika Dhananjay 2016-07-21 10:15:56 UTC
This bug is getting closed because a glusterfs-3.7.13 release has been made.

glusterfs-3.7.13 has been released [1]. The source tarball and prebuilt packages can be downloaded from [2].
Please report any bugs found using [3].

[1] http://www.gluster.org/pipermail/gluster-devel/2016-July/050139.html
[2] https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.13/
[3] https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.7.13


Note You need to log in before you can comment on or make changes to this bug.