Bug 1410132

Summary:	[NFS-Ganesha] Linux kernel untar failed after add-brick
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Prasad Desala <tdesala>
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED CURRENTRELEASE	QA Contact:	surabhi <sbhaloth>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, dang, ffilz, jthottan, mbenjamin, nbalacha, pkarampu, ravishankar, rgowdapp, rhs-bugs, sankarshan, skoduri, smohan, spalai, sraj, storage-qa-internal, tdesala
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	rhgs-3.3.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-21 12:45:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Prasad Desala 2017-01-04 14:41:35 UTC

Description of problem:
=======================
On a distributed-replicate volume, Linux kernel untar failed with below error messages when added few bricks to the volume.

tar: linux-4.4.1/arch/arm/mach-pxa/colibri-pxa3xx.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/corgi.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/corgi_pm.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/csb701.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/csb726.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/include: Cannot mkdir: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/include: Cannot mkdir: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/include/mach: Cannot mkdir: No such file or directory
tar: linux-4.4.1/arch/arm/mach-pxa/include/mach/addr-map.h: Cannot open: No such file or directory
tar: linux-4.4.1/arch/arm/mach-pxa/include: Cannot mkdir: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/include/mach/audio.h: Cannot open: No such file or directory
tar: linux-4.4.1/arch/arm/mach-pxa/include: Cannot mkdir: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/include/mach/balloon3.h: Cannot open: No such file or directory

Version-Release number of selected component (if applicable):
glusterfs version: 3.8.4-10.el7rhgs.x86_64
Ganesh version: nfs-ganesha-2.4.1-3.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create ganesha cluster and create a distributed-replicate volume.
2) Enable nfs-ganesha on the volume with mdcache settings.
3) Mount the volume.
4) On the mount point, copy linux kernel package and create two empty directories say 1 and 2
5) Now, simultaneously start untarring the kernel package to both directories 1 and 2 respectively.
tar -xvf linux-4.4.1.tar -C 1
tar -xvf linux-4.4.1.tar -C 2
6) While untar is in-progress, add few bricks to the volume.

Immediate upon the add-brick command, untar failed with above error messages.

Continuous inode context not found error messages are seen in ganesha-gfapi.log and split-brain observed errors are also seen.

[2017-01-04 13:03:41.443033] E [snapview-client.c:557:gf_svc_stat] 6-distrep-snapview-client: inode context not found for gfid 5852cf25-1503-47b3-bab7-ca658333b05b
[2017-01-04 13:03:41.443376] E [snapview-client.c:1097:gf_svc_mkdir] 6-distrep-snapview-client: failed to get the inode context for ce4eabb8-607f-4f26-a549-d67dd6f253dd
[2017-01-04 13:03:41.443805] E [snapview-client.c:557:gf_svc_stat] 6-distrep-snapview-client: inode context not found for gfid 5852cf25-1503-47b3-bab7-ca658333b05b
[2017-01-04 13:03:41.444207] E [snapview-client.c:1290:gf_svc_create] 6-distrep-snapview-client: failed to get the inode context for b5189efe-78bd-4694-894f-d6aa903fafd7
[2017-01-04 13:03:41.446055] E [snapview-client.c:1290:gf_svc_create] 6-distrep-snapview-client: failed to get the inode context for b5189efe-78bd-4694-894f-d6aa903fafd7

[2017-01-04 13:14:23.313709] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 6-distrep-replicate-6: Failing SETATTR on gfid e7e08a7a-e423-443d-99d3-a9ddd468a23e: split-brain observed.
[2017-01-04 13:14:23.350708] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 6-distrep-replicate-6: Failing SETXATTR on gfid 680f153c-7db6-4180-bedf-37f93cb13250: split-brain observed.
[2017-01-04 13:14:23.501320] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 6-distrep-replicate-6: Failing SETATTR on gfid 680f153c-7db6-4180-bedf-37f93cb13250: split-brain observed.
[2017-01-04 13:14:23.526790] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 6-distrep-replicate-6: Failing SETXATTR on gfid b735f0de-445f-46fa-9b06-1e233df3119b: split-brain observed.
[2017-01-04 13:14:23.994326] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 6-distrep-replicate-6: Failing SETATTR on gfid b735f0de-445f-46fa-9b06-1e233df3119b: split-brain observed.

Actual results:
===============
Immediate upon the add-brick command, untar failed with above error messages.

Expected results:
=================
untar should not throw any errors.

Comment 7 Soumya Koduri 2017-01-05 07:42:34 UTC

Thanks Prasad. Even with the original volume, we are unable to reproduce the issue.  Looking at the logs/sosreport provided -


tar: linux-4.4.1/arch/arm/mach-pxa/colibri-pxa3xx.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/corgi.c: Cannot open: Invalid argument
tar: linux-4.4.1/arch/arm/mach-pxa/corgi_pm.c: Cannot open: Invalid argument

In gfapi.log , I see 

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/Prasad/1410132/ganesha-gfapi/server-1/ganesha-gfapi.log

[2017-01-04 13:03:40.845609] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 6-distrep-dht: Found anomalies in /1/linux-4.4.1/arch/arm (gfid = 8b3b468c-2c5a-4438-9147-2d1510d1120a). Holes=3 overlaps=0
[2017-01-04 13:03:40.845645] W [MSGID: 109005] [dht-selfheal.c:2106:dht_selfheal_directory] 6-distrep-dht: Directory selfheal failed: 3 subvolumes down.Not fixing. path = /1/linux-4.4.1/arch/arm, gfid = 8b3b468c-2c5a-4438-9147-2d1510d1120a
[2017-01-04 13:03:40.845718] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 6-distrep-dht: no subvolume for hash (value) = 3001413547
[2017-01-04 13:03:40.845912] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 6-distrep-replicate-1: no subvolumes up
[2017-01-04 13:03:40.846116] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 6-distrep-replicate-3: no subvolumes up
[2017-01-04 13:03:40.846288] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 6-distrep-replicate-5: no subvolumes up
[2017-01-04 13:03:40.847437] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 6-distrep-dht: Found anomalies in /1/linux-4.4.1/arch/arm/mach-pxa (gfid = ce4eabb8-607f-4f26-a549-d67dd6f253dd). Holes=3 overlaps=0
[2017-01-04 13:03:40.847501] W [MSGID: 109005] [dht-selfheal.c:2106:dht_selfheal_directory] 6-distrep-dht: Directory selfheal failed: 3 subvolumes down.Not fixing. path = /1/linux-4.4.1/arch/arm/mach-pxa, gfid = ce4eabb8-607f-4f26-a549-d67dd6f253dd
[2017-01-04 13:03:40.847948] I [MSGID: 108006] [afr-common.c:4823:afr_local_init] 6-distrep-replicate-1: no subvolumes up


Susant confirmed when there are network disconnects, directory healing doesn't happen. 

From the newly-added brick log  (bricks-brick5-b5.log) -

[2017-01-04 13:03:44.445721] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-distrep-server: disconnecting connection from dhcp47-155.lab.eng.blr.redhat.com-6911-2017/01/04-13:03:26:328233-distrep-client-12-0-0
[2017-01-04 13:03:44.445912] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-distrep-server: Shutting down connection dhcp47-155.lab.eng.blr.redhat.com-6911-2017/01/04-13:03:26:328233-distrep-client-12-0-0
[2017-01-04 13:03:44.457352] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-distrep-server: disconnecting connection from dhcp46-101.lab.eng.blr.redhat.com-7157-2017/01/04-13:03:26:612690-distrep-client-12-0-0
[2017-01-04 13:03:44.457529] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-distrep-server: Shutting down connection dhcp46-101.lab.eng.blr.redhat.com-7157-2017/01/04-13:03:26:612690-distrep-client-12-0-0
[2017-01-04 13:03:44.560467] I [login.c:76:gf_auth] 0-auth/login: allowed user names: dfec5104-52aa-44a6-95d1-79dfcd333c61
[2017-01-04 13:03:44.560522] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-distrep-server: accepted client from dhcp47-155.lab.eng.blr.redhat.com-13639-2017/01/03-10:05:27:67298-distrep-client-12-6-0 (version: 3.8.4)
[2017-01-04 13:03:44.668517] I [login.c:76:gf_auth] 0-auth/login: allowed user names: dfec5104-52aa-44a6-95d1-79dfcd333c61
[2017-01-04 13:03:44.668560] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-distrep-server: accepted client from dhcp47-167.lab.eng.blr.redhat.com-32528-2017/01/04-13:03:27:935078-distrep-client-12-0-0 (version: 3.8.4)
[2017-01-04 13:03:44.708822] I [login.c:76:gf_auth] 0-auth/login: allowed user names: dfec5104-52aa-44a6-95d1-79dfcd333c61
[2017-01-04 13:03:44.708868] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-distrep-server: accepted client from dhcp47-167.lab.eng.blr.redhat.com-7769-2017/01/03-10:05:27:68091-distrep-client-12-6-0 (version: 3.8.4)
[2017-01-04 13:03:44.778648] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-distrep-server: disconnecting connection from dhcp47-167.lab.eng.blr.redhat.com-32528-2017/01/04-13:03:27:935078-distrep-client-12-0-0
[2017-01-04 13:03:44.778866] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-distrep-server: Shutting down connection dhcp47-167.lab.eng.blr.redhat.com-32528-2017/01/04-13:03:27:935078-distrep-client-12-0-0

There are disconnect messages. 

So most probably, directory heal dint happen because of network disconnects then resulting in these errors on the console.


However, from the logs looks like there seem to be another issue  -

[2017-01-04 13:25:57.756641] W [MSGID: 108019] [afr-lk-common.c:1064:afr_log_entry_locks_failure] 6-distrep-replicate-6: Unable to obtain sufficient blocking entry locks on at least one child while attempting MKDIR on {pgfid:b314cb37-f8a5-447a-a66c-147191de23ef, name:io}.
[2017-01-04 13:25:57.760166] I [MSGID: 108019] [afr-transaction.c:1870:afr_post_blocking_entrylk_cbk] 6-distrep-replicate-6: Blocking entrylks failed.

If I do lookup on the parent gfid directory at the backend- 

[root@dhcp46-42 gluster]# ls /bricks/brick5/b5/.glusterfs/b3/
00/ 05/ 06/ 15/ 29/ 2e/ 65/ 71/ 80/ 9c/ e2/ f4/

There is no such directory created on the newly added brick - b5 

[root@dhcp46-42 gluster]# ls /bricks/brick7/b7/.glusterfs/b3/1
10/ 12/ 13/ 14/ 15/ 16/ 1a/ 1c/ 1e/ 1f/ 
[root@dhcp46-42 gluster]# ls /bricks/brick7/b7/.glusterfs/b3/14/
b314cb37-f8a5-447a-a66c-147191de23ef

However that directory is present in the earlier bricks.

Checked with Ravi on this. If the directory heal hasn't happened on replicate-6 subvol, we are not sure why the fops(/entrylk) are sent to AFR in the first place. Request dht team to take a look and provide comments.

Comment 8 Soumya Koduri 2017-01-05 07:51:23 UTC

As suggested by Susant doing 'ls -R' on mountpoint seemed to have healed the directories on all the bricks now.

Comment 9 Susant Kumar Palai 2017-01-05 08:59:15 UTC

When Soumya pointed to this setup, there were network disconnections. 

If the network disconnection happens once the mkdir completes on all the bricks, but before layout setting on them, then there will be holes and tar operations will fail. 

Soumya/Prasad,
   Is this reproduced all the time?

Comment 11 Susant Kumar Palai 2017-01-05 09:40:05 UTC

Ideally without any network disconnect, we should not hit this bug.

Comment 14 Soumya Koduri 2017-01-09 10:24:41 UTC

Raghavendra and myself have looked at the logs -


[root@dhcp46-42 ~]#  grep -i "14-distrep" /var/log/ganesha-gfapi.log-20170109 | grep -v "2017-01-05" | grep -E "client-6|client-7"  | grep -v "Connection refused" | grep -v " changing port"
[2017-01-06 06:43:49.281871] I [MSGID: 114020] [client.c:2356:notify] 14-distrep-client-6: parent translators are ready, attempting connect on transport
[2017-01-06 06:43:49.285108] I [MSGID: 114020] [client.c:2356:notify] 14-distrep-client-7: parent translators are ready, attempting connect on transport
[2017-01-06 06:43:49.810107] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 14-distrep-client-7: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-06 06:43:49.816029] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 14-distrep-client-7: Connected to distrep-client-7, attached to remote volume '/bricks/brick8/b8'.
[2017-01-06 06:43:49.816058] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 14-distrep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-06 06:43:49.816192] I [MSGID: 108005] [afr-common.c:4654:afr_notify] 14-distrep-replicate-3: Subvolume 'distrep-client-7' came back up; going online.
[2017-01-06 06:43:49.843843] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 14-distrep-client-7: Server lk version = 1
[2017-01-06 06:43:49.846150] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 14-distrep-client-6: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-06 06:43:49.849027] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 14-distrep-client-6: Connected to distrep-client-6, attached to remote volume '/bricks/brick8/b8'.
[2017-01-06 06:43:49.849052] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 14-distrep-client-6: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-06 06:43:49.850164] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 14-distrep-client-6: Server lk version = 1
[2017-01-06 07:06:15.409507] W [socket.c:590:__socket_rwv] 14-distrep-client-7: readv on 10.70.47.167:49153 failed (No data available)
[2017-01-06 07:06:15.410199] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 14-distrep-client-7: disconnected from distrep-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-06 11:03:09.948565] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 14-distrep-client-7: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-01-06 11:03:09.949940] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 14-distrep-client-7: Connected to distrep-client-7, attached to remote volume '/bricks/brick8/b8'.
[2017-01-06 11:03:09.949983] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 14-distrep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2017-01-06 11:03:09.950451] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 14-distrep-client-7: Server lk version = 1
[root@dhcp46-42 ~]# 


Client-7 was disconnected for about ~4hrs with error "Connection refused". Not sure what is the cause of network disconnect. There seem to be iptable rules as well for the brick ports. Not sure if they blocked any I/Os. 


[root@dhcp46-42 ~]#  grep -i "14-distrep" /var/log/ganesha-gfapi.log-20170109 | grep -v "2017-01-05"  | grep -i "replicate-3"
[2017-01-06 06:43:49.816192] I [MSGID: 108005] [afr-common.c:4654:afr_notify] 14-distrep-replicate-3: Subvolume 'distrep-client-7' came back up; going online.
[root@dhcp46-42 ~]# 

Since there are no errors reported wrt to on-going I/O in the logs, maybe there were some errors wrt ganesha server-client communication itself.

Eitherwise request Prasad to reproduce and provide us the setup while client I/Os going on. Thanks!


However the replicate-3 was up and never seem to have gone down. So I/Os shouldn't have been disrupted. We shall need live setup to further debug using gdb or tcpdump. Though there is rpc network disconection ,it should n't

Comment 15 Atin Mukherjee 2017-01-09 11:51:55 UTC

I am moving this BZ back to NFS-Ganesha as the issue looks to be on the nfs client & ganesha server side as per comment 14.

Comment 19 Kaleb KEITHLEY 2017-08-21 12:45:32 UTC

reopen if seen again