Description of problem: While performing inservice upgrade( post stopping the ganesha and glusterd service on 1st node and putting pcs cluster in standby mode) ,Observing I/O error on both the clients while performing linux untar on NFS mount point. Post stopping the IO on one of the client,IO got hung for the another client. Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-2.4.4-18.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-18.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-54.6.el7rhgs.x86_64 How reproducible: Steps to Reproduce: 1.Create 4 node ganesha cluster 2.Create 4 x 3 Distributed-repliacte volume.Export the volume via ganesha 3.Mount the volume on 2 clients via 2 different VIP's withy vers=4.0 4.Create 2 directories on mount point.Say dir1 and dir2.Copy tar file on both the dirs 4.From client 1,start linux untar on dir1 and similarly from client2,start untar on dir2 5.Perform the following steps for inservice upgrade- a)pcs cluster standby b)# pcs cluster disable # pcs status c)systemctl stop nfs-ganesha d)# systemctl stop glusterd # pkill glusterfs # pkill glusterfsd Actual results: After some time,observed the linux untars on both the client caused I/O error.Stooped the IO's from second client,The IO got hung for 1st client.Mount point is accessible from both the clients ================= tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/gf100.c: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/gk104.c tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld: Cannot mkdir: Input/output error tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/gk104.c: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/gt215.c tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld: Cannot mkdir: Input/output error tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/gt215.c: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/mcp89.c tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld: Cannot mkdir: Input/output error tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/mcp89.c: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/priv.h tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld: Cannot mkdir: Input/output error tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/msvld/priv.h: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvdec/ tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvdec: Cannot mkdir: Input/output error linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvdec/Kbuild tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvdec: Cannot mkdir: Input/output error tar: linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvdec/Kbuild: Cannot open: No such file or directory linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvenc/ linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/nvenc/Kbuild linux-4.9.5/drivers/gpu/drm/nouveau/nvkm/engine/pm/ ========================= ganesha.log of the node on which the failover happened- ============================ 6/04/2018 14:49:37 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-127] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:49:37 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-76] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:49:37 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-67] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:50:07 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-90] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error No such file or directory 16/04/2018 14:50:14 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-234] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:50:14 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-196] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:50:33 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-2] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error No such file or directory 16/04/2018 14:50:33 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-27] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:52:15 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-131] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:52:26 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-39] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:52:26 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-194] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:52:26 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-61] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:52:51 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-51] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error No such file or directory 16/04/2018 14:52:51 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-202] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:53:07 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-189] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Read-only file system 16/04/2018 14:56:50 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-165] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error No such file or directory 16/04/2018 14:57:41 : epoch 42e40000 : dhcp37-182.lab.eng.blr.redhat.com : ganesha.nfsd-14466[work-105] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error No such file or directory ================================== Ganesha-gfapi.log- ============= [Input/output error] [2018-04-16 09:26:40.548608] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir2/linux-4.9.5/drivers/gpu/drm/arc [Input/output error] [2018-04-16 09:26:40.548990] W [MSGID: 108001] [afr-transaction.c:814:afr_handle_quorum] 0-Ganeshavol1-replicate-1: /dir2/linux-4.9.5/drivers/gpu/drm/arc: Failing MKDIR as quorum is not met [Input/output error] [2018-04-16 09:26:55.283265] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir1/linux-4.9.5/drivers/crypto/qat/qat_dh895xccvf [Input/output error] [2018-04-16 09:26:55.283650] W [MSGID: 108001] [afr-transaction.c:814:afr_handle_quorum] 0-Ganeshavol1-replicate-1: /dir1/linux-4.9.5/drivers/crypto/qat/qat_dh895xccvf: Failing MKDIR as quorum is not met [Input/output error] [2018-04-16 09:26:55.900982] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir1/linux-4.9.5/drivers/crypto/qce [Input/output error] [2018-04-16 09:26:55.902276] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-4: remote operation failed. Path: /dir1/linux-4.9.5/drivers/crypto/qce [Input/output error] [2018-04-16 09:27:08.889041] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir1/linux-4.9.5/drivers/dma/ioat [Input/output error] [2018-04-16 09:27:08.889353] W [MSGID: 108001] [afr-transaction.c:814:afr_handle_quorum] 0-Ganeshavol1-replicate-1: /dir1/linux-4.9.5/drivers/dma/ioat: Failing MKDIR as quorum is not met [Input/output error] [2018-04-16 09:27:34.435245] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir2/linux-4.9.5/drivers/gpu/drm/msm/hdmi [Input/output error] [2018-04-16 09:27:34.437509] W [MSGID: 108001] [afr-transaction.c:814:afr_handle_quorum] 0-Ganeshavol1-replicate-1: /dir2/linux-4.9.5/drivers/gpu/drm/msm/hdmi: Failing MKDIR as quorum is not met [Input/output error] [2018-04-16 09:28:05.106352] E [MSGID: 114031] [client-rpc-fops.c:301:client3_3_mkdir_cbk] 0-Ganeshavol1-client-3: remote operation failed. Path: /dir1/linux-4.9.5/drivers/gpu/drm/amd/include/asic_reg/gca [Input/output error] =============================== Expected results: No I/O failure should be observed Additional info: Attaching sosreports shortly
So, it looks like maybe the cluster isn't sufficient to take down one node and still write data? I'm not a gluster expert, but this is suggestive: Failing MKDIR as quorum is not met [Input/output error] There's nothing Ganesha could do in these circumstances, as it's getting EROFS.