Bug 1575927

Summary: [Ganesha+EC] rm -rf failed with Input/output Error when ran from 2 clients
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Manisha Saini <msaini>
Severity: low Docs Contact:
Priority: medium    
Version: rhgs-3.4CC: dang, ffilz, grajoria, jahernan, jijoy, jthottan, pasik, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-29 12:00:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Manisha Saini 2018-05-08 10:07:01 UTC
Description of problem:

rm -rf * fails with  Input/output error when ran from 2 clients simultaneously on ganesha v4 mount points


Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-6.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-6.el7rhgs.x86_64
nfs-ganesha-2.5.5-6.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-8.el7rhgs.x86_64


How reproducible:
2/2

Steps to Reproduce:
1.Create 8 node ganesha cluster
2.Create 3 x (4 + 2) Distributed-Disperse Volume
3.Export the volume via ganesha
4.Mount the volume to 2 linux clients (v4.0) and 1 windows client (v3)
5.Ran linux untars from 2 linux clients ad diskfill utility from windows clients
6.Post IO completion,perform rm -rf * from 2 linux clients on same mount point 

Actual results:
rm -rf * fails with Input/output error

Client 1
----------------

[root@dhcp46-223 ganesha]# rm -rf * 
rm: cannot remove ‘dir1/linux-4.9.5/arch/ia64/uv’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/s390/include’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/x86/entry/vdso’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/alpha/include/uapi/asm’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/avr32/boards/atngw100’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/microblaze/lib’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/parisc’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/arm/plat-pxa’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/m32r/platforms/opsput’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/blackfin/mach-bf537/include’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/Documentation/devicetree/bindings/arm/mrvl’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/Documentation/media/uapi/mediactl’: Is a directory

[root@dhcp46-223 ganesha]# ls
dir1  dir2

-------------------

Client 2
--------------

[root@dhcp47-35 ganesha]# rm -rf * 
rm: cannot remove ‘dir1/linux-4.9.5/arch/arm64/boot/dts/broadcom’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/avr32/boards/hammerhead’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/blackfin/mach-bf538/include’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/arch/c6x/include/uapi/asm’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/drivers/accessibility’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/drivers/block/zram’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/Documentation/arm’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/Documentation/parisc’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/Documentation/devicetree/bindings/soc’: Input/output error
rm: cannot remove ‘dir1/linux-4.9.5/Documentation/devicetree/bindings/display/bridge’: Directory not empty
rm: cannot remove ‘dir2/linux-4.9.5/arch/arm/mach-picoxcell’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/arm/mach-realview’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/powerpc/boot/dts/include’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/mips/include/asm/mach-rm’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/s390/mm’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/xtensa/boot/dts’: Input/output error
rm: cannot remove ‘dir2/linux-4.9.5/arch/arc/configs’: Input/output error
--------------


Expected results:

It should not fail with Input/output error


Additional info:


ganesha-gfapi.logs

--------------------
[2018-05-08 08:25:45.872433] E [MSGID: 101046] [dht-common.c:751:dht_discover_complete] 1-Ganeshavol1-dht: dict is null
[2018-05-08 08:25:45.872497] W [MSGID: 104011] [glfs-handleops.c:1316:pub_glfs_h_create_from_handle] 0-meta-autoload: inode refresh of 1a6a6839-4777-48cf-9b80-d89680116af5 failed: No such file or directory [No such file or directory]
[2018-05-08 08:25:45.979344] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 0bb95030-db50-4e3a-9117-e0537c548f8f/Makefile: dentry not found in d4612d18-9622-4262-9a56-39b373105d0e
[2018-05-08 08:25:45.984083] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 0bb95030-db50-4e3a-9117-e0537c548f8f/extable.c: dentry not found in 882d08fe-202f-4c6d-9d60-bf0c0e78054f
[2018-05-08 08:25:45.992208] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 0bb95030-db50-4e3a-9117-e0537c548f8f/highmem_32.c: dentry not found in 8ee194a6-375f-4d09-9045-61f62651628c
[2018-05-08 08:25:46.000128] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 0bb95030-db50-4e3a-9117-e0537c548f8f/init_32.c: dentry not found in 18077669-01e6-471c-aed8-419c6b663ea2
[2018-05-08 08:25:46.005811] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-0: Failed to complete preop lock [Input/output error]
[2018-05-08 08:25:46.008481] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-0: Failed to complete preop lock [Stale file handle]
[2018-05-08 08:25:46.016533] E [MSGID: 101046] [dht-common.c:751:dht_discover_complete] 1-Ganeshavol1-dht: dict is null
[2018-05-08 08:25:46.018178] E [MSGID: 109040] [dht-helper.c:1386:dht_migration_complete_check_task] 1-Ganeshavol1-dht: /dir1/linux-4.9.5/arch/x86/mm/ioremap.c: failed to lookup the file on Ganeshavol1-dht [Stale file handle]
[2018-05-08 08:25:46.125877] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 88cc5c7a-6548-4f7f-a3e8-da252d57ee21/error.c: dentry not found in aced0025-4d77-4952-929a-0e435990ee1a
[2018-05-08 08:25:46.132540] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 88cc5c7a-6548-4f7f-a3e8-da252d57ee21/opcode.c: dentry not found in 18e5a7cb-74b6-46f6-8a2b-5d10862d3efa
[2018-05-08 08:25:46.139924] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 88cc5c7a-6548-4f7f-a3e8-da252d57ee21/pte.c: dentry not found in 358b085f-383d-47ad-a9b5-bff5a8b159f9
[2018-05-08 08:25:46.151029] W [MSGID: 101159] [inode.c:1250:__inode_unlink] 0-inode: 88cc5c7a-6548-4f7f-a3e8-da252d57ee21/error.h: dentry not found in a96815da-efa7-472a-825a-2ac18ae37a2d
[2018-05-08 08:25:46.156234] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-2: Failed to complete preop lock [Stale file handle]
[2018-05-08 08:25:46.159738] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-2: Failed to complete preop lock [Stale file handle]
[2018-05-08 08:25:46.164653] E [MSGID: 101046] [dht-common.c:751:dht_discover_complete] 1-Ganeshavol1-dht: dict is null
[2018-05-08 08:25:46.164764] E [MSGID: 109040] [dht-helper.c:1386:dht_migration_complete_check_task] 1-Ganeshavol1-dht: /dir1/linux-4.9.5/arch/x86/mm/kmemcheck/kmemcheck.c: failed to lookup the file on Ganeshavol1-dht [Stale file handle]
[2018-05-08 08:25:46.191861] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 1-Ganeshavol1-client-2: remote operation failed. Path: <gfid:88cc5c7a-6548-4f7f-a3e8-da252d57ee21> (88cc5c7a-6548-4f7f-a3e8-da252d57ee21) [No such file or directory]
[2018-05-08 08:25:46.191944] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 1-Ganeshavol1-client-1: remote operation failed. Path: <gfid:88cc5c7a-6548-4f7f-a3e8-da252d57ee21> (88cc5c7a-6548-4f7f-a3e8-da252d57ee21) [No such file or directory]
[2018-05-08 08:25:46.192058] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 1-Ganeshavol1-client-3: remote operation failed. Path: <gfid:88cc5c7a-6548-4f7f-a3e8-da252d

------------------------------



Attaching full logs shortly

Comment 3 Kaleb KEITHLEY 2018-05-08 12:19:59 UTC
Needs RCA before we can decide to take into 3.4.0

Comment 4 Daniel Gryniewicz 2018-05-08 13:15:15 UTC
So, I don't believe this is a bug.  This is a consequence of the way POSIX APIs work.  Here's what happens:

Each client does a readdir.  They get back dirents, and start deleting them.  However, because multiple clients are doing this, there's a chance that some of the objects represented by the dirents are already deleted by the other client.  There's no way to know this, other than attempting to unlink(), which will (of course) fail.  It's arguable whether or not EIO is the correct error or ENOENT, but (the manpage is somewhat unclear on this, IMO), but EIO is a valid return from unlink(), so this is not something that should hold up a release.

I was able to get errors on my local filesystem with this scenario.  It's much much more difficult on my SSD than on a remote FS, but it does happen.

Comment 5 Frank Filz 2018-05-08 17:25:40 UTC
It looks like the I/O error may have originated here:

[2018-05-08 08:25:46.005811] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-0: Failed to complete preop lock [Input/output error]

I also see stale file handle errors, which would be expected in this scenario.

I definitely agree that two rm -Rf racing with each other are going to trip over each other in unpredictable ways.

While this is an interesting test and is something we should not crash or do something else horrible on, it also seems unrealistic.

Frank