Description of problem: Error reading some files. I'm trying to export a vm from gluster volume because oVirt pause the VM because storage error but it's not possible due to "Stale file handle" errors. I mounted the volume on another server: s23gfs.ovirt:VOL_VMDATA on /mnt/VOL_VMDATA type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072) Trying to read the file with cp, rsync or qemu-img converter has the same result: # qemu-img convert -p -t none -T none -f qcow2 /mnt/VOL_VMDATA/d4f82517-5ce0-4705-a89f-5d3c81adf764/images/dbb038ee-2794-40e8-877a-a4806c47f11f/f81e0be9-db3e-48ac-876f-57b6f7cb3fe8 -O raw PLONE_active-raw.img qemu-img: error while reading sector 2448441344: Stale file handle Version-Release number of selected component (if applicable): Gluster 3.12.15-1.el7 In mount log file I got many errors like: [2018-11-20 03:20:24.471344] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 3558 failed. Base file gfid = 4feb4a7e-e1a3-4fa3-8d38-3b929bf52d14 [Stale file handle] [2018-11-20 08:56:21.110258] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 541 failed. Base file gfid = 2c1b6402-87b0-45cd-bd81-2cd3f38dd530 [Stale file handle] Is there a way to fix this? It's a distributed 2 - replicate 3 volume with sharding. Thanks, Marco Additional info: # gluster volume info VOL_VMDATA Volume Name: VOL_VMDATA Type: Distributed-Replicate Volume ID: 7bd4e050-47dd-481e-8862-cd6b76badddc Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Brick2: s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Brick3: s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Brick4: s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Brick5: s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Brick6: s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Options Reconfigured: auth.allow: 192.168.50.*,172.16.4.*,192.168.56.203 performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: enable features.shard-block-size: 512MB cluster.data-self-heal-algorithm: full nfs.disable: on transport.address-family: inet # gluster volume heal VOL_VMDATA info Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick Status: Connected Number of entries: 0 # gluster volume status VOL_VMDATA Status of volume: VOL_VMDATA Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 3186 Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 5148 Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 3792 Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 3257 Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 4402 Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDA TA/brick 49153 0 Y 3231 Self-heal Daemon on localhost N/A N/A Y 4192 Self-heal Daemon on s25gfs.ovirt.prisma N/A N/A Y 63185 Self-heal Daemon on s24gfs.ovirt.prisma N/A N/A Y 39535 Self-heal Daemon on s20gfs.ovirt.prisma N/A N/A Y 2785 Self-heal Daemon on s23gfs.ovirt.prisma N/A N/A Y 765 Self-heal Daemon on s22.ovirt.prisma N/A N/A Y 5828 Task Status of Volume VOL_VMDATA ------------------------------------------------------------------------------ There are no active volume tasks
Can you please attach the gluster logs from /var/log/glusterfs from the host on which the VM is running? Please also attach the vdsm log /var/log/vdsm/vdsm.log Is this a hyperconverged setup?
Created attachment 1508529 [details] vdsm.log
Created attachment 1508530 [details] glusterfs vms data volume mount log
Hi Sahina, no, it's not a hyperconverged setup. The Engine runs on a separate KVM server.
Hi, I tried running dd from the VM: dd if=/dev/vda of=/dev/null BS=1M status=progress 13519000000 (14GB) copied, 14.056719s, 962 MB/s VM paused host /var/log/glusterfs/rhev-data-center-mnt-glusterSD-s20gfs.ovirt.prisma\:_VOL__VMDATA.log [2018-11-26 12:26:04.176267] I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret 0 and op_errno 0 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27 [2018-11-26 12:26:04.177778] I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret -1 and op_errno 2 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27 [2018-11-26 12:26:04.207594] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.208249] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.208668] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7 [2018-11-26 12:26:04.208700] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.208722] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819774: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.209947] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.210225] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7 [2018-11-26 12:26:04.210265] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819772: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db " repeated 2 times between [2018-11-26 12:26:04.207594] and [2018-11-26 12:26:04.213769] [2018-11-26 12:26:04.214709] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.214767] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819773: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 2 times between [2018-11-26 12:26:04.208700] and [2018-11-26 12:26:04.214761] [2018-11-26 12:26:04.214801] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.215431] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.215436] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.215483] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819771: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.215482] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.215957] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.216003] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819751: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.216307] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819750: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 2 times between [2018-11-26 12:26:04.215436] and [2018-11-26 12:26:04.216301] [2018-11-26 12:26:04.218010] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.218610] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.218616] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.218651] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819753: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.219400] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.219923] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.219999] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819756: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.219990] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.220589] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.221134] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.221140] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.221174] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819742: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.221921] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.222407] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.222458] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819754: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.222451] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.223214] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.223747] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.223753] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.223789] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819755: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.224431] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.224905] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.224954] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819757: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.224948] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.225654] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db The message "W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7" repeated 2 times between [2018-11-26 12:26:04.210225] and [2018-11-26 12:26:04.226192] [2018-11-26 12:26:04.226202] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.226283] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819758: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.227461] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819759: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.228805] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819760: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.229964] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819761: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.231246] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819762: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.232471] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819763: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.233704] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819766: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.234984] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819767: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) [2018-11-26 12:26:04.236171] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819768: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db" repeated 10 times between [2018-11-26 12:26:04.224905] and [2018-11-26 12:26:04.237655] [2018-11-26 12:26:04.237795] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7 [2018-11-26 12:26:04.237846] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819765: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db " repeated 10 times between [2018-11-26 12:26:04.225654] and [2018-11-26 12:26:04.239028] [2018-11-26 12:26:04.239518] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.239569] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819769: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 10 times between [2018-11-26 12:26:04.226202] and [2018-11-26 12:26:04.239562] [2018-11-26 12:26:04.241144] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.241697] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db [2018-11-26 12:26:04.241704] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle] [2018-11-26 12:26:04.241743] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819764: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle) The message "I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret -1 and op_errno 2 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27" repeated 2 times between [2018-11-26 12:26:04.177778] and [2018-11-26 12:26:04.180779]
These are the outputs of getfattr -d -m . -e hex on the file that gives error on each of the 3 replica servers: security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000058e63c48000b64cb trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.VOL_VMDATA-client-3=0x000000000000000000000000 trusted.afr.VOL_VMDATA-client-5=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000058da63390009e5b2 trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.VOL_VMDATA-client-3=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000058dd284f000dee61 trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000
Here some of the tests we have done: - compared the extended attributes of all of the three replicas of the involved shard. Found identical attributes. - compared SHA512 message digest of all of the three replicas of the involved shard. Found identical digests. - tried to delete the shard from a replica set, one at a time, along with its hard link. Shard is always rebuilt correctly but error from client persists.
Hi, Just wanted to understand if you are still seeing this issue? From the logs, it seems shard is merely logging the error it got from the layers below. The problem doesn't appear to be in shard translator.
This bug is moved to https://github.com/gluster/glusterfs/issues/937, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days