Description of problem: glusterd continues writing (and creates all the missing directories) when the brick device get umounted (due to high latency and mdraid, in my scenario). Here a full description: 19:56:50: disk is detached from my system. This disk is actually the brick of the volume V. 19:56:50: LVM sees the disk as unreachable and starts its maintenance procedures 19:56:50: LVM umounts my thin provisioned volumes 19:57:02: Health check on specific bricks fails thus moving the brick to a down state 19:57:32: XFS filesystem umounts At this point, the brick filesystem is no longer mounted. The underlying filesystems is empty (misses the brick directory too). My assumption is that gluster would stop itself in such conditions: it is not. MD (yes, i use md to aggregate 4 disks into a single 4Tb volume): /dev/md128: Version : 1.2 Creation Time : Mon Aug 29 18:10:45 2016 Raid Level : raid0 Array Size : 4290248704 (4091.50 GiB 4393.21 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon Aug 29 18:10:45 2016 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : 128 UUID : d5c51214:43e48da9:49086616:c1371514 Events : 0 Number Major Minor RaidDevice State 0 8 80 0 active sync /dev/sdf 1 8 96 1 active sync /dev/sdg 2 8 112 2 active sync /dev/sdh 3 8 128 3 active sync /dev/sdi PV, VG, LV status PV VG Fmt Attr PSize PFree DevSize PV UUID /dev/md127 VGdata lvm2 a-- 2.00t 2.00t 2.00t Kxb6C0-FLIH-4rB1-DKyf-IQuR-bbPE-jm2mu0 /dev/md128 gluster lvm2 a-- 4.00t 1.07t 4.00t lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile VGdata wz--n- 4.00m 1 0 0 2.00t 2.00t XI2V2X-hdxU-0Jrn-TN7f-GSEk-7aNs-GCdTtn gluster wz--n- 4.00m 1 6 0 4.00t 1.07t ztxX4f-vTgN-IKop-XePU-OwqW-T9k6-A6uDk0 LV VG #Seg Attr LSize Maj Min KMaj KMin Pool Origin Data% Meta% Move Cpy%Sync Log Convert LV UUID LProfile apps-data gluster 1 Vwi-aotz-- 50.00g -1 -1 253 12 thinpool 0.08 znUMbm-ax1N-R7aj-dxLc-gtif-WOvk-9QC8tq feed gluster 1 Vwi-aotz-- 100.00g -1 -1 253 14 thinpool 0.08 hZ4Isk-dELG-lgFs-2hJ6-aYid-8VKg-3jJko9 homes gluster 1 Vwi-aotz-- 1.46t -1 -1 253 11 thinpool 58.58 salIPF-XvsA-kMnm-etjf-Uaqy-2vA9-9WHPkH search-data gluster 1 Vwi-aotz-- 100.00g -1 -1 253 13 thinpool 16.41 Z5hoa3-yI8D-dk5Q-2jWH-N5R2-ge09-RSjPpQ thinpool gluster 1 twi-aotz-- 2.93t -1 -1 253 9 29.85 60.00 oHTbgW-tiPh-yDfj-dNOm-vqsF-fBNH-o1izx2 video-asset-manager gluster 1 Vwi-aotz-- 100.00g -1 -1 253 15 thinpool 0.07 4dOXga-96Wa-u3mh-HMmE-iX1I-o7ov-dtJ8lZ Gluster volume configuration (all volumes use the same exact configuration, listing them all would be redundant) Volume Name: vol-homes Type: Replicate Volume ID: 0c8fa62e-dd7e-429c-a19a-479404b5e9c6 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: glu01.prd.azr:/bricks/vol-homes/brick1 Brick2: glu02.prd.azr:/bricks/vol-homes/brick1 Brick3: glu03.prd.azr:/bricks/vol-homes/brick1 Options Reconfigured: performance.readdir-ahead: on cluster.server-quorum-type: server nfs.disable: disable cluster.lookup-unhashed: auto performance.nfs.quick-read: on performance.nfs.read-ahead: on performance.cache-size: 4096MB cluster.self-heal-daemon: enable diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR nfs.rpc-auth-unix: off nfs.acl: off performance.nfs.io-cache: on performance.client-io-threads: on performance.nfs.stat-prefetch: on performance.nfs.io-threads: on diagnostics.latency-measurement: on diagnostics.count-fop-hits: on performance.md-cache-timeout: 1 performance.cache-refresh-timeout: 1 performance.io-thread-count: 16 performance.high-prio-threads: 16 performance.normal-prio-threads: 16 performance.low-prio-threads: 16 performance.least-prio-threads: 1 cluster.server-quorum-ratio: 60 fstab: /dev/gluster/homes /bricks/vol-homes xfs defaults,noatime,nobarrier,nofail 0 2 Logs: Sep 22 19:56:50 glu03 lvm[868]: WARNING: Device for PV lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m not found or rejected by a filter. Sep 22 19:56:50 glu03 lvm[868]: Cannot change VG gluster while PVs are missing. Sep 22 19:56:50 glu03 lvm[868]: Consider vgreduce --removemissing. Sep 22 19:56:50 glu03 lvm[868]: Failed to extend thin metadata gluster-thinpool-tpool. Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume gluster-thinpool-tpool from /bricks/vol-homes. Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume gluster-thinpool-tpool from /bricks/vol-search-data. Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume gluster-thinpool-tpool from /bricks/vol-apps-data. Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume gluster-thinpool-tpool from /bricks/vol-video-asset-manager. Sep 22 19:57:02 glu03 bricks-vol-video-asset-manager-brick1[45162]: [2016-09-22 17:57:02.713428] M [MSGID: 113075] [posix-helpers.c:1844:posix_health_check_thread_proc] 0-vol-video-asset-manager-posix: health-check failed, going down Sep 22 19:57:05 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22 17:57:05.186146] M [MSGID: 113075] [posix-helpers.c:1844:posix_health_check_thread_proc] 0-vol-apps-data-posix: health-check failed, going down Sep 22 19:57:18 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22 17:57:18.674279] M [MSGID: 113075] [posix-helpers.c:1844:posix_health_check_thread_proc] 0-vol-search-data-posix: health-check failed, going down Sep 22 19:57:32 glu03 bricks-vol-video-asset-manager-brick1[45162]: [2016-09-22 17:57:32.714461] M [MSGID: 113075] [posix-helpers.c:1850:posix_health_check_thread_proc] 0-vol-video-asset-manager-posix: still alive! -> SIGTERM Sep 22 19:57:32 glu03 kernel: XFS (dm-15): Unmounting Filesystem Sep 22 19:57:35 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22 17:57:35.186352] M [MSGID: 113075] [posix-helpers.c:1850:posix_health_check_thread_proc] 0-vol-apps-data-posix: still alive! -> SIGTERM Sep 22 19:57:35 glu03 kernel: XFS (dm-12): Unmounting Filesystem Sep 22 19:57:48 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22 17:57:48.674444] M [MSGID: 113075] [posix-helpers.c:1850:posix_health_check_thread_proc] 0-vol-search-data-posix: still alive! -> SIGTERM Sep 22 19:57:48 glu03 kernel: XFS (dm-13): Unmounting Filesystem Version-Release number of selected component (if applicable): CentOS Linux release 7.1.1503 (Core) glusterfs-api-3.7.13-1.el7.x86_64 glusterfs-libs-3.7.13-1.el7.x86_64 glusterfs-3.7.13-1.el7.x86_64 glusterfs-fuse-3.7.13-1.el7.x86_64 glusterfs-server-3.7.13-1.el7.x86_64 glusterfs-client-xlators-3.7.13-1.el7.x86_64 glusterfs-cli-3.7.13-1.el7.x86_64 How reproducible: Steps to Reproduce: 1. Create filesystem over a network mounted disk (iscsi works fine) 2. Severe the link to the target 3. Write data on the volume such as it get replicated into the severed disk 4. Observe gluster writing on the local filesystem Actual results: Glusterd continues writing on the local disk, creating the full directory structure Expected results: Glusterd refuses to write to a filesystem which misses the root brick structure Additional info: Could be useful a new flag that allows glusterfs to write (or force it to bring the bricks down) when the underling filesystem gets missing.
Part of the email thread is here: http://www.gluster.org/pipermail/gluster-users/2016-September/028445.html Please reply to the question that was posted (how were the brick processes restarted and is there anything in the brick logs). It would be best to reply to the email and post the response here as well. Thanks!
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days