1779616 – Glusterfs mount-dir is getting read-only system after 1 day

Bug 1779616 - Glusterfs mount-dir is getting read-only system after 1 day

Summary: Glusterfs mount-dir is getting read-only system after 1 day

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	posix
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Mohammed Rafi KC
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-04 11:34 UTC by sharmaakshay890
Modified:	2020-03-12 12:29 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-12 12:29:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description sharmaakshay890 2019-12-04 11:34:10 UTC

Description of problem:
We are mounting kubernetes pods in glusterfs.
after keeping setup for long-run , we can see vm itself is not accessible.
pods which we are mounting are(kafka,etcd,logstash,influxdb)

Version-Release number of selected component (if applicable):
Glusterfs version - 5.9
linux version - 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes version - 1.9.5


How reproducible:
keeping setup for long-run on virtualbox vms with ram 24Gb and cpu 8


Actual results:
Vm got unaccessible, even can't able to ssh.

logs : (/var/log/glusterfs/bricks)
[2019-11-28 04:53:12.195465] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 0, [Read-only file system]
[2019-11-28 04:53:12.195523] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565966: WRITEV 173 (ece0dcd6-9c14-4b94-bcd3-c8559c299852), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.195673] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565967: WRITEV 232 (a99243ad-3346-4d5c-951c-f5e821c98bfe), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.195923] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 4096, [Read-only file system]
[2019-11-28 04:53:12.195993] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565968: WRITEV 45 (539acf44-e3e4-4083-95b8-98f08380a8eb), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.196199] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 61440, [Read-only file system]
[2019-11-28 04:53:12.196284] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 40960, [Read-only file system]
[2019-11-28 04:53:12.196306] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565970: WRITEV 90 (3bd318e9-fd6b-46af-8b82-ce9f4ce22934), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.196356] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565973: WRITEV 89 (2fce19d6-ae20-4b29-a381-63ef15b876eb), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.196370] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 0, [Read-only file system]
[2019-11-28 04:53:12.196511] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-volume-server: 2565971: WRITEV 179 (470e3858-f97a-4ebe-9c9c-b878879c9130), client: CTX_ID:cb52c772-77e8-44be-b2c2-9b7b87ef8f7a-GRAPH_ID:0-PID:21526-HOST:deploy1-PC_NAME:gluster-volume-client-0-RECON_NO:-0, error-xlator: gluster-volume-posix [Read-only file system]
[2019-11-28 04:53:12.196551] E [MSGID: 113072] [posix-inode-fd-ops.c:1905:posix_writev] 0-gluster-volume-posix: write failed: offset 0, [Read-only file system]
-----------------------------------------------------------------------
----------------------------------------------------------------------
/var/log/syslog

deploy1 kernel: [89146.220855] blk_update_request: I/O error, dev sdb, sector 38523272
170912 Nov 28 04:53:11 deploy1 kernel: [89146.222122] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1048903 (offset 0 siz       e 0 starting block 4815410)
170913 Nov 28 04:53:11 deploy1 kernel: [89146.222132] Buffer I/O error on device sdb1, logical block 4815153
170914 Nov 28 04:53:11 deploy1 kernel: [89146.223425] sd 3:0:0:0: rejecting I/O to offline device
170915 Nov 28 04:53:11 deploy1 kernel: [89146.224577] sd 3:0:0:0: [sdb] killing request
170916 Nov 28 04:53:11 deploy1 kernel: [89146.224590] sd 3:0:0:0: rejecting I/O to offline device
170917 Nov 28 04:53:11 deploy1 kernel: [89146.225752] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1315505 (offset 89047       04 size 20480 starting block 5472643)
170918 Nov 28 04:53:11 deploy1 kernel: [89146.225759] Buffer I/O error on device sdb1, logical block 5472381
170919 Nov 28 04:53:11 deploy1 kernel: [89146.226883] Buffer I/O error on device sdb1, logical block 5472382
170920 Nov 28 04:53:11 deploy1 kernel: [89146.227674] Buffer I/O error on device sdb1, logical block 5472383
170921 Nov 28 04:53:11 deploy1 kernel: [89146.228241] Buffer I/O error on device sdb1, logical block 5472384
170922 Nov 28 04:53:11 deploy1 kernel: [89146.228834] Buffer I/O error on device sdb1, logical block 5472385
170923 Nov 28 04:53:11 deploy1 kernel: [89146.229429] Buffer I/O error on device sdb1, logical block 5472386
170924 Nov 28 04:53:11 deploy1 kernel: [89146.229991] sd 3:0:0:0: rejecting I/O to offline device
170925 Nov 28 04:53:11 deploy1 kernel: [89146.230517] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1053838 (offset 0 siz       e 0 starting block 5490330)
170926 Nov 28 04:53:11 deploy1 kernel: [89146.230521] Buffer I/O error on device sdb1, logical block 5490073
170927 Nov 28 04:53:11 deploy1 kernel: [89146.231046] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1053838 (offset 16793       60 size 12288 starting block 5490333)
170928 Nov 28 04:53:11 deploy1 kernel: [89146.231050] Buffer I/O error on device sdb1, logical block 5490074
170929 Nov 28 04:53:11 deploy1 kernel: [89146.231581] Buffer I/O error on device sdb1, logical block 5490075
170930 Nov 28 04:53:11 deploy1 kernel: [89146.232119] sd 3:0:0:0: rejecting I/O to offline device
170931 Nov 28 04:53:11 deploy1 kernel: [89146.232648] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1053886 (offset 62341       12 size 77824 starting block 5498629)
170932 Nov 28 04:53:11 deploy1 kernel: [89146.232676] sd 3:0:0:0: rejecting I/O to offline device
170933 Nov 28 04:53:11 deploy1 kernel: [89146.233197] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1053892 (offset 62341       12 size 77824 starting block 5500677)
170934 Nov 28 04:53:11 deploy1 kernel: [89146.233229] sd 3:0:0:0: rejecting I/O to offline device
170935 Nov 28 04:53:11 deploy1 kernel: [89146.233739] EXT4-fs warning (device sdb1): ext4_end_bio:330: I/O error -5 writing to inode 1053891 (offset 62341       12 size 77824 starting block 5502725)
170936 Nov 28 04:53:11 deploy1 kernel: [89146.233766] sd 3:0:0:0: rejecting I/O to offline device

Expected results:
volume mount should be accessible all the time


Additional info:
we are doing static provisioning of glusterfs volume mount in k8s.

Creating glusterfs using our own scripts with the following commands
sudo gluster peer probe 
sudo gluster volume create 
sudo gluster volume start

we are mounting glusterfs in secondary disk of linux (eg /dev/sdb)


while cleaning up:-
sudo gluster peer detach
sudo gluster volume stop
sudo gluster volume delete
we are unmounting the disk
and formatting the disk using (ext4 type)


Please let me know , if i'm missing any command or not following proper procedure.

Comment 1 Sahina Bose 2019-12-09 13:24:47 UTC

Rafi, can you take a look?

Comment 2 Mohammed Rafi KC 2019-12-11 18:18:31 UTC

It looks like the backend brick mount is corrupted. The system logs suggest an i/o error, and mostly due to a hardware failure or will be related to the storage device. The errors from gluster brick logs are also from the POSIX layer, the layer where gluster talks to the backend device. So it is highly likely to have some problems with the backend mount.

Comment 3 sharmaakshay890 2019-12-13 06:25:55 UTC

Thanks Rafi,

1). Can i know what could be reason of this i/o error failure??.

We are facing this issues in some of the machines , due to which we are unable to do anything until and unless we reboots that machine.
After that everything come backs to normal.

we are unable to find the root cause.

2). I have shared some steps in bug's Additional Info.

can you please let us know, that mentioned steps are correct ? (for creation and cleaning up of Glusterfs).

3) Recommended Backend storage for glusterfs ??

we are using Sata disk with ext4 type.

Comment 4 Mohammed Rafi KC 2020-01-16 08:23:23 UTC

(In reply to sharmaakshay890 from comment #3)
> Thanks Rafi,
> 
> 1). Can i know what could be reason of this i/o error failure??.
> 
> We are facing this issues in some of the machines , due to which we are
> unable to do anything until and unless we reboots that machine.
> After that everything come backs to normal.
> 
> we are unable to find the root cause.

I have to admit it I'm not an expert in disk failure cases. You can may start with the disk health check and then see if the file system is corrupted or not.

> 
> 2). I have shared some steps in bug's Additional Info.
> 
> can you please let us know, that mentioned steps are correct ? (for creation
> and cleaning up of Glusterfs).

Please refer to https://docs.gluster.org/en/latest/Administrator%20Guide/setting-up-storage/

> 
> 3) Recommended Backend storage for glusterfs ??
> 
> we are using Sata disk with ext4 type.

ext4 is perfectly fine though gluster recommends XFS.

Comment 5 Worker Ant 2020-03-12 12:29:54 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/896, and will be tracked there from now on. Visit GitHub issues URL for further details

Note You need to log in before you can comment on or make changes to this bug.