Bug 1774598 - GlusterFS gets corrupted when moves are done when capacity is 100%
Summary: GlusterFS gets corrupted when moves are done when capacity is 100%
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 6
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 14:26 UTC by Rob de Wit
Modified: 2020-03-12 12:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:17:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Rob de Wit 2019-11-20 14:26:01 UTC
Description of problem:

When a GlusterFS filesystem is written to until 100% capacity is used the filesystem becomes corrupted if files are moved during that time.

Version-Release number of selected component (if applicable):
Version 6.1

How reproducible:
Fairly reproducable. It looks like a race-condition that is quite easy to catch.

Steps to Reproduce:

# mount HOST:VOLUME -t glusterfs /PATH
# df -k /PATH
Filesystem	1K-blocks  Used Available Use% Mounted on
HOST:VOLUME       5232640 90984   5141656   2% /PATH
total 0
# dd if=/dev/zero of=/PATH/TEST bs=4096 ; mv /PATH/TEST /PATH/TEST.X
dd: error writing '/PATH/TEST': Input/output error
dd: closing output file '/PATH/TEST': Input/output error
mv: cannot stat '/PATH/TEST': Transport endpoint is not connected
# ls -la /PATH
ls: cannot access '/PATH/TEST': Transport endpoint is not connected
drwxr-xr-x  3 root root   36 Nov 20 14:57 ./
drwxr-xr-x 42 root root 4096 Nov 18 13:38 ../
-?????????  ? ?    ?       ?            ? TEST
# df -k /PATH
Filesystem	1K-blocks    Used Available Use% Mounted on
HOST:VOLUME       5232640 5232640         0 100% /PATH


Actual results:
Corrupted filesystem

Expected results:
Valid fiesystem


Additional info:
The volume is actually still mounted. It is the file that is corrupted.

We saw this corrupting a file system on a GlusterFS volume that was used by a Redis server. Redis writes data in a temporary file and then moves it to its final location. The final file names got corrupted (instead of the temporary names) which caused the Redis server system to stop responding because of unreliable persistency :-(


Mount log:
[2019-11-20 13:56:12.760373] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-2: remote operation failed [No space left on device]
[2019-11-20 13:56:24.310731] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-1: remote operation failed [No space left on device]
[2019-11-20 13:56:24.328860] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-0: remote operation failed [No space left on device]
[2019-11-20 13:56:27.559474] W [fuse-bridge.c:1266:fuse_attr_cbk] 0-glusterfs-fuse: 9065357: FSTAT() /TEST => -1 (No space left on device)
[2019-11-20 13:56:29.566267] W [fuse-bridge.c:2915:fuse_writev_cbk] 0-glusterfs-fuse: 9065338: WRITE => -1 gfid=ca6f6460-e0b3-42bc-b3dc-0be264300010 fd=0x7f6bc40e78b8 (No space left on devic
e)
[2019-11-20 13:56:31.565946] W [fuse-bridge.c:1823:fuse_err_cbk] 0-glusterfs-fuse: 9065360: FLUSH() ERR => -1 (No space left on device)
[2019-11-20 13:56:31.659272] W [MSGID: 114031] [client-rpc-fops_v2.c:2405:client4_0_rename_cbk] 0-VOLUME-client-0: remote operation failed [No space left on device]
[2019-11-20 13:56:31.691804] W [MSGID: 114031] [client-rpc-fops_v2.c:2405:client4_0_rename_cbk] 0-VOLUME-client-1: remote operation failed [No space left on device]
[2019-11-20 13:56:31.693838] W [MSGID: 114031] [client-rpc-fops_v2.c:2405:client4_0_rename_cbk] 0-VOLUME-client-2: remote operation failed [No space left on device]
[2019-11-20 13:56:31.694769] W [fuse-bridge.c:2366:fuse_rename_cbk] 0-glusterfs-fuse: 9065392: /TEST -> /TEST.X => -1 (No space left on device)
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-1: remote operation failed [No space left on device]" repeated 28 times between [2019-11-20 
13:56:24.310731] and [2019-11-20 13:56:29.597505]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-0: remote operation failed [No space left on device]" repeated 28 times between [2019-11-20 
13:56:24.328860] and [2019-11-20 13:56:29.605913]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-2: remote operation failed [No space left on device]" repeated 3122 times between [2019-11-2
0 13:56:12.760373] and [2019-11-20 13:56:29.607119]
[2019-11-20 13:59:21.189234] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-1: remote operation failed [No space left on device]
[2019-11-20 13:59:21.233674] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-2: remote operation failed [No space left on device]
[2019-11-20 13:59:22.112250] W [MSGID: 108027] [afr-common.c:2273:afr_attempt_readsubvol_set] 0-VOLUME-replicate-0: no read subvols for /TEST
[2019-11-20 13:59:22.112304] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664700: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:22.113797] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664701: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:24.129198] W [fuse-bridge.c:2915:fuse_writev_cbk] 0-glusterfs-fuse: 11664683: WRITE => -1 gfid=6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 fd=0x7f6bcc05e6b8 (Input/output error)
[2019-11-20 13:59:24.130731] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664720: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:25.592637] W [fuse-bridge.c:1823:fuse_err_cbk] 0-glusterfs-fuse: 11664721: FLUSH() ERR => -1 (Input/output error)
[2019-11-20 13:59:25.642640] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664729: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:25.647432] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664731: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:26.147832] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664747: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:28.166271] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664766: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:21.233778] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-VOLUME-client-2: remote operation failed [No space left on device]
The message "W [MSGID: 108027] [afr-common.c:2273:afr_attempt_readsubvol_set] 0-VOLUME-replicate-0: no read subvols for /TEST" repeated 10 times between [2019-11-20 13:59:22.112250] and [
2019-11-20 13:59:28.166256]
[2019-11-20 13:59:30.183268] W [MSGID: 108027] [afr-common.c:2273:afr_attempt_readsubvol_set] 0-VOLUME-replicate-0: no read subvols for /TEST
[2019-11-20 13:59:30.184436] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664785: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:32.201961] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664804: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:34.222450] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664823: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:36.243094] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664842: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:38.259811] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664861: LOOKUP() /TEST => -1 (Transport endpoint is not connected)
[2019-11-20 13:59:40.276368] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 11664880: LOOKUP() /TEST => -1 (Transport endpoint is not connected)


Glusterd server brick log 1:

[2019-11-20 13:04:09.055353] I [glusterfsd-mgmt.c:2019:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
[2019-11-20 13:04:08.983233] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2019-11-20 13:56:24.302554] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318565888, [No space left on device]
[2019-11-20 13:56:24.310902] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318696960, [No space left on device]
[2019-11-20 13:56:24.310930] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728899: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:24.310936] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728900: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:24.334698] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318828032, [No space left on device]
[2019-11-20 13:56:24.334734] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728909: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:24.336609] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318959104, [No space left on device]
[2019-11-20 13:56:24.336634] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728910: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
...
...
[2019-11-20 13:56:26.044039] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318828032, [No space left on device]
[2019-11-20 13:56:26.044069] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728947: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:26.046266] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318565888, [No space left on device]
[2019-11-20 13:56:26.046296] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728948: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:27.562009] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728954: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.590605] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728961: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.594334] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728963: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.597777] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728964: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.601819] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728965: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.606035] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 728966: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:31.659325] I [MSGID: 115061] [server-rpc-fops_v2.c:991:server4_rename_cbk] 0-VOLUME-server: 728986: RENAME /TEST (00000000-0000-0000-0000-000000000001/TEST) -> /TEST.X (00000000-0000-0000-0000-000000000001/TEST.X), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-0-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:38.110931] W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_a24598d2599320d9eea64cae3dbfdd96/brick_46346a11daff8a65949cbdc48fddf11e/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3
The message "W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_a24598d2599320d9eea64cae3dbfdd96/brick_46346a11daff8a65949cbdc48fddf11e/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3" repeated 3 times between [2019-11-20 13:59:38.110931] and [2019-11-20 13:59:42.161994]
[2019-11-20 14:04:07.758674] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...


Glusterd server brick log 2:

[2019-11-20 13:04:09.023811] I [glusterfsd-mgmt.c:2019:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
[2019-11-20 13:04:08.970623] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2019-11-20 13:56:12.729306] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 720983: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:12.742427] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 720985: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:12.742508] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 720988: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:12.742608] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 720989: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
...
...
[2019-11-20 13:56:29.603997] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 726541: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.607181] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 726542: WRITEV 1 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:31.693805] I [MSGID: 115061] [server-rpc-fops_v2.c:991:server4_rename_cbk] 0-VOLUME-server: 726562: RENAME /TEST (00000000-0000-0000-0000-000000000001/TEST) -> /TEST.X (00000000-0000-0000-0000-000000000001/TEST.X), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:21.219747] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318500352, [No space left on device]
[2019-11-20 13:59:21.233593] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 768392: WRITEV 1 (6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:21.233599] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318631424, [No space left on device]
[2019-11-20 13:59:21.233726] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 768393: WRITEV 1 (6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-2-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:38.119838] W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_d8a31ba2fdb863691be6f3a85f9f816f/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3
The message "W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_d8a31ba2fdb863691be6f3a85f9f816f/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3" repeated 3 times between [2019-11-20 13:59:38.119838] and [2019-11-20 13:59:42.161685]
[2019-11-20 14:04:07.778897] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...


Glusterd server brick log 3:

[2019-11-20 13:04:08.624108] I [glusterfsd-mgmt.c:2019:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
[2019-11-20 13:04:08.551525] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2019-11-20 13:56:24.298820] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318565888, [No space left on device]
[2019-11-20 13:56:24.310682] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318696960, [No space left on device]
[2019-11-20 13:56:24.310689] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900755: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device] 
[2019-11-20 13:56:24.310721] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900756: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:24.334699] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318828032, [No space left on device]
[2019-11-20 13:56:24.334730] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900765: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:24.335873] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318959104, [No space left on device]
[2019-11-20 13:56:24.335894] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900766: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
...
...
[2019-11-20 13:56:26.035575] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318828032, [No space left on device]
[2019-11-20 13:56:26.035599] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900805: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:26.037812] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318565888, [No space left on device]
[2019-11-20 13:56:26.037834] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900806: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:27.561771] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900812: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.589826] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900819: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device] 
[2019-11-20 13:56:29.592014] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900821: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.592771] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900822: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.595035] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900823: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:29.597448] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 900824: WRITEV 0 (ca6f6460-e0b3-42bc-b3dc-0be264300010), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:56:31.691648] I [MSGID: 115061] [server-rpc-fops_v2.c:991:server4_rename_cbk] 0-VOLUME-server: 900846: RENAME /TEST (00000000-0000-0000-0000-000000000001/TEST) -> /TEST.X (00000000-0000-0000-0000-000000000001/TEST.X), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:21.168590] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-VOLUME-posix: write failed: offset 5318238208, [No space left on device]
[2019-11-20 13:59:21.189017] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-VOLUME-server: 942790: WRITEV 1 (6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3), client: CTX_ID:b19ad7e9-1568-45b0-b721-ce4196011029-GRAPH_ID:0-PID:17323-HOST:lucia-PC_NAME:VOLUME-client-1-RECON_NO:-0, error-xlator: VOLUME-posix [No space left on device]
[2019-11-20 13:59:38.114854] W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_2e69e68e3bbecd5bd53f6f8730765169/brick_c95022a8c98ef3a61b7b392c1e957b55/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3
The message "W [MSGID: 113117] [posix-metadata.c:627:posix_set_ctime] 0-VOLUME-posix: posix set mdata failed, No ctime : /var/lib/heketi/mounts/vg_2e69e68e3bbecd5bd53f6f8730765169/brick_c95022a8c98ef3a61b7b392c1e957b55/brick/.glusterfs/60/05/6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3 gfid:6005b4e3-7e72-44f1-aa8a-1a4b7ba308e3" repeated 3 times between [2019-11-20 13:59:38.114854] and [2019-11-20 13:59:42.161636]
[2019-11-20 14:04:07.759296] I [MSGID: 100011] [glusterfsd.c:1641:reincarnate] 0-glusterfsd: Fetching the volume file from server...

Comment 1 Mohit Agrawal 2020-02-19 14:31:05 UTC
I believe the issue should be fixed after backporting the patch(https://review.gluster.org/#/c/glusterfs/+/23572/) in release 6.0.

Would it be possible for you to try to reproduce the issue after apply the same patch?

Thanks,
Mohit Agrawal

Comment 2 Rob de Wit 2020-02-27 07:50:46 UTC
@Mohit I need some time to test this properly - will come back and report.

Comment 3 Rob de Wit 2020-02-27 12:09:36 UTC
The original setup is no longer available for testing this. 

When I try to reproduce it on (slower) hardware I cannot reproduce the error, so I can't tell if the patch fixes the problem :-(

Comment 4 Worker Ant 2020-03-12 12:17:44 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/862, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.