Bug 1565623 - glusterfs disperse volume input output error
Summary: glusterfs disperse volume input output error
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-10 12:49 UTC by Alexey Shcherbakov
Modified: 2018-06-20 18:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-11 14:32:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
glusterfs-logs (1.92 MB, application/x-gzip)
2018-04-10 14:39 UTC, Alexey Shcherbakov
no flags Details
glusterfs-all-nodes-logs Part-1 (19.00 MB, application/x-7z-compressed)
2018-04-10 19:22 UTC, Alexey Shcherbakov
no flags Details
glusterfs-all-nodes-logs Part-2 (13.60 MB, application/octet-stream)
2018-04-10 19:32 UTC, Alexey Shcherbakov
no flags Details

Description Alexey Shcherbakov 2018-04-10 12:49:50 UTC
Description of problem:

After rebooting one of the glusterfs nodes, I have a problem with file on disperse volume. When i try to read this from mount point i recieve error,

# md5sum /mnt/glfs/vmfs/slake-test-bck-m1-d1.qcow2
md5sum: /mnt/glfs/vmfs/slake-test-bck-m1-d1.qcow2: Input/output error


# gluster --version
glusterfs 3.9.0 built on Nov 22 2016 17:08:59
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


Configuration and status of volume is:


# gluster volume info vol1
 
Volume Name: vol1
Type: Disperse
Volume ID: a7d52933-fccc-4b07-9c3b-5b92f398aa79
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (13 + 2) = 15
Transport-type: tcp
Bricks:
Brick1: glfs-node11.local:/data1/bricks/brick1
Brick2: glfs-node12.local:/data1/bricks/brick1
Brick3: glfs-node13.local:/data1/bricks/brick1
Brick4: glfs-node14.local:/data1/bricks/brick1
Brick5: glfs-node15.local:/data1/bricks/brick1
Brick6: glfs-node16.local:/data1/bricks/brick1
Brick7: glfs-node17.local:/data1/bricks/brick1
Brick8: glfs-node18.local:/data1/bricks/brick1
Brick9: glfs-node19.local:/data1/bricks/brick1
Brick10: glfs-node20.local:/data1/bricks/brick1
Brick11: glfs-node21.local:/data1/bricks/brick1
Brick12: glfs-node22.local:/data1/bricks/brick1
Brick13: glfs-node23.local:/data1/bricks/brick1
Brick14: glfs-node24.local:/data1/bricks/brick1
Brick15: glfs-node25.local:/data1/bricks/brick1
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on


# gluster volume status vol1
Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick glfs-node11.local:/data1/bricks/brick
1                                           49152     0          Y       1781
Brick glfs-node12.local:/data1/bricks/brick
1                                           49152     0          Y       3026
Brick glfs-node13.local:/data1/bricks/brick
1                                           49152     0          Y       1991
Brick glfs-node14.local:/data1/bricks/brick
1                                           49152     0          Y       2029
Brick glfs-node15.local:/data1/bricks/brick
1                                           49152     0          Y       1745
Brick glfs-node16.local:/data1/bricks/brick
1                                           49152     0          Y       1841
Brick glfs-node17.local:/data1/bricks/brick
1                                           49152     0          Y       3597
Brick glfs-node18.local:/data1/bricks/brick
1                                           49152     0          Y       2035
Brick glfs-node19.local:/data1/bricks/brick
1                                           49152     0          Y       1785
Brick glfs-node20.local:/data1/bricks/brick
1                                           49152     0          Y       1755
Brick glfs-node21.local:/data1/bricks/brick
1                                           49152     0          Y       1772
Brick glfs-node22.local:/data1/bricks/brick
1                                           49152     0          Y       1757
Brick glfs-node23.local:/data1/bricks/brick
1                                           49152     0          Y       1825
Brick glfs-node24.local:/data1/bricks/brick
1                                           49152     0          Y       1963
Brick glfs-node25.local:/data1/bricks/brick
1                                           49152     0          Y       2376
Self-heal Daemon on localhost               N/A       N/A        Y       2018
Self-heal Daemon on glfs-node15.local       N/A       N/A        Y       38261
Self-heal Daemon on glfs-node16.local       N/A       N/A        Y       36005
Self-heal Daemon on glfs-node12.local       N/A       N/A        Y       25785
Self-heal Daemon on glfs-node27.local       N/A       N/A        Y       13248
Self-heal Daemon on glfs-node19.local       N/A       N/A        Y       38535
Self-heal Daemon on glfs-node18.local       N/A       N/A        Y       21067
Self-heal Daemon on glfs-node21.local       N/A       N/A        Y       5926
Self-heal Daemon on glfs-node22.local       N/A       N/A        Y       12980
Self-heal Daemon on glfs-node23.local       N/A       N/A        Y       8368
Self-heal Daemon on glfs-node26.local       N/A       N/A        Y       8268
Self-heal Daemon on glfs-node25.local       N/A       N/A        Y       7872
Self-heal Daemon on glfs-node17.local       N/A       N/A        Y       15884
Self-heal Daemon on glfs-node11.local       N/A       N/A        Y       36075
Self-heal Daemon on glfs-node24.local       N/A       N/A        Y       37905
Self-heal Daemon on glfs-node30.local       N/A       N/A        Y       31820
Self-heal Daemon on glfs-node14.local       N/A       N/A        Y       3236
Self-heal Daemon on glfs-node13.local       N/A       N/A        Y       25817
Self-heal Daemon on glfs-node29.local       N/A       N/A        Y       21261
Self-heal Daemon on glfs-node28.local       N/A       N/A        Y       32641
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks



And heal info shows me this:


# gluster volume heal vol1 info
Brick glfs-node11.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node12.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node13.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node14.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node15.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node16.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node17.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node18.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node19.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node20.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node21.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node22.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node23.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node24.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Brick glfs-node25.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2
Status: Connected
Number of entries: 1

Other data files on volume are accesible.

How to recover file (/vmfs/slake-test-bck-m1-d1.qcow2) from this volume ?

Comment 1 Xavi Hernandez 2018-04-10 14:05:18 UTC
What version of gluster are you using ?

I will need the output of the following command from all bricks of the volume:

    getfattr -m. -e hex -d /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2

There was any issue before rebooting the node ? can you upload the logs from the time before the node was rebooted ?

Comment 2 Alexey Shcherbakov 2018-04-10 14:37:53 UTC
(In reply to Xavi Hernandez from comment #1)
> What version of gluster are you using ?
> 
> I will need the output of the following command from all bricks of the
> volume:
> 
>     getfattr -m. -e hex -d
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
> 
> There was any issue before rebooting the node ? can you upload the logs from
> the time before the node was rebooted ?


glusterfs 3.9.0 built on Nov 22 2016 17:08:59

#  getfattr -m. -e hex -d /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x03000000000000005a2fb10a00015e43
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

Comment 3 Alexey Shcherbakov 2018-04-10 14:39:41 UTC
Created attachment 1419914 [details]
glusterfs-logs

Comment 4 Alexey Shcherbakov 2018-04-10 14:46:15 UTC
Output of command from all nodes:


glfs-node11.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bffb9500077191
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node12.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bffeb000040e04
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node13.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x04000000000000005a44a9eb00071d91
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x00000000000058670000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x4000000004e0e29b0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node14.avp.ru
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059c0040a0007ab0b
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

getfattr: Removing leading '/' from absolute path names
glfs-node15.avp.ru
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bfc284000a0f4b
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

getfattr: Removing leading '/' from absolute path names
glfs-node16.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059f176f00009b9b3
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node17.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bfced500026426
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node18.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x03000000000000005a2fb10a00015e43
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node19.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.ec.config=0x0000080f02000200
trusted.ec.version=0x00000000000000000000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node20.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059fb19cc000ca1ed
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node21.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x03000000000000005a3296e90003f74d
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node22.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059fb1a7900027722
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node23.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bfd5d5000f3484
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

glfs-node24.avp.ru
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x020000000000000059bfd8e7000962e5
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x0000000000006aca0000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x0000000004e1910e0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

getfattr: Removing leading '/' from absolute path names
glfs-node25.avp.ru
getfattr: Removing leading '/' from absolute path names
# file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
trusted.bit-rot.version=0x06000000000000005a6090cf0006c90f
trusted.ec.config=0x0000080f02000200
trusted.ec.dirty=0x00000000000058670000000000000000
trusted.ec.size=0x0000000b83f30000
trusted.ec.version=0x4000000004e0e29b0000000004e19112
trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d

Comment 5 Xavi Hernandez 2018-04-10 17:03:22 UTC
Which node did you reboot before seeing those errors ? at what time was the node rebooted ?

What I see from the data you posted is that glfs-node13.avp.ru and glfs-node25.avp.ru were doing a heal at some point (probably at the time of reboot, but I'm not completely sure yet). This is ok because your configuration allows 2 bad bricks, but we have another node with mismatching data: glfs-node19.avp.ru.

This is what is causing the EIO error, since we have 3 failures but a maximum of 2 are allowed.

We can try to determine if one of the mismatching versions is good enough to be considered as good and recover the file.

I still need to check the logs to see if there's more information. Meantime, knowing which node was rebooted and at what time will be very useful to analyze the logs.

Comment 6 Alexey Shcherbakov 2018-04-10 19:22:12 UTC
Created attachment 1420021 [details]
glusterfs-all-nodes-logs Part-1

Comment 7 Alexey Shcherbakov 2018-04-10 19:32:43 UTC
Created attachment 1420023 [details]
glusterfs-all-nodes-logs Part-2

Comment 8 Alexey Shcherbakov 2018-04-10 19:49:17 UTC
(In reply to Xavi Hernandez from comment #5)
> Which node did you reboot before seeing those errors ? at what time was the
> node rebooted ?
> 

was rebooted only node glfs-node19.avp.ru, the rest worked normally, last time (UTC) when qcow disk image worked:

-rwxrwx--- 1  107  107  47G apr  6 06:37 slake-test-bck-m1-d1.qcow2

and log from virtual machine that was disconnected disk at this time:

[2018-04-06 06:39:04.177631] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-vol1-client-8: remote operation failed [Transport endpoint is not connected]
[2018-04-06 06:39:04.189701] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fe0801186fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fe08b20b79e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fe08b20b8ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fe08b20d004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7fe08b20d8d0] ))))) 0-vol1-client-8: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-04-06 06:38:22.143778 (xid=0xc908c31a)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
[2018-04-06 06:39:04.334167] E [MSGID: 122034] [ec-common.c:461:ec_child_select] 0-vol1-disperse-0: Insufficient available children for this request (have 0, need 13)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)



> What I see from the data you posted is that glfs-node13.avp.ru and
> glfs-node25.avp.ru were doing a heal at some point (probably at the time of
> reboot, but I'm not completely sure yet). This is ok because your
> configuration allows 2 bad bricks, but we have another node with mismatching
> data: glfs-node19.avp.ru.
> 
> This is what is causing the EIO error, since we have 3 failures but a
> maximum of 2 are allowed.
> 
> We can try to determine if one of the mismatching versions is good enough to
> be considered as good and recover the file.
> 
> I still need to check the logs to see if there's more information. Meantime,
> knowing which node was rebooted and at what time will be very useful to
> analyze the logs.

I collected and attached more logs from all nodes.

Comment 9 Xavi Hernandez 2018-04-11 06:18:46 UTC
I'll analyze the logs. Once this is solved, I strongly recommend you to upgrade to 3.12 since 3.9 is not maintained anymore and 3.10 will be EOL soon.

Comment 10 Xavi Hernandez 2018-04-11 09:46:40 UTC
All seems to indicate that a heal was happening on nodes glfs-node13.avp.ru and glfs-node25.avp.ru at the time of restarting node glfs-node19.avp.ru. Unfortunately this coincided with a modification that caused 3 simultaneous failures on the file.

We need to manually repair the file or recover it from a backup.

To recover the file manually we have two options:

1. Guess which of the 3 bad fragments is "less" bad. Probably the best candidate would be the fragment on node glfs-node19.avp.ru, but we need to check it. It would be interesting to see the modification times of all fragments on all bricks. To do so we can execute 'stat /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2'. This will help us to decide, but it's not a 100% secure way to determine the best option.

2. Try to check integrity of fragments. To do this we'll need to develop a small tool able to do the check. It will require some time but it will tell us if the file is good or, if there's something bad, we'll know where the problem is (in which block). The advantage of this method is that unless the 3 bad fragments are damaged on the same block, we may be able to recover the whole file.

Comment 11 Alexey Shcherbakov 2018-04-11 10:52:03 UTC
(In reply to Xavi Hernandez from comment #10)
> All seems to indicate that a heal was happening on nodes glfs-node13.avp.ru
> and glfs-node25.avp.ru at the time of restarting node glfs-node19.avp.ru.
> Unfortunately this coincided with a modification that caused 3 simultaneous
> failures on the file.
> 
> We need to manually repair the file or recover it from a backup.
> 
> To recover the file manually we have two options:
> 
> 1. Guess which of the 3 bad fragments is "less" bad. Probably the best
> candidate would be the fragment on node glfs-node19.avp.ru, but we need to
> check it. It would be interesting to see the modification times of all
> fragments on all bricks. To do so we can execute 'stat
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2'. This will help us to
> decide, but it's not a 100% secure way to determine the best option.
> 
> 2. Try to check integrity of fragments. To do this we'll need to develop a
> small tool able to do the check. It will require some time but it will tell
> us if the file is good or, if there's something bad, we'll know where the
> problem is (in which block). The advantage of this method is that unless the
> 3 bad fragments are damaged on the same block, we may be able to recover the
> whole file.

Stat command result on all nodes:

glfs-node11.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 81618486372  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.086093439 +0300
Modify: 2018-04-06 09:37:40.056093815 +0300
Change: 2018-04-10 12:05:31.442268179 +0300
 Birth: -
glfs-node12.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 77312455458  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.086927017 +0300
Modify: 2018-04-06 09:37:40.056927261 +0300
Change: 2018-04-10 12:05:31.442493868 +0300
 Birth: -
glfs-node13.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 2351140352	Blocks: 2898080    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 77309833852  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:19:41.347484000 +0300
Modify: 2018-04-06 09:37:43.501667158 +0300
Change: 2018-04-10 12:05:31.443246800 +0300
 Birth: -
glfs-node14.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 45097173507  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.085762909 +0300
Modify: 2018-04-06 09:37:40.055763093 +0300
Change: 2018-04-10 12:05:31.442878601 +0300
 Birth: -
glfs-node15.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 47297640679  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.084025418 +0300
Modify: 2018-04-06 09:37:40.054025522 +0300
Change: 2018-04-10 12:05:31.443444637 +0300
 Birth: -
glfs-node16.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 62287697396  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.084973842 +0300
Modify: 2018-04-06 09:37:40.054974225 +0300
Change: 2018-04-10 12:05:31.450825340 +0300
 Birth: -
glfs-node17.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 25769816893  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.087142929 +0300
Modify: 2018-04-06 09:37:40.057143241 +0300
Change: 2018-04-10 12:05:31.444507483 +0300
 Birth: -
glfs-node18.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 81645506748  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.085410682 +0300
Modify: 2018-04-06 09:37:40.055411031 +0300
Change: 2018-04-10 12:05:31.445081386 +0300
 Birth: -
glfs-node19.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430648    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 55834592083  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:39.000000000 +0300
Modify: 2018-04-06 09:37:39.000000000 +0300
Change: 2018-04-10 13:43:32.589505057 +0300
 Birth: -
glfs-node20.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 64483022647  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.086797717 +0300
Modify: 2018-04-06 09:37:40.056797940 +0300
Change: 2018-04-10 12:05:31.447094554 +0300
 Birth: -
glfs-node21.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430648    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 30399803420  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.084872465 +0300
Modify: 2018-04-06 09:37:40.054872712 +0300
Change: 2018-04-10 12:05:31.445816033 +0300
 Birth: -
glfs-node22.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 10740329468  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.087255449 +0300
Modify: 2018-04-06 09:37:40.057255732 +0300
Change: 2018-04-10 12:05:31.447168330 +0300
 Birth: -
glfs-node23.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 73019182467  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2018-04-06 09:37:40.088930999 +0300
Modify: 2018-04-06 09:37:40.058931201 +0300
Change: 2018-04-10 12:05:31.449991404 +0300
 Birth: -
glfs-node24.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 3804491264	Blocks: 7430656    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 219231      Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/    qemu)   Gid: (  107/    qemu)
Access: 2018-04-06 09:37:40.087524522 +0300
Modify: 2018-04-06 09:37:40.057524854 +0300
Change: 2018-04-10 12:05:31.448713916 +0300
 Birth: -
glfs-node25.avp.ru
  File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’
  Size: 2351140352	Blocks: 2898080    IO Block: 4096   regular file
Device: 815h/2069d	Inode: 55834924933  Links: 2
Access: (0770/-rwxrwx---)  Uid: (  107/    qemu)   Gid: (  107/    qemu)
Access: 2018-04-06 09:19:41.347484000 +0300
Modify: 2018-04-06 09:37:43.502748628 +0300
Change: 2018-04-10 12:05:31.449630601 +0300
 Birth: -

Comment 12 Xavi Hernandez 2018-04-11 11:31:41 UTC
From this data, we can clearly see that nodes glfs-node13.avp.ru and glfs-node25.avp.ru have incomplete fragments (its size is smaller than the others) because they were in the middle of a heal operation.

So the best possibility is to consider the fragment on glfs-node19.avp.ru as good. It has the correct size, but its modification time differs in one second compared to the others. It's possible that any change made during this time will contain garbage data now.

Does this qcow image correspond to a machine with heavy disk activity ?

We can proceed with the recovery of the fragment on glfs-node19.avp.ru and see what happens, or wait to see if we can recover the file using a specific tool, though without guarantees (note that since we have two nodes with a fragment size of little more than 2 GB, we can only recover errors in glfs-node19.avp.ru below this size. Any errors above this size are unrecoverable).

You can also make a manual copy of all fragments (directly from bricks to somewhere else) before attempting to recover the fragment on glfs-node19.avp.ru, just to be able to try other approaches if the first one doesn't work.

Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a manual repair unless a manual copy of all fragments has been done previously.

If you want to proceed with the recovery, you can do this on node glfs-node19.avp.ru:

    setfattr -n trusted.ec.size -v 0x0000000b83f30000 /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
    setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112 /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2

This should fix the Input/Output error and a heal should be triggered shortly after to fix the other remaining fragments, but it's no guarantee that the virtual machine will work correctly. If it doesn't work, you can recover from backup or we can try to manually recover the file (the lower 2GB at most).

Comment 13 Alexey Shcherbakov 2018-04-11 13:17:48 UTC
(In reply to Xavi Hernandez from comment #12)
> From this data, we can clearly see that nodes glfs-node13.avp.ru and
> glfs-node25.avp.ru have incomplete fragments (its size is smaller than the
> others) because they were in the middle of a heal operation.
> 
> So the best possibility is to consider the fragment on glfs-node19.avp.ru as
> good. It has the correct size, but its modification time differs in one
> second compared to the others. It's possible that any change made during
> this time will contain garbage data now.
> 
> Does this qcow image correspond to a machine with heavy disk activity ?
> 

Yes, machine is heavy loaded.

> We can proceed with the recovery of the fragment on glfs-node19.avp.ru and
> see what happens, or wait to see if we can recover the file using a specific
> tool, though without guarantees (note that since we have two nodes with a
> fragment size of little more than 2 GB, we can only recover errors in
> glfs-node19.avp.ru below this size. Any errors above this size are
> unrecoverable).
> 
> You can also make a manual copy of all fragments (directly from bricks to
> somewhere else) before attempting to recover the fragment on
> glfs-node19.avp.ru, just to be able to try other approaches if the first one
> doesn't work.
> 
> Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a
> manual repair unless a manual copy of all fragments has been done previously.
> 
> If you want to proceed with the recovery, you can do this on node
> glfs-node19.avp.ru:
> 
>     setfattr -n trusted.ec.size -v 0x0000000b83f30000
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
>     setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
> 

I'm ran these commands on glfs-node19.avp.ru and virtual machine started with qcow disk and working without errors - correctly.

Thank you so much!!!

> This should fix the Input/Output error and a heal should be triggered
> shortly after to fix the other remaining fragments, but it's no guarantee
> that the virtual machine will work correctly. If it doesn't work, you can
> recover from backup or we can try to manually recover the file (the lower
> 2GB at most).

But, heal info in previuos state:

# gluster volume heal vol1 info
Brick glfs-node11.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node12.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node13.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node14.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node15.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node16.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node17.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node18.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node19.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node20.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node21.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node22.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node23.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node24.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node25.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1


What next steps should be taken ?

Comment 14 Xavi Hernandez 2018-04-11 13:25:34 UTC
Self-heal should already have been triggered to fix the remaining files. You can monitor its progress by looking at the size of the file on nodes 13 and 25. it should be growing. The output of heal info won't change until the heal is complete.

However there was a bug (not sure at which version it was solved right now) that was preventing self-heal to finish on files that are constantly being modified. If that's your case, the only thing you can do is to completely stop the virtual machine and let self-heal finish before starting it again.

I recommend you to upgrade gluster to a newer version where many self-heal related issues have been fixed.

Comment 15 Alexey Shcherbakov 2018-04-11 14:21:25 UTC
(In reply to Xavi Hernandez from comment #14)
> Self-heal should already have been triggered to fix the remaining files. You
> can monitor its progress by looking at the size of the file on nodes 13 and
> 25. it should be growing. The output of heal info won't change until the
> heal is complete.
> 
> However there was a bug (not sure at which version it was solved right now)
> that was preventing self-heal to finish on files that are constantly being
> modified. If that's your case, the only thing you can do is to completely
> stop the virtual machine and let self-heal finish before starting it again.
> 
> I recommend you to upgrade gluster to a newer version where many self-heal
> related issues have been fixed.

Stopped virtual machine and heal info state returned to normal:

# gluster volume heal vol1 info
Brick glfs-node11.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node12.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node13.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node14.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node15.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node16.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node17.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node18.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node19.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node20.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node21.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node22.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node23.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node24.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0

Brick glfs-node25.local:/data1/bricks/brick1
Status: Connected
Number of entries: 0


I will prepare for the upgrade.

Comment 16 Xavi Hernandez 2018-04-11 14:24:43 UTC
If everything is working fine, are you ok if I close the bug ?

Comment 17 Alexey Shcherbakov 2018-04-11 14:31:20 UTC
(In reply to Xavi Hernandez from comment #16)
> If everything is working fine, are you ok if I close the bug ?

Yes, ok.


Note You need to log in before you can comment on or make changes to this bug.