Description of problem: After rebooting one of the glusterfs nodes, I have a problem with file on disperse volume. When i try to read this from mount point i recieve error, # md5sum /mnt/glfs/vmfs/slake-test-bck-m1-d1.qcow2 md5sum: /mnt/glfs/vmfs/slake-test-bck-m1-d1.qcow2: Input/output error # gluster --version glusterfs 3.9.0 built on Nov 22 2016 17:08:59 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Configuration and status of volume is: # gluster volume info vol1 Volume Name: vol1 Type: Disperse Volume ID: a7d52933-fccc-4b07-9c3b-5b92f398aa79 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (13 + 2) = 15 Transport-type: tcp Bricks: Brick1: glfs-node11.local:/data1/bricks/brick1 Brick2: glfs-node12.local:/data1/bricks/brick1 Brick3: glfs-node13.local:/data1/bricks/brick1 Brick4: glfs-node14.local:/data1/bricks/brick1 Brick5: glfs-node15.local:/data1/bricks/brick1 Brick6: glfs-node16.local:/data1/bricks/brick1 Brick7: glfs-node17.local:/data1/bricks/brick1 Brick8: glfs-node18.local:/data1/bricks/brick1 Brick9: glfs-node19.local:/data1/bricks/brick1 Brick10: glfs-node20.local:/data1/bricks/brick1 Brick11: glfs-node21.local:/data1/bricks/brick1 Brick12: glfs-node22.local:/data1/bricks/brick1 Brick13: glfs-node23.local:/data1/bricks/brick1 Brick14: glfs-node24.local:/data1/bricks/brick1 Brick15: glfs-node25.local:/data1/bricks/brick1 Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on # gluster volume status vol1 Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick glfs-node11.local:/data1/bricks/brick 1 49152 0 Y 1781 Brick glfs-node12.local:/data1/bricks/brick 1 49152 0 Y 3026 Brick glfs-node13.local:/data1/bricks/brick 1 49152 0 Y 1991 Brick glfs-node14.local:/data1/bricks/brick 1 49152 0 Y 2029 Brick glfs-node15.local:/data1/bricks/brick 1 49152 0 Y 1745 Brick glfs-node16.local:/data1/bricks/brick 1 49152 0 Y 1841 Brick glfs-node17.local:/data1/bricks/brick 1 49152 0 Y 3597 Brick glfs-node18.local:/data1/bricks/brick 1 49152 0 Y 2035 Brick glfs-node19.local:/data1/bricks/brick 1 49152 0 Y 1785 Brick glfs-node20.local:/data1/bricks/brick 1 49152 0 Y 1755 Brick glfs-node21.local:/data1/bricks/brick 1 49152 0 Y 1772 Brick glfs-node22.local:/data1/bricks/brick 1 49152 0 Y 1757 Brick glfs-node23.local:/data1/bricks/brick 1 49152 0 Y 1825 Brick glfs-node24.local:/data1/bricks/brick 1 49152 0 Y 1963 Brick glfs-node25.local:/data1/bricks/brick 1 49152 0 Y 2376 Self-heal Daemon on localhost N/A N/A Y 2018 Self-heal Daemon on glfs-node15.local N/A N/A Y 38261 Self-heal Daemon on glfs-node16.local N/A N/A Y 36005 Self-heal Daemon on glfs-node12.local N/A N/A Y 25785 Self-heal Daemon on glfs-node27.local N/A N/A Y 13248 Self-heal Daemon on glfs-node19.local N/A N/A Y 38535 Self-heal Daemon on glfs-node18.local N/A N/A Y 21067 Self-heal Daemon on glfs-node21.local N/A N/A Y 5926 Self-heal Daemon on glfs-node22.local N/A N/A Y 12980 Self-heal Daemon on glfs-node23.local N/A N/A Y 8368 Self-heal Daemon on glfs-node26.local N/A N/A Y 8268 Self-heal Daemon on glfs-node25.local N/A N/A Y 7872 Self-heal Daemon on glfs-node17.local N/A N/A Y 15884 Self-heal Daemon on glfs-node11.local N/A N/A Y 36075 Self-heal Daemon on glfs-node24.local N/A N/A Y 37905 Self-heal Daemon on glfs-node30.local N/A N/A Y 31820 Self-heal Daemon on glfs-node14.local N/A N/A Y 3236 Self-heal Daemon on glfs-node13.local N/A N/A Y 25817 Self-heal Daemon on glfs-node29.local N/A N/A Y 21261 Self-heal Daemon on glfs-node28.local N/A N/A Y 32641 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks And heal info shows me this: # gluster volume heal vol1 info Brick glfs-node11.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node12.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node13.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node14.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node15.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node16.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node17.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node18.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node19.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node20.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node21.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node22.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node23.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node24.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node25.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Other data files on volume are accesible. How to recover file (/vmfs/slake-test-bck-m1-d1.qcow2) from this volume ?
What version of gluster are you using ? I will need the output of the following command from all bricks of the volume: getfattr -m. -e hex -d /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 There was any issue before rebooting the node ? can you upload the logs from the time before the node was rebooted ?
(In reply to Xavi Hernandez from comment #1) > What version of gluster are you using ? > > I will need the output of the following command from all bricks of the > volume: > > getfattr -m. -e hex -d > /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 > > There was any issue before rebooting the node ? can you upload the logs from > the time before the node was rebooted ? glusterfs 3.9.0 built on Nov 22 2016 17:08:59 # getfattr -m. -e hex -d /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x03000000000000005a2fb10a00015e43 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d
Created attachment 1419914 [details] glusterfs-logs
Output of command from all nodes: glfs-node11.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bffb9500077191 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node12.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bffeb000040e04 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node13.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x04000000000000005a44a9eb00071d91 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x00000000000058670000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x4000000004e0e29b0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node14.avp.ru # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059c0040a0007ab0b trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d getfattr: Removing leading '/' from absolute path names glfs-node15.avp.ru # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bfc284000a0f4b trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d getfattr: Removing leading '/' from absolute path names glfs-node16.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059f176f00009b9b3 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node17.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bfced500026426 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node18.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x03000000000000005a2fb10a00015e43 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node19.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.ec.config=0x0000080f02000200 trusted.ec.version=0x00000000000000000000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node20.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059fb19cc000ca1ed trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node21.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x03000000000000005a3296e90003f74d trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node22.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059fb1a7900027722 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node23.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bfd5d5000f3484 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d glfs-node24.avp.ru # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x020000000000000059bfd8e7000962e5 trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x0000000000006aca0000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x0000000004e1910e0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d getfattr: Removing leading '/' from absolute path names glfs-node25.avp.ru getfattr: Removing leading '/' from absolute path names # file: data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 trusted.bit-rot.version=0x06000000000000005a6090cf0006c90f trusted.ec.config=0x0000080f02000200 trusted.ec.dirty=0x00000000000058670000000000000000 trusted.ec.size=0x0000000b83f30000 trusted.ec.version=0x4000000004e0e29b0000000004e19112 trusted.gfid=0x9e236221dbe04096ae4a5546cde59b1d
Which node did you reboot before seeing those errors ? at what time was the node rebooted ? What I see from the data you posted is that glfs-node13.avp.ru and glfs-node25.avp.ru were doing a heal at some point (probably at the time of reboot, but I'm not completely sure yet). This is ok because your configuration allows 2 bad bricks, but we have another node with mismatching data: glfs-node19.avp.ru. This is what is causing the EIO error, since we have 3 failures but a maximum of 2 are allowed. We can try to determine if one of the mismatching versions is good enough to be considered as good and recover the file. I still need to check the logs to see if there's more information. Meantime, knowing which node was rebooted and at what time will be very useful to analyze the logs.
Created attachment 1420021 [details] glusterfs-all-nodes-logs Part-1
Created attachment 1420023 [details] glusterfs-all-nodes-logs Part-2
(In reply to Xavi Hernandez from comment #5) > Which node did you reboot before seeing those errors ? at what time was the > node rebooted ? > was rebooted only node glfs-node19.avp.ru, the rest worked normally, last time (UTC) when qcow disk image worked: -rwxrwx--- 1 107 107 47G apr 6 06:37 slake-test-bck-m1-d1.qcow2 and log from virtual machine that was disconnected disk at this time: [2018-04-06 06:39:04.177631] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-vol1-client-8: remote operation failed [Transport endpoint is not connected] [2018-04-06 06:39:04.189701] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fe0801186fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fe08b20b79e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fe08b20b8ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fe08b20d004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7fe08b20d8d0] ))))) 0-vol1-client-8: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-04-06 06:38:22.143778 (xid=0xc908c31a) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) [2018-04-06 06:39:04.334167] E [MSGID: 122034] [ec-common.c:461:ec_child_select] 0-vol1-disperse-0: Insufficient available children for this request (have 0, need 13) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) > What I see from the data you posted is that glfs-node13.avp.ru and > glfs-node25.avp.ru were doing a heal at some point (probably at the time of > reboot, but I'm not completely sure yet). This is ok because your > configuration allows 2 bad bricks, but we have another node with mismatching > data: glfs-node19.avp.ru. > > This is what is causing the EIO error, since we have 3 failures but a > maximum of 2 are allowed. > > We can try to determine if one of the mismatching versions is good enough to > be considered as good and recover the file. > > I still need to check the logs to see if there's more information. Meantime, > knowing which node was rebooted and at what time will be very useful to > analyze the logs. I collected and attached more logs from all nodes.
I'll analyze the logs. Once this is solved, I strongly recommend you to upgrade to 3.12 since 3.9 is not maintained anymore and 3.10 will be EOL soon.
All seems to indicate that a heal was happening on nodes glfs-node13.avp.ru and glfs-node25.avp.ru at the time of restarting node glfs-node19.avp.ru. Unfortunately this coincided with a modification that caused 3 simultaneous failures on the file. We need to manually repair the file or recover it from a backup. To recover the file manually we have two options: 1. Guess which of the 3 bad fragments is "less" bad. Probably the best candidate would be the fragment on node glfs-node19.avp.ru, but we need to check it. It would be interesting to see the modification times of all fragments on all bricks. To do so we can execute 'stat /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2'. This will help us to decide, but it's not a 100% secure way to determine the best option. 2. Try to check integrity of fragments. To do this we'll need to develop a small tool able to do the check. It will require some time but it will tell us if the file is good or, if there's something bad, we'll know where the problem is (in which block). The advantage of this method is that unless the 3 bad fragments are damaged on the same block, we may be able to recover the whole file.
(In reply to Xavi Hernandez from comment #10) > All seems to indicate that a heal was happening on nodes glfs-node13.avp.ru > and glfs-node25.avp.ru at the time of restarting node glfs-node19.avp.ru. > Unfortunately this coincided with a modification that caused 3 simultaneous > failures on the file. > > We need to manually repair the file or recover it from a backup. > > To recover the file manually we have two options: > > 1. Guess which of the 3 bad fragments is "less" bad. Probably the best > candidate would be the fragment on node glfs-node19.avp.ru, but we need to > check it. It would be interesting to see the modification times of all > fragments on all bricks. To do so we can execute 'stat > /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2'. This will help us to > decide, but it's not a 100% secure way to determine the best option. > > 2. Try to check integrity of fragments. To do this we'll need to develop a > small tool able to do the check. It will require some time but it will tell > us if the file is good or, if there's something bad, we'll know where the > problem is (in which block). The advantage of this method is that unless the > 3 bad fragments are damaged on the same block, we may be able to recover the > whole file. Stat command result on all nodes: glfs-node11.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 81618486372 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.086093439 +0300 Modify: 2018-04-06 09:37:40.056093815 +0300 Change: 2018-04-10 12:05:31.442268179 +0300 Birth: - glfs-node12.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 77312455458 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.086927017 +0300 Modify: 2018-04-06 09:37:40.056927261 +0300 Change: 2018-04-10 12:05:31.442493868 +0300 Birth: - glfs-node13.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 2351140352 Blocks: 2898080 IO Block: 4096 regular file Device: 815h/2069d Inode: 77309833852 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:19:41.347484000 +0300 Modify: 2018-04-06 09:37:43.501667158 +0300 Change: 2018-04-10 12:05:31.443246800 +0300 Birth: - glfs-node14.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 45097173507 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.085762909 +0300 Modify: 2018-04-06 09:37:40.055763093 +0300 Change: 2018-04-10 12:05:31.442878601 +0300 Birth: - glfs-node15.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 47297640679 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.084025418 +0300 Modify: 2018-04-06 09:37:40.054025522 +0300 Change: 2018-04-10 12:05:31.443444637 +0300 Birth: - glfs-node16.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 62287697396 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.084973842 +0300 Modify: 2018-04-06 09:37:40.054974225 +0300 Change: 2018-04-10 12:05:31.450825340 +0300 Birth: - glfs-node17.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 25769816893 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.087142929 +0300 Modify: 2018-04-06 09:37:40.057143241 +0300 Change: 2018-04-10 12:05:31.444507483 +0300 Birth: - glfs-node18.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 81645506748 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.085410682 +0300 Modify: 2018-04-06 09:37:40.055411031 +0300 Change: 2018-04-10 12:05:31.445081386 +0300 Birth: - glfs-node19.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430648 IO Block: 4096 regular file Device: 815h/2069d Inode: 55834592083 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:39.000000000 +0300 Modify: 2018-04-06 09:37:39.000000000 +0300 Change: 2018-04-10 13:43:32.589505057 +0300 Birth: - glfs-node20.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 64483022647 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.086797717 +0300 Modify: 2018-04-06 09:37:40.056797940 +0300 Change: 2018-04-10 12:05:31.447094554 +0300 Birth: - glfs-node21.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430648 IO Block: 4096 regular file Device: 815h/2069d Inode: 30399803420 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.084872465 +0300 Modify: 2018-04-06 09:37:40.054872712 +0300 Change: 2018-04-10 12:05:31.445816033 +0300 Birth: - glfs-node22.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 10740329468 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.087255449 +0300 Modify: 2018-04-06 09:37:40.057255732 +0300 Change: 2018-04-10 12:05:31.447168330 +0300 Birth: - glfs-node23.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 73019182467 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2018-04-06 09:37:40.088930999 +0300 Modify: 2018-04-06 09:37:40.058931201 +0300 Change: 2018-04-10 12:05:31.449991404 +0300 Birth: - glfs-node24.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 3804491264 Blocks: 7430656 IO Block: 4096 regular file Device: 815h/2069d Inode: 219231 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ qemu) Gid: ( 107/ qemu) Access: 2018-04-06 09:37:40.087524522 +0300 Modify: 2018-04-06 09:37:40.057524854 +0300 Change: 2018-04-10 12:05:31.448713916 +0300 Birth: - glfs-node25.avp.ru File: ‘/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2’ Size: 2351140352 Blocks: 2898080 IO Block: 4096 regular file Device: 815h/2069d Inode: 55834924933 Links: 2 Access: (0770/-rwxrwx---) Uid: ( 107/ qemu) Gid: ( 107/ qemu) Access: 2018-04-06 09:19:41.347484000 +0300 Modify: 2018-04-06 09:37:43.502748628 +0300 Change: 2018-04-10 12:05:31.449630601 +0300 Birth: -
From this data, we can clearly see that nodes glfs-node13.avp.ru and glfs-node25.avp.ru have incomplete fragments (its size is smaller than the others) because they were in the middle of a heal operation. So the best possibility is to consider the fragment on glfs-node19.avp.ru as good. It has the correct size, but its modification time differs in one second compared to the others. It's possible that any change made during this time will contain garbage data now. Does this qcow image correspond to a machine with heavy disk activity ? We can proceed with the recovery of the fragment on glfs-node19.avp.ru and see what happens, or wait to see if we can recover the file using a specific tool, though without guarantees (note that since we have two nodes with a fragment size of little more than 2 GB, we can only recover errors in glfs-node19.avp.ru below this size. Any errors above this size are unrecoverable). You can also make a manual copy of all fragments (directly from bricks to somewhere else) before attempting to recover the fragment on glfs-node19.avp.ru, just to be able to try other approaches if the first one doesn't work. Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a manual repair unless a manual copy of all fragments has been done previously. If you want to proceed with the recovery, you can do this on node glfs-node19.avp.ru: setfattr -n trusted.ec.size -v 0x0000000b83f30000 /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112 /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 This should fix the Input/Output error and a heal should be triggered shortly after to fix the other remaining fragments, but it's no guarantee that the virtual machine will work correctly. If it doesn't work, you can recover from backup or we can try to manually recover the file (the lower 2GB at most).
(In reply to Xavi Hernandez from comment #12) > From this data, we can clearly see that nodes glfs-node13.avp.ru and > glfs-node25.avp.ru have incomplete fragments (its size is smaller than the > others) because they were in the middle of a heal operation. > > So the best possibility is to consider the fragment on glfs-node19.avp.ru as > good. It has the correct size, but its modification time differs in one > second compared to the others. It's possible that any change made during > this time will contain garbage data now. > > Does this qcow image correspond to a machine with heavy disk activity ? > Yes, machine is heavy loaded. > We can proceed with the recovery of the fragment on glfs-node19.avp.ru and > see what happens, or wait to see if we can recover the file using a specific > tool, though without guarantees (note that since we have two nodes with a > fragment size of little more than 2 GB, we can only recover errors in > glfs-node19.avp.ru below this size. Any errors above this size are > unrecoverable). > > You can also make a manual copy of all fragments (directly from bricks to > somewhere else) before attempting to recover the fragment on > glfs-node19.avp.ru, just to be able to try other approaches if the first one > doesn't work. > > Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a > manual repair unless a manual copy of all fragments has been done previously. > > If you want to proceed with the recovery, you can do this on node > glfs-node19.avp.ru: > > setfattr -n trusted.ec.size -v 0x0000000b83f30000 > /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 > setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112 > /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2 > I'm ran these commands on glfs-node19.avp.ru and virtual machine started with qcow disk and working without errors - correctly. Thank you so much!!! > This should fix the Input/Output error and a heal should be triggered > shortly after to fix the other remaining fragments, but it's no guarantee > that the virtual machine will work correctly. If it doesn't work, you can > recover from backup or we can try to manually recover the file (the lower > 2GB at most). But, heal info in previuos state: # gluster volume heal vol1 info Brick glfs-node11.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node12.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node13.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node14.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node15.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node16.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node17.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node18.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node19.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node20.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node21.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node22.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node23.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node24.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 Brick glfs-node25.local:/data1/bricks/brick1 /vmfs/slake-test-bck-m1-d1.qcow2 Status: Connected Number of entries: 1 What next steps should be taken ?
Self-heal should already have been triggered to fix the remaining files. You can monitor its progress by looking at the size of the file on nodes 13 and 25. it should be growing. The output of heal info won't change until the heal is complete. However there was a bug (not sure at which version it was solved right now) that was preventing self-heal to finish on files that are constantly being modified. If that's your case, the only thing you can do is to completely stop the virtual machine and let self-heal finish before starting it again. I recommend you to upgrade gluster to a newer version where many self-heal related issues have been fixed.
(In reply to Xavi Hernandez from comment #14) > Self-heal should already have been triggered to fix the remaining files. You > can monitor its progress by looking at the size of the file on nodes 13 and > 25. it should be growing. The output of heal info won't change until the > heal is complete. > > However there was a bug (not sure at which version it was solved right now) > that was preventing self-heal to finish on files that are constantly being > modified. If that's your case, the only thing you can do is to completely > stop the virtual machine and let self-heal finish before starting it again. > > I recommend you to upgrade gluster to a newer version where many self-heal > related issues have been fixed. Stopped virtual machine and heal info state returned to normal: # gluster volume heal vol1 info Brick glfs-node11.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node12.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node13.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node14.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node15.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node16.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node17.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node18.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node19.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node20.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node21.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node22.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node23.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node24.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 Brick glfs-node25.local:/data1/bricks/brick1 Status: Connected Number of entries: 0 I will prepare for the upgrade.
If everything is working fine, are you ok if I close the bug ?
(In reply to Xavi Hernandez from comment #16) > If everything is working fine, are you ok if I close the bug ? Yes, ok.