Bug 1744883 - GlusterFS problem dataloss
Summary: GlusterFS problem dataloss
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 6
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-23 06:12 UTC by Nicola battista
Modified: 2020-03-12 12:14 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:14:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
strace output (7.35 KB, application/zip)
2019-08-23 06:12 UTC, Nicola battista
no flags Details
GlusterFS log dbroot1 debug mode. (958.19 KB, text/plain)
2019-08-29 07:31 UTC, Nicola battista
no flags Details
GlusterFS log dbroot2 debug mode. (999.28 KB, text/plain)
2019-08-29 07:31 UTC, Nicola battista
no flags Details
GlusterFS log dbroot3 debug mode. (976.74 KB, text/plain)
2019-08-29 07:31 UTC, Nicola battista
no flags Details
Tcpdump (9.51 MB, application/zip)
2019-10-14 09:03 UTC, Nicola battista
no flags Details

Description Nicola battista 2019-08-23 06:12:45 UTC
Created attachment 1607200 [details]
strace output

Description of problem:
Greetings,
MariaDB Columnstore uses GFS bricks as an persistant storage for data. We store tables data in so-called segment files. Here[1] is the overview how we use GFS for redundancy.
In your case there are some errors with those segment files that live on GFS. Here are examples of the errors I've seen on strace of the reading process.

86814 stat("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", 0x7ff488ffdc80) = -1 ENOENT (No such file or directory)
86814 open("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", O_RDONLY|O_NOATIME) = -1 ENOENT (No such file or directory)

I've attached the strace output itself and you can filter on the segment file name /000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf to find the relevant errors.

1. https://mariadb.com/resources/blog/mariadb-columnstore-data-redundancy-a-look-under-the-hood/

Version-Release number of selected component (if applicable):
GlusterFS version 6.X

Link ticket MariaBD : https://jira.mariadb.org/browse/MCOL-3392

Comment 1 Atin Mukherjee 2019-08-27 06:22:06 UTC
Could you explain the problem a bit more in details along with providing the volume configuration (gluster v info output).

I'm moving this bug to core component.

Comment 2 Nicola battista 2019-08-27 07:27:26 UTC
Hi,
Sure this is the output : 

[root@cstore-pm01 ~]# glusterfs --version
glusterfs 6.5

gluster> volume status
Status of volume: dbroot1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.31.5:/usr/local/mariadb/column
store/gluster/brick1                        49152     0          Y       12001
Brick 172.16.31.6:/usr/local/mariadb/column
store/gluster/brick1                        49152     0          Y       11632
Brick 172.16.31.7:/usr/local/mariadb/column
store/gluster/brick1                        49152     0          Y       11640
Self-heal Daemon on localhost               N/A       N/A        Y       12021
Self-heal Daemon on 172.16.31.6             N/A       N/A        Y       11663
Self-heal Daemon on 172.16.31.7             N/A       N/A        Y       11673
 
Task Status of Volume dbroot1
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: dbroot2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.31.5:/usr/local/mariadb/column
store/gluster/brick2                        49153     0          Y       12000
Brick 172.16.31.6:/usr/local/mariadb/column
store/gluster/brick2                        49153     0          Y       11633
Brick 172.16.31.7:/usr/local/mariadb/column
store/gluster/brick2                        49153     0          Y       11651
Self-heal Daemon on localhost               N/A       N/A        Y       12021
Self-heal Daemon on 172.16.31.6             N/A       N/A        Y       11663
Self-heal Daemon on 172.16.31.7             N/A       N/A        Y       11673
 
Task Status of Volume dbroot2
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: dbroot3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.31.5:/usr/local/mariadb/column
store/gluster/brick3                        49154     0          Y       12002
Brick 172.16.31.6:/usr/local/mariadb/column
store/gluster/brick3                        49154     0          Y       11648
Brick 172.16.31.7:/usr/local/mariadb/column
store/gluster/brick3                        49154     0          Y       11662
Self-heal Daemon on localhost               N/A       N/A        Y       12021
Self-heal Daemon on 172.16.31.6             N/A       N/A        Y       11663
Self-heal Daemon on 172.16.31.7             N/A       N/A        Y       11673
 
Task Status of Volume dbroot3
------------------------------------------------------------------------------
There are no active volume tasks


gluster> volume info all
 
Volume Name: dbroot1
Type: Replicate
Volume ID: ecf4fd04-2e96-47d9-8a40-4f84a48657fb
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick1
Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick1
Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
 
Volume Name: dbroot2
Type: Replicate
Volume ID: f2b49f9f-3a91-4ac4-8eb3-4a327d0dbc61
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick2
Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick2
Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
 
Volume Name: dbroot3
Type: Replicate
Volume ID: 73b96917-c842-4fc2-8bca-099735c4aa6a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick3
Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick3
Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick3
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


Thanks,
Regards
Nicola Battista

Comment 3 Roman 2019-08-27 15:56:12 UTC
Greetings,

I'm from Mariadb ColumnStore development team. 
CS is a database engine that extensively works with data stored on GFS bricks. In the case described by Nicolla we are facing errors when trying to access files that exists according with ls output. Nicolla will add ls output to prove the files are visible from OS perspective.
However when we try to open files programmaticaly using VFS these function call fails with ENOENT.

86814 stat("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", 0x7ff488ffdc80) = -1 ENOENT (No such file or directory)
86814 open("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", O_RDONLY|O_NOATIME) = -1 ENOENT (No such file or directory)

Hope this makes the issue clear.

Comment 4 Nicola battista 2019-08-28 06:57:03 UTC
Hi,
The file exist : 
[root@cstore-pm01 ~]# ls -lthr /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
-rw-r--r-- 2 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf

Thanks,
best regards
Nicola Battista

Comment 5 Nithya Balachandran 2019-08-28 07:00:37 UTC
Please provide the following:

On the volume on which you see this issue (dbroot2 based on the above comment and the volume info provided earlier):

1. The ls -l /000.dir/000.dir/015.dir/064.dir/008.dir/ from the the gluster mount point
2. the ls -l output for the same directory on each brick of the volume


Do you see any error messages in the gluster client mount log when you perform the stat?

Comment 6 Nicola battista 2019-08-28 07:34:58 UTC
Hi,

[root@cstore-pm03 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir
total 2097420
-rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf

[root@cstore-pm01 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/
total 2097420
-rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf

[root@cstore-pm02 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/
total 2097420
-rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf

Comment 7 Nithya Balachandran 2019-08-28 08:05:52 UTC
(In reply to Nicola battista from comment #6)
> Hi,
> 
> [root@cstore-pm03 ~]# ls -l
> /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.
> dir/008.dir
> total 2097420
> -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf
> 
> [root@cstore-pm01 ~]# ls -l
> /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.
> dir/008.dir/
> total 2097420
> -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf
> 
> [root@cstore-pm02 ~]# ls -l
> /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.
> dir/008.dir/
> total 2097420
> -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf

Hi,

What does ls -l return from the client mount point? 
Please also provide the xattrs set on this file on each brick.

Comment 8 Nicola battista 2019-08-28 09:05:45 UTC
Hi,

[root@cstore-pm03 ~]# df -h
Filesystem                            Size  Used Avail Use% Mounted on
devtmpfs                               16G     0   16G   0% /dev
tmpfs                                  16G   11M   16G   1% /dev/shm
tmpfs                                  16G  9.0M   16G   1% /run
tmpfs                                  16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/ol-root                    46G  6.1G   40G  14% /
/dev/mapper/glusterfs_dbroot1-brick1  400G  224G  177G  56% /usr/local/mariadb/columnstore/gluster/brick1
/dev/mapper/glusterfs_dbroot2-brick2  400G   96G  304G  24% /usr/local/mariadb/columnstore/gluster/brick2
/dev/mapper/glusterfs_dbroot3-brick3  400G   92G  309G  23% /usr/local/mariadb/columnstore/gluster/brick3
/dev/sda1                             497M  232M  266M  47% /boot
tmpfs                                 3.2G     0  3.2G   0% /run/user/0
172.16.31.7:/dbroot3                 400G   96G  305G  24% /usr/local/mariadb/columnstore/data3
172.16.31.7:/dbroot2                  400G  100G  300G  25% /usr/local/mariadb/columnstore/data2
172.16.31.7:/dbroot1                  400G  228G  173G  57% /usr/local/mariadb/columnstore/data1


[root@cstore-pm03 ~]# ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf 
-rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf

[root@cstore-pm02 ~]#  ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
-rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf

[root@cstore-pm02 ~]#  ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
-rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf


Could you explain this step : 
Please also provide the xattrs set on this file on each brick.
I've execute xattr -l /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf but not have output.

Regards.

Comment 9 Nithya Balachandran 2019-08-28 09:44:48 UTC
Hi,

How are you accessing the volume? I'm assuming you are using a fuse mount. I would need to see the ls -l output for the same directory from that client fuse mount. The ls -l you have provided is from directly on the bricks. We now need to compare that information with the view the client sees.

As for the xattrs, please use the command 

getfattr -e hex -m . -d <path to dir on brick>

for the dir on each brick.

If you are on #gluster on IRC, it might be easier to sync up there.

Comment 10 Nicola battista 2019-08-28 11:44:58 UTC
Hi,
I'm using the fuse mount.
fstab mount : 

172.16.31.5:/dbroot3 /usr/local/mariadb/columnstore/data3 glusterfs defaults,direct-io-mode=enable 00
172.16.31.5:/dbroot2 /usr/local/mariadb/columnstore/data2 glusterfs defaults,direct-io-mode=enable 00
172.16.31.5:/dbroot1 /usr/local/mariadb/columnstore/data1 glusterfs defaults,direct-io-mode=enable 00

The path /usr/local/mariadb/columnstore/data2/ is mount client of the brick.


[root@cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick2
trusted.afr.dbroot2-client-1=0x000000000000000000000000
trusted.afr.dbroot2-client-2=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.mdata=0x010000000000000000000000005c94a223000000002de177a2000000005c94a223000000002de177a2000000005c2e0f3f0000000007d401b3
trusted.glusterfs.volume-id=0xf2b49f9f3a914ac48eb34a327d0dbc61

[root@cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick1
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick1
trusted.afr.dbroot1-client-1=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.mdata=0x010000000000000000000000005c94a1f600000000200d67fc000000005c94a1f600000000200d67fc000000005c2e049d00000000101aff76
trusted.glusterfs.volume-id=0xecf4fd042e9647d98a404f84a48657fb

[root@cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick3
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick3
trusted.afr.dbroot3-client-1=0x000000000000000000000000
trusted.afr.dbroot3-client-2=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x73b96917c8424fc28bca099735c4aa6a

Comment 11 Ravishankar N 2019-08-29 05:54:06 UTC
Hi Nicola,

a) Can you provide the gluster fuse mount log of the node which is used by the application in comment #3? 
If this is something you can reproduce at will with your application, please provide debug-level log.
Steps:
1 `gluster volume set dbroot2 client-log-level DEBUG`
2 Run the application and note the timestamp (UTC) at which it gets ENOENT
3 Provide the fuse mount log (something like /var/log/glusterfs/usr-local-mariadb-columnstore-data2.log)
4 Also tell us the time noted in step-2 to make it easier to look for issues in the log of step-3.
5.`gluster volume set dbroot2 client-log-level INFO` <===== Restores it back to the default log level.

b) The getfattr output we need is that of the file in question from all bricks of the volume.  comment#10 seems to give the output on the brick root of different volumes.
What we need is:
`getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf` from 3 bricks of dbroot2.

Comment 12 Nicola battista 2019-08-29 07:31:14 UTC
Created attachment 1609285 [details]
GlusterFS log dbroot1 debug mode.

Comment 13 Nicola battista 2019-08-29 07:31:33 UTC
Created attachment 1609286 [details]
GlusterFS log dbroot2 debug mode.

Comment 14 Nicola battista 2019-08-29 07:31:51 UTC
Created attachment 1609287 [details]
GlusterFS log dbroot3 debug mode.

Comment 15 Nicola battista 2019-08-29 07:37:33 UTC
Hi,
I've attached the log for each dbroot1.

timestamp of the logs :
cstore-pm01 File :  GlusterFS log dbroot1 debug mode.
start [2019-08-29 07:22:02.082937]
finish [[2019-08-29 07:22:24.471092]] 

cstore-pm02 File : GlusterFS log dbroot2 debug mode.
start [2019-08-29 07:17:20.287038]
finish [2019-08-29 07:18:03.043808] 


[root@cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7
trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466

[root@cstore-pm02 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7
trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466

[root@cstore-pm03 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7
trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466


Do you need another information?
Thanks
Regards
Nicola Battista

Comment 16 Nicola battista 2019-08-29 07:41:44 UTC
[root@cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7
trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466

[root@cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf 
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x6d6349a71d6e43a2a603e65c768d99ab
trusted.gfid2path.1b3027e4fc5fdfd2=0x66373865643264662d633632652d343761652d383634342d3764653933336131613430392f46494c453030302e636466

[root@cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf 
getfattr: Removing leading '/' from absolute path names
# file: usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xfb097efd00044c0caf6eea79a40aab6a
trusted.gfid2path.9af95bfcaac3386a=0x33373264663634382d393137642d343738332d393736652d3463336334666430366563352f46494c453030312e636466

Comment 17 Ravishankar N 2019-08-29 10:55:35 UTC
The xattrs don't seem to indicate any pending heals from AFR point of view. The mount logs also do not contain any information about lookup/stat/open failing (either for the file name FILE000.cdf/its gfid or even in general). Given that you are able to access the file using the fuse mount as per comment#8, I'm not sure this is a bug in gluster. Is there a chance of races in the application where a thread tries to access the file before a creat() from another thread?

Comment 18 Roman 2019-08-29 15:37:40 UTC
That's uncomforting news. No, CS doesn't even use this call and at this point of data files existance they are read only.

Comment 19 Nicola battista 2019-08-29 15:45:13 UTC
Hi all,
Maybe this can you help : 
############ CSTORE PM01 ########
[root@cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head 
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 40 35 00 00 00 00 00  |. .......@5.....|
00001010  00 20 66 00 00 00 00 00  00 60 97 00 00 00 00 00  |. f......`......|
00001020  00 a0 c8 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 20 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |. +......`\.....|
00001040  00 80 8d 01 00 00 00 00  00 80 bf 01 00 00 00 00  |................|
[root@cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head 
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 20 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |. ..............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 e0 66 00 00 00 00 00  00 40 98 00 00 00 00 00  |..f......@......|
00001020  00 60 c9 00 00 00 00 00  00 a0 fa 00 00 00 00 00  |.`..............|
00001030  00 c0 2b 01 00 00 00 00  00 40 5d 01 00 00 00 00  |..+......@].....|
00001040  00 60 8e 01 00 00 00 00  00 20 be 01 00 00 00 00  |.`....... ......|
[root@cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head 
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 a0 66 00 00 00 00 00  00 00 98 00 00 00 00 00  |..f.............|
00001020  00 00 c9 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 40 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |.@+......`\.....|
00001040  00 c0 8d 01 00 00 00 00  00 00 bf 01 00 00 00 00  |................|
############ CSTORE PM02 ########
[root@cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 20 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |. ..............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 e0 66 00 00 00 00 00  00 40 98 00 00 00 00 00  |..f......@......|
00001020  00 60 c9 00 00 00 00 00  00 a0 fa 00 00 00 00 00  |.`..............|
00001030  00 c0 2b 01 00 00 00 00  00 40 5d 01 00 00 00 00  |..+......@].....|
00001040  00 60 8e 01 00 00 00 00  00 20 be 01 00 00 00 00  |.`....... ......|
[root@cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 40 35 00 00 00 00 00  |. .......@5.....|
00001010  00 20 66 00 00 00 00 00  00 60 97 00 00 00 00 00  |. f......`......|
00001020  00 a0 c8 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 20 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |. +......`\.....|
00001040  00 80 8d 01 00 00 00 00  00 80 bf 01 00 00 00 00  |................|
[root@cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 a0 66 00 00 00 00 00  00 00 98 00 00 00 00 00  |..f.............|
00001020  00 00 c9 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 40 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |.@+......`\.....|
00001040  00 c0 8d 01 00 00 00 00  00 00 bf 01 00 00 00 00  |................|
############ CSTORE PM03 ########
[root@cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 20 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |. ..............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 e0 66 00 00 00 00 00  00 40 98 00 00 00 00 00  |..f......@......|
00001020  00 60 c9 00 00 00 00 00  00 a0 fa 00 00 00 00 00  |.`..............|
00001030  00 c0 2b 01 00 00 00 00  00 40 5d 01 00 00 00 00  |..+......@].....|
00001040  00 60 8e 01 00 00 00 00  00 20 be 01 00 00 00 00  |.`....... ......|
[root@cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 40 35 00 00 00 00 00  |. .......@5.....|
00001010  00 20 66 00 00 00 00 00  00 60 97 00 00 00 00 00  |. f......`......|
00001020  00 a0 c8 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 20 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |. +......`\.....|
00001040  00 80 8d 01 00 00 00 00  00 80 bf 01 00 00 00 00  |................|
[root@cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head
00000000  8e 77 d0 84 a3 19 c1 fd  02 00 00 00 00 00 00 00  |.w..............|
00000010  01 00 00 00 00 00 00 00  00 20 04 00 00 00 00 00  |......... ......|
00000020  00 00 04 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 20 04 00 00 00 00 00  00 60 35 00 00 00 00 00  |. .......`5.....|
00001010  00 a0 66 00 00 00 00 00  00 00 98 00 00 00 00 00  |..f.............|
00001020  00 00 c9 00 00 00 00 00  00 e0 f9 00 00 00 00 00  |................|
00001030  00 40 2b 01 00 00 00 00  00 60 5c 01 00 00 00 00  |.@+......`\.....|
00001040  00 c0 8d 01 00 00 00 00  00 00 bf 01 00 00 00 00  |................|


Thanks
Regards
Nicola Battista

Comment 20 Nicola battista 2019-09-03 13:25:28 UTC
Hi all,
Any news?

Thanks
Regards
Nicola Battista

Comment 21 Raghavendra G 2019-09-04 04:35:46 UTC
Does turning off performance.readdir-ahead help?

Comment 22 Nicola battista 2019-09-04 06:43:43 UTC
Hi,
i've turning off performance.readdir-ahead but the problem persist.

Thanks,
regards
Nicola Battista

Comment 23 Amar Tumballi 2019-09-04 08:35:27 UTC
Can we rule out the possibility of Gluster's perf xlators causing any of this?

Hi Nicola,

Can you try below ?

  sh# for xl in read-ahead open-behind md-cache quick-read io-cache ; do gluster volume set VOLNAME $xl off; done


and see if this fixes the problem? If yes, then we can corner it to one (or few) of the above option. If not, then the problem is in something core, IMO.

Comment 24 Ravishankar N 2019-09-05 05:47:51 UTC
Also, if disabling the above mentioned translators doesn't help, can you provide the following when you hit the issue:
1. fuse dump of the gluster fuse mount. You will need to mount it again with -o dump-path=$PATH option. (see `man mount.glusterfs`)
2. tcp dump of the fuse client node to capture traffic between the client and the three bricks.
3. If possible, both the client and three brick logs in TRACE log level.
4. strace of the three brick processes.

Another question- are you getting the ENOENT for the same file FILE002.cdf (with the same trusted.gfid) on dbroot2? Or is a different file,  or the same file name deleted and created again as a part of the application I/O?

Comment 25 Nicola battista 2019-09-05 12:19:20 UTC
(In reply to Amar Tumballi from comment #23)
> Can we rule out the possibility of Gluster's perf xlators causing any of
> this?
> 
> Hi Nicola,
> 
> Can you try below ?
> 
>   sh# for xl in read-ahead open-behind md-cache quick-read io-cache ; do
> gluster volume set VOLNAME $xl off; done
> 
> 
> and see if this fixes the problem? If yes, then we can corner it to one (or
> few) of the above option. If not, then the problem is in something core, IMO.

Hi all,
also after this script the problem persist.

Thanks
Regards
Nicola Battista

Comment 26 Nicola battista 2019-09-16 07:08:58 UTC
Hi, for this steps : 
1. fuse dump of the gluster fuse mount. You will need to mount it again with -o dump-path=$PATH option. (see `man mount.glusterfs`)
2. tcp dump of the fuse client node to capture traffic between the client and the three bricks.
3. If possible, both the client and three brick logs in TRACE log level.
4. strace of the three brick processes.

Could you send me the command, for each step, to execute?

Thanks
Regards.
Nicola Battista

Comment 27 Ravishankar N 2019-09-16 08:50:09 UTC
1.`mount -t glusterfs -o dump-fuse=/tmp/fuse_dump.dat ip-adress:volume_name /path/to/fuse/mount`
2.`tcpdump -i <dev> -s 256 -B 32768 -w <pcap file> <filter> `
3.`gluster volume set $volname client-log-level TRACE`
  `gluster volume set $volname brick-log-level TRACE`
4. For all three bricks of the volume: `strace -ff -T -p <process-id-of-brick process> -o <path-where-you-want-strace-output-saved>`


> 
> Another question- are you getting the ENOENT for the same file FILE002.cdf
> (with the same trusted.gfid) on dbroot2? Or is a different file,  or the
> same file name deleted and created again as a part of the application I/O?

What is the answer for this?

Comment 28 Nicola battista 2019-09-16 13:53:32 UTC
Hi,
The upload files is too large.
I've upload the strace files on wetransfer.

This is the link : https://we.tl/t-J5VmRBxkxc

Thanks,
Regards
Nicola Battista

Comment 29 Nicola battista 2019-09-17 15:08:20 UTC
Hi all,
Any news?

Thanks,
Regards
Nicola Battista

Comment 30 Ravishankar N 2019-09-22 04:50:29 UTC
(In reply to Nicola battista from comment #29)

> 
> Another question- are you getting the ENOENT for the same file FILE002.cdf
> (with the same trusted.gfid) on dbroot2? Or is a different file,  or the
> same file name deleted and created again as a part of the application I/O?

What is the answer for this? Did you get ENOENT for the file "/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf" having 'trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7' when you captured all the information in comment#28?

Comment 31 Nicola battista 2019-09-23 08:24:58 UTC
Hi,
In which trace should I find this information?
Because using the grep command I didn't find anything.

Thanks,
Regards
Nicola

Comment 32 Ravishankar N 2019-09-23 08:51:29 UTC
(In reply to Nicola battista from comment #31)
> Hi,
> In which trace should I find this information?
> Because using the grep command I didn't find anything.
> 
> Thanks,
> Regards
> Nicola

Umm, when you opened the bug, you said you were getting "No such file or directory" for the 'segment' files, so would you not know what are the files in question?

What I understood from the bug description and comment#3 was you got ENOENT for a file on the gluster client when accessed programatically despite it being present in all 3 bricks of the gluster volume. So assuming gluster is the suspect, I wanted to know which layer in gluster was giving that error. And for that, I needed to know which file (and its corresponding gfid) is the problematic one to look further in the logs you provided. 

I did not find the FILE002.cdf (with '008.dir' as parent) or its gfid in all 3 tcp dumps and hence wanted to know which file to look for.

Comment 33 Nicola battista 2019-09-26 06:58:11 UTC
in which dump file should you find the word FILE002.cdf?
If I want I can reproduce the dump with tcpdump if it is needed.

Thanks
Regards
Nicola

Comment 34 Nicola battista 2019-10-08 05:52:30 UTC
Hi,
Any news?
Thanks
Regards

Comment 35 Nicola battista 2019-10-14 09:03:01 UTC
Created attachment 1625514 [details]
Tcpdump

Tcpdump of the columnstore process.

Comment 36 Worker Ant 2020-03-12 12:14:27 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/846, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.