Created attachment 1519880 [details] asb-etcd-1-smjcf.log Description of problem: Running Etcd, Cassandra and PostgreSQL show a stacktrace after starting with DB files on Gluster 5.2 volumes, if the volume has enabled the volume option performance.write-behind. Using the Gluster volumes to serve normal files does not enforce the issue. Version-Release number of selected component (if applicable): 5.2 How reproducible: Steps to Reproduce: 1. Start Etcd with DB files on a gluster volume option performance.write-behind is on 2. Etcd does start and crashes after listening to clients (unexpected fault address 0x7fca0c001040) 3. Disable performance.write-behind on the gluster volume 4. Restart Etcd 5. Etcd does start normally Actual results: Output of a Etcd crashing (asb-etcd-1-smjcf.log) Expected results: Output of a Etcd running with performance.write-behind off (asb-etcd-3-dsfxf.log) Additional info: The content or size of the Etcd DB doesn't matter. It is also reproducible if the DB is created from scratch.
Created attachment 1519881 [details] asb-etcd-3-dsfxf.log
Can you paste the backtrace here? If possible can you attach the core?
Sorry I interpreted the bug as glusterfs crashing. I see that etcd is having problems coming up. Can you get the following information (I don't need core of glusterfs, as there is none): * strace of etcd (strace -ff -v ...), to find out what syscalls it did. * dump of traffic between fuse kernel module and glusterfs (see --dump-fuse option of glusterfs)
Also detailed steps for reproducer (even better a script or capture of the cmds you executed) would greatly speed up the debugging.
Reproduction case: Exactly as described in the original Ticket. # Prepare gluster volume gluster volume set gluster-pv18 performance.write-behind off # mount the volume mount -t glusterfs <gluster-server>:/gluster-pv18 /mnt/gluster-pv18 # start Postgres docker run --name psql-test --rm -v /mnt/gluster-pv18:/var/lib/postgresql/data docker.io/postgres:9.5 # this should work as expected # clean up docker stop psql-test rm -rf /mnt/gluster-pv18/* umount /mnt/gluster-pv18 # enable write-behind gluster volume set gluster-pv18 performance.write-behind on # mount the volume mount -t glusterfs <gluster-server>:/gluster-pv18 /mnt/gluster-pv18 # start Postgres docker run --name psql-test --rm -v /mnt/gluster-pv18:/var/lib/postgresql/data docker.io/postgres:9.5 # !!! this will now fail: # creating template1 database in /var/lib/postgresql/data/base/1 ... ok # initializing pg_authid ... LOG: invalid primary checkpoint record # LOG: invalid secondary checkpoint record # PANIC: could not locate a valid checkpoint record # Aborted (core dumped) # child process exited with exit code 134 # initdb: removing contents of data directory "/var/lib/postgresql/data"
Created attachment 1525793 [details] dump-fuse, gzipped
Created attachment 1525794 [details] strace of initdb (which crashed) Also interesting: while creating the TGZ archive (not on the gluster volume) of all strace files (which were on the gluster volume), a lot of messages like this appeared: tar: strace/initdb.42: file changed as we read it
Hi, were you able to reproduce the issue?
Could not reproduce this issue anymore with Gluster 5.5 and Etcd, Cassandra and PostgreSQL.
(In reply to gabisoft from comment #9) > Could not reproduce this issue anymore with Gluster 5.5 and Etcd, Cassandra > and PostgreSQL. Its likely that fixes to bz 1512691 have helped. Can you please close the bug?