Description of problem: ======================= When a file is being continuously modified or written to, data base locks are being observed as below [2015-07-07 12:46:00.025876] E [MSGID: 101106] [gfdb_sqlite3.c:694:gf_sqlite3_find_recently_chan ged_files] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( G F_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_ FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID) from GF_FILE_TB where ((GF_FILE_TB.W_SEC * 1000000 + GF_FILE_TB.W_MSEC) >= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MS EC) >= ?) : database is locked Due to this a file getting continuosly written is being demoted unncessarily Version-Release number of selected component (if applicable): ========================================================== [root@nchilaka-tier01 ~]# gluster --version glusterfs 3.7.1 built on Jul 2 2015 21:01:51 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@nchilaka-tier01 ~]# rpm -qa|grep gluster gluster-nagios-common-0.2.0-1.el6rhs.noarch vdsm-gluster-4.16.20-1.2.el6rhs.noarch glusterfs-client-xlators-3.7.1-7.el6rhs.x86_64 glusterfs-server-3.7.1-7.el6rhs.x86_64 nfs-ganesha-gluster-2.2.0-3.el6rhs.x86_64 python-gluster-3.7.1-6.el6rhs.x86_64 glusterfs-3.7.1-7.el6rhs.x86_64 glusterfs-api-3.7.1-7.el6rhs.x86_64 glusterfs-cli-3.7.1-7.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-7.el6rhs.x86_64 glusterfs-rdma-3.7.1-7.el6rhs.x86_64 gluster-nagios-addons-0.2.4-2.el6rhs.x86_64 glusterfs-libs-3.7.1-7.el6rhs.x86_64 glusterfs-fuse-3.7.1-7.el6rhs.x86_64 glusterfs-ganesha-3.7.1-7.el6rhs.x86_64 Steps to Reproduce: ================== 1.create a tiered volume 2.set the tier vol options 3.now create a file and keep appending lines in a loop as below `for in in {0..1000000};do echo "hello world" >>file1 ;done` 4. Check the tier.log and it can be seen that database lock messages would be thrown
sos reports at /home/repo/sosreports/bug.1240569
sos report server rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1240569
The reason this happens is in WAL mode opening a new db connection is an expensive operation as it will try to a lock on the WAL file(even though for a short time). The migration process opens a new connection per brick per promotion/demotion cycle. This is bad scheme. Solution: 1) Create a new connection only in tier_init per brick and have the connection alive and use it for every promotion/demotion 2) Enable pooling (Pooling=True in the connection string) when the connection is established, there isn't any locking of WAL file, because existing connections are reused internal to sqlite.
http://dev.yorhel.nl/doc/sqlaccess
raised an upstream bug 1240577
On further analysis found that "Database locked" error was appearing not at init of db. It was happen in the query path of tier migrator. he message "E [MSGID: 101106] [gfdb_sqlite3.c:779:gf_sqlite3_find_unchanged_for_time] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( GF_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID) from GF_FILE_TB where ((GF_FILE_TB.W_SEC * 1000000 + GF_FILE_TB.W_MSEC) <= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MSEC) <= ?) : database is locked" repeated 2 times between [2015-07-10 08:55:40.438914] and [2015-07-10 08:57:30.559193] Also another observation was that the database was not operating in the WAL journal mode. It was running in the regular journal mode! In the regular journal mode has issues with concurrent access. [root@rhgs1 .glusterfs]# ls 00 6e b7 changelogs d1.db d1.db-journal d5 health_check indices landfill [root@rhgs1 .glusterfs]# pwd /home/disk/d1/.glusterfs [root@rhgs1 .glusterfs]# Looks like sqlite 3.6.20 (which comes with RHEL 6.7) doesnot have WAL journal mode [root@rhgs1 ~]# sqlite3 joseph.db; SQLite version 3.6.20 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> PRAGMA journal_mode=WAL; delete sqlite> This problem doesnt happen on latest sqlite 3.8, tested on fedora21[root@fedora1 ~]# sqlite3 joseph.db SQLite version 3.8.10.2 2015-05-20 18:17:19 Enter ".help" for usage hints. sqlite> PRAGMA journal_mode=WAL; wal sqlite> We would require a higher version of sqlite3 on RHGS 3.x, which supports WAL journal mode.