Bug 1240569

Summary:	Data Tiering: Database locks observed on tiered volumes on continous writes to a file
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	tier	Assignee:	Joseph Elwin Fernandes <josferna>
Status:	CLOSED NOTABUG	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.1	CC:	dlambrig, ndevos, rcyriac, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	tier-migration
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1256591 1267242 (view as bug list)		Environment:
Last Closed:	2016-05-25 13:21:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1256591, 1265399, 1267242, 1270123
Bug Blocks:

Description Nag Pavan Chilakam 2015-07-07 09:08:02 UTC

Description of problem:
=======================
When  a file is being continuously modified or written to, data base locks are being observed as below
[2015-07-07 12:46:00.025876] E [MSGID: 101106] [gfdb_sqlite3.c:694:gf_sqlite3_find_recently_chan
ged_files] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( G
F_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_
FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID)  from GF_FILE_TB where ((GF_FILE_TB.W_SEC *
 1000000 + GF_FILE_TB.W_MSEC) >= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MS
EC) >= ?) : database is locked



Due to this a file getting continuosly written is being demoted unncessarily


Version-Release number of selected component (if applicable):
==========================================================
[root@nchilaka-tier01 ~]# gluster --version
glusterfs 3.7.1 built on Jul  2 2015 21:01:51
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@nchilaka-tier01 ~]# rpm -qa|grep gluster
gluster-nagios-common-0.2.0-1.el6rhs.noarch
vdsm-gluster-4.16.20-1.2.el6rhs.noarch
glusterfs-client-xlators-3.7.1-7.el6rhs.x86_64
glusterfs-server-3.7.1-7.el6rhs.x86_64
nfs-ganesha-gluster-2.2.0-3.el6rhs.x86_64
python-gluster-3.7.1-6.el6rhs.x86_64
glusterfs-3.7.1-7.el6rhs.x86_64
glusterfs-api-3.7.1-7.el6rhs.x86_64
glusterfs-cli-3.7.1-7.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-7.el6rhs.x86_64
glusterfs-rdma-3.7.1-7.el6rhs.x86_64
gluster-nagios-addons-0.2.4-2.el6rhs.x86_64
glusterfs-libs-3.7.1-7.el6rhs.x86_64
glusterfs-fuse-3.7.1-7.el6rhs.x86_64
glusterfs-ganesha-3.7.1-7.el6rhs.x86_64






Steps to Reproduce:
==================
1.create a tiered volume
2.set the tier vol options 
3.now create a file and keep appending lines in a loop as below 
  `for in in {0..1000000};do echo "hello world" >>file1 ;done`
4. Check the tier.log and it can be seen that database lock messages would be thrown

Comment 2 Nag Pavan Chilakam 2015-07-07 09:11:26 UTC

sos reports at /home/repo/sosreports/bug.1240569

Comment 3 Nag Pavan Chilakam 2015-07-07 09:12:47 UTC

sos report server rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1240569

Comment 4 Joseph Elwin Fernandes 2015-07-07 09:20:41 UTC

The reason this happens is in WAL mode opening a new db connection is an expensive operation as it will try to a lock on the WAL file(even though for a short time). The migration process opens a new connection per brick per promotion/demotion cycle. This is bad scheme.

Solution:
1) Create a new connection only in tier_init per brick and have the connection alive and use it for every promotion/demotion 
2) Enable pooling (Pooling=True in the connection string) when the connection is established, there isn't any locking of WAL file, because existing connections are reused internal to sqlite.

Comment 5 Joseph Elwin Fernandes 2015-07-07 09:24:11 UTC

http://dev.yorhel.nl/doc/sqlaccess

Comment 6 Nag Pavan Chilakam 2015-07-07 09:31:07 UTC

raised an upstream bug  1240577

Comment 7 Joseph Elwin Fernandes 2015-07-10 09:09:14 UTC

On further analysis found that "Database locked" error was appearing not at init of db. It was happen in the query path of tier migrator.

he message "E [MSGID: 101106] [gfdb_sqlite3.c:779:gf_sqlite3_find_unchanged_for_time] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( GF_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID)  from GF_FILE_TB where ((GF_FILE_TB.W_SEC * 1000000 + GF_FILE_TB.W_MSEC) <= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MSEC) <= ?) : database is locked" repeated 2 times between [2015-07-10 08:55:40.438914] and [2015-07-10 08:57:30.559193]

Also another observation was that the database was not operating in the WAL journal mode. It was running in the regular journal mode! In the regular journal mode has issues with concurrent access.

[root@rhgs1 .glusterfs]# ls
00  6e  b7  changelogs  d1.db  d1.db-journal  d5  health_check  indices  landfill
[root@rhgs1 .glusterfs]# pwd
/home/disk/d1/.glusterfs
[root@rhgs1 .glusterfs]#

Looks like sqlite 3.6.20 (which comes with RHEL 6.7) doesnot have WAL journal mode 
[root@rhgs1 ~]# sqlite3 joseph.db;
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> PRAGMA journal_mode=WAL;
delete
sqlite> 

This problem doesnt happen on latest sqlite 3.8, tested on fedora21[root@fedora1 ~]# sqlite3 joseph.db
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> PRAGMA journal_mode=WAL;
wal
sqlite> 

We would require a higher version of sqlite3 on RHGS 3.x, which supports WAL journal mode.