Bug 1273295 - [Tier]: glusterfs crashed --volfile-id rebalance/tiervolume
Summary: [Tier]: glusterfs crashed --volfile-id rebalance/tiervolume
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: 3.7.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Dan Lambright
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1273347
TreeView+ depends on / blocked
 
Reported: 2015-10-20 07:26 UTC by Rahul Hinduja
Modified: 2016-08-10 05:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1273347 (view as bug list)
Environment:
Last Closed: 2016-08-10 05:26:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2015-10-20 07:26:48 UTC
Description of problem:
=======================

Few cores are reported on longevity setup with the same bt as:

(gdb) bt
#0  0x00007f0b53de58b1 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1  0x00007f0b46dc2481 in gf_sql_query_function (prep_stmt=0x7f0b203c9348, query_callback=query_callback@entry=0x7f0b473ee150 <tier_gf_query_callback>, _query_cbk_args=_query_cbk_args@entry=0x7f0b28ff8e90)
    at gfdb_sqlite3_helper.c:1157
#2  0x00007f0b46dc3c00 in gf_sqlite3_find_recently_changed_files (db_conn=0x7f0b203d21e0, query_callback=0x7f0b473ee150 <tier_gf_query_callback>, query_cbk_args=0x7f0b28ff8e90, from_time=0x7f0b28ff8e00)
    at gfdb_sqlite3.c:728
#3  0x00007f0b46dbdef1 in find_recently_changed_files (_conn_node=<optimized out>, query_callback=0x7f0b473ee150 <tier_gf_query_callback>, _query_cbk_args=0x7f0b28ff8e90, from_time=0x7f0b28ff8e00)
    at gfdb_data_store.c:551
#4  0x00007f0b473ed179 in tier_process_self_query (local_brick=local_brick@entry=0x7f0b140033f0, args=args@entry=0x7f0b28ff8e10) at tier.c:682
#5  0x00007f0b473edd23 in tier_process_brick (args=0x7f0b28ff8e10, local_brick=0x7f0b140033f0) at tier.c:953
#6  tier_build_migration_qfile (args=args@entry=0x7f0b3cfdec60, query_cbk_args=query_cbk_args@entry=0x7f0b28ff8e90, is_promotion=is_promotion@entry=_gf_true) at tier.c:1028
#7  0x00007f0b473f0822 in tier_promote (args=0x7f0b3cfdec60) at tier.c:1128
#8  0x00007f0b54432dc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f0b53d791cd in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x00007f0b46dc3c00 in gf_sqlite3_find_recently_changed_files (db_conn=0x7f0b203d21e0, query_callback=0x7f0b473ee150 <tier_gf_query_callback>, query_cbk_args=0x7f0b28ff8e90, from_time=0x7f0b28ff8e00)
    at gfdb_sqlite3.c:728
728	        ret = gf_sql_query_function (prep_stmt, query_callback, query_cbk_args);
(gdb) f 1
#1  0x00007f0b46dc2481 in gf_sql_query_function (prep_stmt=0x7f0b203c9348, query_callback=query_callback@entry=0x7f0b473ee150 <tier_gf_query_callback>, _query_cbk_args=_query_cbk_args@entry=0x7f0b28ff8e90)
    at gfdb_sqlite3_helper.c:1157
1157	                                                                (text_column);
(gdb) l
1152	                        /* Get link string. Do shallow copy here
1153	                         * query_callback function should do a
1154	                         * deep copy and then do operations on this field*/
1155	                        gfdb_query_record->_link_info_str = text_column;
1156	                        gfdb_query_record->link_info_size = strlen
1157	                                                                (text_column);
1158	
1159	                        /* Call the call back function provided*/
1160	                        ret = query_callback (gfdb_query_record,
1161	                                                        _query_cbk_args);
(gdb) p text_column
$1 = <optimized out>
(gdb) p *text_column
value has been optimized out
(gdb) f 0
#0  0x00007f0b53de58b1 in __strlen_sse2_pminub () from /lib64/libc.so.6
(gdb) f 1
#1  0x00007f0b46dc2481 in gf_sql_query_function (prep_stmt=0x7f0b203c9348, query_callback=query_callback@entry=0x7f0b473ee150 <tier_gf_query_callback>, _query_cbk_args=_query_cbk_args@entry=0x7f0b28ff8e90)
    at gfdb_sqlite3_helper.c:1157
1157	                                                                (text_column);
(gdb) p gfdb_query_record
$2 = (gfdb_query_record_t *) 0x0
(gdb) p *gfdb_query_record
Cannot access memory at address 0x0
(gdb) q
[root@dhcp37-162 glusterfs]#

[root@dhcp37-162 glusterfs]# file /core.23681
/core.23681: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs -s localhost --volfile-id rebalance/tiervolume --xlator-opt'
[root@dhcp37-162 glusterfs]# 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.19.git0f5c3e8.el7.centos.x86_64

Setup Details:
==============
1. Created 12 node cluster
2. Create tiered volume with Hot tier as (6 x 2) and Cold tier as (2 x (6 + 2) = 16)
3. Fuse Mount the volume on 3 clients RHEL7.2,RHEl7.1 and RHEL6.7
4. Start creating data from each client:

Client 1:
=========
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/

Client 2:
=========
[root@mia ~]# cd /mnt/fuse/
[root@mia fuse]# for i in {1..10}; do cp -rf /etc etc.$i ; sleep 100 ; done

Client 3:
=========
[root@wingo fuse]# for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; sleep 10 ; done

5. After a while, the data creation of client 1 and client 2 should be completed while the data creation from client 3 will still be inprogress

6. At this point the data creation will be of only 1 file from client 3 in every 10 sec.

7. Disabled/Enabled quota 

8. System was idle for a couple of days in between step 6 and step 7

Actual results:
===============

Cores are generated between after step 6.

Comment 2 Nithya Balachandran 2016-08-10 05:26:50 UTC
This should no longer happen as the code that caused the issue has been replaced with the patch (http://review.gluster.org/#/c/12535/).


Closing this BZ (WontFix)


Note You need to log in before you can comment on or make changes to this bug.