Bug 1031614

Summary: Client machine gets rebooted when gluster volume is unmounted
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shruti Sampat <ssampat>
Component: glusterfsAssignee: Raghavendra Talur <rtalur>
Status: CLOSED DUPLICATE QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: barumuga, dtsang, knarra, mmahoney, pprakash, rtalur, sdharane, vagarwal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-23 09:22:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages
none
sosreport from the client machine none

Description Shruti Sampat 2013-11-18 11:51:22 UTC
Description of problem:
-------------------------------------------

When a gluster volume is unmounted from the client, the client is seen to get rebooted.

The following is seen in /var/log/messages, around the same time that the machine got rebooted - 

Nov 18 17:05:18 rhs rsyslogd: the last error occured in /etc/rsyslog.d/gluster.conf, line 17:"$ModLoad mmcount"
Nov 18 17:05:18 rhs rsyslogd-3003: invalid or yet-unknown config file command - have you forgotten to load a module? [try http://www.rsyslog.com/e/3003 ]
Nov 18 17:05:18 rhs rsyslogd: the last error occured in /etc/rsyslog.d/gluster.conf, line 18:"$mmcountKey gf_code # start counting value of gf_code"
Nov 18 17:05:18 rhs rsyslogd: the last error occured in /etc/rsyslog.d/gluster.conf, line 28:"if $app-name contains 'gluster' then :mmcount:"
Nov 18 17:05:18 rhs rsyslogd: warning: selector line without actions will be discarded
Nov 18 17:05:18 rhs rsyslogd: the last error occured in /etc/rsyslog.conf, line 31:"$IncludeConfig /etc/rsyslog.d/*.conf"
Nov 18 17:05:18 rhs rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ]


Version-Release number of selected component (if applicable):
On the client - 

[root@rhs ~]# rpm -qa|grep glusterfs
glusterfs-fuse-3.4.0.42.1u2rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.42.1u2rhs-1.el6rhs.x86_64
glusterfs-3.4.0.42.1u2rhs-1.el6rhs.x86_64

How reproducible:
Frequently

Steps to Reproduce:
1. Fuse mounted a gluster volume on the client.
2. Created data on the mount point, deleted some of it, multiple times.
3. Unmounted the volume.

Actual results:
When the command to unmount the volume was run, the client got rebooted.

Expected results:
Unmounting the volume should not cause the client to reboot.

Additional info:

Comment 1 Shruti Sampat 2013-11-18 11:53:50 UTC
Created attachment 825564 [details]
/var/log/messages

Comment 2 Shruti Sampat 2013-11-18 11:58:00 UTC
Created attachment 825566 [details]
sosreport from the client machine

Comment 5 Bala.FA 2013-12-03 05:32:33 UTC
Regarding messages from /var/log/messages, this bz#1015630 is already taken care of it and these rsyslog warnings are nothing to do with this bug.

Comment 6 Bala.FA 2013-12-03 08:44:00 UTC
Why bz#1015630 is dependent to fix this bug?

Comment 7 Raghavendra Talur 2013-12-03 09:04:27 UTC
I am sorry, the messages in the description pointed to the bug#1015630.
However, they are just the side effect of reboot.

When the client machine reboots, we just see the rsyslog messages in the log.
Removing the depends flag.

Comment 8 Raghavendra Talur 2013-12-03 11:17:13 UTC
Logged in to the machine and checked for crash logs as nothing was available in /var/log/messages.


Found vm-core and vm-dmesg.txt related to same reboot.

Here is a snip of the dmesg:

<4>------------[ cut here ]------------
<2>kernel BUG at fs/dcache.c:670!
<4>invalid opcode: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth0/broadcast
<4>CPU 1 
<4>Modules linked in: bridge stp llc fuse autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 17008, comm: umount Not tainted 2.6.32-358.18.1.el6.x86_64 #1 Red Hat KVM
<4>RIP: 0010:[<ffffffff8119acd8>]  [<ffffffff8119acd8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
<4>RSP: 0018:ffff8801127ebdb8  EFLAGS: 00010292
<4>RAX: 0000000000000055 RBX: ffff8800c79043c0 RCX: 0000000000000000
<4>RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
<4>RBP: ffff8801127ebdf8 R08: 0000000000000000 R09: ffffffff8163fde0
<4>R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000005
<4>R13: ffffffff81a83fc0 R14: ffff8801017b7d80 R15: ffff8800c7904420
<4>FS:  00007fe0e7a6e740(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00007fe0e70d73a0 CR3: 0000000111d26000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process umount (pid: 17008, threadinfo ffff8801127ea000, task ffff880111449500)
<4>Stack:
<4> ffff880111cd0e70 0000000000000000 ffff8801127ebdd8 ffff880111cd0c00
<4><d> ffffffffa01fc200 ffff8801124cc338 ffff880111cd0c00 ffff880113a011c0
<4><d> ffff8801127ebe18 ffffffff8119ad16 0000000000000000 ffff880111cd0c00
<4>Call Trace:
<4> [<ffffffff8119ad16>] shrink_dcache_for_umount+0x36/0x60
<4> [<ffffffff811835ff>] generic_shutdown_super+0x1f/0xe0
<4> [<ffffffff81183726>] kill_anon_super+0x16/0x60
<4> [<ffffffffa01f95d2>] fuse_kill_sb_anon+0x52/0x60 [fuse]
<4> [<ffffffff81183ec7>] deactivate_super+0x57/0x80
<4> [<ffffffff811a215f>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811a2bcb>] sys_umount+0x7b/0x3a0
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 f0 3a 7b 81 48 89 04 24 31 c0 e8 08 2e 37 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 
<1>RIP  [<ffffffff8119acd8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
<4> RSP <ffff8801127ebdb8>



Have asked Shruti(Reporter of this bug) to file a bug on kernel with the above information and make current bug dependent on that.


A similar bug was filed on same kernel version when umount of a cifs mount is done. The patch for that went into cifs module.
Hence it is possible that the bug still exists for fuse mounts. 
Here is the bug link https://bugzilla.redhat.com/show_bug.cgi?id=917890

Asking PM to remove U2 tag from this bug as we can't do much here.

Comment 9 Vivek Agarwal 2013-12-03 11:32:46 UTC
This bug is very inconsistent and is a bug in rhel 6.4 kernel. Moving it out of corbett

Comment 11 Raghavendra Talur 2013-12-23 09:22:12 UTC

*** This bug has been marked as a duplicate of bug 981741 ***

Comment 12 Raghavendra Talur 2013-12-23 09:24:05 UTC
Verified that the back trace found in both the vm-cores are same. 
We have not seen the crash in any kernel version greater than "fixed in" kernel version.