Bug 765585 (GLUSTER-3853)

Summary: cyrus-imapd unable to reliably start/function on top of GlusterFS volume
Product: [Community] GlusterFS Reporter: Erik <erikmjacobs>
Component: coreAssignee: Amar Tumballi <amarts>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.2.5CC: ejacobs, erikmjacobs, gluster-bugs, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-21 09:31:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/maillog showing cyrus errors when starting from clean (no files) state (initial cyrus start) none

Description Erik 2011-12-08 01:54:13 UTC
I was hoping to back-end cyrus-imapd with glusterfs in order to make a scalable mail system.  Unfortunately, cyrus-imapd is unable to reliably start/function on top of a GlusterFS volume.

Environment overview:
Centos6 2.6.32-71.29.1.el6.x86_64
2 Virtual machines with 1024M RAM, 1vCPU on KVM
Each virtual machine has a 10GB VirtIO raw virtual disk that is used solely for Gluster.

Nodes are called "gluster1" and "gluster2", and there is a dedicated virtual network.  

All hosts/IPs are in /etc/hosts:
192.168.122.11 gluster1.forklocal gluster1
192.168.100.11 gluster1-storage.forklocal gluster1-storage
192.168.122.12 gluster2.forklocal gluster2
192.168.100.12 gluster2-storage.forklocal gluster2-storage

The 10GB disk is formatted with ext4 and mounted at /mnt/glustervol

The gluster volume, mailvol, was created with the following line:
gluster volume create mailvol replica 2 transport tcp gluster1-storage.forklocal:/mnt/glustervol gluster2-storage.forklocal:/mnt/glustervol

[root@gluster1 ~]# gluster volume info

Volume Name: mailvol
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gluster1-storage.forklocal:/mnt/glustervol
Brick2: gluster2-storage.forklocal:/mnt/glustervol

The Gluster FS is then mounted locally on gluster1.forklocal into the /var/lib/imap folder, which is where Cyrus wants to do its business:
[root@gluster1 ~]# mount -t glusterfs gluster2-storage.forklocal:/mailvol /var/lib/imap
[root@gluster1 ~]# mount
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/vda1 on /boot type ext4 (rw)
/dev/vdb1 on /mnt/glustervol type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
gluster2-storage.forklocal:/mailvol on /var/lib/imap type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)

Starting the cyrus-imapd service is where things get wacky.  I wish that I could provide some kind of real solid description of what goes wrong, but I can't.  I've tried to strace -f starting the cyrus process manually, but it doesn't tell me anything of interest.

* Sometimes cyrus-imapd will start cleanly.  
* If it does manage to start cleanly, sometimes one of the database services hangs or eats tons of CPU.
* If that doesn't happen, I then usually can't make a connection to cyrus with cyradm, the administration tool.
* Sometimes cyrus will complain bitterly about not being able to open files.
* One time I got some strange memory cache / out of memory / unable to write issues.
* Sometimes even if everything starts cleanly, but I'm unable to connect, it won't stop cleanly, and processes need to be killed.
* Sometimes some of the spawned processes can't be killed with -9 or SIGSEGV and the machine must be rebooted.

If I unmount the glusterfs from /var/lib/imap and start the service on the local disk, everything works perfectly.  If I then remount and try again, the issues reappear.

I am happy to provide any kind of output or other information that people are interested in seeing.  But, at this time, the best suggestion I can give is that cyrus is unusable on top of glusterfs in this particular configuration being that it will not reliably start or function.

Comment 1 Amar Tumballi 2011-12-08 07:50:52 UTC
Hi Erik,

Thanks for the bug report :

* Next time when the hang happens, try doing 'kill -USR1 <PID of process>' and get the statedump from /tmp/glusterdump.<pid>

* What is the behavior if you mount the volume with Gluster's NFS server ?

* Did you happen to check the log files /var/log/glusterfs/* for any abnormalities ?

* Can you try http://bits.gluster.com/pub/gluster/glusterfs/3.3.0qa15/x86_64/ release once, and see the behavior?

Regards,
Amar

Comment 2 Erik 2011-12-08 12:39:15 UTC
(In reply to comment #1)
> Hi Erik,
> 
> Thanks for the bug report :
> 
> * Next time when the hang happens, try doing 'kill -USR1 <PID of process>' and
> get the statedump from /tmp/glusterdump.<pid>

Are you asking me to kill one of the multi-threaded cyrus processes and then look at /tmp/glusterdump.pid or are you asking me to kill one of the glusterfs processes?  Gluster itself never appears to hang, only bits and pieces of Cyrus.

> 
> * What is the behavior if you mount the volume with Gluster's NFS server ?

I have not tried, but I will attempt to.

> 
> * Did you happen to check the log files /var/log/glusterfs/* for any
> abnormalities ?

I would not know what I am looking for, but I will attach some shortly.  I'll mount with the debug option.

> 
> * Can you try http://bits.gluster.com/pub/gluster/glusterfs/3.3.0qa15/x86_64/
> release once, and see the behavior?

I will attempt to test/log with what I currently have (JoeJulian) and what you suggest (bits.gluster).

Comment 3 Erik 2011-12-08 16:34:58 UTC
Created attachment 728


I removed all of cyrus' files and started cyrus.  It immediately reported filesystem/database access errors.

Still using 3.2.5-1.el6.joe

Comment 4 Erik 2011-12-08 16:40:43 UTC
http://www.erikjacobs.com/stuff/var-lib-imap.log

log file from mounting in debug mode using 3.2.5-1.el6.joe

Comment 5 Erik 2011-12-08 16:45:07 UTC
Trying to mount with NFS never completes:
[root@gluster1 ~]# mount -t nfs gluster2-storage.forklocal:/mailvol /var/lib/imap

I had to ^C out of it.  Top on both nodes didn't really show any activity, so I'm not sure what was going on.

This was with 3.2.5-1.el6.joe

Comment 6 Erik 2011-12-08 16:48:02 UTC
http://bits.gluster.com/pub/gluster/glusterfs/3.3.0qa15/x86_64/

These bits require rsync >= 3.0.7 which is not available in Centos6/EPEL6, so I am unable to test.

Comment 7 Amar Tumballi 2012-04-17 18:53:24 UTC
Erik, Sorry for delay in getting this sorted out. We already made 3.3.0beta3 for community with some enhancements. If you are having free cycles you can test this version now.

(In reply to comment #5)
> Trying to mount with NFS never completes:
> [root@gluster1 ~]# mount -t nfs gluster2-storage.forklocal:/mailvol
> /var/lib/imap
> 
> I had to ^C out of it.  Top on both nodes didn't really show any activity, so
> I'm not sure what was going on.
> 
> This was with 3.2.5-1.el6.joe

This can happen because your client normally tries to mount nfs with version 4 as default, which is not supported by GlusterFS. Please try the below option:

[root@gluster1 ~]# mount -t nfs -overs=3,nolock gluster2-storage.forklocal:/mailvol /var/lib/imap

Let us know if all this fixes the issues for you.

-Amar

Comment 8 Amar Tumballi 2012-10-31 10:02:32 UTC
Eric, wanted to see if this is still the issue after 3.3.0 release of glusterfs. If working, would like to close the bug.

Comment 9 Erik M Jacobs 2012-10-31 17:24:00 UTC
Amar, I do not have this test environment readily accessible and I will not have time to test for many months.  The project never materialized so it fell by the wayside. You can go ahead and close the ticket. If I ever come back to it, I'll open a new one.

Comment 10 Amar Tumballi 2012-11-21 09:31:09 UTC
as per comment #9, closing the ticket.