Bug 915741 - [RHEV-RHS] Plugging out a disk hosting one of the RHS bricks brings down the storagedomain
Summary: [RHEV-RHS] Plugging out a disk hosting one of the RHS bricks brings down the ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: RHGS 2.1.2
Assignee: Bug Updates Notification Mailing List
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-26 12:32 UTC by Anush Shetty
Modified: 2015-05-15 18:36 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.4.0.44.1u2rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
virt rhev integration
Last Closed: 2014-02-25 07:24:25 UTC
Embargoed:


Attachments (Terms of Use)
Screenshot of the kernel panic (22.69 KB, image/png)
2013-02-26 12:57 UTC, Anush Shetty
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC

Description Anush Shetty 2013-02-26 12:32:31 UTC
Description of problem: 

We have 2 datacenters. 

1) NFS datacenter
2) Posixfs datacenter

Over the NFS datacenter, we created 4 RHS VMs. Each of these VMs had an additonal virtual thinprovisioned disk of 300G over which 3 LVM volumes of 90G each were created and mounted. A 6x2 Distributed Replicate RHS volume was created over the VMs here with the LVMs as RHS bricks. 

In the posixfs datacenter, we added the 6x2 Distributed Replicate RHS volume as a storagedomain. This storagedomain was used as vm-store. 

Plugging out a disk in any of the one VMs brought down the entire posixfs storagedomain.


Version-Release number of selected component (if applicable):

RHEV-H: rhev-hypervisor6-6.4-20130214.0.auto191.el6.iso
RHS: RHS-2.0-20130219.3-RHS-x86_64-DVD1.iso	


How reproducible: Consistently.


Steps to Reproduce:
1. Create VMs over posixfs storagedomain
2. Plug out the disk in one of the RHS brick VMs
3. 
  
Actual results:

Storage domain is down.

Expected results:

Storage domain should be up since its replica pair is up. 

Additional info:

RHS Volume info:

# gluster volume info vm-store
 
Volume Name: vm-store
Type: Distributed-Replicate
Volume ID: 3a6e5c4e-f684-4559-a4ec-e6eeb5ef2b17
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.45:/brick1/s1
Brick2: 10.70.37.79:/brick1/s2
Brick3: 10.70.37.45:/brick2/s3
Brick4: 10.70.37.79:/brick2/s4
Brick5: 10.70.37.45:/brick3/s5
Brick6: 10.70.37.79:/brick3/s6
Brick7: 10.70.37.81:/brick1/s7
Brick8: 10.70.37.82:/brick1/s8
Brick9: 10.70.37.81:/brick2/s9
Brick10: 10.70.37.82:/brick2/s10
Brick11: 10.70.37.81:/brick3/s11
Brick12: 10.70.37.82:/brick3/s12
Options Reconfigured:
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
storage.owner-uid: 36
storage.owner-gid: 36


Recovery: We were able to recover the posixfs storagedomain by rebooting the hypervisor.

Comment 1 Anush Shetty 2013-02-26 12:42:49 UTC
There is an additional issue of the RHS VMs hitting a kernel panic when the disk is plugged out. This is not the root disk of the VM but the additional disk.

Also, all the disks in the VMs have to be plugged back for the storagedomain to come up and to restart the VMs.

Comment 4 Anush Shetty 2013-02-26 12:57:19 UTC
Created attachment 702849 [details]
Screenshot of the kernel panic

Comment 5 Vijay Bellur 2013-02-27 09:11:36 UTC
Brian, Can you please look into this?

Comment 6 Brian Foster 2013-02-27 13:29:49 UTC
Hi Anush,

It probably isn't a factor, but just to be sure... is the NFS data center a native NFS server with local storage on some remote node, or is that gluster based NFS (on what kind of volume)? Also, what type of virtual disks are fed into the RHS VMs for the data center volume?

I suspect the panic is a factor here. This converts a lost disk to a lost server. Looking at the vm-store volume, it appears we should survive losing a server, so perhaps there are multiple issues here (i.e., losing the server on a disk pull, losing the data center on a server loss).

Comment 8 Brian Foster 2013-02-27 17:19:29 UTC
I just ran a quick test on a single RHS VM to deactivate a mounted lvm volume (e.g., brick) and didn't reproduce a panic. Could you also provide the kernel version of the RHS nodes to make sure I'm testing the correct bits? This is on 2.6.32-220.31.1.el6.x86_64.

Comment 9 Brian Foster 2013-02-27 17:29:43 UTC
Never mind, the panic actually does occur when I reattach the drive. This particular behavior looks like a duplicate of bug 867280, which I believe has a fix. We may need the fix in 6.2.z to address this problem (though the use case is not clear to me, as the disk should not disappear from a guest without user interaction).

Comment 10 Brian Foster 2013-03-01 15:11:14 UTC
Ok, I configured a somewhat similar setup:

- RHEV datacenter on native RHS. This is a posixfs datacenter rather than NFS, since I already had it configured. I suspect this is not a factor, as this remains static during the test (but I could be wrong of course).
- 4 RHS VMs allocated on the native datacenter, 300GB secondary VDs per VM.
- Created and exposed a 6x2 dht+rep volume across the 4 RHS VMs as a new "virtual" datacenter.
- Allocated a rhel6.3 VM on the virtual datacenter.

... and have run some basic tests based on the virtblk panic:

- Crash a host not serving the active rhel6.3 VM.

- Crash a non-domain host serving one leg of the replica set for the active VM (while running dbench in the VM).
- Recover the non-domain host while running dbench in the VM.
  (Repeating this a second time did not result in the panic, but hung the RHS host, in turn leading to hung task messages in the guest until I powered off the RHS VM).

- Crash the domain host (e.g., host with IP of the mount point fed into RHEVM) while running dbench in the VM. Reproduced a latency spike for a period of time (60s).
- Recover the domain host while running dbench in the VM.

- Bring down the rhev host associated with the virt domain, bring it back up.

At no point during these tests did I lose the storage domain. The domain also returns without issue when I cycle the associated rhev host in and out of maintenance mode.

Comment 11 Scott Haines 2013-09-23 19:46:36 UTC
Targeting for 2.1.z (Big Bend) U1.

Comment 12 Amar Tumballi 2013-11-26 08:40:11 UTC
Considering its been 9months since we last tested this, can we re-test this ?

Comment 13 SATHEESARAN 2013-12-02 13:12:55 UTC
Assigning the NEEDINFO on Anush to myself.

Let me try this case and furnish the result.

Comment 14 Gowrishankar Rajaiyan 2013-12-10 16:30:59 UTC
Clearing needinfo flag since this bug is ON_QA for verification.

Comment 15 Gowrishankar Rajaiyan 2013-12-13 18:06:52 UTC
Per integration bug triage 12/13:

Action item if this fails QE:

1. Remove "Corbett" from Internal Whiteboard.
2. Remove "zstream" from Keyword
3. Remove 2.1.z flag
4. set rhs-future?

Comment 16 SATHEESARAN 2014-01-18 08:18:07 UTC
Tested with glusterfs-3.4.0.57rhs-1.el6rhs
ISO - RHSS-2.1-20140116.2
RHEV - IS 32.2 (3.3.0-0.45.el6ev)

Tested the scenarios as suggested by Brian in comment 10

With RHSS 2.1 Update2, client side quorum is enabled.
So, with replica 2 volume, bringing down the first brick makes the volume go read-only, there by pausing the App VMs. So all the below cases are done by bringing down the second node of a replica pair or by killing the second brick of the replica pair.

And in all the above case, 'dd' was running continuously

Test-1
1. In a 4 node cluster, created a distribute replica volume of 2X2
2. Optimized the volume for virt store
(i.e) gluster volume set <vol-name> group virt
gluster volume set <vol-name> storage.owner-uid 36
gluster volume set <vol-name> storage.owner-uid 36
3. Started the volume
4. Created a Data domain with the above created gluster volume
5. Created a App VM with its root disk of size 20GB on the above created data domain.
6. Installed App VM with RHEL 6.5
7. After reboot, started running 'dd' command in loop inside VM
(i.e) dd if=/dev/urandom of=/home/file1 bs=4k count=1000
8. Brought down the second of the replica pair
VMs are up and running.

Test-2
1. In a 4 node cluster, created a distribute replica volume of 2X2
2. Optimized the volume for virt store
(i.e) gluster volume set <vol-name> group virt
gluster volume set <vol-name> storage.owner-uid 36
gluster volume set <vol-name> storage.owner-uid 36
3. Started the volume
4. Created a Data domain with the above created gluster volume
5. Created a App VM with its root disk of size 20GB on the above created data domain.
6. Installed App VM with RHEL 6.5
7. After reboot, started running 'dd' command in loop inside VM
(i.e) dd if=/dev/urandom of=/home/file1 bs=4k count=1000
8. Killed the brick process(glusterfsd) of the second brick in replica pair
VMs are up and running.

Test-3
-------
1. In a 4 node cluster, created a distribute-replicate volume of 6X2
2. Optimized the volume for virt store
(i.e) gluster volume set <vol-name> group virt
gluster volume set <vol-name> storage.owner-uid 36
gluster volume set <vol-name> storage.owner-uid 36
3. Started the volume
4. Created a Data domain with the above created gluster volume
5. Created a App VM with its root disk of size 20GB on the above created data domain.
6. Installed App VM with RHEL 6.5
7. After reboot, started running 'dd' command in loop inside VM
(i.e) dd if=/dev/urandom of=/home/file1 bs=4k count=1000
8. Plugged out one of the disk in the RHSS Node and that is not serving any volume.
This can be done from RHEVM by deactivating the disk

There were no problems and all went well. App VM was healthy

Test-4
------- 
This is the actual case, where the disk serving the brick was unplugged
1. In a 4 node cluster, created a distribute-replicate volume of 6X2
2. Optimized the volume for virt store
(i.e) gluster volume set <vol-name> group virt
gluster volume set <vol-name> storage.owner-uid 36
gluster volume set <vol-name> storage.owner-uid 36
3. Started the volume
4. Created a Data domain with the above created gluster volume
5. Created a App VM with its root disk of size 20GB on the above created data domain.
6. Installed App VM with RHEL 6.5
7. After reboot, started running 'dd' command in loop inside VM
(i.e) dd if=/dev/urandom of=/home/file1 bs=4k count=1000
8. Plugged out one of the disk serving brick for the volume
This can be done from RHEVM by deactivating the disk

Result :
There were immediate error messages in RHSS Node console as follows :

[Sat Jan 18 09:50:16 UTC 2014 root.37.187:~ ] # 
Message from syslogd@rhss2 at Jan 18 05:03:57 ...
 glusterfsd: [2014-01-18 10:03:57.898358] M [posix-helpers.c:1313:posix_health_check_thread_proc] 0-drstore-posix: health-check failed, going down

Message from syslogd@rhss2 at Jan 18 05:04:27 ...
 glusterfsd: [2014-01-18 10:04:27.898560] M [posix-helpers.c:1318:posix_health_check_thread_proc] 0-drstore-posix: still alive! -> SIGTERM

Log messages in /var/log/messages' on the same node.
<snip>
Jan 18 05:03:55 rhss2 multipathd: vdb: remove path (uevent)
Jan 18 05:03:55 rhss2 kernel: virtio-pci 0000:00:07.0: PCI INT A disabled
Jan 18 05:03:56 rhss2 kernel: XFS (dm-2): metadata I/O error: block 0x6400600 ("xlog_iodone") error 19 buf count 262144
Jan 18 05:03:56 rhss2 kernel: XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1052 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa02d
c011
Jan 18 05:03:56 rhss2 kernel: XFS (dm-2): Log I/O Error Detected.  Shutting down filesystem
Jan 18 05:03:56 rhss2 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Jan 18 05:03:57 rhss2 glusterfsd: [2014-01-18 10:03:57.898358] M [posix-helpers.c:1313:posix_health_check_thread_proc] 0-drstore-posix: health-check 
failed, going down
Jan 18 05:04:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:04:27 rhss2 glusterfsd: [2014-01-18 10:04:27.898560] M [posix-helpers.c:1318:posix_health_check_thread_proc] 0-drstore-posix: still alive! 
-> SIGTERM
Jan 18 05:04:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:05:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:05:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:06:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:06:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:07:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:07:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:08:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:08:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:09:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:09:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:10:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:10:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:11:22 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
Jan 18 05:11:52 rhss2 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
</snip>

After all this, App VM was running healthy and no issues were seen.

Test-5
-------
As the continuation of the previous test.
1. Plugged in the disk again to the RHSS Node ( this was the disk which was unplugged in previous test )
This can be done by activating the disk from RHEVM UI
2. Again all went well, App VM was healthy and no other issues
But the disk, was earlier shown as 'vdb' now shown as 'vdf' and that caused little confusion.

Rebooted the RHSS Node and all came back well.

3. Triggered self-heal and it went on smooth and completed successfully


Based on all this results, marking this bugs as VERIFIED

Comment 18 errata-xmlrpc 2014-02-25 07:24:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.