1062848 – [RHS-RHOS] Root disk corruption on a nova instance booted from a cinder volume after a remove-brick/rebalance

Bug 1062848 - [RHS-RHOS] Root disk corruption on a nova instance booted from a cinder volume after a remove-brick/rebalance

Summary: [RHS-RHOS] Root disk corruption on a nova instance booted from a cinder volum...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1286133
TreeView+	depends on / blocked

Reported:	2014-02-08 08:56 UTC by shilpa
Modified:	2015-11-27 11:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1286133 (view as bug list)
Environment:
Last Closed:	2015-11-27 11:43:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log messages from VM instance (80.46 KB, image/png) 2014-02-08 08:58 UTC, shilpa	no flags	Details
View All

Description shilpa 2014-02-08 08:56:04 UTC

Description of problem:
When a nova instance is rebooted while rebalance is in progress on the gluster volume, the root filesystem is mounted R/O after the instance comes back up. Corruption messages are seen.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.59rhs-1.el6_4.x86_64

How reproducible: Always

Steps to Reproduce:
1. Create two 6*2 distribute-replicate volumes called glance-vol and cinder-vol for glance images and cinder volumes respectively.

2. Tag the volumes with group virt
#gluster volume set glance-vol group virt

3. Set the storage.owner-uid and storage.owner-gid of glance-vol to 161
gluster volume set glance-vol storage.owner-uid 161
gluster volume set glance-vol storage.owner-gid 161

4. On RHOS machine, mount the RHS glance volume on /mnt/gluster/glance/images and start the glance-api service. Also configure glance volume for nova instances to use gluster glance-vol.

5. Mount RHS cinder-vol on /var/lib/cinder/volumes and configure RHOS to use RHS volume for cinder storage.

6. Create glance image, create cinder volume and copy the image the image to the volume.

# cinder create --display-name vol3 --image-id dfac4c39-7946-4baa-9fb3-444ec6348a88 10

7. Boot a nova instance out of the bootable cinder volume.

# nova boot --flavor 2 --boot-volume 71973975-7952-4d66-a3d8-3cd38de18431 instance-5

# getfattr -d -etext -m. -n trusted.glusterfs.pathinfo /var/lib/cinder/mnt/4db90e5492997091a102ba6ad764dade/volume-71973975-7952-4d66-a3d8-3cd38de18431
getfattr: Removing leading '/' from absolute path names
# file: var/lib/cinder/mnt/4db90e5492997091a102ba6ad764dade/volume-71973975-7952-4d66-a3d8-3cd38de18431
trusted.glusterfs.pathinfo="(<DISTRIBUTE:cinder-vol-dht> (<REPLICATE:cinder-vol-replicate-0> <POSIX(/rhs/brick1/c2):rhs-vm2:/rhs/brick1/c2/volume-71973975-7952-4d66-a3d8-3cd38de18431> <POSIX(/rhs/brick1/c1):rhs-vm1:/rhs/brick1/c1/volume-71973975-7952-4d66-a3d8-3cd38de18431>))"

8. Now run a remove-brick on the bricks from above output.

# gluster v remove-brick cinder-vol 10.70.37.180:/rhs/brick1/c1 10.70.37.120:/rhs/brick1/c2 start

9. When the volume 71973975-7952-4d66-a3d8-3cd38de18431 is being migrated, reboot the instance-8 that is created from this volume.

10. Check the instance console once it is rebooted. Look for corruption errors messages. Once the instance is up, the rootfs /dev/vda is mounted R/O. Ran fsck manually to correct errors which did not help. The instance is rendered unuseable.

Expected results:

The rootfs should be mounted R/W after the reboot and no corruption messages should be seen

Additional info:

Sosreports and VM screenshot is attached.

Comment 1 shilpa 2014-02-08 08:58:17 UTC

Created attachment 860851 [details]
Log messages from VM instance

Comment 2 shilpa 2014-02-08 09:07:39 UTC

sosreports in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1062848/

Comment 4 Susant Kumar Palai 2015-11-27 11:43:02 UTC

Cloning this to 3.1. To be fixed in future.

Note You need to log in before you can comment on or make changes to this bug.