1002863 – [RHS-RHOS] Kernel panic on bootable cinder volume instance when bricks are brought back up.

Bug 1002863 - [RHS-RHOS] Kernel panic on bootable cinder volume instance when bricks are brought back up.

Summary: [RHS-RHOS] Kernel panic on bootable cinder volume instance when bricks are br...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-30 06:44 UTC by Anush Shetty
Modified:	2014-03-27 09:55 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:	virt rhos cinder rhs integration
Last Closed:	2014-03-27 09:55:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot of the panic (12.14 KB, image/png) 2013-08-30 06:46 UTC, Anush Shetty	no flags	Details
View All

Description Anush Shetty 2013-08-30 06:44:34 UTC

Description of problem: We saw a kernel panic while rebooting the instance after installation of RHEL 6.4 ISO on the nova instance. The nova instance was on a bootable cinder volume hosted on a RHS cluster.


Version-Release number of selected component (if applicable):

Openstack:

http://download.lab.bos.redhat.com/rel-eng/OpenStack/Grizzly/2013-07-08.1/
openstack-cinder-2013.1.2-3.el6ost.noarch
openstack-nova-compute-2013.1.2-4.el6ost.noarch


RHS: glusterfs-3.4.0.24rhs-1.el6rhs.x86_64


How reproducible: Saw it for the first time.


Steps to Reproduce:
1. Create a 2x2 Distributed-Replicate RHS volume

2. Tag the volume with group virt
   (i.e) gluster volume set cinder-vol group virt

3. Set owner uid and gid to the volume
   (i.e) gluster volume set cinder-vol storage.owner-uid 165
         gluster volume set cinder-vol storage.owner-gid 165

4. gluster volume info and status

Volume Name: cinder-vol
Type: Distributed-Replicate
Volume ID: 25b9729b-b326-4eb8-9068-961c67ee25c6
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhshdp01.lab.eng.blr.redhat.com:/cinder1/s1
Brick2: rhshdp02.lab.eng.blr.redhat.com:/cinder1/s2
Brick3: rhshdp03.lab.eng.blr.redhat.com:/cinder1/s3
Brick4: rhshdp04.lab.eng.blr.redhat.com:/cinder1/s4
Options Reconfigured:
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-uid: 165
storage.owner-gid: 165

# gluster volume status cinder-vol
Status of volume: cinder-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhshdp01.lab.eng.blr.redhat.com:/cinder1/s1	49153	Y	11027
Brick rhshdp02.lab.eng.blr.redhat.com:/cinder1/s2	49153	Y	20438
Brick rhshdp03.lab.eng.blr.redhat.com:/cinder1/s3	49157	Y	763
Brick rhshdp04.lab.eng.blr.redhat.com:/cinder1/s4	49157	Y	24021
NFS Server on localhost					2049	Y	20502
Self-heal Daemon on localhost				N/A	Y	20510
NFS Server on rhshdp03.lab.eng.blr.redhat.com		2049	Y	825
Self-heal Daemon on rhshdp03.lab.eng.blr.redhat.com	N/A	Y	833
NFS Server on rhshdp04.lab.eng.blr.redhat.com		2049	Y	850
Self-heal Daemon on rhshdp04.lab.eng.blr.redhat.com	N/A	Y	859
NFS Server on 10.70.36.116				2049	Y	20366
Self-heal Daemon on 10.70.36.116			N/A	Y	20375
 
There are no active volume tasks


5. Configure cinder to use glusterfs volume

  a. 
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT volume_driver cinder.volume.drivers.glusterfs.GlusterfsDriver
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT glusterfs_shares_config /etc/cinder/shares.conf
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT glusterfs_mount_point_base /var/lib/cinder/volumes
  
  b. # cat /etc/cinder/shares.conf
     rhshdp01.lab.eng.blr.redhat.com:cinder-vol

  c. for i in api scheduler volume; do sudo service openstack-cinder-${i} restart; done

6. Create 2x2 Distributed-Replicate volume for glance-vol with virt tag and uid and gid set to 161
  
# gluster volume info glance-vol
 
Volume Name: glance-vol
Type: Distributed-Replicate
Volume ID: c3fe0412-9fec-4914-8fcc-648dc8632a2e
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhshdp01.lab.eng.blr.redhat.com:/glance1/s1
Brick2: rhshdp02.lab.eng.blr.redhat.com:/glance1/s2
Brick3: rhshdp03.lab.eng.blr.redhat.com:/glance1/s3
Brick4: rhshdp04.lab.eng.blr.redhat.com:/glance1/s4
Options Reconfigured:
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 161
storage.owner-gid: 161

7. Mount the RHS glance volume on /var/lib/glance/images
     
8. Uploaded RHEL 6.4 ISO from openstack horizon dashboard
   
   # http://download.eng.blr.redhat.com/pub/rhel/released/RHEL-6/6.4/Server/x86_64/iso/RHEL6.4-20130130.0-Server-x86_64-DVD1.iso
   
9. Created a cinder volume of 10G

10. Launched a nova instance with cinder as boot volume and RHEL 6.4 ISO 

11. During the installation, brought down 2 bricks 
    rhshdp02.lab.eng.blr.redhat.com:/cinder1/s2 (kill -9 pid)
    rhshdp03.lab.eng.blr.redhat.com:/cinder1/s3 (kill -9 pid)

12. Installation completed and rebooted the instance. While rebooting, all brought up all the bricks which were down using "gluster volume start cinder-vol force"


Actual results:

Kernel panic in the instance

Expected results:

Instance should boot up fine.

Additional info:


1) df output on openstack host machine.
# df -h
Filesystem            Size  Used Avail Use% Mounted on

rhshdp01.lab.eng.blr.redhat.com:glance-vol
                      200G   11G  190G   6% /var/lib/glance/images
rhshdp01.lab.eng.blr.redhat.com:cinder-vol
                      200G  1.7G  199G   1% /var/lib/cinder/volumes/2b0d90354f56d251613926a47374f77b
rhshdp01.lab.eng.blr.redhat.com:cinder-vol
                      200G  1.7G  199G   1% /var/lib/nova/mnt/2b0d90354f56d251613926a47374f77b

2) # cinder list
+--------------------------------------+----------------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |     Status     | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+----------------+--------------+------+-------------+----------+--------------------------------------+                                    |
| ede3931f-6332-460e-a9b4-544d715241e8 |     in-use     |   vol_nova   |  10  |     None    |  false   | 6c39d39c-2517-426e-bf7a-e7de12436a99

Comment 1 Anush Shetty 2013-08-30 06:46:16 UTC

Created attachment 792037 [details]
screenshot of the panic

Comment 2 Anush Shetty 2013-08-30 06:46:42 UTC

Sosreports and statedumps,

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1002863/

Comment 4 Divya 2013-09-10 11:58:10 UTC

Amar,

This bug has been identified as a known issue bug. Please provide CCFR information  in the Doc Text field.

Comment 5 Amar Tumballi 2013-09-18 10:17:16 UTC

Divya, as of now, the RCA for the bug is not done. hence the summary of the bug itself serves as the CCFR.

Comment 6 shilpa 2014-03-27 09:02:04 UTC

Tested on RHOS4.0 with RHS2.1 glusterfs-3.4.0.59rhs-1.el6_4.x86_64. With client-quorum enabled with the latest RHS version, I only brought down second bricks in the cluster. Could not reproduce this issue.

Comment 7 shilpa 2014-03-27 09:03:06 UTC

Comment #6 wrong BZ update.

Comment 8 shilpa 2014-03-27 09:04:52 UTC

Tested on RHOS4.0 with RHS2.1 glusterfs-3.4.0.59rhs-1.el6_4.x86_64. With client-quorum enabled with the latest RHS version, I only brought down second bricks in the cluster. Could not reproduce this issue.

Note You need to log in before you can comment on or make changes to this bug.