Bug 1390013

Summary: Unable to create Cinder backup in IPv6 environment
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: python-os-brickAssignee: Erno Kuvaja <ekuvaja>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: urgent Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: apevec, cschwede, dbecker, eharney, emacchi, jcoufal, jobernar, jschluet, jslagle, lhh, lkuchlan, mburns, mcornea, morazi, pgrist, rhel-osp-director-maint, tbarron, tshefi
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-os-brick-1.6.1-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:26:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OSPD and cinder config and log files none

Description Marius Cornea 2016-10-30 19:32:44 UTC
Description of problem:
In an IPv6 environment I am unable to create a Cinder backup of a volume stored on a Ceph backend. The Cinder backup storage backend is set to Swift.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.0.0-0.9.0rc3.el7ost.noarch
python-cinder-9.0.0-5.el7ost.noarch
openstack-cinder-9.0.0-5.el7ost.noarch
puppet-cinder-9.4.1-1.el7ost.noarch
python-cinderclient-1.9.0-1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud:
source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation-v6.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/cinder-backup.yaml \
-e ~/openstack_deployment/environments/cinder_backup_swift_backend.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
--log-file overcloud_deployment.log &> overcloud_install.log

2. Launch instance, create a volume and attach it to the instance
3. Create a backup of the volume:
cinder backup-create a151dd4a-412b-4e59-b92b-c53621bf3dd0 --name vol-backup --force

Actual results:
cinder backup-list
+--------------------------------------+--------------------------------------+--------+------------+------+--------------+-----------+
| ID                                   | Volume ID                            | Status | Name       | Size | Object Count | Container |
+--------------------------------------+--------------------------------------+--------+------------+------+--------------+-----------+
| f784d9c0-3ce5-4fcd-bead-8a526b3c6f7b | a151dd4a-412b-4e59-b92b-c53621bf3dd0 | error  | vol-backup | 1    | 0            | -         |
+--------------------------------------+--------------------------------------+--------+------------+------+--------------+-----------+

Expected results:
The backup gets created successfuly.

Additional info: 
The deployment uses composable roles with all the pacemaker managed services collocated on the controller nodes and the systemd managed services on a custom role. The complete list of services assignment is the environment files list.

Environment files:
http://paste.openstack.org/show/587409/

/var/log/cinder/backup.log:
http://paste.openstack.org/show/587410/

/etc/cinder/cinder.conf
http://paste.openstack.org/show/587411/

Comment 1 Emilien Macchi 2016-10-31 14:59:33 UTC
I've looked at Cinder config and THT, it seems like we missed to configure Swift parameters in cinder.conf.
For that, we need to give more parameters to ::cinder::backup::swift class otherwise Cinder Backup won't be able to connect to Swift backup container.

Comment 4 Erno Kuvaja 2016-11-02 16:12:44 UTC
As we do not specify backup_swift_url anywhere in THT and it works in IPv4, I would assume the IPv6 environment does not provide object-store:swift:publicURL in the Service Catalog, can you verify?

Comment 5 Marius Cornea 2016-11-02 16:18:05 UTC
This is how the service catalog looks in my environment:

http://paste.openstack.org/show/587664/

Please let me know if there's anything else I can check. Thanks

Comment 6 Erno Kuvaja 2016-11-03 10:21:39 UTC
Thanks Marius,

This rules one possible issue out. I'll try to dig deeper and get back to you if I need something else.

Comment 7 Tzach Shefi 2016-11-06 13:56:54 UTC
Still failed. 

build  -p 2016-11-04.2
HA controller, 1 compute, 3 ceph nodes.
Cinder backup is swift. 

Cinder create volume works
Cinder backup of the volume (swift backend) fails. 

[stack@undercloud-0 ~]$ cinder backup-show a67a93bc-f422-49b9-8a25-7a9a27d7f20d
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| availability_zone     | nova                                 |
| container             | None                                 |
| created_at            | 2016-11-06T13:45:32.000000           |
| data_timestamp        | 2016-11-06T13:45:32.000000           |
| description           | None                                 |
| fail_reason           | Error connecting to ceph cluster.    |
| has_dependent_backups | False                                |
| id                    | a67a93bc-f422-49b9-8a25-7a9a27d7f20d |
| is_incremental        | False                                |
| name                  | None                                 |
| object_count          | 0                                    |
| size                  | 1                                    |
| snapshot_id           | None                                 |
| status                | error                                |
| updated_at            | 2016-11-06T13:50:35.000000           |
| volume_id             | ae225d81-1aeb-40b2-9373-292e2f2c53c2 |
+-----------------------+--------------------------------------+

FYI before this deployment, by mistake I had installed a same deployment only with Cinder backup's backend was left as ceph. 
That setup also failed with same "Error connecting to ceph cluster. "

Attaching OPSD/Cinder config and logs. 
Keeping system up for debugging.

Comment 8 Tzach Shefi 2016-11-06 13:57:55 UTC
Created attachment 1217748 [details]
OSPD and cinder config and log files

Comment 11 Christian Schwede (cschwede) 2016-11-09 14:24:57 UTC
Please note that there were to other IPv6 issues with Swift itself recently; these have been fixed meanwhile and builds are available now:

puppet-tripleo-5.1.0-0.20160921213932.ceccbfd.el7ost
https://bugzilla.redhat.com/show_bug.cgi?id=1378428

openstack-tripleo-heat-templates-5.0.0-0.20160922100830.75c20d5.1.el7ost
puppet-tripleo-5.3.0-7.el7ost
https://bugzilla.redhat.com/show_bug.cgi?id=1390010

Please verify that these fixes are in to ensure Swift itself works properly in IPv6.

Comment 14 Jon Bernard 2016-11-10 17:25:51 UTC
When a backup of a ceph volume is intiated, os_brick is called with a temporary ceph.conf file that contains the necessary connection information.  This file is created on-demand from the cinder-backup process I believe.  This file is in /tmp (e.g. /tmp/tmppNdFZF)

This file does not contain a final newline, and librbd hates this:

[root@controller-1 ~]# cat /tmp/tmppNdFZF 
mon_host = fd00:fd00:fd00:3000::14:6789,fd00:fd00:fd00:3000::17:6789,fd00:fd00:fd00:3000::1d:6789
[client.openstack]
keyring = /etc/ceph/ceph.client.openstack.keyring[root@controller-1 ~]# rbd -c /tmp/tmppNdFZF --id openstack --cluster ceph ls -l volumes
2016-11-10 17:23:56.550541 7ff4f850ad80 -1 Errors while parsing config file!
2016-11-10 17:23:56.550547 7ff4f850ad80 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file with a newline.
2016-11-10 17:23:56.550936 7ff4f850ad80 -1 Errors while parsing config file!
2016-11-10 17:23:56.550939 7ff4f850ad80 -1 read_conf: ignoring line 3 because it doesn't end with a newline! Please end the config file with a newline.

This will cause cinder-backup to fail to connect to the cluster, and is the reason why backups are failing in this env.

Comment 15 Jon Bernard 2016-11-10 17:26:25 UTC
In this case, ipv6 appears to be a red herring.

Comment 16 Jon Bernard 2016-11-10 17:34:23 UTC
Ahh, scratch that, there is also a bug in the contents of the file itself.  Looking further, one minute.

Comment 17 Marius Cornea 2016-11-10 17:38:08 UTC
(In reply to Jon Bernard from comment #15)
> In this case, ipv6 appears to be a red herring.

In this case I believe we have 2 issues here, please correct me if I'm wrong:

1. Backup doesn't work with Ceph backend and IPv6 networking due to the issues mentioned in comment #14

2. The environment in the initial report used Swift for the CinderBackupBackend parameter but it it didn't seem to get configured, the logs were still pointing to ceph errors , see comment #1

Comment 18 Jon Bernard 2016-11-10 17:47:27 UTC
So, this actually is an ipv6 issue.  The temporary ceph.conf file created by cinder-backup contains malformed mon_host entries for ipv6 addresses.  Specifically, the PORT number must be moved outside of the square braces.

For example, this is what is created now:

[global]
mon_host = [fd00:fd00:fd00:3000::14:6789]
[client.openstack]
keyring = /etc/ceph/ceph.client.openstack.keyring

And this should instead be:

[global]
mon_host = [fd00:fd00:fd00:3000::14]:6789
[client.openstack]
keyring = /etc/ceph/ceph.client.openstack.keyring

Comment 19 Jon Bernard 2016-11-10 18:14:04 UTC
For reference see RFC 3986 (http://www.ietf.org/rfc/rfc3986.txt), section 3.2.2

Comment 22 Tzach Shefi 2016-11-15 11:54:33 UTC
Verified!
Version: 
python-os-brick-1.6.1-3.el7ost.noarch
puppet-tripleo-5.3.0-9.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-1.7.el7ost.noarch

3 controller + 1 compute + 3 ceph

Cinder backup backend is swift: 

[stack@undercloud-0 ~]$ sudo grep swift /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml 
  ## Cinder Backup backend can be either 'ceph' or 'swift'.
  CinderBackupBackend: swift

Booted an instance, attached a new Cinder volume to it

[stack@undercloud-0 ~]$ nova volume-attach 22fc3174-0643-4ae6-b523-2f0967b7650b 3b6ab8d3-4cb9-466c-8109-568f40f2f920 auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 3b6ab8d3-4cb9-466c-8109-568f40f2f920 |
| serverId | 22fc3174-0643-4ae6-b523-2f0967b7650b |
| volumeId | 3b6ab8d3-4cb9-466c-8109-568f40f2f920 |
+----------+--------------------------------------+


[stack@undercloud-0 ~]$ cinder backup-create 3b6ab8d3-4cb9-466c-8109-568f40f2f920 --force 
+-----------+--------------------------------------+
| Property  | Value                                |
+-----------+--------------------------------------+
| id        | 3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27 |
| name      | None                                 |
| volume_id | 3b6ab8d3-4cb9-466c-8109-568f40f2f920 |
+-----------+--------------------------------------+

[stack@undercloud-0 ~]$ cinder backup-list
+--------------------------------------+--------------------------------------+-----------+------+------+--------------+---------------+
| ID                                   | Volume ID                            | Status    | Name | Size | Object Count | Container     |
+--------------------------------------+--------------------------------------+-----------+------+------+--------------+---------------+
| 3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27 | 3b6ab8d3-4cb9-466c-8109-568f40f2f920 | available | -    | 1    | 22           | volumebackups |
+--------------------------------------+--------------------------------------+-----------+------+------+--------------+---------------+

Cinder backup restore:
[stack@undercloud-0 ~]$ cinder backup-restore 3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27
+-------------+-----------------------------------------------------+
| Property    | Value                                               |
+-------------+-----------------------------------------------------+
| backup_id   | 3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27                |
| volume_id   | 7c936355-be6f-45cb-94e8-428c9f770ee9                |
| volume_name | restore_backup_3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27 |
+-------------+-----------------------------------------------------+

And a new volume is created from Cinder backup restore:
[stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+-----------------------------------------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status    | Name                                                | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+-----------+-----------------------------------------------------+------+-------------+----------+--------------------------------------+
| 3b6ab8d3-4cb9-466c-8109-568f40f2f920 | in-use    | -                                                   | 1    | -           | false    | 22fc3174-0643-4ae6-b523-2f0967b7650b |
| 7c936355-be6f-45cb-94e8-428c9f770ee9 | available | restore_backup_3109e1a5-13b7-4e1c-a5b4-57c6bdb8fe27 | 1    | -           | false    |                                      |
+--------------------------------------+-----------+-----------------------------------------------------+------+-------------+----------+--------------------------------------+

Comment 25 errata-xmlrpc 2016-12-14 16:26:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html