Bug 1905301

Summary: volume backup to ceph backend gets stuck in creating status when debug and use_json are True
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: python-oslo-serializationAssignee: Sofia Enriquez <senrique>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: apevec, hberaud, lhh, ltoscano, senrique, tshefi
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-oslo-serialization-2.29.2-1.20201116091656.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-26 13:49:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2020-12-08 01:49:55 UTC
Description of problem:

When cinder-volume uses ceph backend and has debug log and json format log enabled, any backup gets stuck in creating status

~~~
(overcloud) [stack@undercloud-0 ~]$ cinder backup-create b9aeba71-0e10-449a-a755-b1b541002488 --name backup2
+-----------+--------------------------------------+
| Property  | Value                                |
+-----------+--------------------------------------+
| id        | 7419ef7e-f18a-4ab4-95ad-a8632d1647a6 |
| name      | backup2                              |
| volume_id | b9aeba71-0e10-449a-a755-b1b541002488 |
+-----------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ cinder backup-show 7419ef7e-f18a-4ab4-95ad-a8632d1647a6
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| availability_zone     | nova                                 |
| container             | None                                 |
| created_at            | 2020-12-08T01:48:15.000000           |
| data_timestamp        | 2020-12-08T01:48:15.000000           |
| description           | None                                 |
| fail_reason           | None                                 |
| has_dependent_backups | False                                |
| id                    | 7419ef7e-f18a-4ab4-95ad-a8632d1647a6 |
| is_incremental        | False                                |
| name                  | backup2                              |
| object_count          | 0                                    |
| size                  | 1                                    |
| snapshot_id           | None                                 |
| status                | creating                             |
| updated_at            | 2020-12-08T01:48:15.000000           |
| volume_id             | b9aeba71-0e10-449a-a755-b1b541002488 |
+-----------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ 
~~~

I don't see any errors in cinder-backup.log or stdout from the cinder-backup container.
I observe the following 2 lines in the bottom of cinder-backup.log
~~~
{"message": "==> connect_volume: call \"{'self': <os_brick.initiator.connectors.rbd.RBDConnector object at 0x7fcd90061550>, 'connection_properties': {'name': 'volumes/volume-b9aeba71-0e10-449a-a755-b1b541002488', 'hosts': ['172.17.3.61', '172.17.3.70', '172.17.3.125'], 'ports': ['6789', '6789', '6789'], 'cluster_name': 'ceph', 'auth_enabled': True, 'auth_username': 'openstack', 'secret_type': 'ceph', 'secret_uuid': '***', 'volume_id': 'b9aeba71-0e10-449a-a755-b1b541002488', 'discard': True, 'keyring': None, 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False}}\"", "asctime": "2020-12-08 01:48:17", "name": "os_brick.initiator.connectors.rbd", "msg": "==> %(func)s: call %(all_args)r", "args": {"func": "connect_volume", "all_args": "{'self': <os_brick.initiator.connectors.rbd.RBDConnector object at 0x7fcd90061550>, 'connection_properties': {'name': 'volumes/volume-b9aeba71-0e10-449a-a755-b1b541002488', 'hosts': ['172.17.3.61', '172.17.3.70', '172.17.3.125'], 'ports': ['6789', '6789', '6789'], 'cluster_name': 'ceph', 'auth_enabled': True, 'auth_username': 'openstack', 'secret_type': 'ceph', 'secret_uuid': '***', 'volume_id': 'b9aeba71-0e10-449a-a755-b1b541002488', 'discard': True, 'keyring': None, 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False}}"}, "levelname": "DEBUG", "levelno": 10, "pathname": "/usr/lib/python3.6/site-packages/os_brick/utils.py", "filename": "utils.py", "module": "utils", "lineno": 146, "funcname": "trace_logging_wrapper", "created": 1607392097.3840482, "msecs": 384.0482234954834, "relative_created": 63425.718784332275, "thread": 140520887643328, "thread_name": "MainThread", "process_name": "MainProcess", "process": 43, "traceback": null, "hostname": "controller-1", "error_summary": "", "context": {"user_name": null, "project_name": "admin", "domain_name": null, "user_domain_name": null, "project_domain_name": null, "user": "b2cfabfa6bc74a9ba0ed91687d280cb3", "tenant": "15b7011587484d6e802741cdbc11dba2", "system_scope": null, "project": "15b7011587484d6e802741cdbc11dba2", "domain": null, "user_domain": "default", "project_domain": "default", "is_admin": true, "read_only": false, "show_deleted": false, "auth_token": "***", "request_id": "req-b525298b-20a6-46c4-863a-94889ccd6034", "global_request_id": "req-681ddf06-8068-4193-8dff-e16b65ed8cd3", "resource_uuid": null, "roles": ["reader", "admin", "member"], "user_identity": "b2cfabfa6bc74a9ba0ed91687d280cb3 15b7011587484d6e802741cdbc11dba2 - default default", "is_admin_project": true, "user_id": "b2cfabfa6bc74a9ba0ed91687d280cb3", "project_id": "15b7011587484d6e802741cdbc11dba2", "domain_id": null, "read_deleted": "no", "remote_address": "172.17.1.20", "timestamp": "2020-12-08T01:48:15.396657+00:00", "quota_class": null, "service_catalog": [{"type": "identity", "name": "keystone", "endpoints": [{"region": "regionOne", "publicURL": "https://10.0.0.101:13000", "adminURL": "http://192.168.24.30:35357", "internalURL": "http://172.17.1.41:5000"}]}, {"type": "image", "name": "glance", "endpoints": [{"region": "regionOne", "adminURL": "http://172.17.1.41:9292", "publicURL": "https://10.0.0.101:13292", "internalURL": "http://172.17.1.41:9292"}]}, {"type": "compute", "name": "nova", "endpoints": [{"region": "regionOne", "adminURL": "http://172.17.1.41:8774/v2.1", "internalURL": "http://172.17.1.41:8774/v2.1", "publicURL": "https://10.0.0.101:13774/v2.1"}]}, {"type": "object-store", "name": "swift", "endpoints": [{"region": "regionOne", "internalURL": "http://172.17.3.41:8080/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2", "publicURL": "https://10.0.0.101:13808/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2", "adminURL": "http://172.17.3.41:8080/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2"}]}]}, "extra": {"project": "15b7011587484d6e802741cdbc11dba2", "version": "unknown"}}
{"message": "opening connection to ceph cluster (timeout=-1).", "asctime": "2020-12-08 01:48:17", "name": "os_brick.initiator.linuxrbd", "msg": "opening connection to ceph cluster (timeout=%s).", "args": [-1], "levelname": "DEBUG", "levelno": 10, "pathname": "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxrbd.py", "filename": "linuxrbd.py", "module": "linuxrbd", "lineno": 70, "funcname": "connect", "created": 1607392097.385284, "msecs": 385.2839469909668, "relative_created": 63426.95450782776, "thread": 140520887643328, "thread_name": "MainThread", "process_name": "MainProcess", "process": 43, "traceback": null, "hostname": "controller-1", "error_summary": "", "context": {"user_name": null, "project_name": "admin", "domain_name": null, "user_domain_name": null, "project_domain_name": null, "user": "b2cfabfa6bc74a9ba0ed91687d280cb3", "tenant": "15b7011587484d6e802741cdbc11dba2", "system_scope": null, "project": "15b7011587484d6e802741cdbc11dba2", "domain": null, "user_domain": "default", "project_domain": "default", "is_admin": true, "read_only": false, "show_deleted": false, "auth_token": "***", "request_id": "req-b525298b-20a6-46c4-863a-94889ccd6034", "global_request_id": "req-681ddf06-8068-4193-8dff-e16b65ed8cd3", "resource_uuid": null, "roles": ["reader", "admin", "member"], "user_identity": "b2cfabfa6bc74a9ba0ed91687d280cb3 15b7011587484d6e802741cdbc11dba2 - default default", "is_admin_project": true, "user_id": "b2cfabfa6bc74a9ba0ed91687d280cb3", "project_id": "15b7011587484d6e802741cdbc11dba2", "domain_id": null, "read_deleted": "no", "remote_address": "172.17.1.20", "timestamp": "2020-12-08T01:48:15.396657+00:00", "quota_class": null, "service_catalog": [{"type": "identity", "name": "keystone", "endpoints": [{"region": "regionOne", "publicURL": "https://10.0.0.101:13000", "adminURL": "http://192.168.24.30:35357", "internalURL": "http://172.17.1.41:5000"}]}, {"type": "image", "name": "glance", "endpoints": [{"region": "regionOne", "adminURL": "http://172.17.1.41:9292", "publicURL": "https://10.0.0.101:13292", "internalURL": "http://172.17.1.41:9292"}]}, {"type": "compute", "name": "nova", "endpoints": [{"region": "regionOne", "adminURL": "http://172.17.1.41:8774/v2.1", "internalURL": "http://172.17.1.41:8774/v2.1", "publicURL": "https://10.0.0.101:13774/v2.1"}]}, {"type": "object-store", "name": "swift", "endpoints": [{"region": "regionOne", "internalURL": "http://172.17.3.41:8080/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2", "publicURL": "https://10.0.0.101:13808/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2", "adminURL": "http://172.17.3.41:8080/swift/v1/AUTH_15b7011587484d6e802741cdbc11dba2"}]}]}, "extra": {"project": "15b7011587484d6e802741cdbc11dba2", "version": "unknown"}}
~~~


Version-Release number of selected component (if applicable):
I reproduced the issue with RHOSP16.1.2 .

How reproducible:
Always

Steps to Reproduce:
1. Deploy overcloud with ceph-backup enabled

2. Enable debug and json format log of cinder-backup

/var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf
~~~
[DEFAULT]
...
debug=True
...
use_json = true
~~~

3. Create volume backup
$ cinder backup-create <voume id> --name <backup name>

Actual results:
The created backup gets stuck in creating status

Expected results:
The backup is created and become active

Additional info:

Comment 4 Sofia Enriquez 2021-01-08 15:28:42 UTC
As Eric wrote in the upstream bug report: 

The oslo.log JSONFormatter gets stuck when passing an RBDVolumeIOWrapper from os-brick.
This happens via os-brick's utils.trace() method which passes a connector containing
{'path': RBDVolumeIOWrapper}. The oslo.log JSONFormatter format() method calls
oslo_serialization's jsonutils.to_primitive and passes in this RBDVolumeIOWrapper object.
 
Therefore the to_primitive method eventually calls RBDVolumeIOWrapper.read() which it's not intended.

Comment 6 Sofia Enriquez 2021-02-10 16:44:01 UTC
Fix in upstream stable/train

Comment 21 errata-xmlrpc 2021-05-26 13:49:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097