Bug 1463228

Summary: Cannot update pool in imported cluster
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Martin Kudlej <mkudlej>
Component: Ceph IntegrationAssignee: Nishanth Thomas <nthomas>
Status: CLOSED WONTFIX QA Contact: sds-qe-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3CC: mkarnik, nthomas, ppenicka, sankarshan
Target Milestone: ---   
Target Release: 4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 05:41:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kudlej 2017-06-20 12:10:49 UTC
Description of problem:
$ for i in `etcdctl --endpoints=http://${HOSTNAME}:2379 ls /queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b`; do echo "$i:"; etcdctl --endpoints=http://${HOSTNAME}:2379 get $i; echo "-------------"; done
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/payload:
{"status": "new", "username": "admin", "run": "ceph.objects.Pool.flows.UpdatePool", "job_id": "b44b8e70-ae5f-40d1-bfe4-922a6000921b", "parameters": {"Pool.pg_num": 128, "TendrlContext.integration_id": "673157f6-b90a-48cd-8fc4-56ec192efb32", "Pool.pool_id": 0, "_method": "PUT"}, "tags": ["tendrl/integration/673157f6-b90a-48cd-8fc4-56ec192efb32"], "created_at": "2017-06-20T10:37:29Z", "created_from": "API", "type": "sds", "name": "UpdatePool"}
-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/locked_by:
{"node_id": "2bbe6edc-7e03-4d68-b683-9bd00049e939", "type": "sds", "fqdn": "dahorak-usm1-mon1._domain_", "tags": ["tendrl/integration/673157f6-b90a-48cd-8fc4-56ec192efb32", "detected_cluster/3524256a-3302-4c13-8cb2-e7f167a23fd3", "tendrl/node_2bbe6edc-7e03-4d68-b683-9bd00049e939", "tendrl/node", "tendrl/integration/ceph", "ceph/mon"]}
-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/errors:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 203, in _run
    the_flow.run()
  File "/usr/lib/python2.7/site-packages/tendrl/ceph_integration/objects/pool/flows/update_pool/__init__.py", line 28, in run
    super(UpdatePool, self).run()
  File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 210, in run
    (atom_fqn, self._defs['help'])
AtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: ceph.objects.Pool.atoms.Update on flow: Update Ceph Pool

-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/hash:
1f7b2f7fd1bcb6110221729adfc0a4cc
-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/job_id:
b44b8e70-ae5f-40d1-bfe4-922a6000921b
-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/updated_at:
2017-06-20 10:42:58.553843+00:00
-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/children:

-------------
/queue/b44b8e70-ae5f-40d1-bfe4-922a6000921b/status:
failed
-------------

Version-Release number of selected component (if applicable):
ceph-ansible-2.2.11-1.el7scon.noarch
ceph-base-10.2.7-27.el7cp.x86_64
ceph-common-10.2.7-27.el7cp.x86_64
ceph-installer-1.3.0-10.g0e9fec2.el7.noarch
ceph-mon-10.2.7-27.el7cp.x86_64
ceph-osd-10.2.7-27.el7cp.x86_64
ceph-selinux-10.2.7-27.el7cp.x86_64
etcd-3.1.7-1.el7.x86_64
glusterfs-3.8.4-26.el7rhgs.x86_64
glusterfs-api-3.8.4-26.el7rhgs.x86_64
glusterfs-cli-3.8.4-26.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-26.el7rhgs.x86_64
glusterfs-events-3.8.4-26.el7rhgs.x86_64
glusterfs-fuse-3.8.4-26.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-26.el7rhgs.x86_64
glusterfs-libs-3.8.4-26.el7rhgs.x86_64
glusterfs-rdma-3.8.4-26.el7rhgs.x86_64
glusterfs-server-3.8.4-26.el7rhgs.x86_64
libcephfs1-10.2.7-27.el7cp.x86_64
python-cephfs-10.2.7-27.el7cp.x86_64
python-etcd-0.4.5-1.noarch
python-gluster-3.8.4-26.el7rhgs.noarch
rubygem-etcd-0.3.0-1.el7.noarch
tendrl-alerting-3.0-alpha.4.el7scon.noarch
tendrl-api-3.0-alpha.5.el7scon.noarch
tendrl-api-httpd-3.0-alpha.5.el7scon.noarch
tendrl-ceph-integration-3.0-alpha.6.el7scon.noarch
tendrl-commons-3.0-alpha.10.el7scon.noarch
tendrl-dashboard-3.0-alpha.5.el7scon.noarch
tendrl-node-agent-3.0-alpha.10.el7scon.noarch
tendrl-node-monitoring-3.0-alpha.6.el7scon.noarch
tendrl-performance-monitoring-3.0-alpha.8.el7scon.noarch


How reproducible:
100%

Steps to Reproduce:
1. import ceph cluster made by ceph-ansible
2. try to update PG count from 64 to 128 in default pool with name "rbd"

Actual results:
Ceph pool update job fails.

Expected results:
It is possible to update pool PG count.

Additional info:
tendrl-node-agent: Starting new HTTP connection (1): 10.34.108.93
tendrl-node-agent: Starting new HTTP connection (2): 10.34.108.93
tendrl-node-agent: Starting new HTTP connection (1): 10.34.108.93
tendrl-node-agent: Starting new HTTP connection (1): 10.34.108.93
tendrl-node-agent: Starting new HTTP connection (2): 10.34.108.93
tendrl-node-agent: Starting new HTTP connection (1): 10.34.108.93
tendrl-node-agent: Traceback (most recent call last):
tendrl-node-agent: File "/usr/lib64/python2.7/site-packages/gevent/greenlet.py", line 327, in run
tendrl-node-agent: result = self._run(*self.args, **self.kwargs)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 273, in _run
tendrl-node-agent: "exception": ex
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/commons/event.py", line 18, in __init__
tendrl-node-agent: Logger(message)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/commons/logger.py", line 18, in __init__
tendrl-node-agent: self.push_message()
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/commons/logger.py", line 79, in push_message
tendrl-node-agent: caller=self.message.caller
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/node_agent/objects/node_message/__init__.py", line 19, in save
tendrl-node-agent: 'message_retention_time'])
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/__init__.py", line 81, in save
tendrl-node-agent: NS._int.wclient.refresh(self.value, ttl=ttl)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/etcd/client.py", line 523, in refresh
tendrl-node-agent: return self.write(key=key, value=None, ttl=ttl, refresh=True, **kwdargs)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/etcd/client.py", line 500, in write
tendrl-node-agent: response = self.api_execute(path, method, params=params)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/etcd/client.py", line 907, in wrapper
tendrl-node-agent: return self._handle_server_response(response)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/etcd/client.py", line 987, in _handle_server_response
tendrl-node-agent: etcd.EtcdError.handle(r)
tendrl-node-agent: File "/usr/lib/python2.7/site-packages/etcd/__init__.py", line 306, in handle
tendrl-node-agent: raise exc(msg, payload)
tendrl-node-agent: EtcdKeyNotFound: Key not found : /nodes/d2af5c46-42cc-4c22-8ced-90b24ef72c9f/messages/58e2fcf3-ec6c-4e53-b12d-22c10914fd58
tendrl-node-agent: <JobConsumerThread at 0x161d910> failed with EtcdKeyNotFound
tendrl-node-agent: Starting new HTTP connection (1): 10.34.108.93
tendrl-node-agent: 2017-06-20 08:12:41.172790+00:00 - node_agent - /usr/lib/python2.7/site-packages/tendrl/node_agent/node_sync/__init__.py:371 - _run - ERROR - node_sync failed: Raft Internal Error : etcdserver: request timed out - EtcdException: [{u'function': u'run', u'line': 327, u'file': u'/usr/lib64/python2.7/site-packages/gevent/greenlet.py', u'statement': u'result = self._run(*self.args, **self.kwargs)'}, {u'function': u'_run', u'line': 211, u'file': u'/usr/lib/python2.7/site-packages/tendrl/node_agent/node_sync/__init__.py', u'statement': u'NS.tendrl.objects.VirtualDisk(**disks[disk]).save(ttl=200)'}, {u'function': u'save', u'line': 81, u'file': u'/usr/lib/python2.7/site-packages/tendrl/commons/objects/__init__.py', u'statement': u'NS._int.wclient.refresh(self.value, ttl=ttl)'}, {u'function': u'refresh', u'line': 523, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'return self.write(key=key, value=None, ttl=ttl, refresh=True, **kwdargs)'}, {u'function': u'write', u'line': 500, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'response = self.api_execute(path, method, params=params)'}, {u'function': u'wrapper', u'line': 907, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'return self._handle_server_response(response)'}, {u'function': u'_handle_server_response', u'line': 987, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'etcd.EtcdError.handle(r)'}, {u'function': u'handle', u'line': 306, u'file': u'/usr/lib/python2.7/site-packages/etcd/__init__.py', u'statement': u'raise exc(msg, payload)'}]
journal: 2017-06-20 08:12:41.172790+00:00 - node_agent - /usr/lib/python2.7/site-packages/tendrl/node_agent/node_sync/__init__.py:371 - _run - ERROR - node_sync failed: Raft Internal Error : etcdserver: request timed out - EtcdException: [{u'function': u'run', u'line': 327, u'file': u'/usr/lib64/python2.7/site-packages/gevent/greenlet.py', u'statement': u'result = self._run(*self.args, **self.kwargs)'}, {u'function': u'_run', u'line': 211, u'file': u'/usr/lib/python2.7/site-packages/tendrl/node_agent/node_sync/__init__.py', u'statement': u'NS.tendrl.objects.VirtualDisk(**disks[disk]).save(ttl=200)'}, {u'function': u'save', u'line': 81, u'file': u'/usr/lib/python2.7/site-packages/tendrl/commons/objects/__init__.py', u'statement': u'NS._int.wclient.refresh(self.value, ttl=ttl)'}, {u'function': u'refresh', u'line': 523, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'return self.write(key=key, value=None, ttl=ttl, refresh=True, **kwdargs)'}, {u'function': u'write', u'line': 500, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'response = self.api_execute(path, method, params=params)'}, {u'function': u'wrapper', u'line': 907, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'return self._handle_server_response(response)'}, {u'function': u'_handle_server_response', u'line': 987, u'file': u'/usr/lib/python2.7/site-packages/etcd/client.py', u'statement': u'etcd.EtcdError.handle(r)'}, {u'function': u'handle', u'line': 306, u'file': u'/usr/lib/python2.7/site-packages/etcd/__init__.py', u'statement': u'raise exc(msg, payload)'}]

Comment 3 Martin Kudlej 2017-06-20 12:23:40 UTC
I've realized after I refreshed page that there are new errors in UI:

 error
New pg-num cannot be less than existing value
20 Jun 2017 01:59:34
error
Failed pre-run: ceph.objects.Pool.atoms.ValidUpdateParameters for flow: Update Ceph Pool
20 Jun 2017 01:59:34
error
Job failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 203, in _run the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/ceph_integration/objects/pool/flows/update_pool/__init__.py", line 28, in run super(UpdatePool, self).run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 160, in run (atom_fqn, self._defs['help']) AtomExecutionFailedError: Atom Execution failed. Error: Error executing pre run function: ceph.objects.Pool.atoms.ValidUpdateParameters for flow: Update Ceph Pool
20 Jun 2017 01:59:34

BUT I've tried to increase number of PGs from 64 to 128 as "rbd" pool has by default 64 PGs and as you can see in bug description I've tried to set 128 PGs.

Comment 9 Shubhendu Tripathi 2018-11-19 05:41:31 UTC
This product is EOL now