Bug 1722066
Summary: | Replace controller scenario - RUNNING HANDLER [ceph-handler : restart ceph mon daemon(s) - container] failed with "unable to exec into ceph-mon-controller-3: no container with name or ID ceph-mon-controller-3 found: no such container" | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Artem Hrechanychenko <ahrechan> | ||||||
Component: | Ceph-Ansible | Assignee: | Dimitri Savineau <dsavinea> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 4.0 | CC: | aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, emacchi, gcharot, gfidente, gmeno, hgurav, johfulto, nthomas, pgrist, ssmolyak, tserlin, vashastr | ||||||
Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||||
Target Release: | 4.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ceph-ansible-4.0.0-0.1.rc10.el8cp | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1732157 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-01-31 12:46:20 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1718981 | ||||||||
Bug Blocks: | 1594251 | ||||||||
Attachments: |
|
Description
Artem Hrechanychenko
2019-06-19 13:00:48 UTC
Asking for blocker flag because regression scenario *** This bug has been marked as a duplicate of bug 1719013 *** I think these are different. There's a similarity to bug 1719013 but I'm re-opening to dig into it more. I think this might be a duplicate of a different bug. Please retry this test but add the following to the deployment: CephAnsibleExtraConfig: handler_health_mon_check_retries: 10 handler_health_mon_check_delay: 20 Until ceph-ansible bug 1718981 is resolved you'll need to do the workaround in #4 so I'm marking it as a blocker to this bug. (In reply to John Fulton from comment #4) > Please retry this test but add the following to the deployment: > > CephAnsibleExtraConfig: > handler_health_mon_check_retries: 10 > handler_health_mon_check_delay: 20 [stack@undercloud-0 ~]$ cat overcloud_replace.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/network/dvr-override.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e ~/containers-prepare-parameter.yaml \ -e /home/stack/virt/extra_templates.yaml \ -e ~/remove-controller.yaml \ -e ~/ceph_wa.yaml \ --log-file overcloud_deployment_92.log [stack@undercloud-0 ~]$ cat ceph_wa.yaml parameter_defaults: CephAnsibleExtraConfig: handler_health_mon_check_retries: 10 handler_health_mon_check_delay: 20 "<192.168.24.21> Failed to connect to the host via ssh: ", "failed: [controller-3 -> 192.168.24.21] (item=controller-3) => changed=true ", " - /usr/bin/env", " - bash", " - /tmp/restart_mon_daemon.sh", " delta: '0:03:47.469168'", " end: '2019-06-19 17:12:37.805898'", " _raw_params: /usr/bin/env bash /tmp/restart_mon_daemon.sh", " start: '2019-06-19 17:08:50.336730'", " exit status 1", " unable to exec into ceph-mon-controller-3: no container with name or ID ceph-mon-controller-3 found: no such container", " Error with quorum.", PR 1410 has not yet merged (undercloud) [stack@undercloud-0 ~]$ rpm -qa ceph-ansible ceph-ansible-4.0.0-0.1.rc10.el8cp.noarch "failed: [ceph-2 -> 192.168.24.8] (item=[{'application': 'openstack_gnocchi', 'name': 'metrics', 'pg_num': 32, 'rule_name': 'replicated_rule'}, {'msg': 'non-zero return code', 'cmd': ['podman', 'exec', 'ceph-mon-controller-0', 'ce ph', '--cluster', 'ceph', 'osd', 'pool', 'get', 'metrics', 'size'], 'stdout': '', 'stderr': 'unable to exec into ceph-mon-controller-0: no container with name or ID ceph-mon-controller-0 found: no such container', 'rc': 125, 'start': '201 9-07-17 16:49:47.920625', 'end': '2019-07-17 16:49:47.966148', 'delta': '0:00:00.045523', 'changed': True, 'failed': False, 'invocation': {'module_args': {'_raw_params': 'podman exec ceph-mon-controller-0 ceph --cluster ceph osd pool get metrics size\\n', 'warn': True, '_uses_shell': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': [], 'stderr_lin es': ['unable to exec into ceph-mon-controller-0: no container with name or ID ceph-mon-controller-0 found: no such container'], 'failed_when_result': False, 'item': {'application': 'openstack_gnocchi', 'name': 'metrics', 'pg_num': 32, 'r ule_name': 'replicated_rule'}, 'ansible_loop_var': 'item'}]) => changed=false ", " delta: '0:00:00.053923'", " end: '2019-07-17 16:49:49.504360'", " podman exec ceph-mon-controller-0 ceph --cluster ceph osd pool create metrics 32 32 replicated_rule 1", " - application: openstack_gnocchi", " - metrics", " delta: '0:00:00.045523'", " end: '2019-07-17 16:49:47.966148'", " podman exec ceph-mon-controller-0 ceph --cluster ceph osd pool get metrics size", " application: openstack_gnocchi", " name: metrics", " start: '2019-07-17 16:49:47.920625'", " start: '2019-07-17 16:49:49.450437'", [heat-admin@ceph-2 ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 77e3cf880b9c 192.168.24.1:8787/rhosp15/openstack-cron:20190711.1 dumb-init --singl... 23 hours ago Up 23 hours ago logrotate_crond 9947cb175aed 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 23 hours ago Up 23 hours ago ceph-osd-8 6321d76031e1 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 23 hours ago Up 23 hours ago ceph-osd-5 00ddb30cbf84 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 23 hours ago Up 23 hours ago ceph-osd-14 b83a4a18df38 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 23 hours ago Up 23 hours ago ceph-osd-11 47242e9e34b7 192.168.24.1:8787/ceph/rhceph-4.0-rhel8:latest /opt/ceph-contain... 23 hours ago Up 23 hours ago ceph-osd-1 Created attachment 1591736 [details]
oc logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312 |