Bug 1128285
| Summary: | Rubygem-Staypuft: HA-neutron deployment fails - puppet agent run fails on keystone which reports OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | rubygem-staypuft | Assignee: | Scott Seago <sseago> |
| Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.0 (RHEL 7) | CC: | aberezin, cwolfe, mburns, morazi, rhos-maint, yeylon |
| Target Milestone: | ga | Keywords: | TestOnly |
| Target Release: | Installer | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-08-21 18:08:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Alexander Chuzhoy
2014-08-08 19:34:17 UTC
Created attachment 925289 [details]
keystone.log where the error happens.
Created attachment 925290 [details]
the messages log from the host where the error happens.
Created attachment 925292 [details]
mariadb.log where the error doesn't happen.
This looks suspicious in /var/log/messages: Aug 8 14:56:29 maca25400702876 mysqld_safe: 140808 14:56:29 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.29hlr6' --pid-file='/var/lib/mysql/maca25400702876.example.com-recover.pid' Aug 8 14:56:32 maca25400702876 mysqld_safe: 140808 14:56:32 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1 Aug 8 14:56:35 maca25400702876 rsyncd[16020]: rsyncd version 3.0.9 starting, listening on port 4444 Aug 8 14:56:35 maca25400702876 rsyncd[16034]: name lookup failed for 192.168.100.138: Name or service not known Aug 8 14:56:35 maca25400702876 rsyncd[16034]: connect from UNKNOWN (192.168.100.138) Aug 8 14:56:35 maca25400702876 rsyncd[16034]: rsync to rsync_sst/ from UNKNOWN (192.168.100.138) Aug 8 14:56:35 maca25400702876 rsyncd[16034]: receiving file list Aug 8 14:56:36 maca25400702876 rsyncd[16048]: name lookup failed for 192.168.100.138: Name or service not known Aug 8 14:56:36 maca25400702876 rsyncd[16048]: connect from UNKNOWN (192.168.100.138) Aug 8 14:56:36 maca25400702876 rsyncd[16034]: sent 72 bytes received 18877038 bytes total size 18874368 Aug 8 14:56:36 maca25400702876 rsyncd[16048]: rsync to rsync_sst-log_dir/ from UNKNOWN (192.168.100.138) Aug 8 14:56:36 maca25400702876 rsyncd[16048]: receiving file list Aug 8 14:56:37 maca25400702876 rsyncd[16048]: sent 73 bytes received 10487256 bytes total size 10485760 Aug 8 14:56:37 maca25400702876 rsyncd[16050]: name lookup failed for 192.168.100.138: Name or service not known Aug 8 14:56:37 maca25400702876 rsyncd[16050]: connect from UNKNOWN (192.168.100.138) Aug 8 14:56:37 maca25400702876 rsyncd[16050]: rsync to rsync_sst/./mysql from UNKNOWN (192.168.100.138) Aug 8 14:56:37 maca25400702876 rsyncd[16050]: receiving file list In my working setup, similar rsync messages (except that they succeed) reference the cluster_control_ip unlike the above. So the question is, where is 192.168.100.138 coming from (either cluster_control_ip is set wrong, or galera or puppet is doing some extra (bad) inference)? Based on messages attached, the cluster_control_ip *should* be one of 192.168.0.9, 192.168.0.10, 192.168.0.11. Reproduced with rhel-osp-installer-0.1.9-1.el6ost.noarch Instead of previous comment: Reproduced with rhel-osp-installer-0.1.9-1.el6ost.noarch with HA-Nova deployment. Further investigation with sseago and sasha on IRC revealed private_ip was not set in the same subnet as the pacemaker cluster members / cluster_control_ip, most likely the underlying cause to this bug. related to the ip address changing randomly bug which is fixed in staypuft 0.2.5, please retest with that version. Didn't reproduced on rhel-osp-installer-0.1.10-2.el6ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1090.html |