Looks like https://bugzilla.redhat.com/show_bug.cgi?id=1349733, which still doesn't have a fix.
I see this in mistral logs: mistral/executor.log:2018-02-08 17:31:12.565 1931 ERROR ironicclient.common.http [-] Error contacting Ironic server: ノード e36e9754-c392-4903-a296-1fc6fb5afaff がホスト d-director01.ntt.co.jp によりロックされています。現在の操作の完了後に再試行してください。 (HTTP 409). Attempt 6 of 6 mistral/executor.log:2018-02-08 17:31:12.566 1931 ERROR mistral.engine.default_executor [-] Failed to run action [action_ex_id=73c28817-d060-4fb0-be31-35eef1e7a0ab, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'provide', u'node_uuid': u'e36e9754-c392-4903-a296-1fc6fb5afaff', u'configdrive': None, u'cleansteps': None}'] mistral/executor.log:2018-02-08 17:31:12.566 1931 ERROR mistral.engine.default_executor ActionException: IronicAction.node.set_provision_state failed: <class 'ironicclient.common.apiclient.exceptions.Conflict'>: \u30ce\u30fc\u30c9 e36e9754-c392-4903-a296-1fc6fb5afaff \u304c\u30db\u30b9\u30c8 d-director01.ntt.co.jp \u306b\u3088\u308a\u30ed\u30c3\u30af\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u73fe\u5728\u306e\u64cd\u4f5c\u306e\u5b8c\u4e86\u5f8c\u306b\u518d\u8a66\u884c\u3057\u3066\u304f\u3060\u3055\u3044\u3002 However, its not clear that there is a problem with provisioning as per the Ironic logs. There are some communication errors with the node: ironic-conductor.log:2018-02-08 17:25:04.952 1934 WARNING dracclient.wsman [req-dfe29650-e10e-4e8f-a52d-8fd79bc0294c - - - - -] A SSLError error occurred while communicating with 192.0.2.201, attempt 1 of 3 and RPC timeout errors: ources /usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py:331 2018-02-08 17:19:46.583 1934 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: timed out _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1049 2018-02-08 17:19:46.584 1934 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: Timeout while waiting on RPC response - topic: "<unknown>", RPC method: "<unknown>" info: "<unknown>" _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1049 However the node is successfully powered off: 2018-02-08 17:31:27.614 1934 INFO ironic.conductor.utils [req-765f4045-7b5a-42e8-8635-fa1c4e6dd68b - - - - -] Successfully set node e36e9754-c392-4903-a296-1fc6fb5afaff power state to power off. Can you provide the output of "openstack baremetal node list" AFTER the introspect command? Also is the introspection data available for this node via "openstack baremetal introspection data save <uuid>"? It looks like it was retrieved in ironic-inspector.log: 2018-02-08 17:30:36.446 1941 INFO ironic_inspector.process [-] [node: e36e9754-c392-4903-a296-1fc6fb5afaff] Introspection data was stored in Swift in object inspector_data-e36e9754-c392-4903-a296-1fc6fb5afaff So besides the output message there may not be a functional problem.
Thanks Masaki. This looks a I18N issue with encoding: 018-03-06 08:36:06Z [overcloud-Controller-dzy3ysczluaf.0]: UPDATE_FAILED resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 2018-03-06 08:36:06Z [overcloud-Controller-dzy3ysczluaf]: UPDATE_FAILED resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 2018-03-06 08:36:06Z [Controller]: UPDATE_FAILED resources.Controller: resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 2018-03-06 08:36:06Z [CephStorage]: UPDATE_FAILED UPDATE aborted 2018-03-06 08:36:06Z [Compute]: UPDATE_FAILED UPDATE aborted 2018-03-06 08:36:06Z [overcloud-Compute-2lv4f7nkcbtp-0-jsv4c36fbhcc.NodeExtraConfig]: UPDATE_COMPLETE state changed 2018-03-06 08:36:06Z [overcloud]: UPDATE_FAILED resources.Controller: resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 2018-03-06 08:36:07Z [overcloud-Compute-2lv4f7nkcbtp.1]: CREATE_FAILED CREATE aborted 2018-03-06 08:36:07Z [overcloud-CephStorage-b4hawzaltrsg.1]: UPDATE_FAILED UPDATE aborted 2018-03-06 08:36:07Z [overcloud-Compute-2lv4f7nkcbtp.0]: UPDATE_FAILED UPDATE aborted 2018-03-06 08:36:07Z [overcloud-Compute-2lv4f7nkcbtp]: UPDATE_FAILED Operation cancelled
Masaki, I'd like to get clarity as to what the actual issue is that we need to look at. Trying to summarize here: 1. This BZ was initially created due to the "is locked by host" warning message. As described in https://bugzilla.redhat.com/show_bug.cgi?id=1349733, the is a nuisance message and should not cause a failed introspection. There was no evidence provided in comments 3 or 4 (the overcloud_deploy logs in comment 4 are not actually in the case dir) that this in fact a problem with introspection. Comment describes another user config issue with flavors. I am therefore assuming that the initial problem described in the bug is no longer occurring, or if it is, is not a functional problem. 2. Comments 5-10 describe a different totally unrelated problem with the I18N encoding. This looks like a valid bug. I am going to change the title of this bug and redirect it to the team that can look a the I18N encoding issue, unless you have any objections. Thanks.
Thank you Masaki. Updated summary.
Masaki - in order to make progress on debugging the encoding issue on 2-1_Failed_NO_custom-hostname_ja_JP is it possible to get logs or an sosreport on the undercloud.
Looks like stack trace is here - in subscription_manager backtrace: :rhsmcertd-worker:140:<module>:UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-26: ordinal not in range(128) : :Traceback (most recent call last): : File "/usr/libexec/rhsmcertd-worker", line 140, in <module> : print _('Unable to update entitlement certificates and repositories') :UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-26: ordinal not in range(128) : :Local variables in innermost frame: :ga_loader: <module 'subscription_manager.ga_loader' from '/usr/lib/python2.7/site-packages/subscription_manager/ga_loader.pyc'> :exit_on_signal: <function exit_on_signal at 0x2a81d70> :managerlib: <module 'subscription_manager.managerlib' from '/usr/lib/python2.7/site-packages/subscription_manager/managerlib.pyc'> :inj: <module 'subscription_manager.injection' from '/usr/lib/python2.7/site-packages/subscription_manager/injection.pyc'> :log: <logging.Logger object at 0x28f2350> :args: [] :generate_correlation_id: <function generate_correlation_id at 0x2953398> :__package__: None :logutil: <module 'subscription_manager.logutil' from '/usr/lib/python2.7/site-packages/subscription_manager/logutil.pyc'> :main: <function main at 0x2a820c8> :__doc__: None :OptionParser: <class subscription_manager.i18n_optparse.OptionParser at 0x2a7f2c0> :ConsumerIdentity: <class subscription_manager.identity.ConsumerIdentity at 0x296c460> :parser: <subscription_manager.i18n_optparse.OptionParser instance at 0x2a35908> :__builtins__: <module '__builtin__' (built-in)> :__file__: '/usr/libexec/rhsmcertd-worker' :WrappedIndentedHelpFormatter: <class subscription_manager.i18n_optparse.WrappedIndentedHelpFormatter at 0x2a7f258> :sys: <module 'sys' (built-in)> :USAGE: u'%prog [\u30aa\u30d7\u30b7\u30e7\u30f3]' :__name__: '__main__' :action_client: <module 'subscription_manager.action_client' from '/usr/lib/python2.7/site-packages/subscription_manager/action_client.pyc'> :_: <bound method GNUTranslations.ugettext of <gettext.GNUTranslations instance at 0x27e6b90>> :init_dep_injection: <function init_dep_injection at 0x29800c8> :logging: <module 'logging' from '/usr/lib64/python2.7/logging/__init__.pyc'> :dbus: <module 'dbus' from '/usr/lib64/python2.7/site-packages/dbus/__init__.pyc'> :options: <Values at 0x2a74a28: {'autoheal': False}> :connection: <module 'rhsm.connection' from '/usr/lib64/python2.7/site-packages/rhsm/connection.pyc'> :signal: <module 'signal' (built-in)> :e: UnicodeEncodeError('ascii', u'\u30a8\u30f3\u30bf\u30a4\u30c8\u30eb\u30e1\u30f3\u30c8\u8a3c\u660e\u66f8\u3068\u30ea\u30dd\u30b8\u30c8\u30ea\u30fc\u3092\u66f4\u65b0\u4e2d', 0, 23, 'ordinal not in range(128)')
It looks like this subscription-manager issue with the Japanese locale is being tracked here - https://bugzilla.redhat.com/show_bug.cgi?id=1480395. I'm going to mark this as a duplicate, a fix is available in RHEL 7.5, so it should be in the next 10z release. *** This bug has been marked as a duplicate of bug 1480395 ***
Masaki - good analysis. I agree that the fix for the subscription manager issue should be in subscription-manager-1.19.23-1.el7_4.x86_64 installed on this system, per the RHEL7.4z fix https://bugzilla.redhat.com/show_bug.cgi?id=1484265. As far as the Dbus bug, this bug does not look to have the same stack trace signature as Bug 1480395, so I'm not sure its a duplicate. I'm changing the Product and Component so the RHEL team can decide if this is a duplicate.
Code paths involved don't use the D-BUS services from bug 1483746, so closing as dup of 1480395 *** This bug has been marked as a duplicate of bug 1480395 ***
Thanks Kevin. So it seems that the subscription manager trace should be fixed with the installed version. If that is resolved, going back to Comment 13, it seems that we don't have a bug and this BZ should be closed.
Masaki - subscription manager is a RHEL package, there is no specific openstack component for subscription manager. It failed with subscription-manager-1.19.21-1.el7 because this was before the update. Comment 18 came from the sosreport, its likely this is an old stack trace when subscription-manager-1.19.21-1.el7 was being used that is in the sosreport. If the customer is still getting the subscription manager stack trace even when subscription-manager-1.19.23-1.el7_4.x86_64 is installed then you will need to open a new bug against RHEL component subscription_manager. Again there is no specific openstack component for subscription_manager.
Thanks Masaki and Kevin. I do see the 'ascii' codec can't encode" message in 2-2_Failed_custom-hostname_ja_JP.log but likewise can't correlate it to a sosreport to make any progress in determining the source of the problem. From this log file it looks like the error is occurring on the Controller. It would be most useful to check this node for log files that indicate which component is causing the error. It could be heat, but heat is just rolling up message it receives from other components so its also likely to be occurring elsewhere. 2018-03-27 06:05:27Z [overcloud-Controller-spez2uipfgxr.0]: UPDATE_FAILED resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 2018-03-27 06:05:27Z [overcloud-Controller-spez2uipfgxr]: UPDATE_FAILED resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) For the sosreports - in sosreport-d-director01.ntt.co.jp.02027916-20180327150623 I only see the old subscription manager stack trace which we now know isn't valid. I don't see any other sosreports under /cases/02027916 [bfournie@collab-shell 02027916]$ ls /cases/02027916 citellus.html magui.json magui.ou We do have this recent bug in Openstack https://bugzilla.redhat.com/show_bug.cgi?id=1535399, not sure if related.
Masaki - yes it is the same stack errors as in the previous log, and seems to be from about the same time - March 28: heat-engine.log-20180328:2018-03-27 09:59:17.664 4790 INFO heat.engine.stack [req-14b8b62b-f227-4824-9a8c-c681a0d052fc b1c118f8825d49ce9e256ef34bfe360c fc50d03f63864ce7a6c3c9bfd847dd18 - - -] Stack UPDATE FAILED (overcloud-Controller-haysmeyafqpx-0-h65m3a22bbwy): 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) heat-engine.log-20180328:2018-03-27 09:59:18.575 4792 ERROR heat.engine.resource ResourceFailure: resources[0]: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) heat-engine.log-20180328:2018-03-27 10:18:28.370 4945 INFO heat.engine.stack [req-ee2200f2-f430-40b1-9de1-180abee0e7db - - - - -] Stack UPDATE FAILED (overcloud-Controller-haysmeyafqpx): 'ascii' codec can't encode character u'\xf3' in position 67: ordinal not in range(128) We would need to get logs from the particular controller in order to pinpoint which component is causing the problem as indicated in previous comment. Including the DF DFG as they be able to better pinpoint the source of the error from the logs.
Thomas - yes agreed this case is confusing. The final issue is that we're getting these I18N errors being reported in the heat log during a stack create: heat-engine.log-20180328:2018-03-27 10:18:28.370 4945 INFO heat.engine.stack [req-ee2200f2-f430-40b1-9de1-180abee0e7db - - - - -] Stack UPDATE FAILED (overcloud-Controller-haysmeyafqpx): 'ascii' codec can't encode character u'\xf3' in position 67: ordinal not in range(128) We are waiting for logs from the controller nodes to see if we can pinpoint where the error occurs. As far as doing an install with non-english locale, I'm not sure or not if that is supported...
Masaki - I don't see the sosreport in this bug or in the case. Can you attach it to this bug?
Masaki - can you please identify which are the new files that you've added and if they clearly exhibit the error? As per Comment 35, the request is for controller logs when the problem occurs. There are literally hundreds of log files associated with this case and yanking them in the case shows them all with the same datestamp. The latest set of sosreports has an id of 0514 but those logs had no errors as per Comment 41.