1578849 – Overcloud deployment failed with BlockStorage

Bug 1578849 - Overcloud deployment failed with BlockStorage

Summary: Overcloud deployment failed with BlockStorage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	Upstream M3
Target Release:	14.0 (Rocky)
Assignee:	Alex Schultz
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1592505 (view as bug list)
Depends On:
Blocks:	1656540
TreeView+	depends on / blocked

Reported:	2018-05-16 13:38 UTC by Oksana Voshchana
Modified:	2022-03-13 15:00 UTC (History)
CC List:	30 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-9.0.0-0.20180915013702.860c9a3.el7ost
Doc Type:	Bug Fix
Doc Text:	With this update, NTP time is synced early in the deployment process to prevent container configuration and deployment failure. If the NTP servers are not accessible and cannot be synced, deployment fails immediately. Prior to this update, failures could occur later with a cryptic error message.
Clone Of:
Environment:
Last Closed:	2019-04-17 19:35:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Openstack long failures (595.64 KB, text/plain) 2018-05-16 13:38 UTC, Oksana Voshchana	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	redhat-openstack infrared issues 331	'None'	closed	RFE: validate that virt host machine is NTP synced	2020-12-04 18:34:38 UTC
Launchpad	1776869	None	None	None	2018-06-20 13:03:04 UTC
OpenStack gerrit	576888	'None'	MERGED	Add host prep step for ntp time sync	2020-12-04 18:35:05 UTC
Red Hat Issue Tracker	OSP-11392	None	None	None	2021-12-10 16:14:28 UTC
Red Hat Knowledge Base (Solution)	5048941	None	None	None	2020-05-05 12:50:06 UTC
Red Hat Product Errata	RHEA-2019:0045	None	None	None	2019-01-11 11:50:35 UTC

Description Oksana Voshchana 2018-05-16 13:38:55 UTC

Created attachment 1437382 [details]
Openstack long failures

Description of problem:
We are trying to deploy OSP13 with standalone BlockStorage:
parameter_defaults:
   BlockStorageCount: 1
   OvercloudBlockStorageFlavor: cinder
   ControllerCount: 1
   OvercloudControlFlavor: controller
   ComputeCount: 1
   OvercloudComputeFlavor: compute

Deployment stops in next stage 

2018-05-16 00:57:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0]: CREATE_FAILED  Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2018-05-16 00:57:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2018-05-16 00:57:18Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-05-16 00:57:19Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-05-16 00:57:20Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with Heat Stack create failed.
Heat Stack create failed.
non-zero status code: 2
2018-05-16 00:57:20Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

 Stack overcloud CREATE_FAILED 

overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 02911b18-9498-4a63-9f52-5256f6917c54
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "    os.utime(dst, (st.st_atime, st.st_mtime))", 
            "OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'", 
            "stdout: f394156bde72e578364583b14f5c0b624836b13657609f1e690062ea24722059"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/6c4586d6-4a26-4e15-b995-ffaaec26acc2_playbook.retry
    
2018-05-16 00:57:20Z 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy undercloud with VM's
2. Deploy overcloud with composable roles with next role-files Controller, Compute, BlockStorage

Actual results:
Heat Stack create failed.

Expected results:
Heat Stack created 

Additional info:

Comment 1 Ricardo Noriega 2018-05-30 08:27:18 UTC

I'm hitting this issue, and it's just 1 controller + 2 computes, all virtualized:

           "Error running ['docker', 'run', '--name', 'mysql_bootstrap', '--label', 'config_id=tripleo_step1', '--label', 'container_name=mysql_bootstrap', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 1, \"image\": \"192.168.24.1:8787/rhosp13/openstack-mariadb:2018-05-25.1\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"KOLLA_BOOTSTRAP=True\", \"DB_MAX_TIMEOUT=60\", \"DB_CLUSTERCHECK_PASSWORD=g2MBvdwbrGWCecq9Tx9h2dMZT\", \"DB_ROOT_PASSWORD=DeRVs33bBu\", \"TRIPLEO_CONFIG_HASH=6fff9ad95bebaf4683dcd50016885e84\"], \"command\": [\"bash\", \"-ec\", \"if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\\\\necho -e \\\\\"\\\\\\\\n[mysqld]\\\\\\\\nwsrep_provider=none\\\\\" >> /etc/my.cnf\\\\nkolla_set_configs\\\\nsudo -u mysql -E kolla_extend_start\\\\nmysqld_safe --skip-networking --wsrep-on=OFF &\\\\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \\'until mysqladmin -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" ping 2>/dev/null; do sleep 1; done\\'\\\\nmysql -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" -e \\\\\"CREATE USER \\'clustercheck\\'@\\'localhost\\' IDENTIFIED BY \\'${DB_CLUSTERCHECK_PASSWORD}\\';\\\\\"\\\\nmysql -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" -e \\\\\"GRANT PROCESS ON *.* TO \\'clustercheck\\'@\\'localhost\\' WITH GRANT OPTION;\\\\\"\\\\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p\\\\\"${DB_ROOT_PASSWORD}\\\\\" shutdown\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json\", \"/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/mysql:/var/lib/mysql\"], \"net\": \"host\", \"detach\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=KOLLA_BOOTSTRAP=True', '--env=DB_MAX_TIMEOUT=60', '--env=DB_CLUSTERCHECK_PASSWORD=g2MBvdwbrGWCecq9Tx9h2dMZT', '--env=DB_ROOT_PASSWORD=DeRVs33bBu', '--env=TRIPLEO_CONFIG_HASH=6fff9ad95bebaf4683dcd50016885e84', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json', '--volume=/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/mysql:/var/lib/mysql', '192.168.24.1:8787/rhosp13/openstack-mariadb:2018-05-25.1', 'bash', '-ec', 'if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\\necho -e \"\\\\n[mysqld]\\\\nwsrep_provider=none\" >> /etc/my.cnf\\nkolla_set_configs\\nsudo -u mysql -E kolla_extend_start\\nmysqld_safe --skip-networking --wsrep-on=OFF &\\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \\'until mysqladmin -uroot -p\"${DB_ROOT_PASSWORD}\" ping 2>/dev/null; do sleep 1; done\\'\\nmysql -uroot -p\"${DB_ROOT_PASSWORD}\" -e \"CREATE USER \\'clustercheck\\'@\\'localhost\\' IDENTIFIED BY \\'${DB_CLUSTERCHECK_PASSWORD}\\';\"\\nmysql -uroot -p\"${DB_ROOT_PASSWORD}\" -e \"GRANT PROCESS ON *.* TO \\'clustercheck\\'@\\'localhost\\' WITH GRANT OPTION;\"\\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p\"${DB_ROOT_PASSWORD}\" shutdown']. [2]", 
            "stderr: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json", 
            "INFO:__main__:Validating config file", 
            "INFO:__main__:Kolla config strategy set to: COPY_ALWAYS", 
            "INFO:__main__:Copying service configuration files", 
            "INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets", 
            "INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets", 
            "INFO:__main__:Deleting /etc/my.cnf.d/galera.cnf", 
            "INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/galera.cnf to /etc/my.cnf.d/galera.cnf", 
            "ERROR:__main__:Unexpected error:", 
            "Traceback (most recent call last):", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 411, in main", 
            "    execute_config_strategy(config)", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 377, in execute_config_strategy", 
            "    copy_config(config)", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 306, in copy_config", 
            "    config_file.copy()", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 150, in copy", 
            "    self._merge_directories(source, dest)", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 97, in _merge_directories", 
            "    os.path.join(dest, to_copy))", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 92, in _merge_directories", 
            "    self._set_properties(source, dest)", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 117, in _set_properties", 
            "    self._set_properties_from_file(source, dest)", 
            "  File \"/usr/local/bin/kolla_set_configs\", line 122, in _set_properties_from_file", 
            "    shutil.copystat(source, dest)", 
            "  File \"/usr/lib64/python2.7/shutil.py\", line 98, in copystat", 
            "    os.utime(dst, (st.st_atime, st.st_mtime))", 
            "OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'", 
            "stdout: a90cb6fabce8e94fe42effe325a1350a9e47b134f211774acc3153a35a545a7f"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/71cc6674-d6d2-4b0f-ad9c-4a9e470b36bb_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=28   changed=13   unreachable=0    failed=1   
    
  deploy_stderr: |

Comment 2 Alan Bishop 2018-06-13 13:02:35 UTC

The attachment listed in the BZ description, and comment #1 both report the same failure, and it's something that happens when starting the 'mysql_bootstrap' container using the openstack-mariadb container image. This is not a DF:Storage issue, so reassigning to DFG:DF for further investigation.

Comment 3 Alex Schultz 2018-06-13 16:12:24 UTC

PIDONE owns mysql

Comment 4 Damien Ciabrini 2018-06-18 14:37:02 UTC

Hmmm, the error from the bug description seems to mean that when the transient container mysql_bootstrap is starting, kolla_init wants to overwrite /etc/pki/ca-trust/extracted with some config file that would come from /var/lib/config-data/puppet-generated/mysql, or change the folder permissions.

I can't think of any reason why this would happen without logs to analyze.

Did you enable TLS everywhere when you deployed the overcloud?
Can you provide the exact deployment command so I try to replicate locally?

Comment 5 Jiri Stransky 2018-06-20 13:01:07 UTC

Jakub hit this issue earlier in a virtualized environment, we managed to fix it by making sure the *virt host machine* was NTP-synced.

Note that it's not enough that we provide correct NtpServers to the overcloud Heat stack. If the setup is virtual, the VMs will get time from the hypervisor until they sync themselves via NTP. If the time is offset and there's an abrupt jump as the NTP sync takes effect, a wide variety of problems can appear. The only defense is to make sure that the virt host is NTP-synced too.

As such this is a problem with the virtualized setups we use. I reported RFEs in in TripleO Quickstart and Infrared, that they should at least validate NTP on the virt host is synced, and refuse to run if not (or maybe even set up the NTP sync on virt host too).

Comment 6 Jiri Stransky 2018-06-20 13:29:20 UTC

On a second thought, if a bare metal machine goes into the deployment with wrongly set hardware clock, a similar problem could happen. It's probably worth investigating if we can do something to make this safer, e.g. make sure NTP sync is done early during deployment (and we wait for its completion) before running the containerized puppet.

Comment 7 Alex Schultz 2018-06-20 15:10:07 UTC

I'll take this back since it's NTP. We can do this in the deployment via host_prep_tasks for ntp to ensure we have a time sync early on in the deployment.

Comment 8 Jiri Stransky 2018-06-20 15:38:28 UTC

Thanks for taking this Alex, yes either host_prep_tasks, or if we want to continue using the Puppet module for NTP config for whatever reason, we could add an `exec` resource after it [1] with `tries`+`try_sleep` to wait until the NTP is synced.

I looked at the code ordering a bit this afternoon and the issue reported here should disappear if we ensure that we never enter the `docker-puppet.py` phase with unsynced NTP. The sync needs to be asserted either in host_prep_tasks or step 1 of puppet run on the host [2]. The docker-puppet.py phase [3] comes after both.

Just transfering my thoughts as i already spent a bit of time looking at this.

[1] https://github.com/openstack/puppet-tripleo/blob/cab0d34affeb171215e2bb288df7d478049e79cf/manifests/profile/base/time/ntp.pp#L29
[2] https://github.com/openstack/tripleo-heat-templates/blob/4286727ae70b1fa4ca6656c3f035afeac6eb2a95/common/deploy-steps-tasks.yaml#L156
[3] https://github.com/openstack/tripleo-heat-templates/blob/4286727ae70b1fa4ca6656c3f035afeac6eb2a95/common/deploy-steps-tasks.yaml#L184

Comment 9 Alex Schultz 2018-06-20 16:30:30 UTC

As we're eventually going to replace ntp with chrony for some configurations, I think the host prep task will work the best for the service we're using. It's also something that we need to perform on the host as opposed to in the containers so it makes the most sense there.  The host_prep_tasks get run before any of the container items so it'll make sure we have a synced time before we start working with the containers themselves,

https://github.com/openstack/tripleo-heat-templates/blob/4286727ae70b1fa4ca6656c3f035afeac6eb2a95/common/deploy-steps.j2#L281

Comment 11 Alex Schultz 2018-06-20 16:33:05 UTC

*** Bug 1592505 has been marked as a duplicate of this bug. ***

Comment 12 Daniel Alvarez Sanchez 2018-06-20 21:05:14 UTC

In my case I fixed it the timezone to UTC in the hypervisor:
cp /usr/share/zoneinfo/UTC /etc/localtime

Synchronized against NTP:
ntpdate clock.redhat.com

And updated the hwclock:
hwclock --systohc

After this, redeploying the overcloud worked.
Maybe the ntp sync in TripleO should happen first thing before invoking puppet-docket at all?

Thanks Damien for the super late hours debugging :)!

Comment 26 Artem Hrechanychenko 2019-01-09 06:38:58 UTC

VERIFIED
openstack-tripleo-heat-templates-9.0.1-0.20181013060907.el7ost.noarch

Comment 29 AMOL LONARE 2019-01-11 11:15:27 UTC

Pls update on this BZ

Comment 30 Artem Hrechanychenko 2019-01-11 11:18:11 UTC

Will test it today.thx

Comment 31 errata-xmlrpc 2019-01-11 11:49:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Comment 35 Chris Smart 2019-02-07 00:31:31 UTC

For what it's worth, I seem to have hit this bug in Queens (openstack-tripleo-heat-templates-8.0.7-21.el7ost.noarch) when deploying with some new x86 nodes (previously hit this in ppc64le). For me the nova containers cannot start, here's a log snippet:


[heat-admin@compute-prod-1 ~]$ sudo docker logs nova_libvirt
...
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/libvirt/libvirtd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/libvirtd.conf to /etc/libvirt/libvirtd.conf
INFO:__main__:Deleting /etc/libvirt/passwd.db
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/passwd.db to /etc/libvirt/passwd.db
INFO:__main__:Deleting /etc/libvirt/qemu.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/qemu.conf to /etc/libvirt/qemu.conf
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/migration/authorized_keys
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/authorized_keys to /etc/nova/migration/authorized_keys
INFO:__main__:Deleting /etc/nova/migration/identity
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/identity to /etc/nova/migration/identity
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Deleting /etc/nova/secret.xml
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/secret.xml to /etc/nova/secret.xml
ERROR:__main__:Unexpected error:
Traceback (most recent call last):
  File "/usr/local/bin/kolla_set_configs", line 411, in main
    execute_config_strategy(config)
  File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy
    copy_config(config)
  File "/usr/local/bin/kolla_set_configs", line 306, in copy_config
    config_file.copy()
  File "/usr/local/bin/kolla_set_configs", line 150, in copy
    self._merge_directories(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 92, in _merge_directories
    self._set_properties(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 117, in _set_properties
    self._set_properties_from_file(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 122, in _set_properties_from_file
    shutil.copystat(source, dest)
  File "/usr/lib64/python2.7/shutil.py", line 98, in copystat
    os.utime(dst, (st.st_atime, st.st_mtime))
OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'


I created a dirty hack around this (at least I think I did) by adding a ntpdate and hwclock sync in custom OS::TripleO::NodeUserData (tested by setting the hwclock on a node into the past and re-deploying).

Comment 40 Aram Alipoor 2019-02-26 12:10:31 UTC

As a workaround when faced a similar issue when installing undercloud we had to touch all files, dirs and symlinks, remove containers and re-run the install:


find /etc -exec touch -h {} +
docker rm -f $(docker ps -a -q)
rm -rf /var/lib/config-data/puppet-generated

openstack undercloud install --verbose

Comment 43 Brendan Shephard 2019-04-04 05:31:17 UTC

(In reply to Aram Alipoor from comment #40)
> As a workaround when faced a similar issue when installing undercloud we had
> to touch all files, dirs and symlinks, remove containers and re-run the
> install:
> 
> 
> find /etc -exec touch -h {} +
> docker rm -f $(docker ps -a -q)
> rm -rf /var/lib/config-data/puppet-generated
> 
> openstack undercloud install --verbose

Very interesting. I hit this on upstream Rocky twice now. And this does indeed fix the issue for the Undercloud. Possibly need a new BZ for this though. I'll create one if I can get a reliable reproducer.

Comment 44 Chris Smart 2019-04-11 01:41:03 UTC

If it's useful, this is how I worked around it by executing this on the first boot. Include -e userdata_env.yaml in deploy and set your ntp server variable in network-environment.yaml, e.g. NtpServer: 'your.ntp.server'


Contents of userdata_env.yaml:

resource_registry:
  OS::TripleO::NodeUserData: userdata_custom.yaml


Contents of userdata_custom.yaml:

parameters:
  NtpServer:
    description: NTP server to use to sync hw clock bz#1578849
    type: string
    default: pool.ntp.org

description: >
  Do stuff on first boot

resources:
  userdata:
    type: OS::Heat::MultipartMime
    properties:
      parts:
      - config: {get_resource: sync_hw_clock_config}

  sync_hw_clock_config:
    type: OS::Heat::SoftwareConfig
    properties:
      config:
        str_replace:
          template: |
            #!/bin/bash
            echo "pre" > /root/ntp.results
            echo "$NTPSERVER" >> /root/ntp.results
            date >> /root/ntp.results
            hwclock >> /root/ntp.results
            systemctl stop ntpd
            ntpdate $NTPSERVER
            hwclock --systohc
            echo "post" >> /root/ntp.results
            date >> /root/ntp.results
            hwclock >> /root/ntp.results
          params:
            $NTPSERVER: {get_param: NtpServer}

outputs:
  OS::stack_id:
    value: {get_resource: userdata}

Comment 45 Christopher Brown 2019-04-17 19:26:05 UTC

I'm re-opening this because I have also hit this issue and am not convinced the backport is in effect.

This is namely because in the upstream commit, I see:

  EnablePackageInstall:
    default: 'false'
    description: Set to true to enable package installation at deploy time
    type: boolean

therefore unless we are specifically setting this to boolean to true, this patch wont take effect. Does this make sense?

Comment 46 Alex Schultz 2019-04-17 19:35:39 UTC

Please file a new bug. Once it's been closed in errata we can't reopen this.   Also the EnablePackageInstall has no effect here because the package should already be on the image.  When you open a new bug, please include all the logs and reproducer information.

Comment 47 Christopher Brown 2019-04-17 19:39:12 UTC

(In reply to Alex Schultz from comment #46)
> Please file a new bug. Once it's been closed in errata we can't reopen this.
> Also the EnablePackageInstall has no effect here because the package should
> already be on the image.  When you open a new bug, please include all the
> logs and reproducer information.

Ok, we've had to workaround with the firstboot fix like other reporters so don't have logs for this any more.

The package might already be on the image but clearly its not being configured soon enough.

Comment 48 Alex Schultz 2019-04-17 20:01:22 UTC

It's likely that the hardware time itself is off when the system is provisioned.  The host_prep_tasks are run first in the software deployment so from a deployment framework standpoint it's about as early as we can get. We could try adding hwclock to the host prep tasks as well.  That's the only difference between the patch and what was mentioned in comment 44

Comment 49 ggrimaux 2020-01-22 17:59:49 UTC

If you come here because you have docker containers stuck in 'restarting' and you have the error:

INFO:__main__:Deleting /etc/nova/secret.xml
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/secret.xml to /etc/nova/secret.xml
ERROR:__main__:Unexpected error:
Traceback (most recent call last):
  File "/usr/local/bin/kolla_set_configs", line 411, in main
    execute_config_strategy(config)
  File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy
    copy_config(config)
  File "/usr/local/bin/kolla_set_configs", line 306, in copy_config
    config_file.copy()
  File "/usr/local/bin/kolla_set_configs", line 150, in copy
    self._merge_directories(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
    os.path.join(dest, to_copy))
  File "/usr/local/bin/kolla_set_configs", line 92, in _merge_directories
    self._set_properties(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 117, in _set_properties
    self._set_properties_from_file(source, dest)
  File "/usr/local/bin/kolla_set_configs", line 122, in _set_properties_from_file
    shutil.copystat(source, dest)
  File "/usr/lib64/python2.7/shutil.py", line 98, in copystat
    os.utime(dst, (st.st_atime, st.st_mtime))
OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'


You might be facing this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1794119

Note You need to log in before you can comment on or make changes to this bug.

abishop
adhingra
agurenko
ahrechan
alonare
anascko
aram.alipoor
aschultz
astupnik
bshephar
chjones
chris.brown
chris.smart
dalvarez
dciabrin
dhill
ggrimaux
gkumar
jamsmith
jlibosva
jstransk
m.andre
mburns
ovoshcha
rnoriega
sisadoun
sknauss
sputhenp
tonyb
uemit.seren