Bug 1744128 - After log rotation ironic-api container cannot access it's log file during a db sync.
Summary: After log rotation ironic-api container cannot access it's log file during a ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Cédric Jeanneret
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks: 1727808
TreeView+ depends on / blocked
 
Reported: 2019-08-21 12:23 UTC by Sofer Athlan-Guyot
Modified: 2019-09-17 15:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-17 15:49:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sofer Athlan-Guyot 2019-08-21 12:23:04 UTC
Description of problem: ironic-dbsync fails during an update from Beta-1.0  to passed_phased.

TASK [Ensure all online data migrations for Ironic have been applied] **********                                                                                                                                     
fatal: [undercloud-0]: FAILED! => {"changed": true, "cmd": "if podman ps | grep ironic_api; then\n  podman exec ironic_api ironic-dbsync --config-file /etc/ironic/ironic.conf online_data_migrations\n# handle situa
tion when container_cli is podman but\n# the containers are still under docker\nelif docker ps | grep ironic_api; then\n  docker exec ironic_api ironic-dbsync --config-file /etc/ironic/ironic.conf online_data_migr
ations\nfi\n", "delta": "0:00:01.685398", "end": "2019-08-20 12:01:32.036538", "msg": "non-zero return code", "rc": 1, "start": "2019-08-20 12:01:30.351140", "stderr": "Traceback (most recent call last):\n  File \
"/usr/bin/ironic-dbsync\", line 10, in <module>\n    sys.exit(main())\n  File \"/usr/lib/python3.6/site-packages/ironic/cmd/dbsync.py\", line 365, in main\n    service.prepare_service(sys.argv)\n  File \"/usr/lib/
python3.6/site-packages/ironic/common/service.py\", line 34, in prepare_service\n    log.setup(CONF, 'ironic')\n  File \"/usr/lib/python3.6/site-packages/oslo_log/log.py\", line 274, in setup\n    _setup_logging_f
rom_conf(conf, product_name, version)\n  File \"/usr/lib/python3.6/site-packages/oslo_log/log.py\", line 382, in _setup_logging_from_conf\n    filelog = file_handler(logpath)\n  File \"/usr/lib64/python3.6/logging
/handlers.py\", line 437, in __init__\n    logging.FileHandler.__init__(self, filename, mode, encoding, delay)\n  File \"/usr/lib64/python3.6/logging/__init__.py\", line 1032, in __init__\n    StreamHandler.__init
__(self, self._open())\n  File \"/usr/lib64/python3.6/logging/__init__.py\", line 1061, in _open\n    return open(self.baseFilename, self.mode, encoding=self.encoding)\nPermissionError: [Errno 13] Permission denie
d: '/var/log/ironic/ironic-dbsync.log'\nexit status 1", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/bin/ironic-dbsync\", line 10, in <module>", "    sys.exit(main())", "  File \"/usr/lib/
python3.6/site-packages/ironic/cmd/dbsync.py\", line 365, in main", "    service.prepare_service(sys.argv)", "  File \"/usr/lib/python3.6/site-packages/ironic/common/service.py\", line 34, in prepare_service", "  
  log.setup(CONF, 'ironic')", "  File \"/usr/lib/python3.6/site-packages/oslo_log/log.py\", line 274, in setup", "    _setup_logging_from_conf(conf, product_name, version)", "  File \"/usr/lib/python3.6/site-packa
ges/oslo_log/log.py\", line 382, in _setup_logging_from_conf", "    filelog = file_handler(logpath)", "  File \"/usr/lib64/python3.6/logging/handlers.py\", line 437, in __init__", "    logging.FileHandler.__init__
(self, filename, mode, encoding, delay)", "  File \"/usr/lib64/python3.6/logging/__init__.py\", line 1032, in __init__", "    StreamHandler.__init__(self, self._open())", "  File \"/usr/lib64/python3.6/logging/__i
nit__.py\", line 1061, in _open", "    return open(self.baseFilename, self.mode, encoding=self.encoding)", "PermissionError: [Errno 13] Permission denied: '/var/log/ironic/ironic-dbsync.log'", "exit status 1"], "s
tdout": "c651bbf9266f  192.168.24.1:8787/rhosp15/openstack-ironic-api:20190715.1                 dumb-init --singl...  26 hours ago  Up 26 hours ago         ironic_api", "stdout_lines": ["c651bbf9266f  192.168.24.
1:8787/rhosp15/openstack-ironic-api:20190715.1                 dumb-init --singl...  26 hours ago  Up 26 hours ago         ironic_api"]} 

If you look inside the container you get:

[stack@undercloud-0 ~]$ sudo podman exec -ti ironic_api  bash
()[root@undercloud-0 /]# ls -lrth /var/log/ironic/                                                                                                                                                                   
ls: cannot access '/var/log/ironic/ironic-dbsync.log': Permission denied                                                                                                                                             
ls: cannot access '/var/log/ironic/app.log': Permission denied                                                                                                                                                       
ls: cannot access '/var/log/ironic/ironic-conductor.log': Permission denied                                                                                                                                          
total 14M                                                                                                                                                                                                            
-?????????? ? ?      ?         ?            ? ironic-dbsync.log                                                                                                                                                      
-?????????? ? ?      ?         ?            ? ironic-conductor.log                                                                                                                                                   
-?????????? ? ?      ?         ?            ? app.log                                                                                                                                                                
drwxr-xr-x. 2 ironic ironic 4.0K Aug 19 10:44 deploy                                                                                                                                                                 
-rw-r--r--. 1 ironic ironic 261K Aug 19 12:01 app.log.2.gz
-rw-r--r--. 1 ironic ironic 581K Aug 19 16:00 ironic-conductor.log.3.gz
-rw-r--r--. 1 ironic ironic 378K Aug 20 00:00 ironic-conductor.log.2.gz
-rw-r--r--. 1 ironic ironic 1.5M Aug 20 00:00 app.log.1
-rw-------. 1 ironic ironic 1.4K Aug 20 00:00 ironic-dbsync.log.1
-rw-r--r--. 1 ironic ironic  11M Aug 20 08:01 ironic-conductor.log.1

The new log files are not readable inside the container.

Version-Release number of selected component (if applicable):


How reproducible: always, as long as logrotation happened.


Steps to Reproduce:
1. Deploy osp15-trunk/Beta-1.0 with infrared, no special configuration.

2. Wait a while for the logrotation to happen.

3. Try updating the undercloud to passed_phase2

Actual results: update fails because log files are not readable/writable


Expected results: update succeed and log files are in a good state.

Comment 2 Cédric Jeanneret 2019-08-21 14:36:27 UTC
Hello,

So after having checked on a running Beta-1.0 node, we can push tmpwatch and logrotate out of the way: the node I could access had several logrotate runs, and all the files were correct in it:
()[root@undercloud-0 /]# ls -l /var/log/ironic/
total 24940
-rw-r--r--. 1 ironic ironic  4531546 Aug 12 08:28 app.log
-rw-r--r--. 1 ironic ironic 11069098 Aug 12 06:00 app.log.1
-rw-r--r--. 1 ironic ironic   241921 Aug 10 00:01 app.log.10.gz
-rw-r--r--. 1 ironic ironic   237870 Aug  9 18:00 app.log.11.gz
-rw-r--r--. 1 ironic ironic   241811 Aug  9 12:00 app.log.12.gz
-rw-r--r--. 1 ironic ironic   237928 Aug  9 06:00 app.log.13.gz
-rw-r--r--. 1 ironic ironic   267218 Aug  9 00:00 app.log.14.gz
-rw-r--r--. 1 ironic ironic   241143 Aug 12 00:00 app.log.2.gz
-rw-r--r--. 1 ironic ironic   237409 Aug 11 18:01 app.log.3.gz
-rw-r--r--. 1 ironic ironic   242512 Aug 11 12:01 app.log.4.gz
-rw-r--r--. 1 ironic ironic   237308 Aug 11 06:01 app.log.5.gz
-rw-r--r--. 1 ironic ironic   242039 Aug 11 00:01 app.log.6.gz
-rw-r--r--. 1 ironic ironic   237481 Aug 10 18:00 app.log.7.gz
-rw-r--r--. 1 ironic ironic   243355 Aug 10 12:00 app.log.8.gz
-rw-r--r--. 1 ironic ironic   237220 Aug 10 06:00 app.log.9.gz
drwxr-xr-x. 2 ironic ironic      222 Aug  8 19:21 deploy
-rw-r--r--. 1 ironic ironic  3906253 Aug 12 08:28 ironic-conductor.log
-rw-r--r--. 1 ironic ironic   453537 Aug 12 00:00 ironic-conductor.log.1
-rw-r--r--. 1 ironic ironic   316220 Aug 11 23:01 ironic-conductor.log.2.gz
-rw-r--r--. 1 ironic ironic    14878 Aug 11 00:01 ironic-conductor.log.3.gz
-rw-r--r--. 1 ironic ironic   316446 Aug 10 23:01 ironic-conductor.log.4.gz
-rw-r--r--. 1 ironic ironic    15240 Aug 10 00:01 ironic-conductor.log.5.gz
-rw-r--r--. 1 ironic ironic   317110 Aug  9 23:00 ironic-conductor.log.6.gz
-rw-r--r--. 1 ironic ironic   204896 Aug  9 00:00 ironic-conductor.log.7.gz
-rw-------. 1 ironic ironic        0 Aug  9 00:00 ironic-dbsync.log
-rw-------. 1 ironic ironic      690 Aug  9 00:00 ironic-dbsync.log.1

Regarding tmpwatch, it is set to NOT delete .log files, among others:
tmpwatch --nodirs \
  -X "/var/log/containers/*/*log" \
  -X "/var/log/containers/*/*/*log" \
  -X "/var/log/containers/*/*err" \
  15 \
  /var/log/containers/ 2>&1 | logger -t container-tmpwatch

Small note: the "15" is wrong, newer version/compose will have the correction adding a missing "d", making it drop files after 15 days instead of hours.

The issue might be linked to the several ones we have with paunch/podman, for instance https://bugzilla.redhat.com/show_bug.cgi?id=1739224 - this has to be controlled.

Cheers,

C.

Comment 5 Sofer Athlan-Guyot 2019-09-09 12:54:22 UTC
I haven't experienced this issue anymore. I think until we've got a clear reproducer we can close that one.

Comment 6 Bob Fournier 2019-09-17 15:49:19 UTC
Closing this, please reopen if problem reoccurs.


Note You need to log in before you can comment on or make changes to this bug.