921537 – rhc threaddump have wrong log file prompt

Bug 921537 - rhc threaddump have wrong log file prompt

Summary: rhc threaddump have wrong log file prompt

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	1.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Luke Meyer
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	922844 923405 1244545
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-14 11:53 UTC by nsun
Modified:	2017-03-08 17:34 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openshift-origin-cartridge-ruby-1.8-1.0.8-1, openshift-origin-cartridge-ruby-1.9-scl-1.0.9-1
Doc Type:	Bug Fix
Doc Text:	rhc threaddump against a Ruby app provides thread dumps in a rotated server error log with the file name containing a time stamp. The file reported did not always match the actual log file due to time zone complexities. This bug has been fixed by specifying the file with a blob uniquely identifying the log file but not containing a time stamp.
Clone Of:
Environment:
Last Closed:	2013-04-02 08:50:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0694	0	normal	SHIPPED_LIVE	OpenShift Enterprise 1.1.3 bug fix update	2013-04-02 12:49:10 UTC

Description nsun 2013-03-14 11:53:12 UTC

Description of problem:
After setup an app to threaddump mode, It will prompt our the wrong log file.

Version-Release number of selected component (if applicable):
Puddle: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-13.1/

How reproducible:
alaws

Steps to Reproduce:
1.Given a ruby-1.9 app created
# rhc app create r19 ruby-1.9 -predhat
2.Threaddump the app
# rhc threaddump r19 -predhat
Application event 'thread-dump' successful
Success
The thread dump file will be available via: rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EDT -o '-n 250'
3. Saw thread dump file 
# rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EDT -o '-n 250' -predhat
Actual Result:
Could not find any files matching glob

Expect Result:
successed


Additional info:
 I found there is "error_log-20130314-000000-EST" file exist when I login to app.
 and run command blow successed:
 # rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EST -o '-n 250' -predhat
 But the timezone of node's system is EDT.
 [root@node2 openshift]# date
Thu Mar 14 07:42:42 EDT 2013

Comment 2 Luke Meyer 2013-03-14 13:31:39 UTC

I know we saw a bug like this before. I think it was fixed such that the broker's timezone is used and we assume the node's is the same.

Are both broker and node timezone EDT?

I wonder if it's just that the app creating the log hasn't updated its idea of what the timezone is since last week. A restart might fix it but that's not a great customer solution....

Perhaps we should recommend using GMT.

Comment 3 Brenton Leanhardt 2013-03-14 13:37:36 UTC

Yeah, switching to GMT/UTC would probably be the best thing going forward.  Is the timezone in the logfiles coming from the OS?

Comment 4 Luke Meyer 2013-03-14 16:21:04 UTC

It is coming from the OS. Specifically this bit in /usr/libexec/openshift/cartridges/ruby-1.9/info/hooks/threaddump:

    DATE=`date -u '+%Y%m%d'`
    ZONE=`date +%Z`
    client_result "Success"
    client_result ""
    client_result "The thread dump file will be available via: rhc tail ${application} -f ${cartridge_type}/logs/error_log-$DATE-000000-$ZONE -o '-n 250'"

So... very mysterious that this didn't work. It works fine for me. I wonder if there's some way in which date can get confused about the TZ having changed. Perhaps an env var it reads?

Comment 5 nsun 2013-03-15 10:51:04 UTC

Verify it using manually.

1. Inquiry the pid which created the file named error_log-*-000000-EST: 

[root@node1 openshift]# lsof  /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d//ruby-1.9/logs/error_log-20130315-000000-EST
COMMAND     PID                             USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
rotatelog 27887 8878e0f829d64346b3766cf96cae0f1d    3w   REG  253,0     5231 279483 /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d//ruby-1.9/logs/error_log-20130315-000000-EST
found pid is 27887

2. Inquiry the command execused by the pid 27887
[root@node1 openshift]# ps -ef |grep 27887
1617     27887 27884  0 02:09 ?        00:00:00 /usr/sbin/rotatelogs /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d/ruby-1.9/logs/error_log-%Y%m%d-%H%M%S-%Z 86400
Found generate "error_log-*-000000-EST" command:
/usr/sbin/rotatelogs $file $rotate_rules

3.Exec this command manually:
/usr/sbin/rotatelogs /var/log/redhat-%Y%m%d-%H%M%S-%Z 86400

4.check the logfile gereted by last command:
[root@node1 openshift]# ls /var/log/redhat-*
/var/log/redhat-20130315-000000-EST

5.show local date:
[root@node1 openshift]# date
Fri Mar 15 06:48:27 EDT 2013

So, I think it caused by httpd log tools rotatelogs.

Comment 7 Luke Meyer 2013-03-18 16:33:20 UTC

Looks like a bug in rotatelogs. I filed one: bug 922844 - and BTW it's the same whether using JBoss or RHEL httpd.

Seems to me we could work around this a couple ways:
1) Configure the system with UTC or another timezone that doesn't change with DST.
2) Edit the cartridges so that configs don't include the TZ in the logfile name. I don't see that it really helps anything to have it. But the change would be rather tedious and we should also check whether upstream changes make this moot after the rebase.

Comment 8 Brenton Leanhardt 2013-03-18 17:09:45 UTC

I'm a bit confused regarding the severity of this bug.  When do the logs get rotated? daily?

If someone performs a thread dump is the problem only reproduced if the wait until after the logs are rotated?

Comment 9 Luke Meyer 2013-03-18 17:18:29 UTC

Agree, lowering severity.

The bug doesn't really have anything to do with rotation, it's just that rotatelogs is setting the file name on the log, and it's using the wrong timezone, so the name that the node thinks the log should be according to node TZ (blah-log-EDT) is not the name rotatelogs is actually giving it (blah-log-EST). We wouldn't notice while DST is not in effect.

Comment 10 Brenton Leanhardt 2013-03-18 17:44:28 UTC

Interesting, now I understand how rotatelogs is being used.  Interestingly enough Online seems to have this problem too.

I just logged into my app and 'rhc tail' didn't work because it couldn't find any matching files.  However I do see access_log-20130318-000000-EST instead of EDT.

Comment 11 Luke Meyer 2013-03-18 18:12:44 UTC

In the past this was actually hardcoded to use EST. Obviously that was wrong and we had an enterprise bug 855264 and fixed it to use the actual timezone, upstream too. Only now it's wrong for 7 months out of the year.

Ideally the rotatelogs bug would be fixed but no idea when to expect that.

Comment 12 Luke Meyer 2013-03-18 19:11:02 UTC

Actually even with rotatelogs correct, I think we would see a mismatch on the two days a year where DST changes. So I lean toward either moving to UTC or just removing the TZ. I think including UTC in the name is clearer.

Comment 13 Luke Meyer 2013-03-19 19:44:58 UTC

Actually now I lean away from trying to fix the name, and toward making the logic for finding the name less fragile. Either have the node logic find the actual file name, or supply it as a glob that won't be broken by timezone.

Comment 14 Luke Meyer 2013-03-21 13:57:27 UTC

Problem really only existed in the ruby cartridges, where the thread dump goes to rotated server logs. I decided it was simplest to just supply a glob that should uniquely find the file without being TOO specific.

Comment 16 nsun 2013-03-22 10:04:40 UTC

[jenkins2@qe-auto-slave ~]$ rhc threaddump r19 -predhat
Application event 'thread-dump' successful
Success

The thread dump file will be available via: rhc tail r19 -f ruby-1.9/logs/error_log-20130322-* -o '-n 250'
[jenkins2@qe-auto-slave ~]$ rhc tail r19 -f ruby-1.9/logs/error_log-20130322-* -o '-n 250'
Password: ******
[Fri Mar 22 02:53:29 2013] [notice] SELinux policy enabled; httpd running as context system_u:system_r:openshift_t:s0:c1,c387
[Fri Mar 22 02:53:29 2013] [notice] mod_bw : Memory Allocated 32 bytes (each conf takes 32 bytes)
[Fri Mar 22 02:53:29 2013] [notice] mod_bw : Version 0.8 - Initialized [1 Confs]
[Fri Mar 22 02:53:29 2013] [notice] Digest: generating secret for digest authentication ...
...

Prompt is ok on puddle 2013-03-21.1

Comment 18 errata-xmlrpc 2013-04-02 08:50:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0694.html

Note You need to log in before you can comment on or make changes to this bug.