Bug 1463002 - ksh function calls hang in loop
ksh function calls hang in loop
Status: NEW
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ksh (Show other bugs)
6.9
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Siteshwar Vashisht
BaseOS QE - Apps
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-19 17:29 EDT by Greg Chandler
Modified: 2017-07-17 12:40 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-22 14:07:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Greg Chandler 2017-06-19 17:29:06 EDT
Description of problem:
External call of function hangs requiring kill to stop script.

Version-Release number of selected component (if applicable):
ksh-20120801-33.el6.x86_64

How reproducible:
100%
#!/bin/ksh
_urlencode ()
  {
####For every line in the string encode it properly.
####UNIX cut will take the same cut section from every input line, so you have to walk one line at a time.
####Note, unfortunately this method will always dump an extra line feed at the end, which must be cleaned up
####  outside of this function, due to the way it is called.
    string="$1"
    echo "$string" | while IFS= read -r line
      do
        x=1
        length=${#line}
        while [ $x -le $length ]
          do
            CHAR="`echo \"$line\" |cut -c$x`"
            if [ -z "`echo \"$CHAR\" |egrep \"[0-9]|[a-z]|[A-Z]|[._/-]\"`" ] ; then
              DECIMAL="`echo \"$CHAR\" |od -t d1 |awk '{printf \"%s\",\$2}'`"
              HEX="`echo \"obase=16; $DECIMAL\" |bc`"
              printf "%%%s" "$HEX"
            else
              printf "%s" "$CHAR"
            fi
            x=`expr $x + 1`
          done
        printf "%%%s" "0A"
      done
  }
ENCODED_STRING="`_urlencode \" Some ra^ndo#m string \"`"
echo "$ENCODED_STRING"


Steps to Reproduce:
1.Run above code
2.
3.

Actual results:
Inner while loop hangs after 1st character {$CHAR} is set.  Will not continue, does not eat CPU or RAM.  Acts as if waiting for input, but it isn't.
When called like this:
_urlencode "Some ra^ndo#m string"

It acts as expected, however it will not be able to capture the output.

Expected results:
Some%20ran%5Edo%23m%20stri&ng

Additional info:
This works properly on other ksh installations on other OSes.
Comment 2 Siteshwar Vashisht 2017-06-20 04:42:54 EDT
It does not wait on my system. Here is the output from my test system :

# rpm -q ksh
ksh-20120801-33.el6.x86_64

# cat test.sh 
#!/bin/ksh
_urlencode ()
  {
####For every line in the string encode it properly.
####UNIX cut will take the same cut section from every input line, so you have to walk one line at a time.
####Note, unfortunately this method will always dump an extra line feed at the end, which must be cleaned up
####  outside of this function, due to the way it is called.
    string="$1"
    echo "$string" | while IFS= read -r line
      do
        x=1
        length=${#line}
        while [ $x -le $length ]
          do
            CHAR="`echo \"$line\" |cut -c$x`"
            if [ -z "`echo \"$CHAR\" |egrep \"[0-9]|[a-z]|[A-Z]|[._/-]\"`" ] ; then
              DECIMAL="`echo \"$CHAR\" |od -t d1 |awk '{printf \"%s\",\$2}'`"
              HEX="`echo \"obase=16; $DECIMAL\" |bc`"
              printf "%%%s" "$HEX"
            else
              printf "%s" "$CHAR"
            fi
            x=`expr $x + 1`
          done
        printf "%%%s" "0A"
      done
  }
ENCODED_STRING="`_urlencode \" Some ra^ndo#m string \"`"
echo "$ENCODED_STRING"

# ksh test.sh 
%20Some%20ra%5Endo%23m%20string%20%0A
Comment 3 Greg Chandler 2017-06-20 11:35:39 EDT
I am able to reproduce the problem 100% of the time.
It's been tried on 30+ servers, all of which fail, and all of which are RHEL 6.9.  I haven't had a chance to test on other versions of RHEL yet, but it has worked on our AIX and Solaris systems.

What other info do we need to figure out what the problem might be?
Comment 4 Greg Chandler 2017-06-20 12:25:36 EDT
Ok, so I went to check again this morning, and the script above, as just a snippet does indeed work.  However the code it is being called from does not.

The function is copied verbatim from the larger script, and the ENCODED_STRING set only differs from the string being passed.
ENCODED_STRING="`_urlencode \" Some ra^ndo#m string \"`"
  vs
ENCODED_STRING="`_urlencode \"$2\"`"

I am starting to wonder now if a previously patched ksh problem is now to blame.  There was a bug I found in Solaris' ksh where the limit size for a job had been capped.  It seems that they decided that the script size + env + memory would be capped at an arbitrary number rather than allow for an unlimited size like ksh was designed specifically to do.

The script itself is 410,821 bytes, 8106 lines long.
------------------------------------------------------------------------------
Here is the top info for it once it is hung:

top - 11:13:04 up 29 days,  8:28,  3 users,  load average: 0.67, 0.71, 0.46
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.4%us,  2.8%sy,  0.0%ni, 86.6%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32752628k total, 28189840k used,  4562788k free,   399044k buffers
Swap:  3145724k total,        0k used,  3145724k free, 16630996k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18369 root      20   0  108m 4004 1528 S  0.0  0.0   0:01.79 ksh

------------------------------------------------------------------------------
A second run:

top - 11:17:59 up 29 days,  8:33,  3 users,  load average: 0.67, 0.70, 0.51
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.2%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32752628k total, 28177092k used,  4575536k free,   399064k buffers
Swap:  3145724k total,        0k used,  3145724k free, 16633400k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30161 root      20   0  108m 4000 1528 S  0.0  0.0   0:01.74 ksh

------------------------------------------------------------------------------

[root@ipvrw00a0006 ~]# lsof |grep 30161
ksh       30161      root  cwd       DIR              253,0        4096     524315 /var/tmp
ksh       30161      root  rtd       DIR              253,0        4096          2 /
ksh       30161      root  txt       REG              253,0     1507840     917663 /bin/ksh93
ksh       30161      root  mem       REG              253,0      161704     786454 /lib64/ld-2.12.so
ksh       30161      root  mem       REG              253,0     1930416     786472 /lib64/libc-2.12.so
ksh       30161      root  mem       REG              253,0       23088     786584 /lib64/libdl-2.12.so
ksh       30161      root  mem       REG              253,0      600048     786497 /lib64/libm-2.12.so
ksh       30161      root  mem       REG              253,0       18056     787004 /lib64/libutil-2.12.so
ksh       30161      root  mem       REG              253,0    99170352    1068621 /usr/lib/locale/locale-archive
ksh       30161      root    0u     unix 0xffff880119625780         0t0   41259139 socket
ksh       30161      root    1u      REG              253,0           2    1182388 /tmp/sf1c.j3a (deleted)
ksh       30161      root    2u      CHR              136,3         0t0          6 /dev/pts/3
ksh       30161      root   10r      DIR              253,0        4096     524315 /var/tmp
ksh       30161      root   11r      REG              253,0      410821    1180654 /tmp/hcm.sh
ksh       30161      root   12r      DIR              253,0        4096     524315 /var/tmp
ksh       30161      root   13u      CHR              136,3         0t0          6 /dev/pts/3
ksh       30161      root   14u      CHR              136,3         0t0          6 /dev/pts/3
ksh       30161      root   15r      DIR              253,0        4096     524315 /var/tmp
ksh       30161      root   16u      REG              253,0           0    1182380 /tmp/sf12.iih (deleted)

------------------------------------------------------------------------------

The exact code from the script:
_urlencode ()
  {
####For every line in the string encode it properly.
####UNIX cut will take the same cut section from every input line, so you have to walk one line at a time.
####Note, unfortunately this method will always dump an extra line feed at the end, which must be cleaned up
####  outside of this function, due to the way it is called.
    string="$1"
    echo "$string" | while IFS= read -r line
      do
        x=1
        length=${#line}
        while [ $x -le $length ]
          do
            CHAR="`echo \"$line\" |cut -c$x`"
            if [ -z "`echo \"$CHAR\" |egrep \"[0-9]|[a-z]|[A-Z]|[._/-]\"`" ] ; then
              DECIMAL="`echo \"$CHAR\" |od -t d1 |awk '{printf \"%s\",\$2}'`"
              HEX="`echo \"obase=16; $DECIMAL\" |bc`"
              printf "%%%s" "$HEX"
            else
              printf "%s" "$CHAR"
            fi
            x=`expr $x + 1`
          done
        printf "%%%s" "0A"
      done
  }

append_CA_super_field ()
  {
    echo "$1 = $2" |$TEE $CONLOG
    ENCODED_STRING="`_urlencode \"$2\"`"
    CLEANED_ENCODED_STRING="`echo \"$ENCODED_STRING\" |sed -e 's/%0A\$//g'`"
    CA_SUPER_FIELD_MAIN_VALUES="$CA_SUPER_FIELD_MAIN_VALUES^$1:$CLEANED_ENCODED_STRING"
  }

set_attributes ()
  {
####For a complete list of all attributes being set, do not forget to check all the calls to file_to_CA, as they reference this function
    output_header "Setting HPSA Custom Attributes"
    append_CA_super_field "ADDM_Keys" "$OR_ADDM_KEYS"
    append_CA_super_field "ADDM_User" "$OR_ADDM_USER"
####
#### About 200 more lines of the same kind of calls removed
####
  }

set_attributes is called near the end of the script.
Prior to it's call, OR_ADDM_KEYS="No" , OR_ADDM_USER=""
Comment 5 Greg Chandler 2017-06-20 12:33:49 EDT
I modified the function to include the 3 debug statements below:
The echos must be redirected to a file since the function takes the output and sends it to a variable rather than stdout


_urlencode ()
  {
####For every line in the string encode it properly.
####UNIX cut will take the same cut section from every input line, so you have to walk one line at a time.
####Note, unfortunately this method will always dump an extra line feed at the end, which must be cleaned up
####  outside of this function, due to the way it is called.
    string="$1"
    echo "$string" | while IFS= read -r line
      do
        x=1
        length=${#line}
        while [ $x -le $length ]
          do
            CHAR="`echo \"$line\" |cut -c$x`"
echo "$x: $CHAR" >>/tmp/lines
            if [ -z "`echo \"$CHAR\" |egrep \"[0-9]|[a-z]|[A-Z]|[._/-]\"`" ] ; then
              DECIMAL="`echo \"$CHAR\" |od -t d1 |awk '{printf \"%s\",\$2}'`"
              HEX="`echo \"obase=16; $DECIMAL\" |bc`"
echo "$x: HEX: $HEX" >>/tmp/lines
              printf "%%%s" "$HEX"
            else
echo "$x: CHAR: $CHAR" >>/tmp/lines
              printf "%s" "$CHAR"
            fi
            x=`expr $x + 1`
          done
        printf "%%%s" "0A"
      done
  }


[root@ipvrw00a0006 tmp]# ls -al /tmp/lines
ls: cannot access /tmp/lines: No such file or directory
[root@ipvrw00a0006 tmp]# ksh hcm.sh -pre
{standard output}
ADDM_Keys = No
{hang}

{ctrl-z}
^Z
[1]+  Stopped                 ksh hcm.sh -pre
[root@ipvrw00a0006 tmp]# ls -al /tmp/lines
-rw-r--r-- 1 root root 5 Jun 20 11:29 /tmp/lines
[root@ipvrw00a0006 tmp]# cat /tmp/lines
1: N
[root@ipvrw00a0006 tmp]#



This is the behavior I am seeing, and able to replicate.
For whatever reason which I am still having trouble figuring out, it gets 1 character into the encode loop and hangs.  it seemingly hangs in the if statement, but I am not 100% certain that the if statement actually is the hang point.

To show there are no aliases for echo/egrep:
[root@ipvrw00a0006 tmp]# alias
alias cp='cp -i'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
[root@ipvrw00a0006 tmp]#
[root@ipvrw00a0006 tmp]# ulimit
unlimited
[root@ipvrw00a0006 tmp]#
Comment 6 Greg Chandler 2017-06-20 12:36:02 EDT
I also wanted to drop a reminder, when this hangs at that point, ctrl-c does not work to stop the script.   The only way I can get out of it is to pause the process and then kill it:

crtl-z
kill %1
Comment 7 Greg Chandler 2017-06-20 12:53:09 EDT
I removed the code for AIX, to see if we were hitting a size limit like with the Oracle bug:
[root@ipvrw00a0006 tmp]# ls -al hcm.sh
-rw-r--r-- 1 x819713 usasa 370705 Jun 20 11:37 hcm.sh

But it didn't make that much of a difference, and the script still hung in the exact same place.


I then removed the AIX and HP/UX code as well:
[root@ipvrw00a0006 tmp]# ls -al hcm.sh
-rw-r--r-- 1 x819713 usasa 265511 Jun 20 11:46 hcm.sh


[root@ipvrw00a0006 tmp]# cat /tmp/lines
1: N

[root@ipvrw00a0006 tmp]# top -p 24611
top - 11:48:25 up 29 days,  9:03,  4 users,  load average: 0.35, 0.23, 0.21
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.4%sy,  0.0%ni, 98.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32752628k total, 28297756k used,  4454872k free,   399084k buffers
Swap:  3145724k total,        0k used,  3145724k free, 16640852k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24611 root      20   0  108m 3256 1528 S  0.0  0.0   0:01.32 ksh



And again it hung in the exact same place, the resident size of ram has reduced, and in the case of the Oracle bug, the script would have continued to a point where it hit the limit, and then died there.  Each of the global variables is combined after encoding into a single string to be sent to another application, so the reduced size of the environment in the script load itself, freed more ram for the final string.  This however is NOT happening here, so I have at least ruled that bug out.
Comment 8 Greg Chandler 2017-06-20 13:46:33 EDT
While testing...
RHEL 7 has the same issue, but much farther up in the script.  
In this case, there is a function, that executes all the lines of the function, but does not properly return.

message_parser ()
  {
####   $1 input file
####   Sort -u?
####   Pass/Fail
####   $ERROR_STRINGS is a PIPE demlimited list
    if [ -f "$1" ] ; then
      for CHECKS in `echo "$ERROR_STRINGS" |tr "|" " "`
        do
          output_header "$1 $CHECKS logs"
          LINES="`cat \"$1\" 2>/dev/null |grep -i \"$CHECKS\" |wc -l |sed -e 's/ //g'`"
          echo "$1 has $LINES lines with $CHECKS within it." |$TEE $CONLOG
          LINES="`cat \"$1\" 2>/dev/null |grep -i \"$CHECKS\" |sort -uk 6 |wc -l |sed -e 's/ //g'`"
####  Create a ratio of found lines to the max lines, and print that many only.  Add one for the case that it returns 0 but should have been rounded upwards.
          GET_LINES="`echo \"scale=4; $LINES/$MAX_LINES*100+1\" |bc |awk -F\".\" '{print \$1}'`"
          echo "The unique lines, {up to $GET_LINES lines}, of type \"$CHECKS\" in $1 are:" |$TEE $CONLOG
          grep -i "$CHECKS" "$1" |sort -uk 6 |$TAIL -"$GET_LINES" |$TEE $CONLOG
        done
    fi
echo "exiting"
  }

mesg_dmesg_check ()
  {
####  This is meant for dmesg...  AIX needs something like alog -o -t console which looks completely different
    if [ ! "$OS" = "AIX" ] ; then
      dmesg >$DMESG_LOG
      message_parser "$DMESG_LOG"
      retain_messages "$DMESG_LOG" "dmesg_Log" "$CA_SUPER_FIELD_LOGS"
    fi
    message_parser "$MESSAGE_LOG"
echo "before"
    retain_messages "$MESSAGE_LOG" "Error_Log" "$CA_SUPER_FIELD_LOGS"
echo "after"
    \rm "$DMESG_LOG" 2>/dev/null
  }




I see "exiting", but not "before"



top - 12:43:48 up 5 days, 14:50,  3 users,  load average: 0.04, 0.15, 0.18
Tasks:   1 total,   0 running,   0 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16268312 total,  8696812 free,  1429144 used,  6142356 buff/cache
KiB Swap: 16777212 total, 16777212 free,        0 used. 14466576 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21753 root      20   0  119096   4120   1636 T   0.0  0.0   0:00.25 ksh





I am noticing that the RAM usage on RHEL 7 is also higher, and I am starting to wonder again if there is indeed a limit being placed somewhere.  However, this thing should be crashing/exiting or seg-faulting somewhere if it hit a hard limit.  There are no core files anywhere on the systems. {RHEL 6 or 7}
Comment 9 Siteshwar Vashisht 2017-06-22 10:40:29 EDT
You should install debuginfo symbols for ksh (debuginfo-install ksh), reproduce the problem and execute 'pstack <ksh process id>" to check where it hangs in the code.
Comment 10 Greg Chandler 2017-06-22 13:53:38 EDT
These servers are all in a secure environment.  I neither have access to, or have the legal ability to install those packages.
Comment 11 Siteshwar Vashisht 2017-06-22 14:07:03 EDT
I have not been able to reproduce this bug and it will not be possible for me to make any progress with current data. Please reopen if you are able to provide a reproducer.
Comment 12 Greg Chandler 2017-06-22 14:12:51 EDT
I don't appreciate the ticket being closed while we are still actively looking for a solution.  I was able to find a system in a non-secure environment, to run this on, and here is the stack trace:


[root@dev1a ~]# pstack 8736
#0  0x0000003a63aacbd0 in __pause_nocancel () from /lib64/libc.so.6
#1  0x000000000047265c in sleep ()
#2  0x000000000047274f in sh_delay ()
#3  0x0000000000472904 in b_sleep ()
#4  0x000000000045e0f6 in sh_exec ()
#5  0x0000000000407660 in exfile ()
#6  0x0000000000407e70 in sh_main ()
#7  0x0000003a63a1ed1d in __libc_start_main () from /lib64/libc.so.6
#8  0x0000000000406b39 in _start ()
Comment 13 Siteshwar Vashisht 2017-06-22 14:29:24 EDT
The debuginfo symbols are missing from your system, but the stacktrace shows that sleep builtin is being executed. I would suggest you to look at any calls to 'sleep' builtin in your scripts.
Comment 14 Greg Chandler 2017-06-22 14:51:35 EDT
My apologies...  I ran a trace on the tripwire fork and not the main execution.
Here is the proper trace:
[root@dev1a ~]# pstack 8698
#0  0x0000003a63aac82e in waitpid () from /lib64/libc.so.6
#1  0x0000000000428fca in job_reap ()
#2  0x0000000000429fc6 in job_wait ()
#3  0x000000000045cd64 in sh_exec ()
#4  0x0000000000454448 in sh_subshell ()
#5  0x00000000004377cf in comsubst ()
#6  0x00000000004371c3 in copyto ()
#7  0x000000000043874d in sh_macexpand ()
#8  0x000000000047a3bb in sh_argbuild ()
#9  0x0000000000459637 in sh_exec ()
#10 0x000000000045b153 in sh_exec ()
#11 0x000000000045992b in sh_exec ()
#12 0x000000000045c91b in sh_exec ()
#13 0x000000000045992b in sh_exec ()
#14 0x000000000045c91b in sh_exec ()
#15 0x000000000045a2fa in sh_exec ()
#16 0x000000000045b3b5 in sh_exec ()
#17 0x000000000045994a in sh_exec ()
#18 0x000000000046daed in b_dot_cmd ()
#19 0x000000000045ed75 in sh_funct ()
#20 0x000000000045aca2 in sh_exec ()
#21 0x0000000000454448 in sh_subshell ()
#22 0x00000000004377cf in comsubst ()
#23 0x00000000004371c3 in copyto ()
#24 0x00000000004374ea in sh_mactrim ()
#25 0x000000000044048f in nv_setlist ()
#26 0x0000000000459ed3 in sh_exec ()
#27 0x000000000045992b in sh_exec ()
#28 0x000000000046daed in b_dot_cmd ()
#29 0x000000000045ed75 in sh_funct ()
#30 0x000000000045aca2 in sh_exec ()
#31 0x000000000045992b in sh_exec ()
#32 0x000000000046daed in b_dot_cmd ()
#33 0x000000000045ed75 in sh_funct ()
#34 0x000000000045aca2 in sh_exec ()
#35 0x000000000045992b in sh_exec ()
#36 0x0000000000459afb in sh_exec ()
#37 0x000000000045992b in sh_exec ()
#38 0x000000000045b17a in sh_exec ()
#39 0x0000000000407660 in exfile ()
#40 0x0000000000407e70 in sh_main ()
#41 0x0000003a63a1ed1d in __libc_start_main () from /lib64/libc.so.6
#42 0x0000000000406b39 in _start ()
Comment 15 Siteshwar Vashisht 2017-06-22 15:25:01 EDT
debuginfo symbols are still missing from the stacktrace. 

Do you have a Red Hat support ticket for this bug ? If not, I would suggest you to start by opening a ticket with Red Hat Customer Support. They will help you in narrowing down this issue.
Comment 16 Greg Chandler 2017-06-22 15:30:56 EDT
[root@dev1a ~]# rpm -qa |grep debuginfo
ksh-debuginfo-20120801-33.el6.x86_64

[root@dev1a ~]# ps -aux |grep hcm.sh
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      8698  0.0  0.0 110792  3248 pts/3    S+   13:07   0:01 ksh hcm.sh -pre
root     28125  0.0  0.0 103328   888 pts/5    S+   14:27   0:00 grep hcm.sh

[root@dev1a ~]# pstack 8698
#0  0x0000003a63aac82e in waitpid () from /lib64/libc.so.6
#1  0x0000000000428fca in job_reap ()
#2  0x0000000000429fc6 in job_wait ()
#3  0x000000000045cd64 in sh_exec ()
#4  0x0000000000454448 in sh_subshell ()
#5  0x00000000004377cf in comsubst ()
#6  0x00000000004371c3 in copyto ()
#7  0x000000000043874d in sh_macexpand ()
#8  0x000000000047a3bb in sh_argbuild ()
#9  0x0000000000459637 in sh_exec ()
#10 0x000000000045b153 in sh_exec ()
#11 0x000000000045992b in sh_exec ()
#12 0x000000000045c91b in sh_exec ()
#13 0x000000000045992b in sh_exec ()
#14 0x000000000045c91b in sh_exec ()
#15 0x000000000045a2fa in sh_exec ()
#16 0x000000000045b3b5 in sh_exec ()
#17 0x000000000045994a in sh_exec ()
#18 0x000000000046daed in b_dot_cmd ()
#19 0x000000000045ed75 in sh_funct ()
#20 0x000000000045aca2 in sh_exec ()
#21 0x0000000000454448 in sh_subshell ()
#22 0x00000000004377cf in comsubst ()
#23 0x00000000004371c3 in copyto ()
#24 0x00000000004374ea in sh_mactrim ()
#25 0x000000000044048f in nv_setlist ()
#26 0x0000000000459ed3 in sh_exec ()
#27 0x000000000045992b in sh_exec ()
#28 0x000000000046daed in b_dot_cmd ()
#29 0x000000000045ed75 in sh_funct ()
#30 0x000000000045aca2 in sh_exec ()
#31 0x000000000045992b in sh_exec ()
#32 0x000000000046daed in b_dot_cmd ()
#33 0x000000000045ed75 in sh_funct ()
#34 0x000000000045aca2 in sh_exec ()
#35 0x000000000045992b in sh_exec ()
#36 0x0000000000459afb in sh_exec ()
#37 0x000000000045992b in sh_exec ()
#38 0x000000000045b17a in sh_exec ()
#39 0x0000000000407660 in exfile ()
#40 0x0000000000407e70 in sh_main ()
#41 0x0000003a63a1ed1d in __libc_start_main () from /lib64/libc.so.6
#42 0x0000000000406b39 in _start ()

I'm not sure what else to do, the debuginfo package was installed prior to the execution that is currently hanging {I've left it hung so that it stays the same process ID, currently hung 81 minutes}.

Was there something else I missed?
Comment 17 Greg Chandler 2017-07-17 12:40:12 EDT
Ok, so the code above is still breaking as-is.

I have found another method which works, which leads me to believe this is a stream/pipe bug, possibly in libreadline, or the shell's use of it.

    old_IFS=$IFS
####IFS should be set to $'\n' but it does not properly translate on some OSes, so it is a hard CR here
    IFS=$'
'
    for line in $1
      do
        x=1
        length=${#line}
        while [ $x -le $length ]
          do
            CHAR="`echo \"$line\" |cut -c$x`"
            if [ -z "`echo \"$CHAR\" |egrep \"[0-9]|[a-z]|[A-Z]|[._/-]\"`" ] ; then
              DECIMAL="`echo \"$CHAR\" |od -t d1 |awk '{printf \"%s\",\$2}'`"
              HEX="`echo \"obase=16; $DECIMAL\" |bc`"
              printf "%%%s" "$HEX"
            else
              printf "%s" "$CHAR"
            fi
            x=`expr $x + 1`
          done
        printf "%s" "%0A"
       done
    IFS=$old_IFS
  }


The IFS set has to be done with an actual carriage return, as this script needs to work on about 32 variants of Unix, and the \n is not interpreted properly on all of them.

For some reason, modifying the IFS statement and running a for loop, works perfectly.  The while loop above takes an echo in as a pipe, and from what I can tell, it hangs in the background waiting on input from read.
the IFS=  does not properly set in the while loop like it is supposed to, and I assume that is the reason it seems to be waiting on read. 

My best guess is that read is waiting on user input inside of a forked process, and that process lost it's way to send that back to the invoking shell.  Maybe not, but at this point it's all I've got.

Since this is 100% repeatable for me on all my RHEL 6 boxes, and the for loop does work, this is definitely a bug somewhere in ksh or read.

Note You need to log in before you can comment on or make changes to this bug.