Thursday, December 13, 2012

Email interface to Blogger

Somehow it had slipped my attention that there is an email interface for Blogger.

It works very easily:

Setup in Blogger

  • in Settings -> Mobile and email you first have to choose an email address like first.last.xxxxx@blogger.com where xxxxx are some secret words (supposedly only known by you)
  • you can choose whether emails sent to this address are posted right away, saved as draft or the mechanism is disabled.

    Blog creation

    Now you can use any email client to blog.
  • To: first.last.xxxxx@blogger.com
  • Subject: The title of the blog
  • The email body in HTML will be your blog content.


    Very nice and simple but there is no security check and the email address could be guessed (maybe not likely but also not impossible) i.e. anyone could send anything. Therefore it is advisable to choose save as draft.

    I found this when looking for a way to use LibreOffice as a blog writing tool, thanks to this hint, here are the steps again.

    Setup in LibreOffice

    (on a Mac)
    In Preferences -> Internet -> E-mail browse the applications and choose /Applications/Thunderbird.app (or any other email client) and press OK.
    Write the blog and File -> Save As HTML
    Choose File -> Send -> Document as E-mail ... and an email client Write window should open with your LibreOffice document attached. Fill in recipient and subject and send. Done.
  • Saturday, December 8, 2012

    Some beginners ideas about JavaScript mouse event handling in different browsers

    From the very beginning JavaScript had to fight the various browser incompatibilities and this is still an issue today.

    In this blog I'd like to highlight some of the issues with JavaScript mouse events.
    Nothing new for the experienced JavaScript developer but an issue of confusion for newcomers. And I won't go into the IE and Microsoft event model (look here for a good description) differences but restrict myself to Firefox and Safari.

    My experimental setup consists of two SELECT boxes called Left and Right. The Left box contains one OPTION element. The boxes and the option are coloured to highlight their boundaries. A TEXTAREA shows the log of events (and can be cleared with the Clear log button).

    The SELECT boxes have been set up to capture MOUSEOVER, MOUSEDOWN and MOUSEUP events. There is one line in the log for each event consisting of:

    • the type of event (MOUSEOVER etc.)
    • the type of element which caused the event (OPTION or SELECT)
    • the name (if SELECT) or value (if OPTION) of the event
    • the element which triggered the mouse event handler

    Left
    Right
    Log of mouse events


    Experiment 1a

    Put your mouse pointer at the title Left and move the mouse down to the blue area of the Left box in a normal tempo.
    Watch the event log.
    You see three MOUSEOVER events:
  • one for the SELECT box: this is because there is a tiny blue area between the top of the SELECT box and the top of the OPTION box.
  • one for the OPTION and
  • one for the SELECT box again.
    No surprises so far, here are the results (I tested onn Firefox 16.0.2 and Safari 5.0.6)

    MOUSEOVER SELECT Left (triggered by Left)
    MOUSEOVER OPTION item 1 (triggered by Left)
    MOUSEOVER SELECT Left (triggered by Left)
    

    Eperiment 1b

    Like before but move your mouse fast. You'll notice that the first (and sometimes even the second) MOUSEOVER event has disappeared i.e. you need to understand that the ways users move the mouse influence which events are fired. Some events don't show up even when you think they should have happened. If you implement dependencies on certain events you might go wrong if these events don't occur.


    Experiment 2

    In this experiment I explore the MOUSEDOWN and MOUSEUP events, in particular when there is some more action in between DOWN and UP.

    2a

    Start again at the title Left and move your mouse onto the OPTION box, click and release the left mouse button.
    There should be two MOUSEOVER events like before for SELECT and OPTION (which I will not display below) and two new events for the OPTION element: one MOUSEDOWN and one MOUSEUP
    MOUSEDOWN OPTION item 1 (triggered by Left)
    MOUSEUP OPTION item 1 (triggered by Left)
    

    2b

    Start again at the title Left and move your mouse onto the OPTION box, click the left mouse button and hold it, drag the mouse further down into the blue area of Left and release the mouse button.
  • The MOUSEDOWN event should be like before.
  • There is one new MOUSEOVER event for entering the SELECT box and the MOUSEUP event has changed: its target is now the SELECT box rather than the OPTION box like before.

    MOUSEDOWN OPTION item 1 (triggered by Left) 
    MOUSEOVER SELECT Left (triggered by Left) 
    MOUSEUP SELECT Left (triggered by Left)
    

    2c

    Start again at the title Left and move your mouse onto the OPTION box, click the left mouse button and hold it, drag the mouse to the right into the blue area of the second SELECT box Right and release the mouse button there.

    The first MOUSEOVER event should be like before.
    Then there are three events:

  • a MOUSEOVER in Left for entering the SELECT box after leaving the OPTION box to the right
  • a MOUSEOVER in Right for entering the second SELECT box
  • a MOUSEUP event.

    Here we find important differences in the way browsers handle these events:

  • the order in which the events are logged is different
  • the target of the MOUSEUP event is different

    Firefox 16.0.2
    MOUSEDOWN OPTION item 1 (triggered by Left) 
    MOUSEOVER SELECT Left (triggered by Left) 
    MOUSEUP SELECT Left (triggered by Left) 
    MOUSEOVER SELECT Right (triggered by Right) 
    
    Safari 5.0.6
    MOUSEDOWN OPTION item 1 (triggered by Left)
    MOUSEOVER SELECT Left (triggered by Left)
    MOUSEOVER SELECT Right (triggered by Right)
    MOUSEUP SELECT Right (triggered by Right)
    
    Compared to Firefox the last two events are switched and the MOUSEUP event's target is Right (not Left) i.e. during the MOUSEUP event Safari takes the current position of the mouse as an event target whereas Firefox takes the old element where the original MOUSEDOWN happened.

    Conclusion: this blog just showed the tip of iceberg when it comes to mouse event handling in JavaScript. Various browsers or browser releases and various event models make it very difficult to write JavaScript programs which are generally valid. The experienced JavaScript developer will point to libraries like jQuery which simplify a lot of these incompatibilities and their study is worth the effort but I always think it is nice to experience some of these issues directly, makes the appreciation of good libraries even higher.


    For those interested here is the full code:
    <FORM ACTION="nothing" METHOD="POST" NAME="FormObject">
    <TABLE BORDER>
    <TR><TD ALIGN="CENTER">
    Left<BR>
    <SELECT NAME="Left" 
    ONMOUSEDOWN="handle_event(event)" 
    ONMOUSEOVER="handle_event(event)" 
    ONMOUSEUP="handle_event(event)" 
    SIZE="5" STYLE="background: #0000ff; width: 80px;">
    <OPTION NAME="Option 1" STYLE="background: #ffff00;" VALUE="item 1">Item 1</OPTION>
    </SELECT>
    
    </TD><TD ALIGN="CENTER">
    Right<BR>
    <SELECT NAME="Right" 
    ONMOUSEDOWN="handle_event(event)" 
    ONMOUSEOVER="handle_event(event)" 
    ONMOUSEUP="handle_event(event)" 
    SIZE="5" STYLE="background: #0000ff; width: 80px;">
    </SELECT>
    
    </TD><TD VALIGN="TOP">Log of mouse events<BR>
    <TEXTAREA COLS="50" NAME="TextObject" ROWS="10"></TEXTAREA>
    </TD><TD VALIGN="CENTER">
    <INPUT ONCLICK="text.value=''" TYPE="button" VALUE="Clear log" >
    </TD></TR>
    </TABLE>
    </FORM>
    
    <SCRIPT>
    var text        = document.FormObject.TextObject;
    function handle_event(e) {
            var name;
            var target      = e.target ;
            var curr        = e.currentTarget ;
    
            if(target.nodeName=="OPTION") {
                    name    = target.value
            } else {
                    name    = target.name
            }
            text.value      = text.value
                    + e.type.toUpperCase()
                    + " " + target.nodeName + " " + name
                    + " (triggered by " + curr.name +  ")"
                    + "\n";
    }
    </SCRIPT>
    


    Note: since events seem to be prevented in Blogger preview I needed to test the JavaScript pieces outside of the editing process, a little cumbersome and potentially dangerous when the functionality of your blog can only be fully tested after publishing, oh well.
  • Tuesday, November 27, 2012

    A little script driven two stage dialog with zenity

    On Solaris and Linux there is a little utility called zenity which displays a dialog window. It has various options from simply displaying a string to showing a file dialog progress. I sometimes use it if I write a script but don't want the user to follow the output of the script in a terminal, especially if the script is going to be used by a larger group of mainly non-technical users one needs a GUI dialog. zenity is not very fancy in its capabilities, configurability or design but it serves its purpose.

    In this article I want to show how to use zenity several times in a script.

  • At first the users are presented with a list of servers to choose from and
  • secondly there is a list of applications to be selected. The list of applications and the geometry of the zenity window depend on the previous selection.
  • Finally in a third invocation of zenity there is a summary window shown.

    The task has been split into two files.
    One is a config file which contains the definitions of the various lists.
    The second file is the actual script. Since I'm using arrays to define the lists I'm using bash in this example.
    (Note: originally the script ran on Solaris 10 in ksh but the 'echo ... | zenity' construct somehow did not on Linux.)

    Here is the config file

    #!/bin/echo Error:_this_needs_to_be_sourced_in:
    #
    
    # A set of arrays (for ksh)
    # (server and domain names are fake of course)
    
      ALIAS[1]="Japan server"
    MACHINE[1]="server1.japan"
       MENU[1]='Finance
    Logistics
    Sales'
      MSIZE[1]="--height=220 --width=330"
    
      ALIAS[2]="UK server"
    MACHINE[2]="serv2a.uk"
       MENU[2]='Finance
    Sales'
      MSIZE[2]="--height=190 --width=330"
    
      ALIAS[3]="US server"
    MACHINE[3]="newserv.us"
       MENU[3]='Finance
    Logistics
    Marketing
    Sales'
      MSIZE[3]="--height=220 --width=330"
    
    # Set the number of array elements
    numElements=3
    
    The config file defines four arrays:
  • ALIAS is nickname of the actual server and will be displayed in the first menu
  • MACHINE is the actual machine name
  • MENU is the list of choices in the second zenity window
  • MSIZE is the geometry setting for this menu. Height varies. Finally there is a variable numElements which defines the length of the arrays, 3 in this case.

    Here is the script:

    #!/bin/bash 
    
    # The config file 
    #   Should be in the same directory as the script.
    #   Could of course be passed as an argument too.
    ########################################
    CONFIGFILE=`/usr/bin/dirname $0`/zenity.config
    
    # Check if the config file exists
    ########################################
    [ -r $CONFIGFILE ] || { zenity --error --text "Config file does not exist: $CONFIGFILE" ; exit 1 ; }
      
    # Source the config file
    ########################################
    . $CONFIGFILE   || { zenity --error --text "Error while sourcing config file: $CONFIGFILE" ; exit 1 ; }
    
    # Step 1: select an option out of the list of aliases
    ########################################
    selection=`\
    ind=1
    while [ $ind -le $numElements ] ; do
            echo "${ALIAS[$ind]}"   # Piped to zenity
            ind=$((ind+1))
    done | zenity --list --height=200 --width=350 \
    --column="Choose your Server (highlight and press OK)" \
    --title="Remote Server Launcher" \
    `
       
    # If zenity has been canceled or nothing selected then exit
    ########################################
    [ $? -eq 0 -a "x$selection" != "x" ] || exit 1
      
    # Now map the user selection to a machine name
    # and show the machine specific menu
    ########################################
    ind=1
    while [ $ind -le $numElements ] ; do
            if [ "x$selection" = "x${ALIAS[$ind]}" ] ; then
                    server=${MACHINE[$ind]}
    
                    # Stage 2: select the window size and a specific application
                    ########################################
                    selection=`\
                    echo "${MENU[ind]}" | zenity --list ${MSIZE[$ind]} \
                    --column="Application (highlight one and press OK)" \
                    --title="Remote Server Launcher" \
                    `
                    # If zenity has been canceled then exit
                    [ $? -eq 0 ] || exit 1
                    break
            fi
            ind=$((ind+1))
    done
    
    # Just for information purposes
    ########################################
    zenity --info --text "You chose:\nServer: ${ALIAS[$ind]} (${MACHINE[$ind]})\nApplication:  $selection  "
    

    A few notes:

  • zenity is also used to display error messages
  • the while loops could be replaced by for ind in {1..$numElements} in ksh but this does not work in bash
  • the zenity windows are invoked inside backticks in order to capture their output
  • the script shows various options of zenity and how to use them.
  • after having gone through these choices of course something real should be done rather than just displaying the choices (but that is beyond this article).
  • this array technique allows you to dynamically set the content of zenity dialogs, to set variables which are hidden to the user and to separate the logic from the content (and there even could be done more e.g. define all geometries in the config file)

    Here is the first zenity window (on Ubuntu 11) after having clicked one server.

    Here is the second zenity window with the appropriate menu and geometry

    And finally here is the last zenity window to recap the selections

  • Sunday, November 25, 2012

    VirtualBox: command line (ssh) access to guest system

    Today I needed to access my VirtualBox Solaris session from my Mac (the screensaver seemed to hang and I wanted to kill it to regain access).

    I found this very informative posting which contained all the information and I took its contents and put them into this little script (not really necessary but a nice little exercise to write a loop for these kinds of repetitive statements).

    #!/bin/bash
    
    Machine="Solaris 10"    # My guest system
    Adapter=e1000
    GuestPort=22
    HostPort=2222
    Protocol=TCP
    
    # Note: do not put any spaces into the brace enclosed string,
    # it will break brace expansion
    for var in {Protocol,HostPort,GuestPort} ; do
      echo $var
      eval VBoxManage setextradata \"$Machine\"  "VBoxInternal/Devices/$Adapter/0/LUN#0/Config/ssh/$var" \${$var}
    done
    
    # This is just to check if the settings are ok
    VBoxManage getextradata "$Machine"  enumerate
    
    # And this is also only a test to see if the system is reachable
    ping -p $HostPort localhost
    

    Access to the guest then works like this
    ssh -p 2222 localhost

    Access did not work right away for my VirtualBox Solaris session since it was running at the time when I executed the commands. In order to enable access I had to save and restart the session (of course a reboot would have worked too).

    Given that VBoxManage is in your PATH the script should work on any UNIX system with bash.

    I admit though that this script is probably more confusing than helpful since it kind of hides what is actually going on, the original posting simply lists 3 statements which are easy to read and understand.

    Saturday, November 24, 2012

    Reusable code in shell scripts or How to create a shell library

    When working a lot with shell scripts (either your own or others) you get to the point where certain pieces of code seem to be repeated numerous times so eventually one starts to wonder if and how one could build and use some library of reusable code.
    A seasoned programmer could eventually end up with a library of standard functions or better a library for various shells (sh, bash, ksh, etc.) and various operating systems (Solaris vs. Linux being the major distinction but also the various releases of each major OS show differences).

    For a particular project written in sh/ksh on Solaris I built a library and below I'll explain a few of the considerations.

    What is a shell library

    Without having seen the phrase 'shell library' anywhere else to me it is a collection of environment variables and functions. So using a shell library is invoking a file containing this collection and thus setting the environment of the executing shell script.

    Why is a shell library useful

    Many shell scripts contain settings of environment variables and definitions of functions at the beginning of the script. When working in projects with multiple scripts where the same or similar settings are being used it does seem to make sense to put these settings into a single place. An important advantage of such an approach: if the setting needs to be changed later it needs to be changed in only one place. This concept is obvious for programmers but I have seen it rarely used in shell scripts.
    When I see pieces of code like HOST=`hostname` and many more of such statements repeated in dozens of scripts (all of them part of a big project) it is time to start using a library.

    What goes in

    That is probably the simplest question: I'm almost tempted to say any piece of code that is used twice or more should be in the library.

    Changes over time

    One of the big questions is how to handle a library over time.
    New things need to be added i.e. the library grows.
    Maybe one has ideas to improve the current code and thus code changes.
    Will the changed library still work in all older invocations (backward compatibility)?

    How to invoke the library?

    Use the dot operator "." as in
    . lib/mylib.sh
    assuming that your library sits in a file called mylib.sh in a sub directory lib.

    Location

    In order to invoke the library the calling script needs to locate it. Where should the library reside?
    Assuming that it is part of a project (and thus a collection of scripts which are deployed in conjunction) you need to define a directory (without established standards for shell script libraries you might as well call it lib following the convention of other languages).

    Some examples

    Simplest case: setting a variable

    HOST=`hostname` ; export HOST

    So your scripts need to run the hostname command only once. Of course the underlying assumption here is that the hostname command can be found in the PATH of the user executing the script.

    Extract a variable

    i.e. extract pieces of information out of a larger output.
    Say you have the output of id and you want the username:

    id uid=712(joe) gid=100(other) groups=100(other),22(staff)

    The following extracts the string between the first parentheses.

    USER=`id |sed -e 's/).*//' -e 's/.*(//'` ; export USER

    Setting a variable for if clauses

    The control flow in scripts very often depends on whether a variable has a certain value or not. You can introduce a (boolean) variable to subsume this logic.

    Imagine that you want to test whether the script is executed by root or not. One could use the USER variable and (always) test like this

    if [ "$USER" = "root" ] ; then ... ; fi

    An alternative could be this setting in your library which creates a new variable isRootUser

    isRootUser=`ID=\`id | sed -e 's/uid=//' -e 's/(.*//'\`; [ $ID -eq 0 ] && echo $ID` ; export isRootUser

    This at first glance complex piece of code simply

  • runs the id command and extracts the uid and sets the variable ID
  • checks whether ID is zero (this would also cover the case that there is a second superuser account with uid 0) and if so then sets the variable isRootUser to ID
  • The variable can then be invoked as follows:

    if test $isRootUser ; then ... ; fi

    Advantages of this approach:

  • the root check is encapsulated in the setting of isRootUser (if you decide to use a different method to identify the root user you can change it here and change it only once in the library)
  • it runs only once at the invocation of the library (not possibly multiple times in your script)
  • thereafter a very simple check using a variable with a telling name can be used as many times as needed

  • Common functions

    Maybe this is the more interesting piece and related to other programming languages: defining a set of reusable functions. Due to the nature of shells you have to watch out for the scope and use of variables (local / global / input / return).

    A simple function to print an error message and stop the script:

    die() {
      echo "Error: $*" >&2
      exit 1
    }

    # Usage: 
    #     die Some condition has not been met
    # or: die "Some condition has not been met"
    # or: die "Some condition" has "not been" met

    A wrapper to mkdir including nicer error handling:

    mk_dir() {
      [ -z "${1}" ] && return 1
      [ -d "${1}" ] || mkdir -p "${1}" 2>/dev/null || { echo "Error: cannot mkdir $1"; return 1; }
      return 0
    }

    # Usage: 
    #     mk_dir DIRECTORY
    #        if you are not interested if successful or not
    # or: mk_dir DIRECTORY || return 1
    #        if you want to stop further execution of a function after failure
    # or: mk_dir DIRECTORY || exit 1
    #        if you want to stop the script after failure

    Check if your you are dealing with a number (positive integer or zero) by invoking another shell (in this case: ksh):

    isNum() {
      ksh -c "[[ \"$1\" = +([0-9]) ]]" return $?
    }
    # Usage:
    #     isNum $N && echo "yes"
    #        do something if ok
    # or: isNum $N || echo "no"
    #        do something if not ok

    Have fun building your own libraries.

    Friday, November 23, 2012

    A little exercise about a recent forum question (input field handling in awk and Perl)

    Just recently the following question was posted to the UNIX scripting group in Linkedin:
    remove all duplicate entries in a colon separated list of strings
    e.g. a: b:c: b: d:a:a: b:e::f should be transformed to a: b:c: d:e::f

    Some of the fields contain spaces which should be preserved in the output of course, there is an empty field too which (to me and other authors) indicates that the fields are not necessarily ordered. Here I won't discuss the suggested solutions, I also did not answer to the original posting because I read it one month too late.

    awk
    But when reading the question my brain already got working and I could not help to try for myself. The obvious tool of choice for exercises like this is awk because awk has inbuilt mechanisms for viewing lines as a sequence of fields with configurable field separators.

    A solution could be

    BEGIN {
      FS=":" ;    # field separator
      ORS=":"     # output record separator
    }
    { for(i=1;i<=NF;i++) {  # for all input fields
        if( f[$i] ) {       # check if array entry for field already exists
         continue;          # if yes: go to next field
        } else {
         print $i;          # if no: print the field content
         f[$i] = 1;          # and record it in array 'f'
        }  }
    }

    which leads to this output:
    a: b:c: d:e::f:

    The script can be shortened by omitting superfluous braces and 'else' to

    BEGIN { FS=":" ; ORS=":" } 
    { for(i=1;i<=NF;i++) { if(f[$i]) continue; f[$i]=1; print $i; } } 

    The script uses a very simple straightforward logic: loop through all input fields, if a field is new then print it, if not skip it. This is achieved by storing each field in an associated array 'f' when it first occurs.
    Using the field separator FS for splitting the input line and the output record separator ORS when printing (you need to know that 'print' automatically adds ORS) makes this an easy task.

    There is one issue though: this solution adds an extra colon at the very end (compared to the requested output), this could be an issue or not depending on the context of this request so one might prefer this code:

    BEGIN { FS=":" } 
    { printf $1; f[$1]=1; 
      for(i=2;i<=NF;i++) { if(f[$i]) continue; f[$i]=1; printf FS $i } }

    which uses a slightly different logic: the first field is printed straight away (and recorded), the loop checks the remaining fields 2..NF and prints the field separator as a prefix to the field content. This code also works for the extreme case where there is just one field and no colon.

    Perl
    I then wondered if this couldn't be done equivalently or even shorter in Perl but my best solution is a little bit lengthier because I have to use 'split' to get the individual fields.

    $FS=":";
    @s = split($FS,<>);
    for($i=0;$i<=$#s;$i++) {$e=$s[$i]; next if(exists($f{$e})); $f{$e}=1; print $e,$FS }


    I could have used command line options "-a -F:" to avoid the 'split' but I need FS to be defined anyway for the output (I don't know if the split pattern defined by -F can be accessed in Perl).
    I use 'split' to chop up the input line and put it into an array 's'. Then the same logic applies as in awk. Instead of an associative array I'm using a hash table 'f' in Perl. The variable 'e' is only used to avoid repeated occurances of $s[$i]. In the end tit's a matter of personal preference which solution you take.


    It should be noted that I tested with

    echo -n "...." | awk '...' or perl -e '...'

    which feeds a string without newline to the pipe which helped to avoid 'chomp' in Perl for removing the newline in the last field.

    Thursday, November 22, 2012

    Create anonymous pdf file

    I often use LibreOffice (previously I used OpenOffice.org, not sure where these two are heading to) to create a text and its Export to PDF... function to create the corresponding pdf file.

    Today I wanted to create an anonymous pdf file. It was the copy of a text where I had omitted all personal references (name, address, links to personal web sites, etc.) and I also wanted the pdf file to be anonymous in the sense that it shouldn't contain any personal trace.

    I'm not sure that I reached 100% anonymity but here is what I found: the Properties section of the pdf file contained a number of references to my name which I had to get rid off one by one.

    File: which is the file name. Originally my initials were part of it but I had saved the file without it already using a neutral file name.

    Title: often I give documents a title in  File -> Properties.. . In the Description tab there is a Title entry where in this instance I had put my name too which I needed to remove.
    (My first try was the LibreOffice Export to PDF... dialog. There is a tab User Interface with a sub section Windows which shows a tick box Display document title which is ticked by default. After removing the tick the title had disappeared from the the top of the pdf window but it still showed in the pdf properties.)

    Author: I have set my profile in LibreOffice preferences User Data e.g. name, address etc. I use it to include parts of it whenever necessary. In order to exclude this information from showing up in a pdf file I had to change the LibreOffice file's properties.
    Go to File -> Properties.. and choose the General tab. There is a tick box Apply User Data which is ticked by default. Removing the tick will prevent from user data being used. The LibreOffice file needs to be first saved and then exported again.

    Location: since I usually create and save files on my system under my account (which contains my full name) it showed as  Macintosh HD:Users:fullname:Documents so I had to find an anonymous place and I chose /Users/Shared.

    Since I'm not a pdf expert there might be other (maybe hidden) references somewhere in the pdf file. I'd love to know.

    Newline in awk and Perl

    When someone switches from using awk to using Perl one of the beginners mistakes is to understand that awk does some things automatically which you need to code in Perl.

    One example: end of line.
    awk automatically drops the newline character at the end of a line. In Perl you need to do that manually by using the chomp function (or some other method).

    Since echo abc does print the string abc plus a newline character the following string comparison works well in awk:

    echo abc | awk '{if($0=="abc") print "yes"}'

    will print "yes" whereas the seemingly equivalent in Perl does not:

    echo abc | perl -e 'if(<> eq "abc") { print "yes\n" }'

    The experienced Perl coder probably does this:

    echo abc | perl -e '$a=<>; chomp($a); if($a eq "abc") {print "yes\n" }'

    Of course you could change the test and explicitly check for the newline (or better inpout record separator $/ in Perl)

    echo abc | perl -e 'if(<> eq "abc$/") {print "yes\n" }'

    Or you could use echo -n to feed just a string without newline. And of course print in Perl requires a newline too whereas awk adds it automatically.

    A bit more complex: check the content of the last field in an input line.
    awk:
    echo abc xyz | awk '{ if($NF=="xyz") print "yes" }' 
    Perl:
    echo abc xyz | perl -e '@a=split / /,<>; chomp(@a); if($a[$#a] eq "xyz") {print "yes\n" }'
    Before chomp() the last field equals the string xyz plus a newline and the comparison test will fail.

    Perl in that sense is more precise and gives the user greater control, on the other hand awk is an old but well established UNIX tool whose inbuilt features can be used to one's advantage.

    It is nice to have tools which do things automatically, the drawback is that you are so getting used to it that over time you forget that these automations exist.

    (A real life example for me: my car has parking sensors. After one year I'm already so used to its existence that whenever driving another car I tend to forget that I have to use the good old fashioned method rather than waiting for the frequency of the beeps.) 


    Thursday, January 26, 2012

    Text-to-Audio on my Mac

    I recently experimented a little with creating speech from a given text.

    First of all I had not known that this functionality existed on my Mac. A little web search discovered a number of pages explaining the functionality but for the sake of the reader (and maybe even more my own's sake not having to remember all this stuff) I'll describe it here.

    There are two applications involved:
    1. TextEdit where you write the text to be converted to speech
    2. Automator which will do the conversion
    So in the first step open TextEdit (Finder->Applications->TextEdit) and write some text which you would like to hear.
    Then you need to start Automator (Finder->Applications->Automator).
    • In the first window choose the workflow Text .
    • Change the field Get content from and select TextEdit
    • Click the Choose button
    • In the lefthand column under Library click Text and in the next column double click Text to Audio File.
    • In the Text to Audio File frame you can choose a voice by selecting an entry in System Voice: I pick Alex.
    • Choose a filename (it will be saved in aiff format) and a location where to store the result.
    • Click the Run button in the upper righthand corner of Automator. Now your text should get transformed into speech and a file containing the output will be created.
    • Click the Results button in the Text to Audio File frame. Listen to result by double clicking on the file icon.
    The recipe above works fine if you do text to speech translation once in a while.

    Regular task:
    If you do it regularily you can create an Automator workflow and reuse it whenever needed. Simply do a 'Save As...' and save this workflow under a recognizablee name. Note that it will always use the same output filename and location and this overwrites previous audio files.

    More voices:
    You can also download and install other voices if you're not happy with the standard ones. Good ones possibly need to be paid for, some sites offer trials e.g. InfoVox from Assistiveware.

    Speech control:
    You can insert certain control elements into the text to better control the speech like volume changes of certain words, extra pauses etc. I have been using the silence element e.g. a pause of 5 seconds can be achieved with [[slnc 5000]]
    Here is a comprehensive list of speech commands from Apple (the page seems to be deprecated but the commands still work).

    Wednesday, January 18, 2012

    How to track sub processes

    I write a lot of scripts and one of the common problems (at least in my area of work) is how do I keep track of all sub processes and how do I cleanup all processes which a script might have started.
    (Note: this has been developed on a Solaris 10 system which features the particular ptree command to easily check the process tree of a given process)
    In this article I will only deal with the tracking of sub processes, eventually one would want to kill them if needed which is either fairly easy with kill -9 (but risking leftovers like temporary files) or can become complex if a script spawns new processes when receiving a weaker kill signal.

    So here is the scenario:
    script a.sh runs another script b.sh.
    Before a.sh exits it wants to ensure that b.sh has not left any processes behind i.e. it wants to identify b.sh and all of its child processes (so that they can be killed if still running).

    Running another shell script in the background


    Example 1: a.sh runs b.sh in the background

    ptree suffices in such a case
    b.sh:
    #!/bin/sh
    sleep 200
    
    a.sh:
    #!/bin/sh
    b.sh&
    ptree $!
    ps -u $USER -o pid,ppid,args |grep $!
    
    (ptree should show the process tree of the last background process.
    the ps command should show process ids (pid) , parent process ids (ppid) and script arguments (args) of all processes of user $USER)

    Output of ptree:
    59691 /bin/csh -c a.sh
       59716 /bin/sh a.sh
         59717 /bin/sh b.sh
           59719 sleep 200
    

    Output of ps:
    59731 59716 grep 59716
    59717 59716 /bin/sh b.sh
    59719 59717 sleep 200
    

    So both b.sh and its sub process 'sleep' are shown in the process list and one could get the pids and kill them if needed.

    There are more complex situations where ptree/ps don't help, and these are covered in the next parts.

    Sub process detaching

    This time we consider an example where the sub process detaches itself from the current process tree.

    What do I mean by that?
    Every process has a parent id so that if a process spawns a process which spawns another process they are all connected via their parent id (process' 1 id becomes the parent id of process 2, process' 2 id becomes the parent id of process' 3 aso.).
    A process can break this chain though and can detach itself from its parent so that it gets the init pid 1 as parent id (all processes can be traced back to pid 1 in a UNIX system).

    Example 2: here b.sh runs a process in the background itself
    b.sh:
    #!/bin/sh
    sleep 200&
    

    If you run this script and check your process list you will find something like this, a sleep process with ppid 1
    7968     1 sleep 200
    

    If you run a.sh from the previous example with the new b.sh your ptree and ps output will look as follows:
    Output of ptree:
    8932  /bin/sh ./a.sh
       8933  <defunct>
    
    Output of ps:
    8936  8932 grep 8933
    
    ie. ps does not show anything at all and ptree shows a.sh with a defunct sub process. This defunct sub process is the leftover of b.sh.
    Why is it a defunct? Because it has ended but its parent a.sh has not (yet) waited for it to finish.

    Here is a new a.sh which solves that (remember this rule: a defunct process is always due to bad code in the parent, not the process which became defunct):
    #!/bin/sh
    ./b.sh&
    wait
    ptree $!
    ps -u $USER -o pid,ppid,args |grep $!
    
    Running this a.sh will generate no ptree output at all:
    b.sh has finished when running ptree, the sleep process is detached from the b.sh process hierarchy.

    So how can we track down the 'sleep' process?
    We need to use another process attribute: the process group id (pgid).

    In the new a.sh I have removed the ptree call (since it won't return anything as shown above) and enhanced the ps command to show also the pgid, this time greping for the pid of a.sh (rather than b.sh as before).
    #!/bin/sh
    ./b.sh&
    wait
    ps -u $USER -o pid,ppid,pgid,args |grep $$
    
    Output of ps:
    18028 32741 18028 /bin/sh a.sh
    18030     1 18028 sleep 200
    18031 18028 18028 grep 18028
    18032 18031 18028 ps -u andreash -o pid,ppid,pgid,args
    
    So the sleep process can be found in the list of processes with pgid 18028 (the pid of a.sh) since all sub processes of a.sh seem to be grouped by pgid.

    Happy? Not quite. The next part will show that this solution also might fail.

    What if there is no pgid?

    The former example does work under certain assumptions only:
    you need to run a.sh in a shell which supports pgid creation (csh, ksh), it does not work if you run it in Bourne shell.
    (all the examples above were tested in csh, the standard user's working shell in our environment).

    sunflower% sh
    $ ./a.sh
    27103 27099 27098 grep 27099
    27099 27098 27098 /bin/sh ./a.sh
    $ ps -o pid,ppid,pgid,args|grep sleep
    27277 27098 27098 grep sleep
    27102     1 27098 sleep 200
    
    What you notice is that the sleep process has pgid 27098 which is also the parent pid of a.sh ie. a.sh did not create its own process group. Searching for processes with pgid equal to the pid of a.sh is futile.

    The solution is to write a script which puts its sub processes into a process group of its own, and one way to do it is to use the monitor option of ksh:
    set -m
    will put b.sh (and all sub processes of b.sh) into a process group with pgid equal to b.sh's pid
    ie. again I'm greping for $! (so I reversed the $$ again)

    a.sh:
    #!/bin/ksh
    set -m
    ./b.sh&
    wait
    ps -u $USER -o pid,ppid,pgid,args |grep $!
    
    will lead to output of ps:
    31103     1 31102 sleep 200
    31105 31101 31101 grep 31102
    

    This seemed to me a very nice solution until it dawned upon me how this could fail too.

    Recursive use of pgid creation

    Using the same technique as described in the last part a sub process can not just detach itself from the process hierarchy but can also create its own process group and thus the original script will have lost track completely.

    Replace b.sh by the following code:
    b.sh:
    #!/bin/ksh
    set -m
    sleep 200&
    

    Output of a.sh will look like this (just the grep command):
    38331 38327 38327 grep 38328
    
    and when you check the 'sleep' process it shows its pid also as pgid:
    % ps -o pid,ppid,pgid,args |grep sleep
    38329     1 38329 sleep 200
    

    How can such a process be identified as being a grandchild of a.sh?

    Up to know I don't have an answer, it seems to me that a process can completely hide its origins and thus cannot be tracked or followed.
    (a long time ago I posted the question to comp.unix.shell but didn't receive anything at the time)

    If you have wondered throughout the article why do I bother at all?
    very often I'm facing the scenario that I have to write script a.sh (i.e. I own it and control what it does) but script b.sh comes from a colleague, different department or even from another company. I need/want to ensure that - if I start other scripts in my script - no processes are left behind when my script ends. This cannot be guaranteed.

    Why it is impossible to track all sub processes

    Over time I got suggestions to use newtask (and then kill off all processes found by pkill -T taskid) or write a C program and use setsid or a Perl program and use POSIX::setsid to create a new session leader so that basically all child processes are tagged with the same kind of attribute which then can be used to identify them and do something about.

    All of these suggestions have the same flaw than the one with pgid which I described above and the following argument should prove that it is impossible to track all sub processes and its sub processes (if the sub processes can be any kind of process and its code is not controlled by you).

    Assume that your flavour of UNIX supports a way that you can generate a sub process with a certain attribute which distinguishes the sub process and its offspring from the current process (and possible parent processes).
    In the same fashion a sub process of the sub process can use this technique to distinguish itself from the sub process. The current process will find the sub process but it cannot find the sub process of the sub process anymore.

    Solutions would be that the OS would restrict the setting of that attribute in way that the current process can set it for sub processes but sub processes of the sub process would be blocked to set that attribute or that processes need to notify their parent processes about attribute changes somehow which is not available/possibly in any of the UNIXes I know.

    Summary:
    • a process can track (and kill) all of its sub processes
    • a process can track (and kill) all of a sub process's descendants
      • if the sub process sets a certain attribute equal to its process id
      • if none of the sub process descendants changes that attribute

    Even if you think you are in (code) control of all sub processes and their descendants you might not be aware of all side effects: a process might unknowingly start a daemon.
    Just envision the calling of gconfd: it will be started if it is not running yet. The process which actually caused the start of gconfd will very likely have no idea that it is there since it is only trying to get a service. That the service required a daemon and that proper cleanup would mean the daemon to be killed and that the daemon maybe services other processes too (and thus should not be killed) are all considerations with no easy answers.

    Tuesday, January 17, 2012

    OpenOffice.org - copy subtotal cells only

    Recently a question was raised how to copy the cells showing sub totals rather than copying data cells and subtotals.
    I could not find an easy solution. Below I describe a two-step solution which basically consists of
    • Applying a filter to show only the sub total rows
    • Using Copy / Paste special to get a copy of the sub totals (without the formulas)


    The data

    Assume you have 2 columns of data like this:

    The sub totals

    Data -> Subtotals... and then ticking X and OK will result in adding extra rows for sub totals.


    This was the starting point of the question being asked.

    Applying a filter


    Data -> Standard Filter and enter
    • 2 filter criteria for X .*Sum and .*Total in order to capture both the Sum and Total rows
    • Tick Regular expression
    • Tick Copy results to... and enter a cell on the sheet (A16 in this example)
      (this is important: don't copy to another sheet since the formula won't work)


    Copy the result

    The filter resulted in A16:B20. Two things to note:
    • There are no data rows anymore
    • Column B still contains the formulas


    Paste special

    Now use Paste special to paste the sub totals into a new position. Deselect Formulas in order to copy the data only. Ensure that everything else is ticked, in particular Numbers.


    The result

    A copy of the sub totals in D16:E20.
    Note that column D does not contain formulas.

    How to create a histogram in OpenOffice.org

    The recipe below has been tested in OpenOffice 3.0.1.
    It is unclear (though expected) that it will work also in newer revisions.

    The issue

    Suppose you have a set of data, time data in my example, each representing when a certain event has happened. But rather having to digest the detail data you're only interested in high level information like how often did the measured event occur in an hour.

    Here's the example data:
    12:08
    15:36
    13:00
    14:59
    13:59
    12:45
    15:47
    14:29
    15:01
    
    So you got a number of events at certain times, unsorted, uncounted.

    Assume that these data are in column A in your spreadsheet, maybe labeled Time in the first row.

    The goal

    A histogram which depicts the frequency of the events per hour like this





    The resulting histogram shows
    3 events before 13:00
    1 event between 13:00 and 14:00
    2 events between 14:00 and 15:00
    3 events after 15:00

    How to get there


    Identify the bins


    In the example above identifying the bins for the histogram is rather easy: you pick full hours. It is also rather easy to find the minimum and maximum hours.
    When your list of data is very long you might not easily see the minimum nor maximum nor might it be obvious how to set the bin intervals, a little trial and error is necessary to get there.

    So looking at the data all events are later than 12:00 and none is beyond 16:00, therefore I'm choosing these bins:
    13:00
    14:00
    15:00
    

    I'm entering the bins into column B so that the spreadsheet looks like this now:

    Calculate the frequencies


    StarOffice contains a FREQUENCY function which takes two arrays as input and also returns an array of results (maybe something one has to get used to. The example will make it clear how to use it.
    • Enter Ticks (or any other describing string) into cell C1
    • Click into cell C2 and click on the functions icon.
    • Out of the list of functions select FREQUENCY
    • Enter your data range and your bin range into the resp. parameter fields so that it looks like this:

    The spreadsheet should now look like this:

    Create the chart

    • Mark columns B and C (click on B and drag towards C so that both are highlighted)
    • Insert -> Chart...
    • Step 1: leave the chart type at Column
    • Step 2: Data series in columns and First row as label should be ticked, additionally tick also First column as label
    • Step 3: simply click Next
    • Step 4: enter describing strings e.g. Histogram into Title, hours into X axis, frequency into Y axis

    You're done.