Little Tips

    # Run a command in each git submodule (show status)
    for foo in `find . -iname .git -type d ` ; do repo=`dirname $foo` ; ( echo "                == $repo ==" ; cd $repo ; git status ) ; done
    # Run a command in each git submodule (show remote URL)
   for foo in `find . -iname .git -type d ` ; do repo=`dirname $foo` ; ( cd $repo ; url=`git remote show origin | grep URL | cut -c 8-`; printf "%-47s\t%s\n" "$repo" "$url" ) ; done

Make your command-line history extend to the beginning of time

I save my entire command-line history, archived by month, and have a shell script that lets me search back through it — if I need to recall the command line parameters to do an ssh tunnel or to make curl do a form POST I can pull it up from that time in June when I figured out how.

  # no limit on history file size
  unset  HISTFILESIZE
  # 10k lines limit on in-memory history
  export HISTSIZE=10000
  # name the history file after the date
  export HISTFILE=$HOME/.history-bash/"hist-`date +%Y-%W`.hist"
  # if starting a brand-new history file
  if [[ ! -f $HISTFILE ]]; then
    # seed new history file with the last part of the most recent history file
    LASTHIST=~/.history-bash/`/bin/ls -1tr ~/.history-bash/ | tail -n 1`;
    if [[ -f "$LASTHIST" ]]; then tail -n 1000 "$LASTHIST" > $HISTFILE  ; fi
  fi
  # seed history buffer from history file
  history -n $HISTFILE

h3. Password safety from the command line

For many commands — mysql, curl/wget, others — it’s convenient to pass your credentials from the command line rather than (unsafely) in a file or (inconveniently) enter them each time. There’s a danger, though, that you’ll accidentally save that password in your .history for anyone with passing access to find.

In my .bashrc, I set export HISTCONTROL=ignorespace — now any command entered with leading spaces on the command line is NOT saved in the history buffer (use ignoreboth to also ignore repeated commands). If I know I’m going to be running repeated commands that require a password on the command line, I can just set an environment variable in an ignored line, and then recall the password variable:

womper ~/wukong$      DBPASS=my.sekritpass1234
womper ~/wukong$ mysql -u ics --password=$DBPASS ics_dev

or for another example,

womper ~/wukong$       twuserpass="myusername:twittterpassword"
womper ~/wukong$ curl -s -u $twuserpass http://stream.twitter.com/1/statuses/sample.json

TSV / Hadoop Streaming Fu

Hadoop streaming uses tab-separated text files.

Quickie histogram:

I keep around a few useful

    cat file.tsv | cuttab 3 | cutc 6 | sort | uniq -c


This take file.tsv and extracts the third column (cuttab 3), takes the first six characters (YYYYMM) ; then sorts (putting all distinct entries together in a run) ; and takes the count of each run. Its output is

   4245 200904
  14660 200905
   7654 200906

A few other useful hadoop commands:

A filename of ‘-’ to the -put command makes it use STDIN. For example, this creates a file on the HDFS with a recursive listing of the somedir directory:

    hadoop fs -lsr somedir | hadoop fs -put - tmp/listing


Wukong’s hdp-du command is tab-separated

    hdp-du somedir | hdp-put - tmp/listing


So you can also run an HDFS file through a quick filter:

   hdp-cat somefile.tsv | cuttab 2 | hdp-put - somefile-col2.tsv 


(If you brainfart and use ‘… > hdp-put …’ know that I’ve done so a dozen times too).

Comments are closed.