Little Tips
- November 2, 2009
# Run a command in each git submodule (show status)
for foo in `find . -iname .git -type d ` ; do repo=`dirname $foo` ; ( echo " == $repo ==" ; cd $repo ; git status ) ; done
# Run a command in each git submodule (show remote URL)
for foo in `find . -iname .git -type d ` ; do repo=`dirname $foo` ; ( cd $repo ; url=`git remote show origin | grep URL | cut -c 8-`; printf "%-47s\t%s\n" "$repo" "$url" ) ; done
Make your command-line history extend to the beginning of time
I save my entire command-line history, archived by month, and have a shell script that lets me search back through it — if I need to recall the command line parameters to do an ssh tunnel or to make curl do a form POST I can pull it up from that time in June when I figured out how.
# no limit on history file size
unset HISTFILESIZE
# 10k lines limit on in-memory history
export HISTSIZE=10000
# name the history file after the date
export HISTFILE=$HOME/.history-bash/"hist-`date +%Y-%W`.hist"
# if starting a brand-new history file
if [[ ! -f $HISTFILE ]]; then
# seed new history file with the last part of the most recent history file
LASTHIST=~/.history-bash/`/bin/ls -1tr ~/.history-bash/ | tail -n 1`;
if [[ -f "$LASTHIST" ]]; then tail -n 1000 "$LASTHIST" > $HISTFILE ; fi
fi
# seed history buffer from history file
history -n $HISTFILE
h3. Password safety from the command line
For many commands — mysql, curl/wget, others — it’s convenient to pass your credentials from the command line rather than (unsafely) in a file or (inconveniently) enter them each time. There’s a danger, though, that you’ll accidentally save that password in your .history for anyone with passing access to find.
In my .bashrc, I set export HISTCONTROL=ignorespace — now any command entered with leading spaces on the command line is NOT saved in the history buffer (use ignoreboth to also ignore repeated commands). If I know I’m going to be running repeated commands that require a password on the command line, I can just set an environment variable in an ignored line, and then recall the password variable:
womper ~/wukong$ DBPASS=my.sekritpass1234 womper ~/wukong$ mysql -u ics --password=$DBPASS ics_dev
or for another example,
womper ~/wukong$ twuserpass="myusername:twittterpassword" womper ~/wukong$ curl -s -u $twuserpass http://stream.twitter.com/1/statuses/sample.json
TSV / Hadoop Streaming Fu
Hadoop streaming uses tab-separated text files.
Quickie histogram:
I keep around a few useful
cat file.tsv | cuttab 3 | cutc 6 | sort | uniq -c
This take file.tsv and extracts the third column (cuttab 3), takes the first six characters (YYYYMM) ; then sorts (putting all distinct entries together in a run) ; and takes the count of each run. Its output is
4245 200904
14660 200905
7654 200906
A few other useful hadoop commands:
A filename of ‘-’ to the -put command makes it use STDIN. For example, this creates a file on the HDFS with a recursive listing of the somedir directory:
hadoop fs -lsr somedir | hadoop fs -put - tmp/listing
Wukong’s hdp-du command is tab-separated
hdp-du somedir | hdp-put - tmp/listing
So you can also run an HDFS file through a quick filter:
hdp-cat somefile.tsv | cuttab 2 | hdp-put - somefile-col2.tsv
(If you brainfart and use ‘… > hdp-put …’ know that I’ve done so a dozen times too).