login

Test Driven Development published on Oct 2, tagged with linux, ruby, work

With my recent job shift, I’ve found myself in a much more sophisticated environment than I’m used to with respect to Software Engineering.

At my last position, there wasn’t much existing work in the X++ realm; We were breaking new ground, no one cared about elegance; if you got the thing working – more power to you.

Here, it’s slightly different.

People here are working in a sane, documented, open-source world; and they’re good. Everyone is acutely aware of what’s good design and what’s not. There’s a focus on elegant code, industry standards, solid OOP principles, and most importantly, we practice Test Driven Development.

I’m completely new to this method for development, and I gotta say, it’s quite nice.

Now, I’m not going to say that this is the be-all-end-all of development styles (I’m a functional, strictly-typed, compiler-checked code guy at heart), but I do find it quite interesting – and effective.

So why not do a write-up on it?

Test Framework

The prerequisite for doing anything in TDD is a good test framework. Luckily, ruby is pretty strong in this area. The way it works is the following:

You subclass Test::Unit and define methods that start with test_ where you execute system logic and make assertions about certain results; and then you run that class.

Ruby looks for those methods named as test_whatever and runs them “as tests”. Running a method as a test means that errors and failures (any of your assert methods returning false) will be logged and displayed at the end as part of the “test report”.

All of these test classes can be run automatically by a build-bot and (depending on your test coverage) you get good visibility into what’s working and what’s not.

This is super convenient and empowering in its own right. In a dynamic language like ruby, tests are the only way you have any level of confidence that your most recent code change doesn’t blow up in production.

So now that you’ve got this ability to write and run tests against your code base, here’s a wacky idea, write the tests first.

Test Driven

It’s amazing what this approach does to the design process.

I’ve always been the type that just starts coding. I’m completely comfortable throwing out 6 hours worth of code and starting over. I know my “first draft” isn’t going to be right (though it will be useful). I whole-heartedly believe in refactorings, etc. But most importantly, I need to code to sketch things out. It’s how I’ve always worked.

TDD is sort of the same thing. You do a “rough sketch” of the functionality you’ll add simply by writing tests that enforce that functionality.

You think of this opaque object – a black box. You don’t know how it does what it does, but you’re trying to test it doing it.

This automatically gives you an end-user perspective. You now focus solely on the interface, the input and the output.

This is a wise position to design from.

You also tend to design small self-contained pieces of functionality. Methods that don’t care about state, return the same output for a given input, and generally do one simple thing. Of course, you do this because these are the easiest kind of methods to test.

So, out of sheer laziness, you design a cohesive, easy to use, and completely simple interface, an API.

Now you just have to “plumb it up”. Hack until the tests pass, and you’re done. That might be an over-simplification, but it’s not off by much…

Come to think of it, this is exactly the type of design Haskell favors. With gratuitous use of undefined, the super-high-level logic of a Haskell program can be written out with named functions to “do the heavy lifting”. If you make these functions simple enough and give them descriptive enough names, they practically write themselves.

So that’s TDD (at least my take on it). So far, I like it.

Mairix published on Jul 3, tagged with arch, bash, linux, mutt

Mairix is a nice little utility for indexing and searching your emails. Its smooth integration with mutt is also a plus.

I used to use native mutt search, but it’s pretty slow. So far, mairix is giving me a good approximation of the google-powered search available in the web interface and it’s damn fast.

As I go through this setup, keep in mind the example config files are designed to work with my overall mutt setup; one which is described in two other posts here and here.

If you need a little context, checkout my mutt-config repo which has a fully functioning ~/.mutt, example files for the other apps involved (offlineimap, msmtprc, and now mairix), and any scripts the setup needs.

Mairix

First, of course, install mairix:

pacman -S mairix

Then, setup a ~/.mairixrc which defines where your mails are and their type as well as where to store the results and index. Here’s an example:

# where you keep your mail
base=/home/<you>/Mail

# colon separated list of maildirs to index.
#
# I have two accounts each in their own subfolder. the '...' means there 
# are subdirectories to search as well; it's like saying GMail/* and 
# GMX/*
maildir=GMail...:GMX...

# I omit gmail's archive folder so as to pevent duplicate hits
omit=GMail/all_mail

# search results will be copied to base/<this folder> for viewing in 
# mutt
mfolder=mfolder

# and the path to the index itself
database=/home/<you>/Mail/.mairix_database

With that in place, run mairix once to build the initial index. This first run will be slower but in my tests, subsequent rebuilds were almost instant.

In situations like these, I’ll usually add a verbose flag so I can be sure things are working as expected.

At this point, you could actually do some searching right from the commandline:

mairix some search term # search and populate mfolder
mutt -f mfolder         # open it in mutt

This wasn’t the usage I was after however, I’m typically already in mutt when I want to search my mails.

Mutt

My original script for this purpose was pretty simple. It prompted for the search term and ran it. The problem was you then needed a separate keybind to actually view the results.

Thankfully, Scott commented and provided a more advanced script which got around this issue. Many thanks to Scott and whomever wrote the script in the first place.

This version does some manual tty trickery to build its own prompt, read your input, execute the search and open the results. All from just one keybind.

I merged the two scripts together into what you see below. The main changes from Scott’s version are the following:

  1. I kept my clear, purge, search method rather than relying on cron to keep the index up to date.
  2. I removed the append-search functionality; not my use-case.
  3. I removed the <return> from the ^G trap; it was getting executed by mutt and opening the first message in the inbox after a cancelled search.
  4. I fixed it so that backspace works properly in the prompt.

So, here it is:

#!/bin/bash

read_from_config() {
  local key="$1" config="$HOME/.mairixrc"

  sed '/^'"$key"'=\([^ ]*\) *.*$/!d; s//\1/g' "$config"
}

read -r base    < <(read_from_config 'base')
read -r mfolder < <(read_from_config 'mfolder')

# prevent rm / further down...
[[ -z "$base$mfolder" ]] && exit 1

searchdir="$base/$mfolder"

set -f                          # disable globbing.
exec < /dev/tty 3>&1 > /dev/tty # restore stdin/stdout to the terminal,
                                # fd 3 goes to mutt's backticks.
saved_tty_settings=$(stty -g)   # save tty settings before modifying
                                # them

# trap <Ctrl-G> to cancel search
trap '
  printf "\r"; tput ed; tput rc
  printf "/" >&3
  stty "$saved_tty_settings"
  exit
' INT TERM

# put the terminal in cooked mode. Set eof to <return> so that pressing
# <return> doesn't move the cursor to the next line. Disable <Ctrl-Z>
stty icanon echo -ctlecho crterase eof '^M' intr '^G' susp ''

set $(stty size) # retrieve the size of the screen
tput sc          # save cursor position
tput cup "$1" 0  # go to last line of the screen
tput ed          # clear and write prompt
tput sgr0
printf 'Mairix search for: '

# read from the terminal. We can't use "read" because, there won't be
# any NL in the input as <return> is eof.
search=$(dd count=1 2>/dev/null)

# clear the folder and execute a fresh search
( rm -rf "$searchdir"
  mairix -p
  mairix $search
) &>/dev/null

# fix the terminal
printf '\r'; tput ed; tput rc
stty "$saved_tty_settings"

# to be executed by mutt when we return
printf "<change-folder-readonly>=$mfolder<return>" >&3

A non-trivial macro provides the interface to the script. It sets a variable called my_cmd to the output of the script, which should be the actual change-folder command, then executes it.

macro generic ,s "<enter-command>set my_cmd = \`$HOME/.mutt/msearch\`<return><enter-command>push \$my_cmd<return>" "search messages"
I’ve gotten used to “comma-keybinds” from setting that as my localleader in vim. It’s nice because it very rarely conflicts with anything existing and it’s quite fast to type.

One downside which I’ve been unable to fix (and believe me, I’ve tried!) is that if you press ^G to cancel a search but you’ve typed a few letters into the prompt, mutt will read those letters as commands (via the push) and execute them.

The only thing I could do is prefix those characters with something. I’ve decided to use /. That makes mutt see it as a normal search which you can execute or ^G again to cancel. Annoying, but better than mutt flailing around executing rando commands…

I haven’t had the time yet to learn all the tricks, but here are some of the more useful-looking searches from man mairix:

Useful searches

   t:word                             Match word in the To: header.

   c:word                             Match word in the Cc: header.

   f:word                             Match word in the From: header.

   s:word                             Match word in the Subject: header.

   m:word                             Match word in the Message-ID: 

                                      header.

   b:word                             Match word in the message body 
                                      (text or html!)

   d:[start-datespec]-[end-datespec]  Match messages with Date: headers 
                                      lying in the specific range.

Multiple body parts may be grouped together, if a match in any of them 
is sought.

   tc:word  Match word in either the To: or Cc: headers (or both).

   bs:word  Match word in either the Subject: header or the message body 
            (or both).

   The a: search pattern is an abbreviation for tcf:; i.e. match the 
   word in the To:, Cc: or From: headers.  ("a" stands for "address" in 
   this case.)

The "word" argument to the search strings can take various forms.

   ~word        Match messages not containing the word.  

   word1,word2  This matches if both the words are matched in the 
                specified message part.

   word1/word2  This matches if either of the words are matched in the 
                specified message part.

   substring=   Match any word containing substring as a substring

   substring=N  Match any word containing substring, allowing up to N 
                errors in the match.

   ^substring=  Match any word containing substring as a substring, with 
                the requirement that substring occurs at the beginning 
                of the matched word.

Happy searching!

Pacprune published on Jun 11, tagged with linux, bash, arch

A fairly long time ago, there was a thread on the Arch forums about clearing your pacman cache.

Pacman’s normal -Sc will remove all versions of any packages that are no longer installed and -Scc will clear that plus old versions of packages that are still installed.

The poster wanted a way to run -Scc but also keep the last 1 or 2 versions back from installed. There was no support for this in pacman directly, so a bit of a bash-off ensued.

I wrote a pretty crappy script which I posted there, it laid around in my ~/.bin collecting dust for a while, but I recently rewrote it. I’m pretty proud of the result for its effectiveness and succinctness, so I think it deserves a little discussion.

The methodology of the two versions is the same, but this new version leans heavily on good ol’ unix shell-scripting principles to provide the exact same functionality in way less code, memory, and time.

Approach

The first approach discussed on the thread was to parse filenames for package and version, then do a little sort-grepping to figure out which versions to keep and which versions to discard. This method is fast, but provably inaccurate if a package name contains numbers on the end.

I went a different way.

For each package, pull the .PKGINFO file out of the archive, parse the pkgname and pkgversion variables out of it, then do the same sort-grepping to figure out what to discard.

My first implementation of this algorithm was really bad. I’d parse and write pkgname|pkgversion to a file in /tmp. Then I’d grep unique package names using -m to return at most the number of versions you want to keep (of each package) and store that in another file. I’d then walk those files and rm the packages.

Ick.

Needs moar unix

The aforementioned ugliness, plus some configuration and error checking weighed in at 162 lines of code, used two files, and was dirt slow. I decided to re-attack the problem with a unix mindset.

In a nutshell: write small units that do one thing and communicate via simple text streams.

The first unit this script needs is a parser. It should accept a list of packages (relative file paths) on stdin, parse and output two space-separated values on stdout: name and path. The path will be needed by the next unit down the line, so we need to pass it through.

parse() {
  local package opt

  while read -r package; do
    case "$package" in
      *gz) opt='-qxzf' ;;
      *xz) opt='-qxJf' ;;
    esac

    bsdtar -O $opt "$package" .PKGINFO |\
        awk -v package="$package" '/^pkgname/ { printf("%s %s\n", $3, package) }'
  done
}

11 lines and damn fast. Thank god for bsdtar’s -q option. It tells the extraction to stop after finding the file I’ve requested. Since the .PKGINFO file is usually the first thing in the archive, we barely do any work to get the values.

It’s also done completely in RAM by piping tar directly to awk.

Step two would be the actual pruning. Accept that same space-separated list on stdin and for any package versions beyond the ones we want to keep (the 3 most recent), echo the full path to the package file on stdout.

prune() {
  local name package last_seen='' num_seen=0

  while read -r name package; do
    [[ -n "$last_seen" ]] && [[ "$last_seen" != "$name" ]] && num_seen=0

    num_seen=$((num_seen+1))

    # print full path
    [[ $num_seen -gt $versions_to_keep ]] && readlink -f "$package"

    last_seen="$name"
  done
}

Just watch the list go by and count the number of packages for each name. I’m ensuring that the list is coming in reverse sorted already, so once we see the number of packages we want to keep, any same-named packages after that should be printed.

So simple.

This function can get away with being simple because it doesn’t take into account what’s actually installed on your system. It just keeps the most recent 3 versions of each unique package in the cache. Therefore, to do a full clean, run pacman -Sc first to remove all versions of uninstalled software. Then use this script to clear all but installed plus the two previous versions. This assumes the highest version in the cache is the installed version which may or may not be true in all cases.

All that’s left is to make that reverse sorted list and pipe it through.

find ./ -maxdepth 1 -type f -name '*.pkg.tar.[gx]z' | LC_ALL='C' sort -r | parse | prune

So the whole script (new version) weighs in at ~30 lines (with whitespace) and I claim it is exactly as feature-rich as the first version.

I know what you’re saying: there’s no definition of the cache, no optional safe-list vs actual-removing behavior, there’s no removing at all!

Well, you’re just not thinking unix.

$ cd /some/cache/of/packages
$ pacprune                  # as a normal user, just print the list that 
                            # should be removed -- totally safe.
$ pacprune | sudo xargs rm  # then do the actual removal

You’re free to get as fancy as you’d like too…

$ archiveit() { sudo mv "$@" ~/pkg_archive/; }
$ pacprune | xargs archiveit

And the only configuration is setting the versions_to_keep variable at the top of the script.

The script can be found in my scripts repo.

Forks and Children published on Jun 2, tagged with c, linux

While writing a small learning exercise in C, I came across a nifty little concept. The task itself was a common one: I wanted to spawn a subprocess to the background while letting the main process continue to loop.

Many thanks go to falconindy who spoon fed me quite a bit as I was wrapping my head around all of this knowledge I’m now shamelessly presenting as my own.

In most languages you have some facility to group code into a logical unit (a haskell function or a bash subshell) then pass that unit to a command which forks it off into the background for you (haskell’s forkProcess or bash’s simple &).

Forking C

C takes a far different, but I’d say more elegant, approach. C provides a function, fork() which returns a pid_t.

The beauty of fork() is in its simplicity. All it does is create an exact copy of your program in its current state in memory. That’s it.

int main() {
    pid_t pid;

    pid = fork();

    // ...
}

Guess what, now you’ve got two copies of your running program, both sitting at the exact spot where pid is being assigned the output of fork().

In the copy that was the original (the parent), pid will be the process id of the other copy (the child). And in that child copy, pid will be assigned 0. That’s it; the full extent of fork().

So how do we use this?

Well, let’s say you’ve got a program (as I did) which should sit and loop forever. When some event happens, we want to take some asynchronous action (in my case throw up a dzen notification).

This is the perfect time to use fork(). We’ll let the main thread run continuously, and fork off a child to do its thing when the triggering event occurs.

Here’s a simplified version:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>

int main() {
    int ret;
    pid_t pid;

    while (1) {
       /* wait for the "event" */
       ret = some_blocking_process();

       if (ret) {
           /* fork it! */
           pid = fork();

           if (pid == 0) {
               /* we are the child, take action! */
               some_action(ret);
               exit(EXIT_SUCCESS);
           }

           /* and the parent loops forever... */
       }
    }
}

So as you can see, the main program waits until some_blocking_process returns an int. If that int is nonzero, we consider that “the event” so we fork to create a copy of ourselves. If pid is zero, we know we are the child process so we take some_action and then simply exit. The parent process will skip that if statement, loop again and wait for some_blocking_process to signal the next event.

Zombie kids

So I may have lied to you slightly about the simplicity of this approach. The above is all well and good – it is simple – but I ran into a small snag while working with my little learner’s app…

Zombies.

Turns out, when a child process exits, it reports its return value to its parent; like any good child should. The child does this by sending a SIGCHLD signal.

The parent then knows if all or some of its spawned children finished successfully or not. This is important if you’ve got some dependant logic or simply want to log that fact.

In my case, I couldn’t care less. Succeed fail, whatever. I’m done with you kid – go away.

Double turns out, if the parent neglects to act on the signal sent from the dying child, it can remain a zombie.

I think this is poor form. I mean, come on, a negligent parent is no reason to make a process wander around aimlessly as a zombie until the next reboot.

Ok, ok, enough with the metaphor. Bottom line – all you need to do to prevent this is install a simple signal handler which will read (and ignore) the status of the child process in response to said signal.

Here’s our same example, but this time with a simple handler added:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>

static void sigchld_handler(int signum) {
    /* this just silences a compiler warning you might get since we 
     * discard the signum parameter that is passed in */
    (void) signum;

    /* the actual handling of the signal... */
    while (waitpid(-1, NULL, WNOHANG) > 0);
}

int main() {
    int ret;
    pid_t pid;

    struct sigaction sig_child;

    sig_child.sa_handler = &sigchld_handler;
    sigemptyset(&sig_child.sa_mask);
    sig_child.sa_flags = 0;
    sigaction(SIGCHLD, &sig_child, NULL);

    while (1) {
       ret = some_blocking_process();

       if (ret) {
           pid = fork();

           if (pid == 0) {
               some_action(ret);
               exit(EXIT_SUCCESS);
           }
       }
    }
}

That’s it, no more zombies.

I noticed in the source for dzen2 that they use a double-fork approach which also prevents zombies – with no need for signal handlers (yay KISS!):

if (fork() == 0) {
    if (fork() == 0) {
        //
        // child logic...
        //

        exit(EXIT_SUCCESS);
    }

    exit(EXIT_SUCCESS);
}

wait(0);

//
// continue parent logic...
//
I like this approach better.

Notes published on Mar 26, tagged with bash, gmail, linux, mutt

For me, any sort of general purpose note taking and/or keeping solution needs to meet only a few requirements:

  1. Noting something has to be quick and easy (in a terminal and scriptable)
  2. Notes should be available from anywhere… tothecloud!
  3. Notes should be searchable

Now, just to clarify – I’m not talking about classroom notes, those things go in note-books. I’m talking about short little blurbs of information I would like to keep and reference at a later time.

Though, I suppose this could work for classroom notes too…

I’m also not talking about reminders, those are the stuff of calendars, not note-keeping apps.

So what’s my solution? What else, Gmail!

Gmail

Setting up gmail as a note keeper/searcher is simple. A note is an email from me with the prefix “Note -” in the subject line. Therefore, it’s easy to setup a label and a filter to funnel note-mails into a defined folder:

From:    me@whatever.com
Subject: ^Note - 

I also add “Skip inbox” and “Mark as read” as part of the rule.

I know the gmail filters support some level of regex and/or globbing, but I don’t know where it ends. I’m hoping that the ^ anchor is supported but I’m not positive.

Requirements 2 and 3 done.

Mutt

So if taking a note is done by just sending an email of a particular consistent format, then it’s easy for me to achieve requirement 1 since I use that awesome terminal mail client mutt.

A short bash function gives us uber-simple note taking abilities by handling the boilerplate of a note-mail:

noteit() {
  _have mutt || return 1 # see my dotfiles re: _have

  local subject="$*" message

  [[ -z "$subject" ]] && { echo 'no subject.'; return 1; }

  echo -e "^D to save, ^C to cancel\nNote:\n"

  message="$(cat)"

  [[ -z "$message" ]] && { echo 'no message.'; return 1; }

  # send message
  echo "$message" | mutt -s "Note - $subject" -- pbrisbin@gmail.com

  echo -e "\nnoted.\n"
}
You could probably also streamline note taking by leveraging mutt’s -H option. I’ll leave reading that man page snippet as an exercise to the reader.

And here’s how that might work out in the day-to-day:

//blue/0/~/ noteit test note
^D to save, ^C to cancel
Note:

This is a test note.

< I pressed ^D here >
noted.

//blue/0/~/
You could also use sendmail, mailx, msmtp or whatever other CLI mail solution you want for this.

And there it is, ready to be indexed by the almighty google:

Mutt notes shot

With a few mutt macros, I think this could get pretty featureful without a lot of code.

Let me know in the comments if there are any other simple or out-of-the-box note-keeping solutions you know of.

Oh, and before anyone mentions it – no, you can’t take notes without internet when you’re using this approach. I’m ok with that, I understand if you’re not.