The Two Best Command-Line Tools You’ve Never Heard Of

(because they’re custom hacks)

I’ve been using Unix and Linux command line tools since 1983. Yes, that’s 36 years. And in that time I’ve learned quite a few of them and forgotten most of those.

And wish I could forget others — like that time the NCR Unix System V fortune command told me “try stty 0; it works better!”… and so like I dope I naturally tried it. It turns out this command means “hang up all terminals”, and for some bizarre reason required no root privilege to run. I was not the popular guy at work that day. (But those were also wild west days in which a major corporation would leave an Easter egg like that lying around its commercial product.)

So after all those years, it might surprise you to know that my two favorite command-line tools that are go-to tools every day for a wide variety of uses are not even standard. They’re so non-standard in fact that, as far as I know I’m the only one that uses them (although it seems likely others have also invented them independently).

The two tools are basically modifications of existing tools. One is a shell wrapper for an existing tool (awk), and the other is an alias involving sort and uniq.

The field command shell script

The field command is a filter that selects fields from regularly organized input (e.g. log files, or passwd files, or csv files). It’s usage is straight-forward.

So “cat input | field 1 7” will show you the first and seventh space-separated fields from every line in input. But if you don’t want space as the separator, throw it in as the only available option: “cat input.csv | field -, 8” or “cat /etc/passwd | field -: 4”. (Note that it’s only a filter, it doesn’t handle filenames directly, but I almost always end up using it after a grep anyway.)

So sure, you might be saying, awk already does all of this. And really it’s just a streamlining of having to type “awk -F: ‘{ print $4 }’”, but lets face it that’s awkward to type. Anyway, I use this command a lot. It’s one of the most useful things I’ve ever done. But it’s made all the more useful by the other great command, which is an alias:

(That’s the tcsh version anyway, just add in an equals sign for bash.) That seems equally trivial, and what’s with using sort twice? Well this is a quickly-cobbled-together histogram command. It takes the input, and counts each unique input line (by first sorting, and then using uniq -c), and shows them to you in order of least to most occurrences (the second sort -n). I had no idea after throwing this alias together the first time how often I’d use it.

Almost everything I do with any large bunch of log data or other regular data will at some point involve me wanting to count how often something occurs. That’s what susn does.

Suppose we’re being hammered by hackers trying to randomly find known exploits (which we don’t have) on our web servers. “grep ‘ 404 ’ access_log.* | field 1 | susn” will give me a list of addresses, and how many attempts they made, sorted by number of attempts.

But we also might want to know when these attacks started and when they were at their worst. The date/time field is the fourth field in standard web logs, and it is a long string of “yyyy/Mon/dd:hh:mm:ss”. So: “grep 111.222.33.44 access_log | field 4 | field -: 2 | susn” will give me a histogram of when this IP was active, by the hour. The first “field 4” grabs the date/time string out of the web log, and the “field -:2” pulls the hour out of the date time string. Effectively this is using multiple instances of the field command to slice down a string based on pattern, and in this sense, is a replacement for the sed command.

I can make the same histogram with days instead of hours, but note that the day (dd in the format shown above), then it’s a problem because dd has ‘/’ on the left and ‘:’ on the right. The simple solution is yet another field command: “grep 111.222.33.44 access_log* | field 4 | field -/ 3 | field -: 1 | susn” does what I need. It grabs the date time field. Then it grabs the “dd:hh:mm:ss” section with “field -/ 3”, followed stripping away “hh:mm:ss” with “field -: 1”.

The field command can simplify your grep also. I actually have another alias, “getgrep”, that I use to grep from log files sometimes. It insures that the thing I’m trying to grep is only within the URL field (and not for example in the referrer or user-agent fields). But I don’t even use it that much, because I can do this with the field command too, simply by rearranging the fields and stripping away the ones I don’t care about: “cat access.log | field 7 1 | grep \? | field 2 | susn” will generate a list of all hosts (field 1 in the web log) for requests that included query information (i.e a question mark in field 7). Once the grep has done it’s job on field 7, I use another field command to throw away that part of the line and keep only the hostnames. And this solution is more general then something like getgrep, because I can apply it to any field. If I want to match the user field without risking matching the URL or other fields, I only select that field first and then grep. Much cleaner than the nasty regular expression required to avoid matching the wrong thing.

These commands are useful for almost any log file. You want a summary of who is generating failed logins? Use grep to find the appropriate sshd lines in /var/log/messages, and then the field command to pull the usernames out and count them with susn. Do you want to analyze stuff in a database? SQL might be fine for some things, but if you’re already sitting there with a csv version of a big table, some of the simple stuff you want will be handled with with grep, field, and susn.

I created both of these aliases after a particularly intense day of web log analysis, and after typing the same long tedious commands over and over again. I honestly first thought it was just a one-time thing, and I’d never go back to them, but once I was in this mindset, I just kept reaching for these tools again and again. Hopefully you’ll find this as useful as I have.

Just a guy with too many interests.