THE WRITER MUST EAT -> patreon.com/trn1ty <-
| \ | | blah!
|\ | `\|\ | the rantings and ravings
|/ |(_|| | * of a depraved lunatic
$!NAVIGATION
2024-01-20
: why mm(1)
I started working on mm(1) probably around 2020-2021, when I was first
acquainting myself with the inner workings of UNIX-like operating systems which
I had been using for a couple years by then. I can't remember how I noticed it
but it bothered me that there was this cat(1p) utility which took multiple
input files and streamed them successively to standard output:
[ input ] [ input ] [ input ]...
|_______ | _______|
_|_|_|_
| |
|cat(1p)|
|_______|
|
V
standard output
And then this tee(1p) utility which took from standard input and streamed its
bytes to multiple outputs:
standard input
V
___|___
| |
|tee(1p)|
|_______|
______| | |__________
| | |
[ output ] [ output ] [ output ]...
And they were separate utilities despite both doing the job of writing input(s)
to output(s). I imagined a hypothetical utility mm(1) that does it all:
[ input ] [ input ] [ input ]...
|_______ | _______|
_|_|_|_
| |
| mm(1) |
|_______|
______| | |__________
| | |
[ output ] [ output ] [ output ]...
And attempted to write this magical "mm" (as in, "middleman") utility that
would act as a "middleman" for streams before giving up (due to lack of C or
POSIX API experience) for a couple years to practice making easier programs in
UNIX environments.
There are a couple reasons to implement cat(1p) and tee(1p) as separate
utilities:
1) Ease of implementation
Differentiating input arguments from output arguments would require
either having a separator mark (which would be ineligant and exclude
that mark from being a useable file name) or option parsing.
Imagine a separator mark in the context of a hypothetical utility
insouts(1):
$ PS1='\n$ '
$ insouts -h
Usage: insouts (input...) "][" (output...)
$ printf %s\\n hello\ world
hello world
$ printf %s\\n hello\ world >in1
$ insouts ][
$ insouts ][ ][ /dev/stdout
Usage: insouts (input...) "][" (output...)
$ insouts ./][ ][ /dev/stdout
hello world
What a mess! The file ][ can no longer easily be used with insouts(1),
which may be acceptable (it's not a sensible file name anyway), but
it's sacrificed for horrendously ugly syntax featuring stressfully
unmatched square brackets.
I've written programs that have used separator marks for arguments,
namely pscat(1), psrelay(1), and psroute(1) so far, and there are a
number of additional caveats that come with their particular flavor of
marker and I've been hesitant about the syntax since I came up with it
half a year ago. Best not to make more things about which to fret.
Now imagine option parsing:
$ PS1='\n$ '
$ insouts
Usage: insouts (-i [input])... (-o [output])...
$ insouts -i in1
hello world
$ insouts -i in1 -i ][ -i out1
hello world
hello world
hello world
This works for everything and is how mm(1) works. The issue is with
regards to code itself. Imagine a very basic cat(1) implementation in
C:
#include
int main(int argc, char *argv[]){
int c;
FILE *f;
int i;
for(i = 1; i < argc; ++i){
if((f = fopen(argv[i])) == NULL){
perror(argv[i]);
return 1;
}
while((c = getc(f)) != EOF)
putchar(c);
fclose(f);
}
}
This doesn't conform to POSIX (which requires 'cat -u' to be supported)
but illustrates the ease of using cat(1)'s arguments: For each
argument, open it as a file, write it out, close it, and that's it.
mm(1)'s option parsing for '-i' and '-o' alone, as of writing, are 24
lines alone, excluding the functions they call. The above program is 16
lines of code. This weight does also come from supporting "-" as a
euphemism for /dev/stdin or /dev/stdout depending on whether it was
used for '-i' or '-o' and trying to create an output file if it doesn't
exist and without these two features that are unsupported by the above
program the code for '-i' and '-o' would be considerably lighter, but
the point is that option parsing adds complexity that can be avoided by
simply having two utilities.
Furthermore, options have drawbacks for users.
2) Ease of use
One relatively common use of cat(1p) is to catenate all files matching
a glob pattern. Imagine:
$ PS1='\n$ '
$ ls
in1
in2
in3
$ cat "$f"; done
$ mm . While '-i' and '-o' are 24 lines in
total, the rest of the options logic is necessary for cat(1p) and tee(1p) and
is unavoidable and outweighs the '-i' and '-o' options, plus much of the '-i'
and '-o' logic is still necessary in both cat(1p) and tee(1p) (supporting "-"
and, in tee(1p)'s case, creating an output if it doesn't exist). Though there
is additional memory juggling due to supporting arbitrary inputs and outputs,
in most uses actual memory use isn't noticeably affected (10 extra bytes for 5
file arguments, or one tenth of the data used by this parenthetical statement).
It is possible to write implementations of cat(1p) and tee(1p) in POSIX shell
script as wrappers on mm(1) and I have done so, so users who want to use globs
can simply call cat or tee as usual.
mm -i input -o output tends to be intuitive for existing shell users once they
learn the name "middleman".
$!NAVIGATION
No rights reserved, all rights exercised, rights turned to lefts, left in this
corner of the web.