THE WRITER MUST EAT -> patreon.com/trn1ty <- | \ | | blah! |\ | `\|\ | the rantings and ravings |/ |(_|| | * of a depraved lunatic <^> 2024-01-20 : why mm(1) I started working on mm(1) probably around 2020-2021, when I was first acquainting myself with the inner workings of UNIX-like operating systems which I had been using for a couple years by then. I can't remember how I noticed it but it bothered me that there was this cat(1p) utility which took multiple input files and streamed them successively to standard output: [ input ] [ input ] [ input ]... |_______ | _______| _|_|_|_ | | |cat(1p)| |_______| | V standard output And then this tee(1p) utility which took from standard input and streamed its bytes to multiple outputs: standard input V ___|___ | | |tee(1p)| |_______| ______| | |__________ | | | [ output ] [ output ] [ output ]... And they were separate utilities despite both doing the job of writing input(s) to output(s). I imagined a hypothetical utility mm(1) that does it all: [ input ] [ input ] [ input ]... |_______ | _______| _|_|_|_ | | | mm(1) | |_______| ______| | |__________ | | | [ output ] [ output ] [ output ]... And attempted to write this magical "mm" (as in, "middleman") utility that would act as a "middleman" for streams before giving up (due to lack of C or POSIX API experience) for a couple years to practice making easier programs in UNIX environments. There are a couple reasons to implement cat(1p) and tee(1p) as separate utilities: 1) Ease of implementation Differentiating input arguments from output arguments would require either having a separator mark (which would be ineligant and exclude that mark from being a useable file name) or option parsing. Imagine a separator mark in the context of a hypothetical utility insouts(1): $ PS1='\n$ ' $ insouts -h Usage: insouts (input...) "][" (output...) $ printf %s\\n hello\ world hello world $ printf %s\\n hello\ world >in1 $ insouts][ $ insouts ][ ][ /dev/stdout Usage: insouts (input...) "][" (output...) $ insouts ./][ ][ /dev/stdout hello world What a mess! The file ][ can no longer easily be used with insouts(1), which may be acceptable (it's not a sensible file name anyway), but it's sacrificed for horrendously ugly syntax featuring stressfully unmatched square brackets. I've written programs that have used separator marks for arguments, namely pscat(1), psrelay(1), and psroute(1) so far, and there are a number of additional caveats that come with their particular flavor of marker and I've been hesitant about the syntax since I came up with it half a year ago. Best not to make more things about which to fret. Now imagine option parsing: $ PS1='\n$ ' $ insouts Usage: insouts (-i [input])... (-o [output])... $ insouts -i in1 hello world $ insouts -i in1 -i ][ -i out1 hello world hello world hello world This works for everything and is how mm(1) works. The issue is with regards to code itself. Imagine a very basic cat(1) implementation in C: #include int main(int argc, char *argv[]){ int c; FILE *f; int i; for(i = 1; i < argc; ++i){ if((f = fopen(argv[i])) == NULL){ perror(argv[i]); return 1; } while((c = getc(f)) != EOF) putchar(c); fclose(f); } } This doesn't conform to POSIX (which requires 'cat -u' to be supported) but illustrates the ease of using cat(1)'s arguments: For each argument, open it as a file, write it out, close it, and that's it. mm(1)'s option parsing for '-i' and '-o' alone, as of writing, are 24 lines alone, excluding the functions they call. The above program is 16 lines of code. This weight does also come from supporting "-" as a euphemism for /dev/stdin or /dev/stdout depending on whether it was used for '-i' or '-o' and trying to create an output file if it doesn't exist and without these two features that are unsupported by the above program the code for '-i' and '-o' would be considerably lighter, but the point is that option parsing adds complexity that can be avoided by simply having two utilities. Furthermore, options have drawbacks for users. 2) Ease of use One relatively common use of cat(1p) is to catenate all files matching a glob pattern. Imagine: $ PS1='\n$ ' $ ls in1 in2 in3 $ cat "$f"; done $ mm . While '-i' and '-o' are 24 lines in total, the rest of the options logic is necessary for cat(1p) and tee(1p) and is unavoidable and outweighs the '-i' and '-o' options, plus much of the '-i' and '-o' logic is still necessary in both cat(1p) and tee(1p) (supporting "-" and, in tee(1p)'s case, creating an output if it doesn't exist). Though there is additional memory juggling due to supporting arbitrary inputs and outputs, in most uses actual memory use isn't noticeably affected (10 extra bytes for 5 file arguments, or one tenth of the data used by this parenthetical statement). It is possible to write implementations of cat(1p) and tee(1p) in POSIX shell script as wrappers on mm(1) and I have done so, so users who want to use globs can simply call cat or tee as usual. mm -i input -o output tends to be intuitive for existing shell users once they learn the name "middleman". <^> No rights reserved, all rights exercised, rights turned to lefts, left in this corner of the web.