Why we need the shell
Why the shell is awful
What we can do about it
Some use cases
The Safe Shell Script idea
I assume you've all studied this.
The shell is for starting an app
What are the alternatives?
upstart
or systemd
with the init.d
directory
supervisord
[http://supervisord.org]
getty
-- the login prompt -- which can run an app when you login
When booted it runs one and only one app
This is how embedded devices work:
Boot will load the kernel (PID=0) and the drivers
Then start the one-and-only app (with PID=1)
Use supervisord
Define the processes you want run in the /etc/supervisord.conf
file.
Boot will load the kernel and the drivers
Then start python supervisord
as the one-and-only app
supervisord
will make sure your processes are always running
Developers want to start more than one app
In the olden days of exactly one terminal
Start an app; exit the app; start another app
This is the use case -- the only use case
Here's the ONLY use case for the shell.
import pathlib, subprocess
while True:
app = pathlib.Path(input("$ "))
try:
subprocess.run(app)
except Exception as ex:
print(ex)
Only
To provide a good "User Experience."
The UX is focused on ease-of-use.
Ease of interactive use on a Model 33 Teletype By Rama & Musée Bolo - Own work, CC BY-SA 2.0 fr
Previous Example requires full paths: /bin/ls
. Not so easy to use.
"programming" is (almost) an after-thought.
It's really hard to find a shell feature that's obviously "over the top".
They're all useful.
My advice is Safe Shell Scripts are Small.
This is where the trouble begins: "Why write a program?"
Example: Removing a file.
Writing, compile, test, deploy a C program.
It's the unlink()
function. A dozen lines of code? Fewer?
Use /bin/rm
program that someone else wrote.
An app's "environment"...
Shell environment variables
The Current Working Directory (expand relative paths to absolute)
Current user and group
Effective user and (and group) after setuid
stdin, stdout, stderr
Set the environment.
Start the app.
Anything over a few (about 3) lines of code is a bad idea.
%%sh
# myapp.sh
export MYAPP_HOME='/Users/slott/Documents/Writing/Python/Bashing the Bash/myapp-v1.2.3'
source "${MYAPP_HOME}/env_prod.sh"
python "${MYAPP_HOME}/myapp" $*
MyApp Settings MYAPP_LOG='/myapp/db/log' MYAPP_ENV='/myapp/db/prod' MYAPP_HOME='/Users/slott/Documents/Writing/Python/Bashing the Bash/myapp-v1.2.3'
No unit test framework
Only data structure is a string (split on spaces to make a list-like thing.)
Bizarro-world syntax and quoting rules
No easy way to have stateful objects
Resource intensive run-time
Quality Issues
testability (i.e., no unit test framework)
too much configurability (Tweaking the script just this once)
reliability (worked for me)
Concurrent Pipelines.
app1.py <input.txt | app2.py >output2.txt
More General Concurrent Processing.
app1.py <input.txt & app2.py <output1.txt
a & b
concurrentlya | b
connected as a pipelinea ; b
sequentiallya && b
conditionally if a succeedsa || b
conditionally if a fails(a & b) >log && c
Applying the redirect to the composite of two stepsThese are cool.
Stay safe. Keep them small.
Do you have a unit testing framework for your shell scripts?
The answer is almost always "no" and 😭 that's why people like them.
(There are some unit testing frameworks. It's not impossible.)
Realistically, it's easier to test code with mock OS objects than to mess with a shell script where you forgot a mock and oopsie trashed the database.
echo What about ${MY_FAVORITE_FEATURE}?
import os; print(f"What about {os.environ['MY_FAVORITE_FEATURE']}?")
Good point...
"""What about my favorite feature notification
>>> os.environ['MY_FAVORITE_FEATURE'] = "echo"
>>> main()
What about echo?
"""
import os
def main() -> None:
print(f"What about {os.environ['MY_FAVORITE_FEATURE']}?")
if __name__ == "__main__":
main()
It has a test case.
It runs on all OS's.
A stand-alone "echo" program is a symptom of "shell first" thinking.
Why did you need echo?
Debugging? Logging? Audit?
Focus on the real use case. Ask "Why?" five times.
Remember: No One Wins at Code Golf
The two-char commands: mv
, rm
, cp
to perform file-system operations
Conditional Processing: if-fi
, &&
, ||
, case-esac
Iterative Processing: while-do-done
, for-do-done
Math (Seriously?)
Date/Time
The find
Command: A nested world of horror
The grep
| sed
| awk
unholy mess of fake programming
Parsing JSON/HTML/XML/TOML/CSV etc.
Concurrent Pipelines: app | app
and app & app
mv a b
is pathlib.Path("a").rename("b")
rm a
is pathlib.Path("a").unlink()
cp a b
is tricky. (--preserve
in particular.)
Often, it's this:
pathlib.Path("b").write_bytes(pathlib.Path("a").read_bytes())
If you want the --preserve
semantics, use shutil
.
There's a limit: mount
, unmount
, etc., aren't simple path manipulations.
So many syntax alternatives. if-fi
, &&
, ||
, case-esac
The shell is awful.
Python has if
and match
. Use those.
Consider this
app1.py <input.txt >output.txt && cp output.txt ${BACKUP}/output.txt
First. Why doesn't app1
handle this "save a backup copy"?
Seriously. Why constrain app1
to stdout only? Why separate the "backup" consideration?
import pathlib
import subprocess
import os
def app1() -> int:
with pathlib.Path("input.txt").open() as input:
with pathlib.Path("output.txt").open("w") as output:
p = subprocess.run(["python", "app1.py"], stdin=input, stdout=output, check=False)
return p.returncode
def cp() -> int:
source = pathlib.Path("output.txt")
target = pathlib.Path(os.environ.get("BACKUP", "/tmp")) / source.name
target.write_bytes(source.read_bytes())
return 0
def main() -> None:
r1 = app1()
if r1 == 0: # The && operator
r2 = cp()
# exit(r2) # Spooks Jupyter Lab
if __name__ == "__main__":
main()
It's testable code.
It's readable code.
It's slighly faster than the shell.
You can import
this module when doing "programming in the large."
Most important: Want logging? Audit? Debugging?
You can add that to an app.
You'll often struggle to add it everywhere it's needed in a script
Remember: No One Wins at Code Golf
for f in *.txt
do
nm=${f##.*}
b=${n%.*}
e=${n##*.}
app2.py ${f} >${b}_a2.${e} || echo "Problem with ${f}"
done
This is Audit? Debugging? What's really going on?
Is it okay the return code from app2.py
is lost?
And where does this log go? What's done with it? Who does the remedial processing?
Ask "Why?". Repeatedly.
import pathlib
import subprocess
import os
def app2(file: pathlib.Path) -> int:
output_path = pathlib.Path.cwd() / f"{file.stem}_a2{file.suffix}"
with file.open() as input:
with output_path.open('w') as output:
p = subprocess.run(["python", "app2.py"], stdin=input, stdout=output, check=False)
return p.returncode
def echo(file: pathlib.Path) -> int:
print(f"Problem with {file!s}")
return 0
def main() -> None:
for f in pathlib.Path.cwd().glob("*.txt"):
r1 = app2(f)
if r1 != 0: # The || operator
r2 = echo()
# exit(0)
if __name__ == "__main__":
main()
You cannot rationally do math with the shell.
The expr
program is crazy.
The test
program, also known as [
, is crazier.
The dc
and bc
programs are obscure.
If you think you need awk
. Stop. Draw the line there and use Python.
This, however, is ok:
Technically, not the shell. Mostly a feature of GNU/Linux.
%%sh
dc -e '5k 355 113 / p'
3.14159
Where you can handle exceptions and edge cases cleanly.
import datetime
import pathlib
def log_path() -> pathlib.Path:
now = datetime.datetime.now(datetime.timezone.utc)
path = (
pathlib.Path.cwd() / now.strftime("%Y-%m-%d")
).with_suffix(".log")
return path
log_path()
PosixPath('/Users/slott/Documents/Writing/Python/Bashing the Bash/2022-06-12.log')
find
command¶This is an entire script in a unique, distinct syntax. The shell is awful.
find . -name '*.txt' -exec app2.py <{} >{}.out \;
The relevant feature is recursive descent through a directory tree.
pathlib
¶And glob("**...")
for recursive descent.
import pathlib
import subprocess
class App2:
def run(self, input: pathlib.Path, output: pathlib.Path) -> None:
with input.open() as source:
with output.open('w') as target:
subprocess.run(["python", "app2.py"], stdin=source, stdout=target, check=True)
def find_and_exec():
app2 = App2()
for path in pathlib.Path.cwd().glob("**/*_a2.txt"):
app2.run(path, pathlib.Path(f"{path.name}.out"))
def main():
find_and_exec()
if __name__ == "__main__":
main()
Or this...
find . \( -name '*_a2.txt' -or -name '*.txt.out' \) -print -delete >cleanup.log
import pathlib
import itertools
import contextlib
def main():
matches = itertools.chain(
pathlib.Path.cwd().glob("**/*_a2.txt"),
pathlib.Path.cwd().glob("**/*.txt.out")
)
for path in matches:
print(path)
path.unlink()
if __name__ == "__main__":
log = pathlib.Path("cleanup.log")
with log.open("w") as log_file:
with contextlib.redirect_stdout(log_file):
main()
The grep
app, like ls
, is acceptable for interactive use.
Interactive
But. Not this.
for d in ${DIRECTORIES}
do
fixups=`grep --with-filename 'print\s*(' ${d}/*.py | awk -F: '{print $1}' | sort -u`
for f in fixups
do
echo fixing ${f}
cat $f | sed 's/print/logger.info/' | awk 'BEGIN {print "import logging\nlogger = logging.getLogger(__file__)\n"} {print $0}' > ${f}.new
done
done
What are you trying to do?
Find and update all Python modules with print?
Don't parse Python (or HTML or JSON) with grep
.
Don't make a bewilderingly opaque shell script.
Even if you think it's a one-time special case that will never be used again.
There are never one-time special cases.
Extracting from JSON/HTML/XML/TOML/CSV
Often... RESTful clients done poorly.
repos=`curl https://api.github.com/users/slott56 | jq '.repos_url'`
curl $repos
Don't.
So many unhandled edge cases...
You have app | app
or app & app &
in the shell.
And it's not a grep | awk | sed
kind of thing
Real long-running apps. Not fake programming via pipeline hackery.
Use https://cgarciae.github.io/pypeln/ or https://www.dask.org or https://github.com/pytoolz/toolz
The shell's pipeline is something it does well. Hard to draw a line here
The only thing the shell should be used for is to launch your Python apps.
Safe Shell Scripts are Short
Short Means: (1) Set the environment; (2) Start Python.
Mgr: "It's a simple one-time script. You don't need to write an app."
Dev: "Testing is essential."
Mgr: "It won't have a catastrophic impact. You can simply clean up any problems."
Dev: "It's another example of the X and Y scripts we ran last month." ⬅️ This 💯
Mgr: "Right. It's a simple copy and paste X or Y, making simple changes for this."
Dev: "A generalization-specialization is best handled with an OO programming language. Python."
continued...
Mgr: "It's simple filesystem changes to cleanup a known bug."
Dev: "Pathlib and shutil do these."
Mgr: "It's a simple search throuh JSON files."
Dev: "JSON parsing is a first-class part of Python. Also. Nothing is simple, no matter how many times you say it."
Places we justify shell scripts:
First-class part of the app. Written in the shell for no good reason. (The bad reason is to avoid testing.)
OS administrative part of an app. Allow admins to tweak the shell scripts. (Unrealiable after manual tweaks.)
Semi-permanent Bug-fixes and workarounds. (Architectural nightmare: what system of record owns the hack?)
Cleanup after installation or upgrade. (Auditing nightmare.)
Installation...
TL;DR: failing to include testability, configurability, or reliability
Alternative solutions the Bootstrap Problem.
Assume a given shell is part of the OS. zsh? bash? (Pick one & hope.)
Install some tooling prior to installing your app.
With miniconda, install Python
With Python do your installation, using real programming.
If miniconda won't install? Your shell script wouldn't have worked, either.
Install Docker
py2app
and py2exe
Define classes following the Command pattern for stateful operations.
Define a composite sequence-of-command object for multi-step operations.
Use a processing pool (via concurrent.futures
) for concurrent operations.
Formalize configurations via Python module that's included into the app.
All modules use logging
to define loggers, only a single top-level command-line app does the configuration.
Follow the git
pattern with a (single) parent app that includes all the children and admin things and tasks and special cases and workarounds and cleanups and extensions.