AWS Sagemaker Notebooks have become a great way for me to tinker around with new ML techniques and libraries. One of the biggest annoyances with playing around with a new PyTorch model or some new Tensorflow ops is the environment setup.
Things like GPU setup, having the latest CUDA installed and configured correctly, and being able to scale to really, really large instances if needed is really nice to have sometimes. I’d always prefer to just have a nice Docker setup, but no matter how great Docker is, it can’t fake having a beefy GPU or completely remove the time to install and configure new libraries.
I have an old Linux laptop (with what once was a good GPU), but these days it doesn’t stand up to the latest & greatest.
It’s not super economical for long running tasks though! The smallest GPU machine availiable is ~$4/hr
.
One pain point everyone ever has with Sagemaker is accidentally leaving the notebooks on, and getting surprised with a bill a few weeks later.
The good news is it’s easy to prevent this with Lifecycle Scripts.
Lifecycle Scripts
Some notes on these scripts:
- Bash scripts used at certain points in the lifecycle of a notebook, namely: at creation and at start.
- Run each time notebook starts
- Details:
- Run as
root
- Notebooks run as
ec2-user
, so to get that user in shell:sudo -u ec2-user
- Notebooks run as
- Limit of 16384 characters
- Limit of 5 minute runtime
- In scripts,
$PATH
=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
- Run as
- Can setup connections to other AWS resources as well
- CloudWatch Logs for issues
- Log group:
/aws/sagemaker/NotebookInstances
- Log stream:
[notebook-instance-name]/[LifecycleConfigHook]
- Log group:
The integration with CloudWatch logs is nice - easy to debug what’s going on in your script.
Configuration Steps
- Turn off your notebook
- Create a new Lifecycle Configuration
- Edit the “Notebook Start” script with the script below
- Restart the notebook
Use the script from the official AWS samples Github, adjusting IDLE_TIME
ENV var as you desire:
#!/bin/bash
set -e
# PARAMETERS
IDLE_TIME=10800 # 3 hours
echo "Fetching the autostop script"
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
echo "Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "5 * * * * /usr/bin/python $PWD/autostop.py --time $IDLE_TIME --ignore-connections") | crontab -
And you’re done. Hope this saves anyone reading this a few dollars!
I put a few more details & notes on Github here.