Shutting Down Idle Sagemaker Notebooks with Lifecycle Configurations

AWS Sagemaker Notebooks have become a great way for me to tinker around with new ML techniques and libraries. One of the biggest annoyances with playing around with a new PyTorch model or some new Tensorflow ops is the environment setup.

Things like GPU setup, having the latest CUDA installed and configured correctly, and being able to scale to really, really large instances if needed is really nice to have sometimes. I’d always prefer to just have a nice Docker setup, but no matter how great Docker is, it can’t fake having a beefy GPU or completely remove the time to install and configure new libraries.

I have an old Linux laptop (with what once was a good GPU), but these days it doesn’t stand up to the latest & greatest.

It’s not super economical for long running tasks though! The smallest GPU machine availiable is ~$4/hr.

One pain point everyone ever has with Sagemaker is accidentally leaving the notebooks on, and getting surprised with a bill a few weeks later.

The good news is it’s easy to prevent this with Lifecycle Scripts.

Lifecycle Scripts

Some notes on these scripts:

  • Bash scripts used at certain points in the lifecycle of a notebook, namely: at creation and at start.
  • Run each time notebook starts
  • Details:
    • Run as root
      • Notebooks run as ec2-user, so to get that user in shell: sudo -u ec2-user
    • Limit of 16384 characters
    • Limit of 5 minute runtime
    • In scripts, $PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
  • Can setup connections to other AWS resources as well
  • CloudWatch Logs for issues
    • Log group: /aws/sagemaker/NotebookInstances
    • Log stream: [notebook-instance-name]/[LifecycleConfigHook]

The integration with CloudWatch logs is nice - easy to debug what’s going on in your script.

Configuration Steps

  1. Turn off your notebook
  2. Create a new Lifecycle Configuration
  3. Edit the “Notebook Start” script with the script below
  4. Restart the notebook

Use the script from the official AWS samples Github, adjusting IDLE_TIME ENV var as you desire:

set -e

IDLE_TIME=10800  # 3 hours

echo "Fetching the autostop script"
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py

echo "Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "5 * * * * /usr/bin/python $PWD/autostop.py --time $IDLE_TIME --ignore-connections") | crontab -

And you’re done. Hope this saves anyone reading this a few dollars!

I put a few more details & notes on Github here.