Amazon Web Services(AWS) offer great many services to suit user needs. For ML/DL all one needs is a computer with access to a Graphical Processing Unit(GPU). Since AWS offers a virtual PC with GPU at a reasonable price, we will use their service. This blog post will use an AWS EC2 instance to make it ready for running ML tasks.

Pre-requisites: This guide expects you to be familiar with linux(Ubuntu) environment and activated your AWS account.

Fast.ai

We will use Fastai library to make our ML models. Fastai is a open source software library for deep learning. Developing ML models using fastai makes it much easier than just Pytorch.

Note: This post is similar to the fastai’s guide for AWS setup but with added commentary and updated final instructions.

AWS EC2

AWS EC2 is a service that provides a server PC for our use. It comes with a template called amazon machine image(AMI) that makes it so easy to launch a virtual cloud server in less than two minutes.

For our ML purpose, we will use g4dn.xlarge EC2 on-demand instance. This particular instance has a 16GB Nvidia T4 GPU. For more info on instances, check out this page at AWS.

Note: Servers in regions(also known as availability zones(AZ)) such as North Virginia(N. Virginia), Ohio offer the chepeast prices for g4dn instances starting at $0.526 per hour. Check out the pricing here.

Requesting service limit increase

To avoid misuse of GPU, amazon restricts its usage. However, we can request for service limit increase with a support ticket. It is important to remember, service limit needs to be increased for each region. The number of vCPUs an instance has, is a good measure of its capacity. In this case, request 16vCPUs(upto g4dn.4xlarge) in the support ticket. Follow this step to get access to a GPU.

Tip: Usually a description of ‘to use for ML/DL course and training models’ would be an acceptable reason for approval.

Import key pair to AWS EC2 region

Use this amazon guide to import generate and import rsa key pair to an AWS EC2 region or AZ. It is important to store the private key in a secure directory.

Launch instance

Use this fastai step to launch an g4dn.xlarge instance.

Connect to instance

Using the public IP of the instance and rsa private key we can login into our instance.

Login using

ssh -i <path-to-pem-file> ubuntu@<ip-address>

The username is ubuntu for ubuntu EC2 instance, ec2-user for amazon AMI, and so on. We will be using ubuntu ec2 g4dn instance.

Example:

ssh -i ~/.ssh/myprivatekey.pem ubuntu@<ip>

You may be prompted about trusting this address, to which you should reply ‘yes’.

Solution to possible shell environment problems: In some cases if the host terminal is using a different XTERM environment such as xterm-kitty(echo $TERM), that environment is reflected in the remote EC2 instance. If that is the case, it best to use this command:

TERM='xterm-256color' ssh -i <path-to-pem-file> ubuntu@<ip-address>

Also a possibility would be to add the TERM env to bashrc file.

echo 'export TERM=xterm-256color' >> ~/.bashrc
source ~/.bashrc

Why the need to do to set the TERM env? Because, remote host will have the basic env set and the keystrokes such as previously run command(up arrow) would return a scrambled output on the screen.

Setup Ubuntu server

First, to do basic ubuntu configurations, such as updating packages, and turning on auto-updates, execute:

sudo apt update && sudo apt -y install git
git clone https://github.com/dr563105/fastsetup.git
cd fastsetup
sudo ./ubuntu-initial.sh

The setup shell script will create a new user inside the cloud server, installs/updates the necessary packages and libraries, installs firewalls, and sets up ssh.

Note: This new user creation is not to be confused with IAM user created by AWS root user.

Reboot when prompted. Wait a couple of minutes for reboot, then ssh back in.

To reconnect using ssh, add an additional -L flag which will allow you to open up ports to connect to Jupyter Notebook once it’s installed:

ssh -i <path-to-pem-file> -L localhost:8888:localhost:8888 ubuntu@<ip-address>

(Optional): Change default shell to ZSH with Oh My ZSH:

sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended

Change default SHELL to ZSH using

sudo chsh $USER -s /bin/zsh

When prompted for password, use the password entered at the user creation step during the setup process.

Logout and login again to activate ZSH shell. It is possible to switch without disconnecting but I find this step simpler to other solutions.

Setup Conda

Conda is an open source package management system to setup environments irrespective of the OS. One of the major advantages of conda is that it easily installs all the dependencies for a package. Anaconda provides a conda installer with thousands of packages. However, for our case, we will use Miniconda, a minimal installer of conda.

cd fastsetup
./setup-conda.sh

Note: Deviates from fastai’s fastsetup setup-conda.shscript in creating a .condarc file. My experiments have shown this file to be troublesome.

If you’ve worked with conda installs before, you’ll know that it is too slow. Mamba is a reimplementation of the conda package manager in C++. So we install that next.

source ~/.bashrc (or source ~/.zshrc)
conda install mamba -n base -c fastchan

We use fastchan as the channel source. fastchan is an anaconda pacakage from fastai team. A detailed post on fastchan from weights and biases.

Don’t mix conda-forge with fastai channels. Stick to either fastchan or anaconda.

Install Nvidia

It is much faster to train ML models in a GPU. Currently only Nvidia GPUs are supported. Since our instance has a GPU, we need to install its drivers and activate it.

Use this command to list available drivers: ubuntu-drivers devices

Tip: Choose the “recommended” option, plus the -server suffix.

When you install “470” might be a different number, based on ubuntu-drivers output above

sudo apt-fast install -y nvidia-driver-470-server
sudo modprobe nvidia
nvidia-smi

When prompted for a password, enter the password you entered while you ran the ubuntu setup script.

Note: The command installs cuda 11.4. This is not to be confused with cudatoolkit=11.1 which is needed for pytorch=1.9. Also each pytorch conda install comes with its own cuDNN runtime. So installing cuDNN separately is not needed. cuda 11.4 comes to play if pytorch is built from source.

Create conda environment

It is good practice to install the necessary packages in a new conda environment. Installing everything in the conda base environment is not advisable.

conda create -n <envname> -y
conda create -n mlenv -y
conda activate mlenv

Install fastbook with all its dependencies including CUDA enabled pytorch libraries

Now you’re ready to install all needed packages for the fast.ai course: Make there is enough space to install(df -h). You need about 15GB of space.

mamba install fastbook python=3.8 -c fastai -c fastchan -y
conda install pytorch torchaudio torchvision python=3.8 cudatoolkit=11.1 -c fastchan -y

Fastbook is a fastai’s book on using fastai for ML development. As a python package, it installs all the dependent packages. To see what it install remove -y from the command.

Usually one command(mamba install fastbook) was enough previously to install everything. However, there seems to be package conflicts when executed now. Hence, the additional conda install step which upgrades pytorch, torchvision to the latest GPU version.

Tip: For dry run, use ‘-d’ argument in the mamba install command.

Note: Installing fastbook doesn’t mean you must use fastai. It just installs everything needed for ML tasks. If you’ve noticed carefully while installing, it is Pytorch, Cuda packages that require huge memory space. Other packages come just under 2 GB.

Sanity checks:

These checks are to verifying if indeed CUDA enabled pytorch is installed correctly. Enter these commands inside a python shell.

$which python
/home/ubuntu/miniconda3/envs/mlenv/bin/python
$python --version
Python 3.8.5
$python
>>>import torch
>>>torch.version.cuda
'11.1'
>>>torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>>torch.cuda.current_device()
0
>>>torch.cuda.get_device_name(0)
'Tesla T4'

If the commands return the same results, then we’re ready for ML development.

Run jupyter notebook

To download the notebooks, run:

cd
git clone https://github.com/fastai/fastbook
cd fastbook
jupyter notebook

Click on the localhost url that is displayed. It will open iPython notebook in your default browser. Alternatively, that link can be copied and opened in any browser of choice.

There are other Jupyter notebook guides that require to add a password for access. So far I’ve not needed those steps for my tasks.

(Optional) Email Setup:

To set up email:

sudo ./opensmtpd-install.sh

To test email, create a text file msg containing a message to send, then send it with:

cat msg |  mail -r "x@$(hostname -d)" -s 'subject' EMAIL_ADDR

Replace EMAIL_ADDR with an address to send to. You can get a useful testing address from mail-tester.

Stopping the instance

After you have finished your work, unless a training is going on, it is critical to stop the instance, if you wish to save your AWS bill.

Use either EC2 dashboard to stop the instance(instance->actions->instance state->stop) or use this command in the terminal

sudo shutdown -h now

Note: Pressing terminate will remove the instance completely and your work will be lost forever.

Wrapping up

There it is: an instance that is ready for ML development. If you have questions or feedback, reach me through the comments or via twitter.