Setting up AWS EC2 instance for ML
A guide to setup an AWS EC2 GPU instance for Machine learning tasks.
- Fast.ai
- AWS EC2
- Launch instance
- Connect to instance
- Setup Ubuntu server
- Setup Conda
- Install Nvidia
- Create conda environment
- Install fastbook with all its dependencies including CUDA enabled pytorch libraries
- Sanity checks:
- Run jupyter notebook
- (Optional) Email Setup:
- Stopping the instance
- Wrapping up
Amazon Web Services(AWS) offer great many services to suit user needs. For ML/DL all one needs is a computer with access to a Graphical Processing Unit(GPU). Since AWS offers a virtual PC with GPU at a reasonable price, we will use their service. This blog post will use an AWS EC2 instance to make it ready for running ML tasks.
Pre-requisites: This guide expects you to be familiar with linux(Ubuntu) environment and activated your AWS account.
Fast.ai
We will use Fastai library to make our ML models. Fastai is a open source software library for deep learning. Developing ML models using fastai makes it much easier than just Pytorch.
Note: This post is similar to the fastai’s guide for AWS setup but with added commentary and updated final instructions.
AWS EC2
AWS EC2 is a service that provides a server PC for our use. It comes with a template called amazon machine image(AMI) that makes it so easy to launch a virtual cloud server in less than two minutes.
For our ML purpose, we will use g4dn.xlarge
EC2 on-demand instance. This particular instance has a 16GB Nvidia
T4 GPU. For more info on instances, check out this page at AWS.
Note: Servers in regions(also known as availability zones(AZ)) such as North Virginia(N. Virginia), Ohio offer the chepeast prices for
g4dn
instances starting at $0.526 per hour. Check out the pricing here.
Requesting service limit increase
To avoid misuse of GPU, amazon restricts its usage. However, we can request for service
limit increase with a support ticket. It is important to remember, service limit needs to be increased for each region.
The number of vCPUs an instance has, is a good measure of its capacity. In this case,
request 16vCPUs(upto g4dn.4xlarge
) in the support ticket.
Follow this step to get access to a GPU.
Tip: Usually a description of ‘to use for ML/DL course and training models’ would be an acceptable reason for approval.
Import key pair to AWS EC2 region
Use this amazon guide to import generate and import rsa key pair to an AWS EC2 region or AZ. It is important to store the private key in a secure directory.
Launch instance
Use this fastai step to launch an
g4dn.xlarge
instance.
Connect to instance
Using the public IP of the instance and rsa private key we can login into our instance.
Login using
ssh -i <path-to-pem-file> ubuntu@<ip-address>
The username is ubuntu for ubuntu EC2 instance, ec2-user for amazon AMI, and so on. We will be using ubuntu ec2 g4dn instance.
Example:
ssh -i ~/.ssh/myprivatekey.pem ubuntu@<ip>
You may be prompted about trusting this address, to which you should reply ‘yes’.
Solution to possible shell environment problems: In some cases if the host terminal is using a different XTERM environment such as xterm-kitty
(echo $TERM), that environment is reflected
in the remote EC2 instance. If that is the case, it best to use this command:
TERM='xterm-256color' ssh -i <path-to-pem-file> ubuntu@<ip-address>
Also a possibility would be to add the TERM env to bashrc file.
echo 'export TERM=xterm-256color' >> ~/.bashrc
source ~/.bashrc
Why the need to do to set the TERM env? Because, remote host will have the basic env set and the keystrokes such as previously run command(up arrow) would return a scrambled output on the screen.
Setup Ubuntu server
First, to do basic ubuntu configurations, such as updating packages, and turning on auto-updates, execute:
sudo apt update && sudo apt -y install git
git clone https://github.com/dr563105/fastsetup.git
cd fastsetup
sudo ./ubuntu-initial.sh
The setup shell script will create a new user inside the cloud server, installs/updates the necessary packages and libraries, installs firewalls, and sets up ssh.
Note: This new user creation is not to be confused with IAM user created by AWS root user.
Reboot when prompted. Wait a couple of minutes for reboot, then ssh
back in.
To reconnect using ssh, add an additional -L flag which will allow you to open up ports to connect to Jupyter Notebook once it’s installed:
ssh -i <path-to-pem-file> -L localhost:8888:localhost:8888 ubuntu@<ip-address>
(Optional): Change default shell to ZSH with Oh My ZSH:
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended
Change default SHELL to ZSH using
sudo chsh $USER -s /bin/zsh
When prompted for password, use the password entered at the user creation step during the setup process.
Logout and login again to activate ZSH shell. It is possible to switch without disconnecting but I find this step simpler to other solutions.
Setup Conda
Conda is an open source package management system to setup environments irrespective of the OS. One of the major advantages of conda is that it easily installs all the dependencies for a package. Anaconda provides a conda installer with thousands of packages. However, for our case, we will use Miniconda, a minimal installer of conda.
cd fastsetup
./setup-conda.sh
Note: Deviates from fastai’s fastsetup
setup-conda.sh
script in creating a.condarc
file. My experiments have shown this file to be troublesome.
If you’ve worked with conda installs before, you’ll know that it is too slow. Mamba is a reimplementation of the conda package manager in C++. So we install that next.
source ~/.bashrc (or source ~/.zshrc)
conda install mamba -n base -c fastchan
We use fastchan
as the channel source. fastchan
is an anaconda pacakage
from fastai team. A detailed post on fastchan from weights and biases.
Don’t mix
conda-forge
withfastai
channels. Stick to eitherfastchan
oranaconda
.
Install Nvidia
It is much faster to train ML models in a GPU. Currently only Nvidia GPUs are supported. Since our instance has a GPU, we need to install its drivers and activate it.
Use this command to list available drivers: ubuntu-drivers devices
Tip: Choose the “recommended” option, plus the
-server
suffix.
When you install “470” might be a different number, based on ubuntu-drivers
output above
sudo apt-fast install -y nvidia-driver-470-server
sudo modprobe nvidia
nvidia-smi
When prompted for a password, enter the password you entered while you ran the ubuntu setup script.
Note: The command installs
cuda 11.4
. This is not to be confused withcudatoolkit=11.1
which is needed forpytorch=1.9
. Also each pytorch conda install comes with its owncuDNN
runtime. So installingcuDNN
separately is not needed.cuda 11.4
comes to play if pytorch is built from source.
Create conda environment
It is good practice to install the necessary packages in a new conda
environment. Installing everything in the conda base
environment is not advisable.
conda create -n <envname> -y
conda create -n mlenv -y
conda activate mlenv
Install fastbook with all its dependencies including CUDA enabled pytorch libraries
Now you’re ready to install all needed packages for the fast.ai course:
Make there is enough space to install(df -h
). You need about 15GB of space.
mamba install fastbook python=3.8 -c fastai -c fastchan -y
conda install pytorch torchaudio torchvision python=3.8 cudatoolkit=11.1 -c fastchan -y
Fastbook is a fastai’s book on using fastai for ML development. As a python package, it installs all the dependent packages.
To see what it install remove -y
from the command.
Usually one command(mamba install fastbook
) was enough previously to install everything.
However, there seems to be package conflicts when executed now. Hence, the additional
conda install step which upgrades pytorch, torchvision to the latest GPU version.
Tip: For dry run, use ‘-d’ argument in the mamba install command.
Note: Installing fastbook doesn’t mean you must use fastai. It just installs everything needed for ML tasks. If you’ve noticed carefully while installing, it is Pytorch, Cuda packages that require huge memory space. Other packages come just under 2 GB.
Sanity checks:
These checks are to verifying if indeed CUDA enabled pytorch is installed correctly. Enter these commands inside a python shell.
$which python
/home/ubuntu/miniconda3/envs/mlenv/bin/python
$python --version
Python 3.8.5
$python
>>>import torch
>>>torch.version.cuda
'11.1'
>>>torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>>torch.cuda.current_device()
0
>>>torch.cuda.get_device_name(0)
'Tesla T4'
If the commands return the same results, then we’re ready for ML development.
Run jupyter notebook
To download the notebooks, run:
cd
git clone https://github.com/fastai/fastbook
cd fastbook
jupyter notebook
Click on the localhost url that is displayed. It will open iPython notebook in your default browser. Alternatively, that link can be copied and opened in any browser of choice.
There are other Jupyter notebook guides that require to add a password for access. So far I’ve not needed those steps for my tasks.
(Optional) Email Setup:
To set up email:
sudo ./opensmtpd-install.sh
To test email, create a text file msg
containing a message to send, then send it with:
cat msg | mail -r "x@$(hostname -d)" -s 'subject' EMAIL_ADDR
Replace EMAIL_ADDR
with an address to send to. You can get a useful testing address from mail-tester.
Stopping the instance
After you have finished your work, unless a training is going on, it is critical to stop the instance, if you wish to save your AWS bill.
Use either EC2 dashboard to stop the instance(instance->actions->instance state->stop) or use this command in the terminal
sudo shutdown -h now
Note: Pressing terminate will remove the instance completely and your work will be lost forever.
Wrapping up
There it is: an instance that is ready for ML development. If you have questions or feedback, reach me through the comments or via twitter.