Software Infrastructure

Software Infrastructure#

There are many ways to get a job done using Python, i.e. different IDEs (Colab, VS Code, PyCharm, Spyder, etc.), different libraries (numpy, PyTorch, JAX, etc.), different hardware (CPU, GPU, TPU, parallel CPU/GPU/TPU, etc.). This brief overview should help you find the best setup given your hardware and operating system.

GPU vs CPU#

If your hardware does include a CUDA GPU (How to know if my GPU supports CUDA), then you should probably use it. Note, MacBooks do not have CUDA GPUs and Windows machines require a somewhat different setup than what we will use in the exercises (e.g. Installing Pytorch with CUDA support on Windows 10). Our setup during the exercises is based on Ubuntu 22.04.

If you do not have a CUDA GPU, you should consider using Google Colab with GPU runtime, see How to use Colab. On a MacBook you have access to the Apple GPU, which is fully supported by PyTorch. Follow the instructions by Apple, and the PyTorch team to install the right PyTorch version on your MacBook, check after the installation has completed that PyTorch utilizes the Apple GPU correctly by performing this test.

Why GPU?#

By moving the core computations to the GPU, the training of deep learning models can be accelerated by more than one order of magnitude.
With PyTorch it is relatively straightforward to implement that. You will often see lines like device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') and inputs, labels = data[0].to(device), data[1].to(device).

Windows vs Unix (Linux, MacOS)#

One of the main disadvantage of Windows in the scope of this class is that the ML community has generally agreed to use Unix, or most often Linux (If you are interested in looking into Linux, try Ubuntu 22.04). This means that a lot of the command line code on Stackoverflow is tested on Linux, and you most probably want to profit from this knowledge base. There are a few things that would need special care working on Windows, and it would be your responsibility to take care of them.

IDEs#

When to use Colab?

For a quick-and-dirty testing of some functionality
For demonstration purposes
If you don’t have a CUDA GPU

When not to use Colab?

Session expire after some time, see Google Colab session timeout. This means that any files other than the notebook itself will be deleted forever.
The hardware you get for free is not the best one. If you have better hardware, don’t use Colab.
If you want to work on a long-term project, you would need to set up the Colab environment every time for scratch. In contrast, when working locally you could use a virtual environment which you set up once in the beginning.

Local IDEs (opposed to the web IDE Google Colab)#

Just pick any of the above mentioned: VS Code, PyCharm, Spyder, etc. They have more or less the same functionality and the choise is mainly based on a personal preference rather than software constraints.

Environments#

There are two very popular ways to manage Python environments:

Conda (Miniconda or Anaconda) - If you see a repository whose dependencies are described in a environment.yml file, then conda was used to create the environment

conda env create --file environment.yml # creates a venv from that .yml file
conda activate <ENV_NAME> # this name is typically in the top-most line of the .yml file

Pip - If you see a repository whose dependencies are described in a requirements.txt file, then pip was used to create the environment

python3 -m venv <ENV_NAME>
source activate <ENV_NAME>/bin/activate
pip install -r requirements.txt

There are many tutorials on the internet on how to set up such an environment. Check them for yourself.

Why a virtual environment?#

Your operating system has its own Python installation, which you probably don’t wont to burden with highly specific packages, which you will be using throughout this course.
If you work on many projects and each of them requires different libraries, or even worse, different versions of the same library, then you should use one virtual environment per project.

`.py` vs `.ipynb`#

Python files (.py) offer more flexibility, powerful debugging options, OOP, etc. They should be the preferred choice. Jupyter notebooks (.ipynb) are designed to be used as single projects on themselves or for demonstration purposes. In this course, all exercises are in separate notebooks, and also the practical part of the exam will be a notebook.

Using Notebooks#

If you work on a Linux or MacOS device, then you should be able to run notebooks in the same IDE you use for coding, e.g. VS Code.

However, on Windows this might not work for multiple reasons, and as we said above we do not provide support for running code on Windows.

Notebooks on Windows

I didn’t find a way to work with Jupyter notebooks in Spyder on Windows. There is one hint on the how to pages that t

The add-on for Spyder, to work with Jupyter notebookes, doesn’t work with standalone Windows versions. To make it work with the Jupyter notebook in the browser (see that) follow the below steps:

Open Anaconda Navigator.
Go to environment and open the sciml environment
Right click on the environment and open the terminal
ipython kernel install --user --name=sciml with user not changed
Open Jupiter in any browser and select as Kernel the sciml environment