The ability to run small models on SBCs is an amazing opportunity to learn about Large Language Models (LLMs) at a low-cost.

The ability to run small models on SBCs is an amazing opportunity to learn about Large Language Models (LLMs) at a low-cost.

Machine Learning Compilation (mlc.ai) has done a tremendous job to make running LLM models on ARM easy, and with the release of Llama 3, one can have the opportunity to run the model on such development board.

In this post, we'll be installing and using MLC LLM to chat with Llama 3 using a Python API. Prerequisite: OpenCL Mali drivers must have been installed on the machine (instructions to be found here). Don't forget to reboot before proceeding with this tutorial.

sudo apt-get update
sudo apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 cargo

Install conda Anaconda data science distribution. Download Anaconda's installer:

curl -O https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-aarch64.sh

Execute installer:

bash ~/Downloads/Anaconda3-2020.05-Linux-x86_64.sh

Edit bashrc to initialise Anaconda on startup/SSH session:

nano ~/.bashrc

And add the following:

# >>\> conda initialize >>>

# !! Contents within this block are managed by 'conda init' !!

\_\_conda\_setup="$('/home/pathto/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if \[ $? -eq 0 ]; then eval "$\_\_conda\_setup"
else
if \[ -f "/home/pathto/anaconda3/etc/profile.d/conda.sh" \]; then
. "/home/pathto/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/pathto/anaconda3/bin:$PATH"
fi
fi
unset \_\_conda\_setup

# <<< conda initialize <<<

Change whether to automatically start conda or not:

conda config --set auto_activate_base True

Or:

conda config --set auto_activate_base False

Create and activate our conda environment:

conda create -n mlc-chat-venv -c conda-forge "llvmdev>=15" "cmake>=3.24" git
conda activate mlc-chat-venv

Download and build MLC LLM:

git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm/

Download Llama 3 8B library and models. To access Llama models, you need to ask acces to the models. Credentials for acces to Llama 3 model

git lfs install
mkdir -p dist/prebuilt && cd dist/prebuilt
git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git lib
git clone https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
mkdir -p build && cd build
python3 ../cmake/gen_cmake_config.py

Tap Enter for the first question, then n except for the line Use OpenCL? (y/n) where we answer y. Build from source:

<code="bash">cmake .. && cmake --build . --parallel $(nproc) && cd ..

Install necessary Python packages found inside the python folder:

pip install numpy psutil
cd python
pip install -e . && cd ../

Still inside the mlc llm directory, install MLC's version of TVM:

git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
mkdir -p build && cd build
cp ../cmake/config.cmake .
echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
echo "set(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")" >> config.cmake
echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake
echo "set(USE_OPENCL ON)" >> config.cmake
cmake .. && cmake --build . --target runtime --parallel $(nproc) && cd ../..

And while inside mlc llm:

export TVM_HOME=$(pwd)/tvm_unity
export MLC_LLM_HOME=$(pwd)/mlc-llm
export PYTHONPATH=$TVM_HOME/python:$MLC_LLM_HOME/python:${PYTHONPATH}

Run Python and enter your first input with Llama 3:

python3
# Import module to chat with model
from mlc_llm import ChatModule
from mlc_llm.callback import StreamToStdout

# Select compiled model, its library for MLC LLM and the used platform
cm = ChatModule(
     model="dist/prebuilt/Llama-3-8B-Instruct-q4f16_1-MLC",
     model_lib_path="dist/prebuilt/lib/Llama-3-8b-Instruct/Llama-3-8B-Instruct-q4f16_1-mali.so",
     device="opencl"
 )

# Prompt to text generation
cm.generate(prompt="How was the first computing language invented?", progress_callback=StreamToStdout(callback_interval=2))
# Prompt to text generation speed: prefill tok/s and decode tok/s
print(f"Statistics: {cm.stats()}\n")

Create a script to launch the environment in conda, give Python the right paths to your installation and launch Python: nano start.sh

#!/bin/sh

conda activate mlc-chat-venv
export TVM_HOME=$(pwd)/tvm_unity
export MLC_LLM_HOME=$(pwd)/mlc-llm
export PYTHONPATH=$TVM_HOME/python:$MLC_LLM_HOME/python:${PYTHONPATH}
python3

llama-orangepi5

orangepi-llama-huggin-face-model


Required for comment verification