The ability to run small models on SBCs is an amazing opportunity to learn about Large Language Models (LLMs) at a low-cost.

We have seen previously that MLC LLM can now run Llama 3 8B model to be used with a python API, and while they have added multiple prominent models to their deployment solution some of the newer models do not have the library needed to be able to run on Mali devices such as OrangePi's RK3588S.

In this post we will setup and run MLC LLM to production the necessary library for any model lacking their library counterpart (*-opencl.so).

Create conda environment and enter it:

conda create -n tvm-build-venv -c conda-forge "llvmdev>=15" "cmake>=3.24" git
conda activate tvm-build-venv

Install dependencies:

sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev

Create a folder that will harbour both MLC LLM set up only to create libraries and MLC TVM:

mkdir build-mlc-llm cd build-mlc-llm

We clone MLC LLM's fork of TVM from GitHub

git clone --recursive https://github.com/mlc-ai/relax.git tvm\_unity && cd tvm\_unity/

The newest version of MLC LLM's version of TVM does not work on Orange Pi 5 and RK3588S and ends up displaying the following error:

LOG(FATAL) << "RuntimeError: Memory verification failed with the following errors:\n

We need to downgrade to a more suitable version.

rm -rf build && mkdir build && cd build
cp ../cmake/config.cmake .

Add the following option to our build:

echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
echo "set(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")" >> config.cmake
echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake
echo "set(USE_OPENCL ON)" >> config.cmake

Build TVM:

cmake .. && cmake --build . --parallel $(nproc)

This will take some time to build (dozens of minutes). Set the path for Python to use in our Conda environment:

export PYTHONPATH=~/build-mlc-llm/tvm_unity/python:$PYTHONPATH

Now on to clone MLC LLM:

git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm

Create folders for our compiled models:

mkdir dist && cd dist && mkdir prebuilt && cd prebuilt

Use Hugging Face's user_name/model_name syntax to build the desired model's library. Beware that you must choose one of the models already compiled by MLC at https://huggingface.co/mlc-ai as the Orange Pi 5 is not powerful enough to compile one. Don't forget that Hugging Face now asks that you ask for access to the model and to use your Hugging Face username login as well as set up an access token instead of your password. You will have to enter your credentials twice.

sudo sh -c 'echo 1 >  /proc/sys/vm/drop_caches
python3 -m mlc_llm.build --hf-path user_name/model_name --target opencl --quantization q4f16_1

Once done, copy the resulted model-library-opencl.so to your main MLC LLM installation:

cp ~/build-mlc-llm/mlc-llm/dist/models/model-library-opencl.so ~/mlc-llm/dist/prebuilt/lib/

You might need to specify the path to the your newly created library, in that case add

--model-lib-path dist/prebuilt/lib/model-library-opencl.so

For example, let's say we want to run Mistral 7B. First we build the library for Mistral:

cd ~/build-mlc-llm python3 -m mlc_llm.build --hf-path mistralai/Mistral-7B-Instruct-v0.1 --target opencl --quantization q4f16_1

Then we copy the newly created library to our main MLC LLM installation:

cp ~/build-mlc-llm/mlc-llm/dist/Mistral-7B-Instruct-v0.1-q4f16_1/Mistral-7B-Instruct-v0.1-q4f16_1-opencl.so ~/mlc-llm/dist/prebuilt/lib/

After which we go into our main MLC LLM and download the MLC-adapted model:

cd ~/mlc-llm/dist/prebuilt git lfs install git clone https://huggingface.co/mlc-ai/mlc-chat-OpenHermes-2.5-Mistral-7B-q4f16_1

Now we can run Mistral 7B. We move up to MLC LLM's root directory and we run Mistral:

Now that you got Mistral 7B working you can also use OpenHermes 2.5. Download the model:

cd ~/mlc-llm/dist/prebuilt
git lfs install
git clone https://huggingface.co/mlc-ai/mlc-chat-OpenHermes-2.5-Mistral-7B-q4f16_1

And run it using your Mistral 7B library:

cp ~/build-mlc-llm/mlc-llm/dist/models/model-library-opencl.so ~/mlc-llm/dist/prebuilt/lib/

You might need to specify the path to the your newly created library, in that case add

For example, let's say we want to run OpenHermes 2.5 Mistral 7B. First we build the library for Mistral:

cd ~/build-mlc-llm python3 -m mlc_llm.build --hf-path mistralai/Mistral-7B-Instruct-v0.1 --target opencl --quantization q4f16_1

Then we copy the newly created library to our main MLC LLM installation:

cp ~/build-mlc-llm/mlc-llm/dist/Mistral-7B-Instruct-v0.1-q4f16_1/Mistral-7B-Instruct-v0.1-q4f16_1-opencl.so ~/mlc-llm/dist/prebuilt/lib/

After which we go into our main MLC LLM and download the MLC-adapted model:

cd ~/mlc-llm/dist/prebuilt
git lfs install
git clone https://huggingface.co/mlc-ai/OpenHermes-2.5-Mistral-7B-q4f16_1-MLC
# Import module to chat with model
from mlc_llm import ChatModule
from mlc_llm.callback import StreamToStdout

# Select compiled model, its library for MLC LLM and the used platform
cm = ChatModule(
     model="dist/prebuilt/OpenHermes-2.5-Mistral-7B-q4f16_1-MLC",
     model_lib_path="dist/prebuilt/lib/Mistral-7B-Instruct-v0.1-q4f16_1/Mistral-7B-Instruct-v0.1-q4f16_1-opencl.so",
     device="opencl"
 )

# Prompt to text generation
cm.generate(prompt="How was the first computing language invented?", progress_callback=StreamToStdout(callback_interval=2))
# Prompt to text generation speed: prefill tok/s and decode tok/s
print(f"Statistics: {cm.stats()}\n")

Side note: For ease of use, create a bash script to enter the conda environment and export TVM python library:

As in our previous post to use MLC LLM chat inside Python and for ease of use, create a bash script to enter the conda environment and export TVM python library:

nano ~/build-mlc-llm/mlc-llm/start.sh

Add:

#!/bin/sh

conda activate tvm-build-venv
export PYTHONPATH=~/build-mlc-llm/tvm_unity/python:$PYTHONPATH

Run the script with:

source start.sh


Required for comment verification