JM Robles - Tech Consultant & Entrepreneur

Running Small LLM Models on Raspberry PI 5

Pseudo Raspberry PI 5

“Small” LLM Models

PHI-2 is a LLM (Large Language Model) created by Microsoft with 2.7 trillion parameters. The PHI-2 model is part of Microsoft’s efforts to develop language models that can be used in a variety of applications, from text generation to natural language understanding. This model has been trained on an enormous amount of text to be able to understand and generate human-like content coherently.

Although it may seem like a large model, we are actually talking about a small model within the category of LLMs. In fact, its performance has proven to be very competitive compared to 7B and 13B models like LLama-2. Benchmarks show that PHI-2 has the potential to compete with Google’s Gemini Nano model.

Applications in IoT and Mobile Computing

The idea is that these models can run on IoT devices and mobile phones.

About Llama.cpp

Llama.cpp is an inference engine for large-scale language models developed by the free software community. It provides an efficient and easy-to-use interface for working with these models and has been instrumental in enabling the execution of PHI-2 on the Raspberry PI 5. This inference engine has been specifically designed to work with large language models and allows the execution of the same on hardware with limited resources, such as the Raspberry PI 5.

Tests on Raspberry PI 5

Recently, I had the opportunity to test the performance of PHI-2 on my new Raspberry PI 5. To my surprise, the results were much better than expected.

For the test, I used a reduced version of PHI-2 (with 5-bit weights) and the famous “llama.cpp” engine.

As you can see in the following GIF, the model is capable of processing almost one token per second.

PHI-2 on Raspberry PI 5

Potential for Edge Computing

Imagine the potential this has for edge computing. As a first practical case, I can think of analyzing log blocks to detect anomalous behavior.

Another interesting application would be the use of simplified TimeLLM models for time series analysis.

Conclusions

In summary, there is a lot of potential to discover and develop during 2024.