Another way to run ChatGPT-like bot thanks to an app on the Playstore and a framework that's one of the better things that happened to humanity - no seriously, everyone should know about it
October 04, 2024 #llama.cpp #A.I. #Chatbots #huggingface #termux #klog.website #Large Language Models #ollama #ChatGPT #Open Source Large Language Models #K_log WebsiteA video of me doing it - quickly, read the blog for more explanations!
This is running on better phone than the one in the guide and it has a different operating system and a different shell for likes, if you want to know more about that let me know!
Setup
Okay so in this post we will use another system you can setup through Nix, BUT a few thing.
First you can download this app through the Google Play Store so most people won't be intimidated about getting it.
Second Nix is great for reproducibility - not performance, so if you can compile things yourself it's better.
So this time we will do just that also the app we will utilise is the same app Nix-on-Droid was built so I will always try to introduce people to as many frameworks and systems as possible
Okay so first get Termux from the Google Play Store or you can even get it from the F-droid store if you've been here before.
This is the official guide from the repository. I will add a one extra step just to make sure we are all running the same versions and the instructions in the guide are compatible with the instructions. And a few instructions to make termux better / nicer ( at least to me ).
&&
# curl is for getting models - termux might not have built an enviroment in the same place for everyone so getting
# git is for getting other files we need for the framework and
# cmake and make ccache are so we can play around and automated instructions for making / compiling .c/.cpp files
# jq for interacting with the server
&&
Okay we're still just doing the preparation now we only got the framework, that was the magic of nix...
Now we need to get the models. You can download them in your browser and then figure out where the location of your Downloads directory is and move it from there directly to you. Or go to the browser go to the link of the model and copy the link and use it with curl remember you can use any model on huggingface just remember to use GGUF file format. The latest models are usually released by the company in that format for older and different models I recommend checking out TheBloke.
Now that we yapped a bit
# I used the same model we did in the nix-on-droid example just for performance comparison.
Running the models
Okay we have everything now we can run the models remember we specified where to save the models and we will be specifying to llama.cpp where to find them now. If you tried being clever and did some tinkering with the commands keep tinkering now.
# Start a new tmux session named 'llama'
# Create a new window for the server
# Create a new window for the chat script
# This is just the basic general script and for the best performance you will have to check the parameters on huggingface, every model will be a bit different and then based on that edit the chat.sh file or create your own more complex system on top of it, even ollama is built on top of this( Not this specifically but llama.cpp server ). That's one of the reasons I'll rather recommend ollama instead of llama.cpp it's more user-friendly and it has it's own repository of models which makes it easier to load them. But you need nix for ollama on a Phone. It's actually the nix people that maintain / have a arm64 (smartphone architecture build) for ollama. Anyone can essentially use ollama files to build the framework with any build system that can interpret the files. That would be like compiling llama.cpp with your own make command - veryvery hard and system specific. The makefile for llama.cpp has instructions for multiple systems even arm64 that's why we don't need much.
# Attach to the tmux session
Conclusion
This way is not guranteed to build and requires more tinkering in the terminal setting up models and prompts to make them the most efficient. What we did though should be sufficient for some minor chatting and testing the models. Again go to the huggingface model cards and check their specific and recommended parameters and prompt formats. I wanted to show everyone llama.cpp and in case there there were people intimidated by downloading things outside of the Google Playstore so they have something as well! Check out the other blogs! babye