Skip to main content

Command Palette

Search for a command to run...

Top 4 ways to Run LLM locally on Android and iOS

Published
5 min read
Top 4 ways to Run LLM locally on Android and iOS

In my previous blog, I explored the technical rabbit hole of running Llama.cpp in Termux on an old Android phone. While that was a rewarding experiment, let's be honest: it wasn't exactly "plug-and-play" for most people.

But it's 2026, and the game has changed. You no longer need to be a Linux wizard to have a private, powerful AI in your pocket. In this guide, I'm covering the Top 4 ways you can run LLMs locally on your Android or iOS device—completely for free, with unlimited requests, and zero data leaving your phone.

https://github.com/google-ai-edge/gallery

Google's official showcase is the perfect starting point. It's the easiest way to see what your phone is actually capable of. It's optimized for mobile NPUs, meaning it's fast and efficient for tasks like tone refinement, audio transcription, and image understanding.

The latest version now supports Gemma 4 and Gemma 3, models specifically "distilled" to run on mobile hardware without eating up your RAM.

The "Skills" Update: One of the coolest additions is support for SKILLS. You can now extend what the AI can do by pulling from a vast library at skills.sh. My personal favorite is the Humanizer skill—it's great for making AI-generated text sound more, well, human.

Demo

https://youtube.com/shorts/1H6z2-K7gvw

Models Available

Bundling your own model

If you're feeling adventurous, you can even bundle your own models by following the LiteRT community guide. It walks you through converting models from Pytorch or other formats.


2. PocketPal AI

PocketPal AI is the one I keep coming back to. It's built on llama.cpp (the industry standard), but the interface is actually designed for a human using a phone, not a developer staring at a terminal.

The 'Pals' feature is the standout here. Instead of one messy chat thread, you can set up different "Pals" with their own system prompts. I have one for coding and another for creative writing. It also handles the latest architectures like Qwen 3.5 and LFM 2.5, which are incredibly fast on mobile.

Why it works for me:

  • Easy Model Setup: You can download GGUF models directly within the app. For a smooth experience, I recommend any GGUF model (try Unsloth's uploads) with Q4_K_M quantization.

  • 100% Private: Once the model is on your phone, you can go into Airplane Mode and the AI won't even notice.

  • Live Stats: It shows you tokens per second in real-time. It's satisfying to see your phone's hardware in action.


3. AnythingLLM: The Researcher's Workspace

If you need your AI to actually do something with your files, AnythingLLM is a different beast. It's less of a chatbot and more of a portable workspace.

The killer feature is On-Device RAG. You can feed it a PDF or text file sitting on your phone, and the AI will answer questions based only on that document. I've used this to summarize 50-page technical docs during flights with zero internet.

What sets it apart:

  • Chat with your docs: Local indexing means your data never touches a server.

  • Tools & Agents: It supports web scraping, calendar editing, and the Model Context Protocol (MCP), so it's slowly becoming a true mobile agent.

  • The "Infinite Power" Trick: If your phone is struggling, you can connect it to a massive 70B model running on your home PC via API and use your phone as a remote window into that power.


4. Termux + Ollama: The Power User Shortcut

This is for my Android users who miss the command line. We're using Termux to run Ollama directly on the device. It's the closest you'll get to a desktop experience on a mobile screen.

The Quick Setup:

  1. Install Termux (use the version from F-Droid, not the Play Store).

  2. Run pkg install ollama.

  3. Type ollama run qwen3.5:0.8b.

That's it. Ollama will auto-download the model and drop you into a CLI chat. You can run pretty much anything from the Ollama Library as long as your RAM can handle it. It's the fastest way to test new models the second they drop.


Summary: Which one should you pick?

Method Best For Ease of Use Customization
Google Edge Gallery Beginners / Multi-Modal ⭐⭐⭐⭐⭐ ⭐⭐
PocketPal AI Daily Chat / Personalities ⭐⭐⭐⭐ ⭐⭐⭐
AnythingLLM Working with Documents / RAG ⭐⭐⭐ ⭐⭐⭐⭐⭐
Termux + Ollama Developers / CLI Lovers ⭐⭐⭐⭐

The era of "Local-First" AI

The era of relying on expensive subscriptions and cloud-tracking for AI is ending. Whether you want a simple one-click app or a full terminal setup, you can now carry a "God-tier" brain in your pocket for $0.

Read about more on Local AI and Self-Hosting on nkaushik.in.

Happy (Local) Chatting!

Local AI

Part 1 of 3

In this series, I would explore way to run different AI models ranging from LLMs to Stable Diffusion models locally on all sorts of consumer hardware, like android phones, iPads and Consumer Laptops or Desktops

Up next

Running 24/7 Local AI on an Old Android without Overheating

Last week I wrote about repurposing an old Android phone to run local AI models. In this follow-up I address the biggest obstacle to running the device 24/7: overheating and how I transformed it into