Choosing between Online and Mixed Mode - ONA.UNO Docs

ONA.UNO offers two AI operational modes: Online and Mixed. Both modes let you search, summarize, and chat with your knowledge base — but they differ in where the AI processing happens, how fast it runs, and what it costs.

This guide helps you choose the right mode for your needs.

two white arrows pointing in opposite directions on asphalt

The Core Difference

Online mode sends your content to cloud AI services (via OpenRouter) for all processing stages — embeddings, summaries, titles, and tags. It uses state-of-the-art models with billions of parameters.

Mixed mode runs the high-volume processing stages locally on your Mac using downloaded models (~6 GB total). Chat and full summaries still require cloud services (and an API key) — they won’t work without one.

Set scope (important)

AI mode is stored per Set. Your OpenRouter API key is global and shared across all Sets. See Sets (Libraries).

Both modes produce a fully functional knowledge base. The difference is in quality, speed, cost, and where your data is processed.

Mixed Mode: Local Processing

Mixed mode runs embeddings, micro summaries, titles, and tags entirely on your Mac. Here’s what that means in practice:

Note

Mixed mode requires local model support. If your Mac doesn’t support local AI, ONA.UNO will use Online mode.

Advantages

Lower ongoing costs — The pipeline stages that run most frequently (embeddings, summaries, titles, tags) don’t use cloud services, so you pay nothing for them.
Offline capability — Pipeline processing works without an internet connection. You can index new content while traveling or disconnected. (Chat and full summaries still require internet.)
Data stays local — For the pipeline stages, your content never leaves your Mac.
Responsive UI protection — During heavy interaction (for example active scrolling), ONA.UNO immediately pauses local helper work and then resumes automatically when interaction settles.

Trade-offs

Processing is slow and resource-intensive — On Macs with 16 GB RAM or earlier Apple Silicon chips (M1, M2), local AI processing can strain your system significantly. Each item is processed sequentially — not in parallel — to avoid overwhelming memory. Expect your Mac to run hot, fans to spin up, and the machine to feel sluggish during processing.
Large libraries take a long time — If you have tens of thousands of items, initial processing can take many days of continuous full-load operation. This isn’t an exaggeration. Your Mac will be working hard the entire time.
Battery drain — On MacBooks, local AI processing consumes battery rapidly. Plan to stay plugged in during any significant processing.
Quality ceiling — The local models (335 million parameters for embeddings, 7 billion for the LLM) are good for their size, but can’t match cloud models with 8+ billion parameters.

Think carefully before choosing Mixed mode for large libraries

Running local AI models is computationally demanding. This isn’t a limitation of ONA.UNO — it’s the reality of running billion-parameter models on consumer hardware. For smaller sources (a few hundred to a few thousand items), Mixed mode works well if you’re willing to let your Mac process for hours. For larger libraries (10,000+ items), be prepared for days of processing with your machine running at full load. If that sounds daunting, Online mode is the better choice.

Best for

Users with smaller libraries, powerful Macs (32+ GB RAM, M3/M4 chips), or a strong preference for keeping pipeline processing local — and who are willing to let their Mac run unattended for extended periods.

Online Mode: Cloud Processing

Online mode uses cloud AI services for all processing stages. Here’s what that means:

Advantages

Higher quality — Cloud models are significantly larger and better trained. Qwen3 Embedding 8B (8 billion parameters, 4096 dimensions) produces more accurate semantic search than the local model (335 million parameters, 1024 dimensions). Gemini 2.5 Flash Lite generates more polished summaries, more descriptive titles, and more accurate tags than the local Qwen 7B.
Extremely fast — Cloud processing runs in parallel. A library of 10,000 items might take 30 minutes online vs. days locally. The difference is dramatic.
No local resource drain — Your Mac’s CPU, GPU, and battery aren’t taxed by AI processing. The work happens on remote servers.

Cost Structure

Online mode does cost more — but probably less than you think.

Initial embedding is where most cost occurs. When you first add sources to ONA.UNO, every chunk of content needs an embedding vector. Embeddings are mathematical representations that enable semantic search — they’re computed once per item and stored locally. For a large library, this initial pass is the bulk of your cost, but it’s a one-time expense.

After initial indexing, costs drop dramatically. Processing 100 new items costs $0.04–0.06. Chat conversations cost $0.01–0.15 depending on length. Full summaries cost a fraction of a cent each.

No subscription. OpenRouter is pay-as-you-go — you add credit (starting at $5) and use it until it runs out. There’s no monthly fee, no recurring charges. If you don’t use ONA.UNO for a month, you pay nothing. The ONA.UNO developers receive nothing from your OpenRouter spending — it goes directly to the AI providers (Google, Alibaba, xAI).

Online mode is more affordable than you might expect

Some ballpark figures to give you a sense of real-world costs:

Initial processing of 10,000 items: $3–4
Ongoing daily use (adding 30–50 new items per day, regular chat and summaries): $2–5 per month

These are estimates — actual costs depend on content length, how much you chat, and which models you use. But the point stands: Online mode isn’t expensive for typical use.

Best for

Most users. Online mode offers the best experience — fast processing, superior quality, and costs that are lower than they appear at first glance.

Our Recommendation

We strongly recommend Online mode for the best experience. The quality difference is substantial, processing is dramatically faster, and the costs are genuinely modest.

What About Mixed Mode?

Mixed mode lets you use ONA.UNO’s pipeline (embeddings, micro summaries, titles, tags) without registering with OpenRouter or adding credit. If you want to get a first impression of how ONA.UNO organizes your knowledge before committing to an API account, Mixed mode makes that possible.

However, be aware of the limitations:

Chat, full summaries, and day/daypart summaries require an OpenRouter API key — they simply don’t work without one. This isn’t a deliberate restriction; local models are not capable of providing this kind of work in acceptable quality. We tried. The results weren’t good enough to ship.
Pipeline output quality is lower. Titles, tags, and micro summaries from local models are functional but noticeably less polished than what cloud models produce. If you only use Mixed mode, you may underestimate what ONA.UNO can actually do.
Processing is slow. On typical hardware (16 GB RAM, M1/M2), even a few hundred items can take hours. Importing thousands of items before committing to Online mode is not advisable.
You may see helper lifecycle transitions. In Mixed mode, helper restarts (for example after pause/scroll clamps) are normal behavior; the status bar Local AI indicator shows whether the helper is starting, ready, recovering, or in an issue state.

If you do want to try Mixed mode first: start with a small source (tens to hundreds of items), let it process, and explore the timeline and search. Just keep in mind that chat and full summaries won’t be available, and the quality you see isn’t representative of Online mode.

Mixed Mode + API Key

You can also use Mixed mode with an OpenRouter API key. This hybrid approach keeps pipeline processing local (for privacy or to reduce cloud costs) while enabling chat and full summaries via the cloud. Some users prefer this balance.

Switching Modes

You can switch between Online and Mixed mode anytime in Settings → AI.

Important details:

Switching AI mode is per Set (different Sets can use different modes).
Switching does not restart the app.
Switching triggers a re-embedding for the current Set (because the embedding models produce incompatible vectors). For large libraries, this can take time.
Mixed → Online also clears AI micro summaries/titles/AI tags for that Set so Online mode regenerates them.

About OpenRouter

ONA.UNO uses OpenRouter to access cloud AI models. OpenRouter is a unified API that routes requests to various AI providers (Google, Alibaba, xAI, and others) at near-cost pricing.

Getting Started with OpenRouter

Create an account at openrouter.ai
Add credits — you can start with as little as $5 USD
Generate an API key and paste it into ONA.UNO (Settings → AI → Remote Models)

That’s it. No subscription, no recurring charges.

Why OpenRouter Works Well for ONA.UNO

Credits never expire. Add $5 or $20 and use it over months — there’s no “use it or lose it” pressure.
True API pricing. OpenRouter charges close to the underlying provider costs with minimal markup. You’re paying for actual AI usage, not a subscription that assumes heavy use.
Set a spending limit. When you create your API key, you can set a hard cap on how much that key can spend. This gives you full control — no surprise bills, no runaway costs. If you set a $10 limit, spending stops at $10.
Use it elsewhere. Your OpenRouter credits work with any service that supports OpenRouter, not just ONA.UNO. It’s a general-purpose AI API account.

The ONA.UNO developers receive nothing from your OpenRouter spending. Your money goes directly to the AI providers for the compute they perform.

A Note on Privacy

We want to be straightforward about data handling.

In Mixed mode, the pipeline stages (embeddings, micro summaries, titles, tags) run locally — your content stays on your Mac for these operations. However, chat and full summaries require cloud services in both modes. When you ask ONA.UNO a question or generate a full summary, relevant content is sent to the AI provider. (Without an API key, these features simply don’t work.)

In Online mode, all processing uses cloud services. Your content flows through OpenRouter to providers like Google, Alibaba, or xAI depending on the model.

What this means: Don’t add content to ONA.UNO that you would never want processed by cloud AI services. Even in Mixed mode, using chat or full summaries will send content to the cloud. If you have truly sensitive material that must never leave your device, ONA.UNO isn’t the right tool for that content.

We don’t operate servers that see your data — content flows directly from your Mac to the AI providers over encrypted connections. But the AI providers do process your content to generate responses. Review their privacy policies if this matters for your use case.

For more details on data handling, analytics, and third parties, see Privacy FAQ.

Comparison

Aspect	Online Mode	Mixed Mode
Quality	Best available	Functional
Speed	Very fast (parallel)	Slow (sequential)
Initial cost	$3–4 per 10,000 items	Free (pipeline only)
Ongoing cost	Fraction of cents per item	Free (pipeline only)
Can use without an API key	No	Yes (timeline + search; chat/full summaries off)
Chat & full summaries	Yes (requires API key)	Yes (requires API key)
Hardware impact	Minimal	High (CPU, RAM, battery)
Offline capable	No	Pipeline only
Best for	Most users	Users wanting zero pipeline cloud cost

Getting Started

Choose your mode during initial setup, or switch anytime in Settings → AI.

Setup walkthrough: Installation and Setup
Technical overview: Leveraging AI