Introduction
Leveraging AI - ONA.UNO Docs
A Precision-Engineered Pipeline for Your Personal Knowledge
ONA.UNO does not just “use AI.” It orchestrates a sequence of specialized stages and model choices so raw items become structured knowledge you can search, summarize, and chat with.
This page explains the pipeline architecture and why each stage exists.
Two AI Modes: Online and Mixed
ONA.UNO supports two operating modes so you can choose for quality, cost, and hardware constraints.
Online Mode (Recommended)
In Online mode, the full pipeline runs through cloud models via OpenRouter.
- Uses top-tier cloud models for embeddings and pipeline generation.
- Supports parallel processing for large imports.
- Delivers the highest overall quality.
- Requires internet access and a valid OpenRouter API key.
Mixed Mode
In Mixed mode, high-volume pipeline stages run locally on Apple Silicon, while chat and deep summaries stay cloud-based.
- Local embeddings + micro summaries + titles + tags.
- Sequential processing tuned for local GPU/memory constraints.
- Lower routine cloud cost.
- Core pipeline stages can continue offline.
Seamless Switching
Switching modes switches the full pipeline profile, not just one model.
- Embedding dimensions differ (cloud/local), so vector spaces are not interchangeable.
- Each mode maintains an independent optimized workspace.
- Starred context carries over; pipeline/index state remains coherent per mode.
For practical mode selection guidance, see Choosing between Online and Mixed Mode.
The AI Pipeline: From Raw Content to Refined Knowledge
When you add sources (folders, clips, mail, notes), each item flows through a staged pipeline.
Stage 1: Content Extraction
ONA.UNO first normalizes content into clean text:
- Markdown/text: direct parse.
- PDFs: native extraction, with OCR fallback for scanned pages.
- Images/screenshots: OCR via Apple Vision.
- YouTube clips: transcript retrieval so spoken content becomes searchable.
- Web pages: readability extraction to remove noise and keep article content.
Stage 2: Chunking
Long content is split into overlapping chunks so semantic retrieval remains accurate across boundaries.
- Typical chunk size is tuned for retrieval efficiency.
- Overlap preserves context between adjacent chunks.
Stage 3: Embedding
Each chunk is converted into a semantic vector.
| Mode | Model | Details |
|---|---|---|
| Online | Qwen3 Embedding 8B | Cloud, high-dimensional semantic retrieval |
| Mixed | mxbai-embed-large-v1 | Local Apple Silicon embedding model |
This is what allows meaning-based search (“delivery schedule” matching “project deadlines”).
Stage 4: Concise Summary
Each item receives a short summary for fast scanning.
| Mode | Model | Details |
|---|---|---|
| Online | Gemini 2.5 Flash Lite | Cloud, high throughput |
| Mixed | Qwen 2.5 7B Instruct | Local via llama.cpp |
Stage 5: Intelligent Title
ONA.UNO generates better descriptive titles from content, so your timeline remains readable at scale.
Stage 6: Automatic Tags
ONA.UNO generates conceptual tags (not simple keyword extraction) to improve browse and retrieval flows.
Why These Model Choices
Embeddings: Retrieval Quality First
Embedding quality determines search and chat grounding quality. ONA.UNO prioritizes:
- semantic depth over lexical matching,
- robust retrieval at large library sizes,
- consistency within each mode’s vector space.
Pipeline Generation: Speed + Consistency
For summaries/titles/tags, ONA.UNO optimizes for:
- throughput during ingest,
- stable output quality,
- predictable cost behavior across long-running pipelines.
Chat: Conversations with Your Knowledge
Search gets you to relevant items. Chat synthesizes across them.
Retrieval Flow
- Your question is embedded into the same semantic space as your indexed chunks.
- Retrieval selects the most relevant chunks by semantic proximity.
- Multi-turn context keeps seed + incremental evidence aligned.
- Citations tie generated claims back to source items.
Citation Guarantees
Answers include clickable references so you can jump directly to supporting source content.
Summaries at Two Levels
Concise Summaries (Per Item)
- Precomputed during pipeline processing.
- Optimized for quick scanning in timeline workflows.
Full Summaries (On Demand)
- Generated for days, ranges, tags, searches, or selected items.
- Uses chat-grade models for deeper synthesis.
- Includes source-linked citations.
Privacy and Data Flow
ONA.UNO is local-first by design.
- Source content stays on your Mac.
- In Mixed mode, core pipeline stages can run locally.
- Cloud-dependent operations (for example chat/deep summaries) send only required content over encrypted connections to model providers via OpenRouter.
Cost Control
Cloud usage is BYOK through OpenRouter:
- you control spend limits,
- you see spend in OpenRouter + in-app status reporting,
- no ONA.UNO markup on model usage.
Detailed cost examples: LLM Cost (OpenRouter).