Models and backends

What LLM does Dhamaka use?

Short answer: Dhamaka tries not to use an LLM unless it has to. Most demo fast paths are rules, regexes, fuzzy matching, or structural formula rewrites. When a model is needed, the runtime picks the best local backend available.

Fast path

No LLM

Address autofill, smart paste, many spellcheck cases, and common formula edits often resolve locally without model inference.

Fallback path

Local model

If rules are uncertain, Dhamaka can use a resident browser model, a Transformers.js model, the WASM runtime, or a test mock.

Runtime priority

Auto selection in the code today.

The runtime factory prefers browser-native local inference first, then cross-browser Transformers.js, then Dhamaka's Rust WASM runtime, then MockEngine for Node and tests.

1. window.ai

Chrome Prompt API resident model, often described as Gemini Nano class. Used when the browser exposes window.ai.languageModel.

fastest local path

2. Transformers.js

Lazy-loaded Hugging Face Transformers.js runtime for real cross-browser model inference in the tab.

primary fallback

3. Rust WASM

The compiled dhamaka-runtime.wasm path is tested and wired in, but it is a v2 target until real weights, quantization, and SIMD are production-ready.

v2 target

4. MockEngine

Deterministic local stand-in used by Node, tests, SSR, and development flows that should not download a model.

test only

Default model choices

Different tasks want different models.

Text generation

HuggingFaceTB/SmolLM2-135M-Instruct is the Transformers.js default for generic completion and chat-like output.

small instruct LLM

Text-to-text

Xenova/LaMini-Flan-T5-248M is the default for instruction-following transform work.

rewrite/explain

Fill-mask

Xenova/distilbert-base-uncased is the default masked language model for contextual spellcheck.

spellcheck

Embeddings

Xenova/all-MiniLM-L6-v2 is the default feature-extraction model for semantic search and fuzzy matching work.

search/RAG

Dhamaka Micro

The hub manifest names dhamaka-micro, based on HuggingFaceTB/SmolLM2-360M-Instruct, as the default packaged model target.

WASM/hub target

What can be used

Any model that fits the local contract.

Dhamaka's engine contract is intentionally small: completion, streaming, masked-token prediction, or embeddings. That means models can be swapped by task as long as they run locally in the browser runtime or are wrapped by a compatible adapter.

Good local candidates

Small instruct LLMs: SmolLM2, Qwen small models, Phi mini-class models, Gemma small models.
Masked LMs: BERT, DistilBERT, RoBERTa-style models.
Embeddings: MiniLM, E5, GTE-small style models.
Text-to-text: Flan-T5 and LaMini-style models.

What matters most

Runs in Transformers.js, window.ai, or a Dhamaka engine adapter.
Small enough for first-visit download and browser memory.
Quantized for WASM/WebGPU when possible.
Good enough on the specific task, not just good on chat.

Configuration

Force a Transformers.js model when you need to.

Most users let backend: "auto" choose. Advanced users can force a task and Hugging Face model id through the runtime factory.

runtime.js advanced

import { createEngine } from "@dhamaka/runtime";

const engine = createEngine({
  backend: "transformers",
  task: "text-generation",
  model: "HuggingFaceTB/SmolLM2-135M-Instruct",
});

await engine.load();
const answer = await engine.complete("Rewrite this sentence.");

Cloud models

OpenAI, Claude, Gemini API, and others are possible, but not the default.

Dhamaka's thesis is local-first: no server call, no per-token cost, and no user data leaving the tab. A product can still write a custom Engine adapter for a cloud LLM, but that changes the privacy and cost model and should be a conscious product choice.