Question 1

What's the difference between an AI agency and a generic IT consultancy?

Accepted Answer

A generic IT consultancy ships you a deck, a roadmap and a 6-month engagement that ends in 'recommendations'. An AI agency ships AI features into your product. Concrete output: a sidebar in your CRM that drafts replies, a slash command in Slack that summarizes a thread, a webhook that scores incoming RFPs, a chat panel embedded in your app. Measured by features in production and unit cost per call, not by hours billed. If the proposal mentions 'AI strategy' more than 'AI features shipped', it's consulting wearing AI cosplay.

Question 2

How much does an AI agency cost in 2026?

Accepted Answer

Depends on scope. A focused mission (one AI feature, one product surface, audit + design + build + deploy) runs $8,000 to $25,000 depending on integration complexity. A monthly retainer covering 3-8 features in production (extensions, evals, model migration, cost monitoring) starts around $4,000-$8,000/month. Watch out for agencies that quote in 'AI hours' or pitch a vague 6-month AI transformation — that's repackaged consulting. Our approach: free audit first, then a price per feature shipped, not per hour talked.

Question 3

Which model should we use — Claude, GPT-4o, Mistral or open-weights?

Accepted Answer

Depends on the task and the constraint. Claude Sonnet 4.x leads on long-context reasoning, clean tool use and refusing weird prompts cleanly. GPT-4o is faster on multimodal (vision, voice) and has the most mature function-calling tooling. Mistral Large is competitive on French language and EU data residency. Open-weights (Llama 3.x, DeepSeek, Qwen) work when you need data on-premises or your unit cost ceiling is sub-$0.01. We benchmark per use case and re-benchmark every 6 months when a new generation ships. The model is a choice, not a religion.

Question 4

RAG, fine-tuning or prompting — which one do we need?

Accepted Answer

Prompt engineering first — 70% of features ship with just a well-structured system prompt and good examples. RAG (retrieval-augmented generation) second — when the model needs to read your specific corpus before answering: docs, support tickets, CRM notes, internal wiki. Fine-tuning last — only when retrieval alone hits a quality or cost ceiling, typically for high-volume narrow tasks (classifier-style, fixed output schema). We start with the cheapest layer and only escalate if the eval says we need to. Most fine-tuning pitches we see are actually a RAG problem in disguise.

Question 5

How long does it take to ship a first AI feature in production?

Accepted Answer

Honest answer: 4 to 6 weeks for a first feature on a well-scoped use case. Week 1 audit + use-case scoring. Week 2-3 design (system prompt, RAG schema, eval set, cost ceiling). Week 4-5 build + integration into your product surface. Week 6 internal beta, eval pass, prod deploy with a kill switch. If an agency promises an AI feature in prod in 1 week, they're skipping evals — fine for a demo, dangerous in front of paying users.

Question 6

Will AI replace our team?

Accepted Answer

Augments. Every AI feature we ship has a fallback path back to a human operator — for the edge cases, the angry customers, the high-stakes decisions. What changes: the team stops doing the 80% of repetitive work the AI crushes and refocuses on the 20% that actually needs judgment. On the cohorts we've shipped: sales ops moves from CRM hygiene to building the playbook, support L1 moves from copy-paste replies to fixing the root cause that generated the ticket, content teams move from drafting to editing and ideation. Headcount stays, output multiplies.

Question 7

Is our data safe with LLM providers?

Accepted Answer

Depends on the provider and the contract. Anthropic and OpenAI both offer zero-data-retention modes on their enterprise APIs — your prompts and outputs are never used for training and aren't stored beyond the request. Azure OpenAI, AWS Bedrock and Google Vertex AI give you the same models running in your own cloud account, with EU or US data residency you control. For workloads where data legally can't leave your perimeter (finance, defense, healthcare), we deploy open-weights on-premise via vLLM or TGI. We pick the deployment pattern that fits your risk profile, not the cheapest one by default.

Question 8

What tools and CRMs do you wire AI features into?

Accepted Answer

Tool-agnostic. We've shipped AI features wired to HubSpot, Pipedrive, Salesforce, Attio, Folk, Airtable, Notion, Zendesk, Intercom, Slack, Gmail, Outlook, Stripe, Linear, GitHub, Webflow, Make, n8n, and custom internal systems via REST APIs or Postgres. The wiring lives behind an MCP server or a no-code workflow (Make / n8n) when the team will need to extend it without code. If you have a documented API and webhooks, we can wire AI to it.

Question 9

How do you measure ROI on an AI mission?

Accepted Answer

We track 6 main KPIs per shipped feature, reported monthly in a shared dashboard: usage (calls per day, daily active users), time saved per call (vs. status quo), unit cost per call, eval pass rate, refusal / fallback rate, and revenue or savings attributable to the feature. We refuse to track vanity metrics (model parameters, prompt token counts) unless they serve a direct business goal. If a feature isn't moving the needle after 8 weeks of iteration, we retire it instead of dragging it.

Question 10

How long do we commit for?

Accepted Answer

Three formats. (1) Audit only: flat fee, 2 weeks, deliverable is the ranked use-case list and the design doc for the first feature. (2) Build sprint: 4 to 8 weeks per feature shipped, fixed scope, fixed price. (3) Ongoing retainer: 6-month minimum for teams running 3+ AI features in production who want continuous eval, model migration and use-case extension. No forced annual contract, no convoluted exit clauses. If we don't ship, you stop.

The AI agencythat ships, retrieves, scores, drafts, monitorsAI features, not slide decks.

An AI feature that actually ships stands on 4 pillars.

Use-case + model selection

RAG, retrieval + fine-tuning

Inside the product, not next to it

Evals, cost + guardrails

What an AI feature in prod actually moves.

Our 4-step build, from use case to production.

We ship features into your product, not slides into your inbox.

We score your AI use cases, you leave with a plan.

How we run an AI engagement.

Audit where AI actually moves the needle

Pick the model, design the data pipeline

Build the feature with an eval suite from day one

Deploy the feature inside the product, not as a SaaS aside

Run the eval, watch the cost, iterate every month

The same stack, across multiple client features.

The 10 questions we get asked on every call.

Stop pitching the AI roadmap. Ship the first feature.

The AI agencythat shipsthat retrievesthat scoresthat draftsthat monitorsthat ships, retrieves, scores, drafts, monitorsAI features, not slide decks.