AI tools

How you should choose tools

You don’t pick tools to look modern; you pick them to reduce risk and cycle time. Start with the workflow you want: data ingestion, retrieval, evaluation, deployment, and monitoring. Then choose the smallest set of tools that makes the workflow repeatable.

Prefer boring glue. You want tools that integrate cleanly with identity, CI/CD, and observability.
Optimize for evaluation. If you can’t measure quality, you can’t improve it.
Separate experiments from production. Keep a clean path from prototype to hardened service.

App and agent frameworks

LangChain / LangGraph. Useful when you need structured orchestration, tool-calling, and multi-step flows.
Semantic Kernel. Helpful if you prefer a plugin-style model for tools and prompts, especially in .NET shops.
OpenAI / Anthropic SDKs directly. Often the best default when your workflow is simple and you want fewer abstractions.

Retrieval (RAG) building blocks

Embedding models. You use these to turn text into vectors for similarity search; pick based on language/domain fit.
Vector stores. Store vectors + metadata; make sure filtering and tenancy boundaries match your data model.
Chunking and re-ranking. You tune chunk size and add re-ranking to improve relevance and reduce hallucination risk.
Document pipelines. You need ingestion, deduplication, permissions, and freshness controls.

Evaluation and safety tools

Offline eval. Golden datasets, regression suites, and rubric-based grading for prompts and RAG.
Online monitoring. Track latency, cost, refusal rates, tool errors, and user feedback signals.
Policy gates. You add prompt injection defenses, PII handling, and allowlists for tools/actions.

Production basics

Identity and access. Your agents should use least privilege and short-lived credentials.
Observability. You log prompts safely, capture tool-call traces, and build dashboards for failures.
Cost controls. Use budgets, caching, and rate limits so experimentation doesn’t become a surprise bill.
Change management. Version prompts, datasets, and evals the same way you version code.

References

Anthropic — model and safety research.
OpenAI — platform docs and model capabilities.
Semantic Kernel docs — plugins, planners, and integrations.
LangChain docs — chains, tools, and RAG patterns.