How it feels to learn AI in 2026

No new AI frameworks were created during the writing of this article. Three existing ones were deprecated, though.

The following is inspired by "How it feels to learn JavaScript in 2016" by Jose Aguinaga. This piece is just an opinion, and like any AI agent framework, it shouldn't be taken too seriously.

···

Hey, I got this new project, but to be honest I haven't done much with AI beyond using ChatGPT, and I've heard the landscape changed a bit. You're the most up-to-date AI engineer around here, right?

-The actual term is AI Solutions Architect, but yeah, I'm the right guy. I do AI in 2026. Agentic workflows, multi-modal pipelines, autonomous code generation, you name it. I just came back from NeurIPS and the Anthropic developer summit, so I know the latest technologies to build AI applications.

Cool. I need to build a chatbot that answers questions about our company's internal docs. So I figure I can just send the docs to the OpenAI API with a prompt and have it answer questions?

-Oh my god no, no one just stuffs documents into a prompt anymore. You need to build a RAG pipeline. It's 2026.

RAG?

-Retrieval-Augmented Generation. Basically you chunk your documents, embed them into vectors, store them in a vector database, then at query time you retrieve the relevant chunks and pass them to the LLM as context.

That sounds like a lot of steps just to ask questions about some PDFs. What's a vector database?

-It's a database optimized for similarity search over high-dimensional embeddings. You've got Pinecone, Weaviate, Qdrant, Chroma, Milvus, pgvector, Turbopuffer—

OK, OK. Which one should I use?

-Well, it depends. Are you running locally or in the cloud? Do you need hybrid search? Multi-tenancy? What's your recall target? Are you planning to do late interaction with ColBERT or just cosine similarity over dense embeddings?

I just have like 50 internal docs.

-Oh. You could probably just use a JSON file honestly.

Then why did you—never mind. OK so I chunk the docs. How?

-Well there's your first real decision. Recursive character splitting, semantic chunking, sliding window, document-structure-aware parsing, agentic chunking—

I'll just split them into paragraphs.

-I mean, you COULD, but you'll get terrible retrieval quality. You really should be using a late-chunking strategy with contextual embeddings that preserve document-level semantics. Or you could skip chunking entirely and use a long-context model.

Long-context model?

-Yeah, some models now take 1 million, 2 million tokens. Gemini does like 2 million. You could just dump everything in.

So I don't need RAG at all? I can just stuff it all in the prompt?

-Oh god, no. Just because you CAN fit it all in doesn't mean the model will actually USE it all. There's the lost-in-the-middle problem. And the cost. Do you know what 2 million input tokens costs?

So I DO need RAG.

-Or you could fine-tune.

Fine-tune?

-Yeah, take a base model and train it specifically on your docs. That way the knowledge is in the weights, not the context window.

That sounds serious. How do I do that?

-Well, you probably don't want full fine-tuning, that's expensive. You'd use LoRA. Or QLoRA if you want to do it on a single GPU. You'd need to format your docs as instruction-completion pairs, set up your training pipeline with Axolotl or Unsloth or TRL—

Wait, I just want a chatbot for internal docs. Do I really need to train a model?

-I mean, no. RAG is probably fine. Let's go back to RAG.

Thank god.

-So you'll need an embedding model. There's OpenAI's text-embedding-3-large, Cohere's embed-v4, Voyage AI, or you can run an open-source one like BGE or Nomic. Or GTE-Qwen2. Oh, and you should probably be using Matryoshka embeddings so you can truncate dimensions at query time for efficiency.

I don't know what any of those words mean. Can I just use whatever's default?

-I mean, I guess, but your retrieval quality—

I don't care. What's next?

-OK so now you need an orchestration framework. You've got LangChain, LlamaIndex, Haystack, DSPy, Semantic Kernel, Spring AI if you're in Java, or you could go lighter with Instructor or Mirascope or—

I'll use LangChain, I've heard of that one.

-LangChain? Is this 2024? I mean, you COULD, but most people have moved on. The API surface was too large and it abstracted things in weird ways. Plus are you using LangChain or LangGraph? Because LangChain is mostly for simple chains now and LangGraph is for agentic workflows.

I don't need agents. I just want to answer questions about docs.

-You say that now, but what happens when a user asks a question that requires information from two different docs? Or when they ask a follow-up question that requires conversational memory? Or when they want to take an action based on the answer? You need agents.

I really don't think I need agents.

-Everyone needs agents. It's 2026. Single-shot prompt-response is dead. You need a reasoning loop with tool use. An agent that can decide when to retrieve, when to search, when to ask clarifying questions, when to hand off to a sub-agent—

Sub-agent?

-Yeah, multi-agent systems. You have a planner agent that breaks down the query, a retrieval agent that searches your docs, a critic agent that evaluates the answer, and a synthesis agent that combines everything into a response.

I need FOUR agents to answer a question about our vacation policy?

-Well, you could also look at agent frameworks like CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, or LangGraph. Or Agno. Or Pydantic AI. Or—

How many agent frameworks are there?

-It's AI. There has to be hundreds of frameworks that all do the same thing. We know frameworks. In fact, we have the best frameworks. Our frameworks are huuuge.

OK. Let's say I build this thing. Which model do I actually use?

-Which model? Well, what's your latency budget? Cost per query? Do you need structured output? Tool calling? Do you care about the license? Are you operating in the EU?

I just want it to be good at answering questions.

-Define "good."

It gives the right answer?

-You need evals.

Evals?

-Evaluations. You can't just vibe check your AI application. You need systematic, automated evaluation. There's RAGAS for RAG-specific metrics, or you can use an LLM-as-a-judge framework. Are you doing pairwise evaluation or pointwise scoring? What about faithfulness, answer relevancy, context precision, context recall—

I'll ask Susan in accounting if it gets her questions right.

-That's... I mean, you also need guardrails.

Guardrails?

-Yeah, you need to make sure the model doesn't hallucinate, doesn't leak sensitive information from your docs, doesn't go off-topic, doesn't output harmful content. You can use NeMo Guardrails, Guardrails AI, Lakera, or roll your own with a constitutional AI-style approach using a secondary model—

A secondary model to watch the first model?

-Obviously. Who watches the watchmen? Another LLM.

And who watches THAT model?

-Look, you're overthinking it. Let's talk about deployment.

Please, yes. How do I deploy this.

-Well, are you running a proprietary model or open-source?

What's the difference at this point?

-If you're using Claude or GPT, you're just hitting an API so deployment is pretty straightforward. But if you want to run open-source—say Llama 4, Mistral Large, DeepSeek-V4, or Qwen 3—you need an inference server.

Inference server?

-Yeah, like vLLM, TensorRT-LLM, SGLang, Ollama, or llama.cpp if you're feeling vintage. You'll need to think about quantization too—GPTQ, AWQ, GGUF, or maybe you want FP8 if you have the right hardware—

I don't have hardware. Can't I just call an API?

-I mean, sure, but then you're locked into a provider. What about vendor lock-in? What about when OpenAI changes their pricing again? What about data sovereignty?

I'm building a chatbot for 50 internal docs for a company of 200 people.

-Right, right. OK. So just use an API. But you should use a gateway.

A gateway?

-An AI gateway. It sits between your app and the model providers and handles rate limiting, fallbacks, caching, load balancing across providers. There's LiteLLM, Portkey, Martian, Helicone—or you could use your cloud provider's gateway, AWS Bedrock or Azure AI or Google Vertex—

Can't I just... call the API directly?

-I mean, sure, if you like living dangerously.

OK fine. I call the API. I get the response. I show it to the user. We're done?

-What about observability?

What about it?

-You need to trace your LLM calls. Latency per step, token usage, cost tracking, prompt versioning, retrieval quality metrics. There's LangSmith, Langfuse, Arize Phoenix, Weights & Biases Weave, Braintrust, HoneyHive—

I feel like we're really far from "chatbot that answers questions about docs."

-And you still haven't set up your MCP server.

MCP?

-Model Context Protocol. It's a standard for connecting AI models to external tools and data sources. Anthropic created it but now everyone supports it. Your chatbot should expose your doc search as an MCP tool so any AI client can connect to it.

Why would I do that? I'm building a chatbot, not a tool server.

-It's 2026. Everything is a tool now. Every API, every database, every internal service—they all need MCP interfaces. And then there's A2A.

A2A?

-Agent-to-Agent protocol. Google released it. It's for agents that need to talk to other agents. Your doc chatbot might need to talk to the calendar agent, which talks to the email agent, which talks to the Slack agent—

My chatbot doesn't need to talk to other agents. It needs to answer questions. About documents. That we wrote. In English.

-Sure, sure. But what happens in 6 months when your PM says "can the chatbot also schedule meetings based on the policy docs?" Then you'll wish you had A2A set up.

I'll deal with that in 6 months.

-Famous last words. Oh, and have you thought about which prompt technique you're using?

I was going to write a system prompt that says "you are a helpful assistant that answers questions about our company docs."

-...Are you serious? You need chain-of-thought prompting at minimum. Or maybe tree-of-thought. Or ReAct. Or Reflexion. Or you could use DSPy to automatically optimize your prompts. Have you looked into meta-prompting? Or prompt chaining with—

I'm going to write a sentence that tells the model what to do.

-You know what, maybe you should just use a no-code platform. There's Flowise, Langflow, Dify, Stack AI, Relevance AI—you just drag and drop nodes to build your pipeline.

No-code? That actually sounds nice.

-I mean, it's fine for prototyping, but eventually you'll hit limitations. You won't be able to customize your chunking strategy, your retrieval is a black box, you can't do custom evals, and forget about running your own models—

You just told me I didn't need to run my own models!

-You don't. Right now. But what about when you want to distill a bigger model's outputs into a smaller, faster model for production? Or when you need to do continued pre-training on your domain data? Or—

OK. I'm going to stop you there. All I need is a chatbot. That answers questions. About 50 documents. Written by humans. In a company of 200 people. I have been hearing about vector databases, embedding models, chunking strategies, fine-tuning, LoRA, QLoRA, agents, multi-agent systems, sub-agents, orchestration frameworks, inference servers, quantization formats, AI gateways, observability platforms, MCP servers, agent-to-agent protocols, prompt optimization frameworks, guardrail systems, evaluation pipelines, and no-code builders, and I STILL have not answered a single question about our vacation policy.

-...Have you considered just putting all the docs in a Claude Project and telling people to ask it questions there?

THAT'S WHAT I SAID IN THE BEGINNING.

-Yeah, but it doesn't scale.

You know what. I think we're done here. Actually, I think I'm done. I'm done with AI. I'm done with agents. I'm done with RAG and vectors and embeddings and evals.

-That's fine, in a few years we'll all just be telling the AI to build the AI app for us.

I'm just going to put the docs in a shared folder and let people use Ctrl+F.

-I hear you. You should try the robotics community then.

Why?

-Ever heard of the Sim-to-Real gap?

No new AI frameworks were deprecated during the editing of this article. Five new ones were announced, though. Two of them are just wrappers around the other three.