Bassam Ismail
Internal AI

Ask the app: a Q&A agent over a kid's school year

6 min read

Every feature so far answered a question I could predict. What is due, what happened this week, what does this note mean. I knew the question at build time, so I could fetch exactly the right data and cache the answer. Then I added a text box that lets a parent ask anything, and all of that went out the window. "How is she doing in maths this term?" "Did they ever send the field-trip form?" "What was that book the teacher mentioned in February?" I cannot pre-fetch the context for a question I have not seen.

This is the fifth part of a series on a school-assignment assistant for one child. The earlier features were retrieval I could plan. This one is a small agent: a model with a few tools, pointed at a SQLite database of one kid's school year, allowed to go look things up until it can answer. This is the part where I had to decide how much autonomy to give it, and the answer was "some, with a hard ceiling."

Why an agent and not a retrieval pipeline

The reflexive architecture for "answer questions over my documents" is retrieval-augmented generation with a vector store: embed everything, embed the question, pull the nearest chunks, stuff them in the prompt. I did not build that, and I want to be clear it was not laziness. It was scale.

The entire corpus is one child's school year. A few hundred classroom notes, some events, a couple hundred notices. That is small enough that I do not need approximate nearest-neighbor search over embeddings to find relevant material. A LIKE query over the text finds "multiplication" just fine. A date range pulls "February." The whole premise of vector retrieval, that the corpus is too large to scan and too fuzzy to query directly, simply does not hold at this size. Adding an embedding pipeline and a vector index would be solving a scale problem I do not have, and I would still have to maintain it. So the tools the agent gets are not semantic search. They are the plain database queries I would run by hand.

Start loaded, expand on demand

The design that worked is "give it a sensible default, let it ask for more." When a question comes in, I do not start the model empty. I pre-load the last two weeks of assignments, events, and notices into the first prompt, because most questions a parent asks are about the recent past and that context answers them in one shot, with no tool calls and no extra latency.

If the question reaches further than two weeks, the model has three tools to widen its own view:

const ASK_TOOLS = [
  { name: "expand_history",      // a wider date range when the question reaches back
    params: ["start_date", "end_date"] },
  { name: "search",              // full-text over all assignments and notices, any date
    params: ["query"] },
  { name: "get_subject_history", // everything for one subject, newest first
    params: ["subject_key"] },
];

Each tool is a thin wrapper over a query that already existed for other parts of the app. search runs a substring match across titles, descriptions, and notice content. get_subject_history pulls one subject's full record. The model reads the question, decides whether its loaded context is enough, and if not, picks a tool. "How is she doing in maths" triggers get_subject_history. "The book from February" triggers expand_history or search. The system prompt nudges it toward one focused tool call over guessing, and toward answering immediately when the initial window already covers the question.

The ceiling is the whole point

An agent that can call tools in a loop can also loop forever, or rack up calls chasing a question that the data simply cannot answer. For a personal app, a runaway loop is not a catastrophe, but it is wasted money and a spinning page, so the loop is bounded hard at four iterations. More importantly, the last iteration is special: I take the tools away and force a JSON response.

for (let i = 0; i < MAX_TOOL_ITERATIONS; i++) {
  const isLast = i === MAX_TOOL_ITERATIONS - 1;
  const response = await chat({
    messages,
    tools: ASK_TOOLS,
    tool_choice: isLast ? "none" : "auto",            // last turn: no more tools
    ...(isLast ? { response_format: { type: "json_object" } } : {}),
  });
  // ...if the model called tools, run them, append results, continue.
  // ...otherwise, parse and return the answer.
}

That single isLast branch is the difference between an agent that always terminates with an answer and one that can hang. Without it, the model could ask for a fifth tool call that never comes and leave the parent staring at a spinner. With it, the worst case is "it gathered what it could in three rounds and then had to answer," which is exactly the behavior I want under uncertainty. The answer it returns carries short source citations, like "Wed 15 Apr classwork," so the parent can see what it leaned on rather than taking a confident paragraph on faith.

A BOUNDED AGENT LOOPquestionload windowlast 2 weeksagentreads, calls toolsanswer+ source cites[ tools loop up to 4 rounds, then the last round forces an answer ]

Trusting tool arguments about as far as I trust dates

The same instinct from the extraction part applies here: the model is a good decider and a sloppy typist. It picks the right tool reliably; its arguments need defending. Tool argument JSON is parsed inside a try-catch that falls back to an empty object rather than throwing, an empty search query short-circuits to an error result instead of scanning everything, and a bad subject key returns nothing rather than a crash. The model drives, but every door it opens has been checked from the other side. None of these guards have fired often. All of them mean a malformed tool call degrades to a weaker answer instead of a failed request, which for a parent typing a question one-handed at bedtime is the difference between "hmm, not much there" and a broken page.

When the small version is the right version

This is maybe sixty lines of orchestration around three trivial queries, and it answers open-ended questions about a child's year better than a vector pipeline would, because the corpus is small and the queries are exact. The lesson I keep relearning across this whole project: match the machinery to the actual size of the problem. A two-week default window covers the common case for free. Three tools cover the long tail. A four-iteration ceiling with a forced final answer keeps it honest and terminating. That is the entire agent, and it is small on purpose.

One question remains, and it is the one that keeps an LLM side project from quietly becoming a budget problem: what does all of this cost, and how would I even know. The last part is about the plumbing that answers that, and the admin page that makes it visible.