Teaching an LLM to ignore the recap and surface the homework
Here is a real classroom note, lightly paraphrased: "Today learners revised two-digit addition with regrouping and completed pages 14 to 16. Most were able to carry over correctly. We will continue practice. Spelling assessment on Unit 5 words this Friday." Four sentences. Three of them are a recap of a day that has already happened. One of them is the only thing a parent needs to act on. A useful school assignment assistant has to read all four and surface exactly one.
This is the third part of a series on building a school-assignment assistant for one child. Earlier parts covered why it exists and how the raw notes get scraped out of the school portal. This part is about the feature that makes the whole thing worth having, and it is almost entirely an exercise in teaching a language model to throw work away.
The value of a school assignment assistant is in the refusal
It is tempting to describe this as "extract events with an LLM," as if the interesting part is the extraction. It is not. Pulling structured items out of text is the easy, well-trodden thing models do. The hard part, the part that decides whether a parent trusts the app or quietly stops opening it, is everything the model declines to surface.
Teachers write these notes after the school day, in the past tense, as a record of what happened. That is correct for them and a trap for me. If I extract generously, the parent gets a list dominated by "revised addition," "completed worksheet," "discussed the water cycle." None of it is wrong. All of it is noise, because none of it is a thing the child has to do. A few days of that and the list is just a feed, and a feed is exactly what I was trying to escape in Part 1. The product is not "show me what happened." It is "show me the handful of things I have to act on." Optimizing for that means optimizing for precision and being willing to pay for it in recall.
The prompt is mostly a list of what to ignore
So the extraction prompt spends most of its length on exclusions, not instructions. It tells the model the crucial context up front, that these notes are after-the-fact recaps and most of the text is not actionable, and then it draws the line hard:
EXTRACT only genuinely future action items: a test scheduled for a future date, homework to complete at home, an item to bring, a project with a deadline.
DO NOT EXTRACT: anything taught or completed in class today, vague statements like "practice will continue" with no date, learning objectives, anything that already happened.
And then the instruction that does the heavy lifting: "Be strict. When in doubt, mark it as none. A parent should only see items they need to act on." I run it at a low temperature, because I want the same note to classify the same way every time, not creatively. The output is forced JSON: each item carries the source assignment id, a parent-actionable title, an optional date if one is genuinely stated, and an event type from a small fixed set (test, homework, bring_item, project, and so on). This is where the school assignment assistant becomes useful: the prompt is less about finding every possible event and more about refusing the ones that do not belong.
Marking the absence of events is the real design
The non-obvious part is not what happens when the model finds something. It is what happens when it finds nothing, which is most of the time.
Every assignment gets processed exactly once. Before each run I filter to assignments that have no extraction recorded yet, so a note is never sent to the model twice. But "I already looked and there was nothing actionable here" is itself a result I have to store, or I will re-examine the same empty notes forever and pay for the privilege. So when an assignment yields no future items, I write a sentinel row with the type none against that assignment id. It marks the note as seen-and-empty.
// After the model responds, any assignment that produced no real events
// still gets a sentinel so it is never reprocessed.
const processed = new Set(events.map(e => e.assignment_id));
for (const a of unprocessed) {
if (!processed.has(a.id)) {
db.insertEvents([{ assignment_id: a.id, event_type: "none", subject_key: a.subject_key }]);
}
}This sounds trivial and it is the difference between a job that costs a few cents a day and one whose cost grows with the entire history of the term. It also means the "did I check this?" question has a definite answer in the database, which matters more than it sounds: when the app shows a quiet day, I can tell whether that is because nothing is due or because the extractor has not run, and those are very different things to a parent. That database answer only works because the earlier shape of the project treated the data model as the spine, not as an afterthought.
Trusting the model exactly as far as the schema
I do not trust the model's output shape, even with forced JSON. The response gets normalized before it touches the database. A date is only kept if it matches a strict YYYY-MM-DD pattern, otherwise it becomes null rather than a hallucinated "next week." The subject is backfilled from the source assignment when the model omits it, because I already know which assignment each item came from and the model does not need to be the authority on that. The type defaults to a safe generic if the model invents one outside the set.
const events = parsed.map(e => ({
assignment_id: e.assignment_id || null,
title: e.title || "Unknown event",
event_date: /^\d{4}-\d{2}-\d{2}$/.test(e.event_date) ? e.event_date : null,
subject_key: e.subject_key || lookupSubject(e.assignment_id),
event_type: e.event_type || "activity",
}));The principle I keep coming back to with these features: the model is a good reader and an unreliable typist. Let it do the judgment, the reading and the classifying that I could not write rules for, and let plain code own the parts I can verify, the date format, the foreign keys, the enum. The boundary between "ask the model" and "check it in code" is where most of the reliability lives.
Where this leaves the project
This is a strict pipeline by design, and strictness has a real failure mode I should name: it will occasionally drop something genuinely actionable that was phrased ambiguously, because I told it to discard when unsure. I decided a missed item is a better failure than a noisy list, because a noisy list fails silently by training the parent to ignore the app, while a missed item is a single visible miss I can learn from. That trade is not free and it is not obviously right for every app. It is right for this school assignment assistant.
With future events now sitting in their own table, clean and de-noised, the next part is about turning them, and the raw notes behind them, into something warm enough to actually read: parent-facing summaries that do not cost me a model call every time someone refreshes the page.
