Plain-English research note

Why review timing matters for SOP recall.

RecallAI helps teams keep operational knowledge fresh: refund rules, escalation steps, onboarding checks, audit procedures, and customer-facing answers. FSRS is useful because it adapts review timing to the person, the card, and the review history.

Quick answer

SM-2 is simple. FSRS is more adaptive.

SM-2-style scheduling is a proven, easy-to-understand approach. FSRS goes further by estimating memory state and predicted recall, which is a better fit when managers need efficient practice and useful evidence over time.

Question
SM-2 style scheduling
FSRS scheduling
What it tries to do
Use simple rules to increase the gap between reviews when someone remembers a card.
Predict how likely someone is to remember a card, then schedule the next review around that probability.
How it sees memory
Mostly works from the last rating, interval, and an ease factor.
Tracks difficulty, stability, and retrievability so each card can behave differently over time.
Personalization
Reliable and understandable, but less sensitive to different users, topics, and review histories.
Can adapt from real review logs and tune scheduling around actual learner behavior.
Best use
Good for basic flashcard review where simplicity matters most.
Better for evidence-based training where review timing should be efficient and individualized.
How FSRS thinks

FSRS models whether a memory is hard, durable, and currently retrievable.

The useful mental model is simple: some cards are harder, some memories are more stable, and recall fades over time. FSRS uses those signals to schedule the next review instead of treating every learner and card the same.

Difficulty

How hard a card is for a learner. Harder material usually needs more careful reinforcement.

Stability

How durable the memory is. Higher stability means the person can usually wait longer before the next review.

Retrievability

The current chance that the person can recall the answer now. It falls as time passes.

Desired retention

Higher retention is not free.

Desired retention is the target chance that a learner remembers a card when it comes back. A higher target usually means shorter intervals and more daily reviews. For an operations team, the goal is not maximum review volume. The goal is enough reinforcement to reduce mistakes without creating a training burden.

90% target
A practical default balance for many teams: good retention without overwhelming review load.
95% target
More aggressive. Reviews come back sooner, which may be useful for critical procedures but increases workload.
97%+ target
Usually a specialist setting. Workload can rise quickly and may be too heavy for an SMB rollout.
How RecallAI uses it

FSRS schedules practice. RecallAI adds recall evidence around it.

FSRS decides when a card should return. RecallAI adds the business layer around it: source SOPs, manager-approved cards, typed answers, answer-match scoring, role-based assignment, and reporting.

1
The learner sees a question
The answer is hidden. This starts with active recall instead of recognition.
2
They type their answer
RecallAI captures what they actually attempted before revealing the approved answer.
3
The answer is scored
The typed answer is compared with the approved answer and logged as a match score.
4
They self-rate the card
Again, Hard, Good, or Easy feeds the FSRS scheduler.
5
The next review is scheduled
The card returns when recall is likely to need reinforcement.
6
Managers see evidence
Team and user stats show weak decks, low-match answers, and recent review history.
Technical note

Good scheduling depends on predicting future recall.

Public spaced-repetition benchmarks compare scheduling methods against large review-history datasets. The practical question is whether the scheduler predicts later recall well enough to place reviews at useful intervals.

Desired retention

The target recall rate. A higher target means more frequent reviews and more daily workload.

Log loss

A benchmark metric for whether predicted recall probabilities match real outcomes. Lower is better.

RMSE bins

A benchmark metric that compares predicted recall against actual recall in grouped review situations. Lower is better.

What this means for managers

The practical point is simple: adaptive scheduling helps spend review time where it is most useful, while the typed-answer layer gives managers evidence of attempted recall.

Best use

Human approval keeps recall evidence useful.

A learner can still misunderstand a procedure, write a partial answer, or need manager coaching. The system is strongest when managers use the evidence to find weak areas, improve cards, and reinforce the procedures that matter most.

Managers keep context

Managers decide which SOPs matter and whether answers are acceptable for the business context.

Evidence supports coaching

Typed-answer history helps teams identify weak areas, reinforce training, and keep better records of SOP recall.

Best with real SOPs

The system is most useful when it trains current workflows, not generic training content.