How to track translator output and build a productivity baseline for your agency

Most translation agencies track words—words received, words invoiced, words past deadline. What they rarely measure with any precision is translator output tracking: how much work a given person completes per day, by project type and domain. That gap doesn't feel like a problem until you need to quote a tight deadline, explain a missed delivery window, or decide whether to bring in another translator. Then the absence of real data becomes very expensive, very quickly.

What translator output actually measures

Word count is the obvious unit, but raw word count misleads unless you break it apart. A translator working in a CAT tool on a project where 70% of segments match the translation memory completes far fewer new words than the total segment count implies. A post-editor working through machine translation on a light MTPE pass will cover three times the words per hour of someone doing full human translation from scratch. Lump all of this under one "words per day" figure and the number is not useful.

A workable output taxonomy separates at minimum:

New words: segments with no TM match or below 50% fuzzy
Fuzzy words: segments with partial TM matches, typically in the 50–99% range
Exact matches: segments confirmed automatically from TM without editing
MTPE words: segments received from a machine translation engine, reviewed and edited by the translator

Each category has a different expected throughput. An experienced technical translator working on new content in their domain might sustain 2,000–2,500 new words per day across a full project. The same translator doing MTPE for a high-quality MT engine in the same domain might manage 5,000–7,000 words per day of reviewable output. These are ranges we've seen when agencies actually instrument their data—not industry benchmarks to copy wholesale.

The goal of measuring output is not to rank your translators or set aggressive daily quotas. It is to understand what your team can reliably deliver, so you can quote accurately and staff projects without guessing.

Why most agencies don't have a reliable baseline

The default state for most small-to-medium agencies is a combination of rough estimates, institutional knowledge, and experience-based guesswork. Project managers develop a feel for which translators work quickly—but that knowledge lives in one person's head and doesn't survive staff turnover.

A second common failure mode is mixing unlike work in the same tracking bucket. If translator A delivered 45,000 words last month and translator B delivered 28,000, you might draw conclusions about relative productivity. But if translator A was running MTPE on technical documents with strong TM coverage, and translator B was doing fresh legal translation with no prior TM, the comparison tells you nothing actionable.

The third problem is that tracking feels like administrative overhead. Adding a logging step to the delivery process is easy to defer when the project is already running late. We've worked with agencies where the only available data was invoice amounts and delivery dates—not enough to reconstruct working hours or per-day throughput for any individual.

None of this is surprising. Most agencies build their workflows around completing work, not measuring how it gets done. That calculus shifts when you start losing bids because your estimates were inaccurate, or when a client questions why a delivered project took as long as it did.

The metrics that matter

Once you commit to tracking, keep the metric set narrow. You don't need a custom dashboard from day one, and adding too many columns is a reliable way to ensure none of them get filled in consistently.

Net new words per working day is your primary throughput metric. This is the count of new (TM-unmatched) source words a translator completes in a working day, tracked by domain. This number will vary significantly between domains—legal and financial text typically produce lower daily throughput than technical documentation or marketing copy, for reasons that have more to do with cognitive load and terminology density than raw typing speed.

MTPE words per working day tracks post-editing output separately. If you don't separate this from full translation output, your averages are contaminated. MTPE productivity depends heavily on the quality of the underlying machine translation and the fit between that engine and your specific domain. Tracking it as a distinct category lets you evaluate whether a given AI output is actually helping.

Revision volume per project is not a throughput metric directly, but it acts as a proxy for first-pass quality. A translator who consistently delivers 4,000 words per day but whose output requires heavy revision from your editor is not saving time—they are moving cost downstream.

TM contribution rate tracks how much new, usable translation a given translator adds to the project translation memory. For clients with repeat content, this compounds in value across projects.

Start with the first two: net new words per day and MTPE words per day, separated by domain. Three to six months of consistent data will produce averages you can rely on for planning.

How to set up translator output tracking without adding overhead

The most common mistake is building a system that requires translators to self-report daily output into a separate spreadsheet. Self-reported data has accuracy problems, and asking people to do extra administrative work after completing a project generates friction and incomplete records.

The better approach is pulling data from sources that already exist.

Most CAT tools—Trados, memoQ, Smartcat—produce project statistics that include segment counts by match type. Export those statistics at project close and log them to a running agency record. With a consistent process, this takes two to three minutes per project and can be done by the project manager rather than the translator.

For projects that bypass a CAT tool—direct DOCX files passed for full human translation, or ad hoc jobs outside your standard process—a lightweight submission form at project close works well. Fields needed: domain, word counts by type, approximate working days. The bar for completion has to stay low, or entries pile up and then don't get filled in at all.

A practical setup that works: a shared project log with one row per project-translator combination. Columns include project ID, translator, domain, date range, new words, fuzzy words, exact matches, MTPE words, and a notes field. The CAT tool export populates most columns automatically. The translator fills in working time if the project manager doesn't already track it through another channel.

This doesn't require a new tool. A spreadsheet works fine at the start. The structure matters more than the medium.

Building a baseline from the data you collect

Once you have two to three months of consistent records for a translator working on similar content, you can build a baseline. The calculation is simple: total net new words delivered divided by total working days on those projects.

A concrete example: you have a freelance translator working on IT documentation, English to German. Over 90 days, you've logged 12 projects totaling 28,400 net new words. The translator worked on those projects across a combined 31 days, some of which were partial. That gives you a baseline of roughly 915 net new words per working day for this person, in this domain.

Whether that number is adequate depends on the work. For full human translation on technical content, it sits toward the conservative end of the typical range. For complex software documentation with inconsistent source writing, it is reasonable. The baseline is not a performance judgment—it's a calibration point for planning.

When you have baselines for five or six of your most active translators, deadline calculation becomes grounded in evidence. A 15,000-word new project that needs to complete in five business days: you know which translators can cover it independently and which need parallel assignment.

Baselines also surface things that intuition misses. We worked with one agency that had assumed their fastest translator was consistently their most productive. When they ran the numbers, they found that translator's daily output varied by nearly a factor of three depending on domain. In automotive technical content—their primary area—they were genuinely fast. Outside it, they performed close to average. That finding changed how the agency assigned work.

This only works if the data carries domain tags. A baseline that mixes legal, marketing, and technical output for the same translator is not a reliable number for any of those categories.

What to do with output data once you have it

Data that doesn't change decisions isn't worth collecting. Once you have baselines, the most direct applications are:

Quoting with real capacity numbers. Instead of estimating from instinct or external industry averages, you can quote based on your own team's measured throughput. This reduces both over-promising to clients and over-staffing on individual projects.

Identifying where AI-assisted workflows help. If you run AI pre-translation before passing work to translators—whether through a structured DOCX workflow or a CAT tool pre-translation step—output data tells you which translators see meaningful speed gains from post-editing and which would be faster working from scratch. AI tools are changing how translators structure their working day, but the impact is not uniform across translators or domains. Output tracking gives you evidence instead of assumptions.

Catching source quality problems early. If a translator's output drops on a project type where they usually perform well, the cause is often the source material—poorly written, inconsistent, or structurally unusual input that slows translation at every step. Without a baseline, you have no reference point for that conversation with the client.

Staffing decisions grounded in real capacity. When you know what your current roster can deliver per week by content type, you have a genuine basis for deciding when to add headcount versus when to restructure the workflow.

For agencies running structured AI translation workflows—uploading DOCX files and passing the outputs to translators for review—tools like SnapIntel provide per-file progress visibility and downloadable QA reports as part of each job, which gives you another data point per project alongside your manual logs.

Common pitfalls to avoid

Tracking without domain context. A words-per-day figure with no domain attached is not actionable. Always tag data with content type and, where relevant, subject matter complexity.

Using output as a proxy for quality. High throughput doesn't imply high quality. Track revision volume alongside output. Without that pairing, you'll make decisions that appear to optimize efficiency while quietly shifting cost into the editing stage.

Comparing across dissimilar work types. 2,000 new words per day is reasonable for a translator working on legal contracts from scratch. It would indicate a problem for someone doing MTPE on IT strings where the MT engine is performing well. Mixing these categories without flagging them produces comparisons that mislead rather than inform.

Letting baselines go stale. A baseline built on data from 18 months ago may not reflect current conditions if the translator's domain mix, tools, or workload has shifted. Refresh baselines annually, or sooner if you notice consistent variance from projections.

If you're starting from nothing, here's a manageable first step: pick your five most active translators and your three most common project types. Set up a simple shared log and run it for 60 days without changing anything else. By the end, you'll have rough baselines that are more accurate than any external benchmark, because they reflect your actual workflow, your team, and the content you handle. From there, you can expand coverage, refine the categories, and start applying the numbers to quoting and project planning.