Translation error categories explained: accuracy, fluency, terminology, style

When a translation comes back with problems, the first instinct is often to mark it broadly as "poor quality" and send it back for revision. That works once. What doesn't scale is a QA process built on general impressions rather than defined categories. Translation error categories give reviewers and translators a shared language for describing problems precisely — not just flagging them. In our experience, the difference between a QA report that drives improvement and one that creates friction usually comes down to whether reviewers are working from a consistent framework or from whatever felt wrong to them on the day.

Why error categories matter more than error counts

It's tempting to reduce translation quality to a single metric: errors per thousand words, or what percentage of segments needed revision. These numbers have legitimate uses — they're useful for comparing outputs from different translation engines, or tracking a linguist's consistency over time — but they compress too much information to guide practical decisions.

Knowing that a file contains 34 errors tells you less than knowing that 28 of them are fluency issues and 6 are accuracy errors. The first number suggests a light edit might be sufficient; the second tells you there's a content reliability problem that no amount of style editing will fix.

Error categories also create accountability. A QA report that says "translation was poor quality" is difficult to act on. A QA report that says "12 accuracy errors — 3 critical (changed meaning of contractual clause), 9 minor (omitted modifier)" gives the translator something specific to address and the project manager something specific to assess.

The MQM (Multidimensional Quality Metrics) framework, developed with substantial academic and industry input, provides a detailed taxonomy of translation error types. For most agency workflows, the full MQM taxonomy is more granularity than you need day-to-day. But the major categories it's organized around are worth understanding and applying.

Accuracy errors: when the meaning changes

Accuracy errors are the most serious category in translation QA. An accuracy error occurs when the translation changes, omits, or adds meaning relative to the source text — regardless of how fluent the result sounds in the target language.

The main subtypes include mistranslation (the source term or phrase is rendered with an incorrect meaning), omission (content present in the source is absent from the target), addition (content appears in the target that has no basis in the source), and untranslated text (source-language content that wasn't translated).

In practice, the most common accuracy errors in AI-assisted translation involve false friends (words that look similar across languages but carry different meanings), domain-specific terms where the general meaning differs from the technical meaning in context, and segment-level accuracy that ignores how a phrase connects to surrounding text.

A concrete example: in a financial document, the English phrase "to account for" can mean "to record in accounting" or "to consider/allow for." An AI translation model choosing the accounting interpretation when the text means "to allow for" produces an accuracy error. The translated sentence may read smoothly in the target language, but it says something different from the source — and a fluency review alone won't catch it.

Accuracy errors have to be identified in content review, not just fluency editing. QA workflows that ask reviewers only to make the AI output "sound more natural" miss this category entirely.

Fluency errors: when the text doesn't read like the target language

Fluency errors don't change the meaning — the translation is conveying the right content, but it's doing so in a way that sounds wrong to a native speaker. Subtypes include grammatical errors, spelling mistakes, punctuation problems, register mismatches (formal source rendered in informal target, or vice versa), and awkward phrasing that a native speaker wouldn't produce.

Fluency issues are often the most visible errors and the easiest for reviewers to notice, which is probably why they receive the most attention in translation QA. But visibility doesn't equal severity. A translation that reads beautifully but contains an accuracy error is more problematic for most use cases than one that reads slightly awkwardly but conveys the correct information.

Neural MT systems have generally improved faster on fluency than on accuracy. Modern AI translation output often reads naturally in high-resource language pairs. These fluency gains have made AI output appear better than it sometimes is — a translated document can pass a casual reading without triggering any reaction while containing accuracy errors that only a bilingual subject-matter reviewer would catch.

In a well-designed QA process, fluency review and accuracy review are treated as distinct steps, often handled by different reviewers. A native speaker of the target language who isn't a domain expert can identify grammatical and phrasing issues reliably. Accuracy review requires bilingual competence and domain-specific knowledge. Using the same reviewer for both steps, or treating them as a single task, typically means one of them gets done poorly.

Terminology errors: the consistency problem

Terminology errors occupy their own category because they affect consistency across a document or project, not just correctness at the level of a single segment. A terminology error occurs when a defined term is translated inconsistently — either because a non-approved equivalent was used, or because the approved translation wasn't applied.

This matters most in specialized domains and in any project that has a glossary. A legal document that uses two different target-language terms for the same concept creates ambiguity about whether the same thing is being referred to. A technical manual that uses three different names for the same component makes procedures harder to follow accurately.

The practical tool for managing terminology errors is a project glossary. In a CAT tool workflow, glossaries are associated with projects and surface approved translations when a defined term appears in the source. When the glossary isn't set up, or when a translator ignores the suggestion, terminology inconsistency follows.

For QA reviewers, identifying terminology errors requires access to the approved glossary. You can't flag a terminology violation without knowing what the approved terms are. This is obvious in principle, but many QA workflows send files to reviewers without the glossary attached, which makes this error category effectively unauditable in practice.

Style errors: the hardest category to make consistent

Style errors are real, not just differences of preference — but they're the hardest category to make consistent across reviewers, because they require a shared reference. A style guide defines what "correct style" means for a specific client or brand: sentence length, formality register, use of active versus passive constructions, how numbers are expressed, capitalization conventions, and so on.

Without a style guide, what one reviewer marks as a style error another might accept as a reasonable translation choice. This leads to inconsistencies in QA reports and friction with translators who feel they're being penalized for decisions that the client's documented preferences don't actually prohibit.

We've seen agencies improve style QA significantly by doing one thing: writing a brief style guide for each major client, even if it's just a page. A style guide that specifies the register, a handful of key preferences (active voice where possible, numbers below 10 spelled out, formal address form), and any client-specific conventions gives reviewers a shared standard. Style errors become identifiable — and therefore actionable — rather than subjective.

Severity levels: not all errors are equal

Error categories tell you what type of problem you're dealing with. Severity levels tell you how much it matters for this particular project and this particular use case. Most QA frameworks use a three-level classification:

Critical: errors that change the meaning in a way that could mislead a reader, create legal or safety risk, or make the content unusable. A mistranslated dosage in a pharmaceutical document. An inaccurate term in a contract clause that changes its scope. Critical errors require correction before delivery, without exception.

Major: errors that significantly affect quality but don't create immediate risk. A missed sentence in a product description. An inconsistent term across a large document. Major errors typically require correction before delivery, though some workflows handle them as revision priority rather than as a delivery block.

Minor: errors that affect polish but not substance. A slightly awkward phrase that a native speaker wouldn't have chosen. A punctuation preference. Minor errors can be documented in QA reports for future improvement without blocking delivery.

Applying severity to error categories produces QA reports that are actually useful. An accuracy error at minor severity might be a borderline term choice where both options are defensible. An accuracy error at critical severity might be a factual reversal. The category tells you what happened; the severity tells you what to do about it.

Building a QA workflow around these categories

The goal isn't an exhaustive taxonomy that takes longer to fill out than the translation took to produce. The goal is enough structure that QA reports drive process improvement rather than just generating documentation.

For most agencies, a workable QA process involves four things: a defined set of error categories (accuracy, fluency, terminology, style covers most of what matters), a severity scale (critical, major, minor), a reference glossary available to reviewers for any project with defined terminology, and a style guide for clients where style review is part of the scope.

Error data collected consistently over time reveals things that single-project reviews can't show. Whether a specific linguist consistently produces terminology errors. Whether a particular language pair generates accuracy issues at a higher rate from AI pre-translation than it saves in translation time. Whether post-editing rates are calibrated to the actual effort the content requires. This information only exists if you're tracking by category — a total error count doesn't tell you any of it.

If you work with AI-translated Smartcat bilingual DOCX files and want QA built into the translation output rather than handled as a separate step, SnapIntel includes a quality rating and QA report as part of its workflow — which means error visibility is part of what you download rather than something you have to generate separately.

For a broader look at how QA fits into different agency workflows and project types, our complete guide to translation quality assurance covers the process from pre-delivery checks through client feedback handling.

Actionable takeaway: Take your next QA report and categorize every issue by type (accuracy, fluency, terminology, style) and severity (critical, major, minor). Do this for three consecutive projects. At the end of the third, look at the pattern — which error type appears most often, and at what severity? That pattern tells you where to focus your process improvement effort, whether that's glossary setup, pre-translation quality, reviewer training, or something else entirely.