TÜV to certify AI: Why testing superior systems hits its limits

The EU aims to exert greater control over artificial intelligence through the AI Act, while TÜV prepares testing procedures for high-risk systems. The new legal framework is set to mandate testing, documentation, and proof of compliance for providers. Companies will be required to certify AI systems that impact areas such as healthcare, transport, public administration, critical infrastructure, or fundamental rights. However, the fundamental conflict runs deeper: when AI surpasses human capabilities, human-led testing procedures can offer only limited assurance of safety.

Certifying AI involves more than just testing an ordinary machine

TÜV is well-versed in the testing of technical systems. When dealing with cables, pressure vessels, or vehicle components, experts measure material properties, loads, and failure limits. However, this principle applies only partially to AI. Unlike a mechanical component, a model does not simply operate; instead, it generates results based on data, parameters, and probabilities.

The EU plans to have AI certified by TÜV. However, there are limits to the extent that superior systems can be vetted by humans.
Stock image: AI-generated

This is precisely why a new problem of oversight arises. Auditors can define test cases and evaluate outputs, but they cannot fully grasp every internal weighting mechanism. Furthermore, it remains unclear whether a system will behave the same way in real-world operation as it did during the testing process.

High-risk AI affects sensitive areas

The EU classifies AI systems as high-risk if they significantly impact health, safety, or fundamental rights. This category includes applications for medical diagnoses, autonomous vehicles, personnel decisions, and credit assessments. Critical infrastructure also falls under this classification, whereas simple chatbots are generally subject to less stringent requirements.

Consequently, providers face extensive documentation obligations. They must document data sources, operational limits, logs, and human interventions. They are also required to demonstrate how they mitigate errors, biases, and potential attacks. While these requirements enhance transparency, they do not substitute for comprehensive oversight.

Human standards encounter technical limits

A key objection concerns the logic of the testing process itself. Humans evaluate AI using human standards, whereas highly capable systems may develop entirely different strategies. Passing a test therefore merely indicates that the AI has successfully navigated known scenarios; it does not prove that the system will act safely in novel situations.

The challenge is compounded for systems that learn or undergo frequent updates. A model might yield different results following an update. For this reason, high-risk AI cannot simply be certified once and deemed permanently safe; continuous monitoring becomes more critical than a one-time seal of approval.

EU deadlines delay the real-world test

The AI Act entered into force on August 1, 2024. However, many obligations are being phased in, as the necessary standards, guidelines, and testing methodologies are still under development. The EU Commission has set a deadline of December 2, 2027, for certain high-risk applications. For AI integrated into products such as robots or industrial machinery, August 2, 2028, serves as a key milestone.

These deadlines afford companies more time. At the same time, they prolong a delicate transition phase, as technology, regulation, and practical testing methods are not evolving at the same pace. Precisely for this reason, it remains unclear just how robust the market-entry checks will actually prove to be.

A certification seal must not become a false promise of safety

Certification can reveal vulnerabilities. It allows for the assessment of training data, documentation, defenses against attacks, and failure thresholds. Furthermore, it compels providers to systematically disclose risks prior to deployment. Yet, it cannot guarantee that an advanced AI will remain controllable in every situation.

For patients, job applicants, consumers, and operators of critical infrastructure, therefore, more than just a label matters. What is crucial is determining who bears liability, who can intervene, and who has the authority to halt operations. A system would need to undergo recertification following major changes to ensure the seal does not become an empty symbol of trust. Europe therefore requires ongoing oversight, clear rules for system shutdowns, and transparent incident reporting.

Author: Blackout News
Sources: Zeit (16.06.26) – Europian Comission (Stand: 20.06.26) – TÜV Verband (Stand: 20.06.26) – Das Parlament (12.06.26)