Minimize AI hallucinations and deliver up to 99% accuracy accuracy using automated reasoning checks: Now available

August 15, 2025: Post updated to clarify the token limit for taking the document.

Today I am glad to share these automated thinking checks, the new Amazon Bedrock Guardrails policy, which we have seen during the AWS Re: Invent, is now generally available. Automated Consistence checks will help you verify the accuracy of the content of generated FMS models (FMS) against domain knowledge. This can help prevent factual errors due to AI hallucinations. This policy uses mathematical logical and formal verification techniques to verify accuracy, provide definitive rules and parameters against which AI responses are controlled in terms of accuracy.

This approach is fundamentally different from probabilities of reasoning, which is involved in the uncertainty of assigning the probability of results. In fact, automated thinking checks provide up to 99% accuracy of verification and provide demonstrable certainty when detecting AI hallucinations while helping to detect uncertainty when the output of the model is open more than one interpretation.

With general availability, you will get the following new features:

Support of large documents in one assembly, up to 122,880 chips – extensive documentation; We have found that this can add up to 100 pages of content
Simplified Verification of Policy – Save your verification tests and run them repeatedly, which facilitates maintenance and verification of your policies over time
Automated Scenario generation – Create testing scenarios automatically from your definitions, time and effort, while helping comprehensive coverage
Improved Policy Feedback – Provide Proposals in Natural Language for Policy Changes and Simplify the way you can improve your principles
Customizable Verification Settings – Edit the threshold values of the confidence score to match your specific needs, which gives you more control over the verification strictness

Let’s see how it works in practice.

Creating automated checks justification in Amazon Bedrock railing
If you want to use automated inspections of reasoning, first encode the rules from your knowledge domain to the automated reasoning principle, and then verify the generated content using policies. For this scenario, I will create a mortgage approval policy that will protect AI assistant to evaluate who can qualify for a mortgage. It is important that the predictions of the AI system do not deviate from the rules and instructions laid down for the mortgage approval. These rules and instructions are captured in a political document written in natural language.

I will choose in the Amazon Bedrock console Automated reasoning From the navigation pane to create policy.

I enter the name and description of the policy and record the PDF document of politics. The name and description are only metadata and do not contribute to building automated thinking policy. I describe the source content and add the context of how it should be translated into formal logic. For example, I will explain how I plan to use this policy in my application, including Q&A samples from AI assistant.

Consoel screenshot.

When policy is ready, I land on the overview page and display details about the policies and summary of tests and definitions. I choose Definition From the drop -down list to explore the policy of automated thinking, made of rules, variables and types that were created to convert natural language policy to formal logic.

The Rules Describe how variables are connected in policy and are used in evaluating the generated content. For example, in this case, which are thresholds to be used and how some decisions are taken. For traceability, every rule has its own unique ID.

Console image.

The Variable They represent the main concepts of playing in the original natural language documents. Each variable is connected to one or more rules. The variables allow complex structures to be easier to understand. For this scenario, some rules must look at the deposit or credit score.

Console image.

Custom Types They are created for variables that are neither boolean nor numerical. For example, for variables that can only assume a limited number of values. In this case, there are two types of mortgages described in politics, insured and conventional.

Console image.

Now we can assess the quality of initial policies of automated reasoning through testing. I choose Tests From the drop -down list. Here I can manually enter the test consisting of input (optional) and output, such as the question and its possible answer from the interaction of the customer with AI assistant. Then I set the expected result from an automated thinking check. The expected result may be valid (the answer is correct), invalid (the answer is not correct) or satisfying (the answer may be true or false depending on the specific assumptions). I can also assign a threshold value of confidence for translation of a pair of query/content from natural language to logic.

Before giving tests manually, I use the option to automatically generate the scenario from definitions. This is the easiest way to verify policy, and (unless you are an expert on logic) should be the first step after creating a policy.

For each scenario, I provide the expected verification to see if it is something that can happen (satisfactory) or not (invalid). If not, I can add an annotation that can be used to update the definitions. For a more advanced understanding of the generated scenario, I can show a formal logical representation of the test using SMT-Lib syntax.

Console image.

After using the Generate scenario, enter several tests by hand. For these tests, I have determined different expected results: some are valid because they adhere to policy, some are invalid because they deal with policy and some are satisfactory because their result depends on specific assumptions.

Console image.

Then I choose Verify all tests see the results. All tests passed in this case. Now that I update the policy, I can use these tests to verify that the changes have not brought errors.

Console image.

For each test I can look at the finding. If the test does not pass, I can look at the rules that created a contradiction that caused the test to fail and go against the expected result. Using this information, I understand if I should add annotation, improve policy or correct the test.

Console image.

Now that I am satisfied with the tests, I can create a new Amazon Bedrock Guardrail (or update existing) to use up to two automated thinking principles to check the validity of AI assistant responses. All six principles offered by the railing are modular and can be used together or separately. For example, automated thinking checks can be used with other warranty, such as filtering of content and contextual grounding. The railing can be applied to the models operated by Amazon Bedrock or with any third -party model (such as Open and Google Gemini) via ApplicGuardrail API. I can also use a railing with an agent like Agent Strads, including agents deployed using Amazon Bedrock Agentcore.

Console image.

Now that we have seen how to set up a policy, let’s look at how automated thinking controls are used in practice.

Customer Case Study-
When the lights go out, it counts every minute. That is why companies turn to AI solutions to improve their outage management systems. We cooperated on the solution in this area along with PWC. Using automated thinking checks, the tools can make operations more efficient through:

Automated Protocol generation – creates standardized procedures that meet regulatory requirements
Plan verification in real-time-arrival that the plans of response are in line with the established principles
The structured creation of workflow-based work procedures based on severity with defined objects of response

At its core, this solution combines intelligent policy management with optimized response protocols. Automated Research checks are used to assess the AI -generated responses. If it is found that the answer is invalid or satisfactory, the outcome of the automated reasoning is used to rewrite or improve your response.

This approach shows how AI can transform traditional service operations, making it more efficient, more reliable and responding to customer needs. By combining mathematical accuracy with practical requirements, this solution sets a new standard for managing outages in the public service sector. The result is a faster response time, improved accuracy and better results for both public services and their customers.

According to Matt Wood, the global and American business and innovative PWC clerk:

“In PWC we help clients move from pilot AI to production with confidence – especially in highly regulated industries, where the cost of incorrect step is measured in more than dollars. Our cooperation with AWS on automated control is a default in AI: This AUNTORA for AWSS Teto Instory Pro Awateras. Compliance – where trust is not a function, it is a request.

What to know
Automated checks of justification in Amazon Bedrock Guardrails are generally available in the following AWS regions: Us East (Ohio, N. Virginia), US West (Oregon) and Europe (Frankfurt, Ireland, Paris).

With automated inspections, you pay based on the amount of the processed text. For more information, see Amazon Bedrock.

If you want to learn more and create secure and secure AI applications, you can find the technical documentation and samples of the GITHUB code. Follow this link and for direct access to Amazon Bedrock.

Videos in this track list include an introduction to automated thinking checks, a deep presentation of diving and practical instructions for creating, testing and improving policy. This is the second video in the song list where my colleague Wale provides a nice introduction to the ability.

– Danilo

Minimize AI hallucinations and deliver up to 99% accuracy accuracy using automated reasoning checks: Now available | Amazon Web Services

Leave a Comment Cancel reply