Microsoft has introduced two new multi-model AI research systems called Critique and Council for its Copilot Researcher tool. These features pair OpenAI‘s GPT and Anthropic‘s Claude to work on the same task, either collaboratively or competitively, to improve factual accuracy and reduce hallucinations. The company states that on the DRACO benchmark, its combined system outperformed leading single-model AIs, scoring 57.4 points compared to the 42.7 achieved by Claude Opus 4.6 alone.
Microsoft announced two new features, Critique and Council, for its Copilot Researcher tool. These systems use both OpenAI‘s GPT and Anthropic‘s Claude to tackle complex research tasks, moving beyond reliance on a single AI model. As stated in the announcement, the goal is to separate generation from evaluation to enhance report quality.
In Critique, one model drafts a report and a second model acts as a reviewer to refine it for factual accuracy and citation quality. Microsoft’s testing showed this approach fixes problems like hallucinations and weak citations common in mono-model research. The company reported significant gains on the DRACO benchmark, with Copilot using Critique scoring 57.4 points.
The second feature, Council, runs GPT and Claude simultaneously and compares their outputs. A third “judge” model then writes a summary explaining where the two AIs agreed or diverged. In Critique, the models collaborate, whereas in Council they essentially compete against each other for a more comprehensive analysis.
Both features are currently available to users enrolled in Microsoft’s Frontier early-access program. A Microsoft 365 Copilot license, costing $30 per user per month, is required to access these capabilities. The release underscores Microsoft’s strategy to focus on the orchestration layer that routes tasks to the best model combination.
