AI benchmarking organization faces criticism for delaying disclosure of OpenAI funding

An organization creating mathematical benchmarks for AI has come under scrutiny for failing to disclose funding from OpenAI until recently, sparking accusations of impropriety within the AI community.
Epoch AI, a nonprofit primarily funded by Open Philanthropy, revealed on December 20 that OpenAI supported the development of FrontierMath—a benchmark designed to test an AI’s mathematical skills using expert-level problems. FrontierMath was later used by OpenAI to showcase its upcoming flagship AI model, o3.
A contractor for Epoch AI, using the pseudonym “Meemi” on the LessWrong forum, stated that many contributors to FrontierMath were unaware of OpenAI’s involvement until it became public. Meemi criticized the lack of transparency, arguing that contributors deserved to know about OpenAI’s funding and the potential use of their work for enhancing AI capabilities before deciding to participate.
On social media, concerns emerged about the potential impact of this secrecy on FrontierMath’s credibility as an objective benchmark. Critics pointed out that OpenAI not only funded FrontierMath but also had access to many of the problems and solutions in the benchmark—a detail Epoch AI disclosed only after announcing o3.
Carina Hong, a Stanford PhD student in mathematics, echoed these concerns in a post on X, claiming that OpenAI’s exclusive access to FrontierMath upset some contributors. Hong reported that six mathematicians involved in FrontierMath were unaware of OpenAI’s privileged position and might have reconsidered their participation had they known.
In response to the criticism, Tamay Besiroglu, associate director and co-founder of Epoch AI, acknowledged the organization’s failure to be more transparent. He explained that contractual obligations with OpenAI restricted them from disclosing the partnership earlier but admitted they should have negotiated for greater openness with contributors.
“Our mathematicians deserved to know who might have access to their work,” Besiroglu wrote, adding that transparency should have been a non-negotiable condition of their agreement with OpenAI. He also clarified that OpenAI had verbally agreed not to use FrontierMath’s problems for training its AI models and that a separate holdout set was maintained for independent verification.
Despite this, Epoch AI’s lead mathematician, Ellot Glazer, admitted on Reddit that the organization has not yet independently verified OpenAI’s reported results for FrontierMath. While Glazer expressed personal confidence in the legitimacy of OpenAI’s scores, he noted that independent evaluations were still pending.
The situation highlights the challenges of developing AI benchmarks while securing funding without raising conflict-of-interest concerns. As AI progresses, maintaining transparency and trust in benchmarking processes will remain critical.