April 30, 2025, 8:20 pm
A recent study alleges that LM Arena—the team behind the popular Chatbot Arena—has manipulated its benchmark evaluations to favor select AI labs. Critics argue this approach undermines the fairness of widely recognized scoring methods, fueling demands for greater transparency and accountability in the process.
Bluesky: @techcrunch.com
You also mentioned the whole Chatbot Arena thing, which I think is interesting and points to the challenge around how you do benchmarking. How do you know what models are good for which things? One of the things we've generally tried to do over the last year is anchor more of our models in our Meta…
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According to the authors, LM Arena allowed some…
The Chatbot Arena has become the go-to place for vibes-based evaluation of LLMs over the past two years. The project, originating at UC Berkeley, is home to a large community of model enthusiasts who submit prompts to two randomly selected anonymous models and pick their favorite response. This…
permalink / 4 stories from 3 sources in 7 hours ago #ai #openai #aiethics #chatgpt
Speaking at high-profile tech events in Washington, D.C., Nvidia CEO Jensen Huang cautioned that the U.S.-China AI race is more competitive than many presume. He quipped that China isn’t trailing behind and underscored the tight contest, hinting at significant trade ramifications and job market booms amid relentless innovation. More...
Microsoft’s Q3 report outpaced expectations with a solid 13% revenue bump and accelerating cloud growth that fired up the markets. The earnings results, marked by a healthy share surge and resilient performance in a competitive digital arena, reaffirm Microsoft’s leadership in cloud services, much to the delight of its market watchers. More...
Meta’s quarterly results revealed an intriguing blend of impressive revenue growth alongside a significant setback in its Reality Labs division. While soaring sales figures—including a blockbuster $42 billion in sales—drew cheers, the staggering $4.2 billion loss from Reality Labs added an ironic twist to an otherwise promising performance, leaving investors both impressed and amused. More...
In a dramatic courtroom session, Google’s Sundar Pichai lambasted the DOJ’s proposal to break up the tech giant, warning that such drastic measures could effectively dismantle Google Search. His testimony carried a blend of gravitas and dry wit, highlighting the precarious balance between regulatory intervention and sustaining innovative market leadership. More...
In a courtroom saga that reads like a tech thriller, judges have repeatedly taken a swipe at Apple’s restrictive App Store policies, citing antitrust violations and even executive dishonesty. The decisions have paved the way for Fortnite’s impending return to iOS, forcing Apple to reexamine its ironclad payment rules. More...
Google Gemini Chatbot Update Introduces Image Editing Feature (9 hours ago)
Duolingo Expands AI Language Courses, Cutting Contractor Roles (15 hours ago)
Meta’s WhatsApp Rolls Out Private AI Chat Feature (16 hours ago)
Nvidia CEO Warns Amid U.S.-China AI Rivalry Escalations (15 hours ago)
Disclaimer: The information provided on this website is intended for general informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the content. Users are encouraged to verify all details independently. We accept no liability for errors, omissions, or any decisions made based on this information.