Home Blog Page 233

Inside OpenAI’s quest to make AI do anything for you

0


Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions. 

Today that team, known as MathGen, is considered instrumental to OpenAI’s industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would.

“We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at,” Lightman told TechCrunch, describing MathGen’s early work.

OpenAI’s models are far from perfect today — the company’s latest AI systems still hallucinate and its agents struggle with complex tasks.

But its state-of-the-art models have improved significantly on mathematical reasoning. One of OpenAI’s models recently won a gold medal at the International Math Olympiad, a math competition for the world’s brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects, and ultimately power general-purpose agents that the company has always dreamed of building.

ChatGPT was a happy accident — a lowkey research preview turned viral consumer business — but OpenAI’s agents are the product of a years-long, deliberate effort within the company. 

“Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” said OpenAI CEO Sam Altman at the company’s first developer conference in 2023. “These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”

Techcrunch event

San Francisco
|
October 27-29, 2025

OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California.
OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California.(Photo by Justin Sullivan/Getty Images)Image Credits:Justin Sullivan / Getty Images

Whether agents will meet Altman’s vision remains to be seen, but OpenAI shocked the world with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind that breakthrough are the most highly sought-after talent in Silicon Valley.

Mark Zuckerberg recruited five of the o1 researchers to work on Meta’s new superintelligence-focused unit, offering some compensation packages north of $100 million. One of them, Shengjia Zhao, was recently named chief scientist of Meta Superintelligence Labs.

The reinforcement learning renaissance

The rise of OpenAI’s reasoning models and agents are tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on whether its choices were correct or not in simulated environments.

RL has been used for decades. For instance, in 2016, about a year after OpenAI was founded in 2015, an AI system created by Google DeepMind using RL, AlphaGo, gained global attention after beating a world champion in the board game, Go.

South Korean professional Go player Lee Se-Dol (R) prepares for his fourth match against Google’s artificial intelligence program, AlphaGo, during the Google DeepMind Challenge Match on March 13, 2016 in Seoul, South Korea. Lee Se-dol played a five-game match against a computer program developed by a Google, AlphaGo. (Photo by Google via Getty Images)

Around that time, one of OpenAI’s first employees, Andrej Karpathy, began pondering how to leverage RL to create an AI agent that could use a computer. But it would take years for OpenAI to develop the necessary models and training techniques.

By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive amounts of internet data and a large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math. 

It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed “Q*” and then “Strawberry,” by combining LLMs, RL, and a technique called test-time computation. The latter gave the models extra time and computing power to plan and work through problems, verifying its steps, before providing an answer.

This allowed OpenAI to introduce a new approach called “chain-of-thought” (CoT), which improved AI’s performance on math questions the models hadn’t seen before.

“I could see the model starting to reason,” said El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.” 

Though individually these techniques weren’t novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact checking abilities of AI reasoning models could be useful to power AI agents.

“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”

Scaling reasoning

With AI reasoning models, OpenAI determined it had two new axes that would allow it to improve AI models: using more computational power during the post-training of AI models, and giving AI models more time and processing power while answering a question.

“OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” said Lightman.

Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an “Agents” team led by OpenAI researcher Daniel Selsam to make further progress on this new paradigm, two sources told TechCrunch. Although the team was called “Agents,”  OpenAI didn’t initially differentiate between reasoning models and agents as we think of them today. The company just wanted to make AI systems capable of completing complex tasks.

Eventually, the work of Selsam’s Agents team became part of a larger project to develop the o1 reasoning model, with leaders including OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki.

Ilya Sutskever, Russian Israeli-Canadian computer scientist and co-founder and Chief Scientist of OpenAI.
Ilya Sutskever, Russian Israeli-Canadian computer scientist and co-founder and Chief Scientist of OpenAI, speaks at Tel Aviv University in Tel Aviv on June 5, 2023. (Photo by JACK GUEZ / AFP)Image Credits:Getty Images

OpenAI would have to divert precious resources — mainly talent and GPUs — to create o1. Throughout OpenAI’s history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them.

“One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’”

Some former employees say that the startup’s mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models. By focusing on developing the smartest-possible AI models, rather than products, OpenAI was able to prioritize o1 above other efforts. That type of large investment in ideas wasn’t always possible at competing AI labs.

The decision to try new training methods proved prescient. By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field’s momentum comes from advances in reasoning models.

What does it mean for an AI to “reason?”

In many ways, the goal of AI research is to recreate human intelligence with computers. Since the launch of o1, ChatGPT’s UX has been filled with more human-sounding features such as “thinking” and “reasoning.”

When asked whether OpenAI’s models were truly reasoning, El Kishky hedged, saying he thinks about the concept in terms of computer science.

“We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,” said El Kishky.

Lightman takes the approach of focusing on the model’s results and not as much on the means or their relation to human brains.

The OpenAI logo on screen at their developer day stage.
The OpenAI logo on screen at their developer day stage. (Credit: Devin Coldeway)Image Credits:Devin Coldewey

“If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,” said Lightman. “We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.”

OpenAI’s researchers note people may disagree with their nomenclature or definitions of reasoning — and surely, critics have emerged — but they argue it’s less important than the capabilities of their models. Other AI researchers tend to agree.

Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a blog post. Both, he says, are manmade systems inspired by nature — human reasoning and bird flight, respectively — but they operate through entirely different mechanisms. That doesn’t make them any less useful, or any less capable of achieving similar outcomes.

A group of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a recent position paper that AI reasoning models are not well understood today, and more research is needed. It may be too early to confidently claim what exactly is going on inside them.

The next frontier: AI agents for subjective tasks

The AI agents on the market today work best for well-defined, verifiable domains such as coding. OpenAI’s Codex agent aims to help software engineers offload simple coding tasks. Meanwhile, Anthropic’s models have become particularly popular in AI coding tools like Cursor and Claude Code — these are some of the first AI agents that people are willing to pay up for.

However, general purpose AI agents like OpenAI’s ChatGPT Agent and Perplexity’s Comet struggle with many of the complex, subjective tasks people want to automate. When trying to use these tools for online shopping or finding a long-term parking spot, I’ve found the agents take longer than I’d like and make silly mistakes.

Agents are, of course, early systems that will undoubtedly improve. But researchers must first figure out how to better train the underlying models to complete tasks that are more subjective.

AI applications (Photo by Jonathan Raa/NurPhoto via Getty Images)

“Like many problems in machine learning, it’s a data problem,” said Lightman, when asked about the limitations of agents on subjective tasks. “Some of the research I’m really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.” 

Noam Brown, an OpenAI researcher who helped create the IMO model and o1, told TechCrunch that OpenAI has new general-purpose RL techniques which allow them to teach AI models skills that aren’t easily verified. This was how the company built the model which achieved a gold medal at IMO, he said.

OpenAI’s IMO model was a newer AI system that spawns multiple agents, which then simultaneously explore several ideas, and then choose the best possible answer. These types of AI models are becoming more popular; Google and xAI have recently released state-of-the-art models using this technique.

“I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” said Brown. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.”

These techniques may help OpenAI’s models become more performant, gains that could show up in the company’s upcoming GPT-5 model. OpenAI hopes to assert its dominance over competitors with the launch of GPT-5, ideally offering the best AI model to power agents for developers and consumers.

But the company also wants to make its products simpler to use. El Kishky says OpenAI wants to develop AI agents that intuitively understand what users want, without requiring them to select specific settings. He says OpenAI aims to build AI systems that understand when to call up certain tools, and how long to reason for.

These ideas paint a picture of an ultimate version of ChatGPT: an agent that can do anything on the internet for you, and understand how you want it to be done. That’s a much different product than what ChatGPT is today, but the company’s research is squarely headed in this direction.

While OpenAI undoubtedly led the AI industry a few years ago, the company now faces a tranche of worthy opponents. The question is no longer just whether OpenAI can deliver its agentic future, but can the company do so before Google, Anthropic, xAI, or Meta beat them to it?



Source link

NASA’s latest mission to the ISS features a bacterial experiment

0


Scientists are sending several strains of disease-causing bacteria to the International Space Station as part of the Crew-11 mission. This experiment isn’t the plot to some cheesy horror film, but a scientific investigation from the Sheba Medical Center in Israel and the US-based company Space Tango with the goal of better understanding how bacteria spread and behave under extreme conditions. The experiment includes E. coli, along with bacteria that cause diseases like typhoid fever and the infection commonly known as Salmonella.

After reaching the ISS, the experiment will see the different bacterial species grow before being returned to Earth to be tested against counterparts that were grown simultaneously in an identical lab under normal conditions. The experiment’s results will help scientists understand how bacteria respond to zero gravity and could help astronauts, who are more prone to infections during missions due to stress, exposure to radiation and changes in gravity. However, the research could prove useful beyond space missions. With the onset of superbugs that show antibiotic resistance, the experiment could reveal ways to combat more robust bacterial strains.

“This experiment will allow us, for the first time, to systematically and molecularly map how the genetic expression profile of several pathogenic bacteria changes in space,” Ohad Gal-Mor, head of the Infectious Diseases Research Laboratory at Sheba, said in a press release.

The medical center previously conducted a test with bacteria in simulated space conditions, which showed a reduced ability to develop antibiotic resistance, but the latest experiment is the first one to take place at the ISS. It’s not the first time scientists have studied bacteria’s behavior in microgravity conditions, since researchers from the University of Houston tested how E. coli would grow in a simulated space environment back in 2017. More recently, NASA launched an experiment tasking astronauts to swab the interiors of the ISS and test them for evidence of antibiotic-resistant bacteria.



Source link

Google Home media controller getting Material 3 redesign

0


As previewed last November, Google Home for Android is giving the media controller a Material 3 redesign.

From the Favorites or Devices tab, select a smart display (Nest Hub), speaker, or Android/Google TV device.

This redesign starts with a now playing card that’s taken directly from the Pixel’s notification shade. What you’re watching or listening to is noted at the center with a play/pause button to the right. When Google showed this off last year (as seen above), there was “vibrant artwork” like Android media controls. As of today, we’re not seeing this live. However, it does show a Photo Frame preview.

A timeline scrubber appears at the bottom of the card, with previous and next flanking it. Below that is a volume slider that replaces the circular version.

Advertisement – scroll for more content

Old vs. new

Google TV devices get an “Open remote” shortcut, while you also get full-width buttons for “Stop casting” (or “Cast my screen”) and a button to open the responsible app. The Nest Hub Max lets you “View Nest Cam.” Active buttons are pill-shaped, while inactive controls are rounded rectangles and Dynamic Color is leveraged.

At the time, Google teased “faster performance” for this “easy-to-use interface.” 

The Google Home release notes for version 3.35 refer to this as a “Cast controller refresh” that’s currently in the Public Preview for Android users: “Improves UI reliability and performance and aligns with the Google Home app’s latest design standards.”

We’re seeing this media controller redesign with Google Home 3.37 for Android on the Public Preview program. It’s not available on iOS yet.

More on Google Home:

FTC: We use income earning auto affiliate links. More.



Source link

Mass. police chiefs decide who can carry a gun. Lawsuits now question their role

0


Pioneer Valley Arms
Pioneer Valley Arms in East Longmeadow offers a variety of firearms and safety classes for its customers. July 30, 2025. (Douglas Hook / The Republican)Douglas Hook

CHICOPEE — Police say Connor Doran drank a couple margaritas before he went into the men’s room at Frontera Grill in Chicopee, took his pistol out of its waistband holster and placed it against the wall.

What happened next, Doran told police, was an accident. It also triggered events that led to a lawsuit questioning the constitutionality of the state’s firearms licensing laws.

If you purchase a product or register for an account through a link on our site, we may receive compensation. By using this site, you consent to our User Agreement and agree that your clicks, interactions, and personal information may be collected, recorded, and/or stored by us and social media and other third-party partners in accordance with our Privacy Policy.



Source link

Tim Cook reportedly tells employees Apple ‘must’ win in AI

0


Apple CEO Tim Cook held an hourlong all-hands meeting in which he told employees that the company needs to win in AI, according to Bloomberg’s Mark Gurman.

The meeting came after an earnings call in which Cook told investors and analysts that Apple would “significantly” increase its AI investments. It seems he had a similar message for Apple employees, reportedly telling them, “Apple must do this. Apple will do this. This is sort of ours to grab.”

Despite launching a variety of AI-powered features in the past year under the Apple Intelligence umbrella, the company’s promised upgrades to its voice assistant Siri have been significantly delayed. And Cook seemed to acknowledge that the company has fallen behind its competitors.

“We’ve rarely been first,” he reportedly said. “There was a PC before the Mac; there was a smartphone before the iPhone; there were many tablets before the iPad; there was an MP3 player before iPod.” But in his telling, that didn’t stop Apple from inventing the “modern” versions of those products.



Source link

Darksiders 4 was not on my 2025 bingo card

0


Darksiders 4 is officially coming. During the THQ Nordic Digital Showcase on Friday, we got a glimpse at the next game in the hack and slash action-adventure franchise, alongside trailers for roughly a dozen other games that are in the works, including and . It’s been a while since we’ve seen a new mainline Darksiders title from developer Gunfire Games, and while the fourth entry follows 2019’s prequel, Darksiders Genesis, the announcement says it will “continue where the original Darksiders game left off.”

features all four Horsemen, and you’ll get to choose which one to play as. It features “combat, traversal and puzzle solving in a lore rich post apocalyptic world.” The teaser doesn’t give us much information beyond that and there’s no release date just yet, but we do know it’ll be available on PlayStation 5, Xbox Series X/S and PC.

If you missed the showcase, you can on everything that was announced (like that ). And according to THQ Nordic, that’s only half of what it has up its sleeve. At the end of the showcase, the publisher said a total of 28 games are currently in development, with 15 we still have yet to see.



Source link

Google renames most of its new Gemini-Assistant voices for Nest

0


Back in June, Google introduced three new voices for the upcoming Gemini-Assistant revamp for the Nest Mini and Audio. Ahead of launch, Google has renamed most of them, but they otherwise appear to be the same.

These are the name changes to Gemini-Google Assistant voices if you’re in the Public Preview program:

Old New Description
Aloe Bloom Calm • Mid-range voice
Oxalis Bright • Mid-range voice
Fern Bright Warm • Higher voice
Verbena Magnolia Calm • Deeper voice
Ivy Violet British accent • Mid-range voice
Jade Pothos Engaging • Mid-range voice
Eucalyptus Calathea Australian Accent • Higher voice
Yarrow Warm • Deeper voice
Croton Smooth • Deeper voice
Pilea Amaryllis Bright • Higher voice

Six of the 10 voices have been renamed. They remain botanically themed, with “Violet” having a better connotation than (poison) “Ivy.”  

New

Advertisement – scroll for more content

Similarly, people are probably more familiar with “Magnolia” than “Verbena. That said, I’d think “Eucalyptus” has higher awareness than “Calathea.”

There is one change to the description with “Fern” now offering a “Warm” voice instead of  “Bright.” To my ear, all the voices sound the same, and you can listen to both sets.

Last week, Google teased “major improvements” for Google Assistant in light of reliability complaints. This is most likely in reference to Gemini replacing Google Assistant. Beyond the switch to an LLM approach, it seems increasingly likely that the Assistant branding will go away entirely. For a moment, it seemed that Google was keeping around that brand for Home devices, but that changed now that “Gemini” is coming to Google TV instead of a Google Assistant upgraded with the Gemini models.  

These upgrades are said to be coming “in the fall” after public-facing testing began last year. 

Old

More on Google Assistant:

FTC: We use income earning auto affiliate links. More.



Source link

‘Baseball rat’ and 2022 first-rounder Mikey Romero makes the jump to Triple A

0


Mikey Romero
Mikey Romero was promoted to Triple A on Thursday after the trade deadline and made his Triple-A debut on Friday.Katie Morrison-O’Day

WORCESTER — Red Sox prospect Mikey Romero has been through a lot in his life at just 21 years old.

The infielder lost his father in February 2024. He’s dealt with injuries and lost good chunks of his 2022 and 2023 seasons. But through the ups and downs, Romero has hit Triple-A three years after being drafted 24th overall by the Red Sox in 2022.

If you purchase a product or register for an account through a link on our site, we may receive compensation. By using this site, you consent to our User Agreement and agree that your clicks, interactions, and personal information may be collected, recorded, and/or stored by us and social media and other third-party partners in accordance with our Privacy Policy.



Source link

Anthropic cuts off OpenAI’s access to its Claude models

0


Anthropic has revoked OpenAI’s access to its Claude family of AI models, according to a report in Wired.

Sources told Wired that OpenAI was connecting Claude to internal tools that allowed the company to compare Claude’s performance to its own models in categories like coding, writing, and safety.

TechCrunch has reached out to Anthropic for comment. In a statement to Wired, an Anthropic spokesperson said, “OpenAI’s own technical staff were also using our coding tools ahead of the launch of GPT-5,” which is apparently “a direct violation of our terms of service.” (Anthropic’s commercial terms forbid companies from using Claude to build competing services.)

However, Anthropic also said it would continue to give OpenAI access for “benchmarking and safety evaluations.”

Meanwhile, a statement of its own, an OpenAI spokesperson described its usage as “industry standard” and added, “While we respect Anthropic’s decision to cut off our API access, it’s disappointing considering our API remains available to them.”

Anthropic executives had already shown resistance to providing access to competitors, with Chief Science Officer Jared Kaplan previously justifying the company’s decision to cut off Windsurf (a rumored OpenAI acquisition target, subsequently acquired by Cognition) by saying, “I think it would be odd for us to be selling Claude to OpenAI.”



Source link

YouTube is testing Instagram-style collabs

0


YouTube has started testing a new collaboration feature, similar to Instagram’s and TikTok’s. A Google employee explained on YouTube Help that it will allow creators to add collaborators to a video so that they can be recommended to each other’s audiences. The test is only available to a small group of creators for now, but it sounds like YouTube has plans to expand its availability in the future. Lindsey Gamble, an influencer marketing consultant and advisor, has posted a screenshot showing how the experimental feature works on Threads.

As you can see in the image Gamble posted, adding collaborators would show their names next to the creators on their channel. If there are too many, at least on mobile, the collaborators would show as “…and more” next to the creator’s name. Tapping on it would bring up the list of people involved in the project, with the Subscribe button next to their name.

On Instagram and TikTok, the creator who uploads the content will have to invite another account as a collaborator, who’ll then have to approve the invitation. That’ll most likely be the case here, as well, in order to ensure that creators don’t randomly add other users to their videos. It’s not clear, however, whether the collaborators can see details typically reserved for the uploader’s eyes. As with any experimental YouTube and Google product, the company will be taking the testers’ feedback into account before deciding if it’ll give the feature a wide release.

fff



Source link