Migrating Systems to GPT-5: Tricks and Pitfalls

Alex Petrakis • August 25, 2025

GPT-5 - the highly anticipated latest version of OpenAI’s hit the streets a few weeks ago. Despite of some breathless commentary from influencers who had been given early access, the eventual release was a bit underwhelming (in a way that only something that would have seemed like science-fiction a few short years ago but now seems passe, can be). Aside from the quality of the model itself, which some people have claimed was more about lowering OpenAI’s costs than delivering a better result, there are some issues that the change to GPT-5 has introduced when integrating it into a product which we thought we should share.

Problem 1: You can’t accurately assess cost anymore, or set max output tokens

OpenAI have been accused of confusing model names in the past with similar sounding but different names like GTP 4o vs. GPT o4, and adjectives like mini, turbo, pro, nano, thinking and “deep research” added to model names. GPT 5 attempted to resolve this (kind of) by offering a single API that routes to different models under the covers. This problem is not completely resolved because they ALSO offer mini and nano versions of GPT-5. All of the GPT-5 family of models are reasoning models, you can’t disable reasoning, and reasoning uses up tokens. How many tokens? You can’t control that – there is no max reasoning tokens field, only a max output tokens.

Although you can’t disable reasoning altogether, you can set it to “low” which is a hint to the model as to whether it should reason or not. From our tests the model ALWAYS decided to reason, which took a minimum of 1000 tokens, even for a tiny single-word message. If you’ve set the maximum output tokens to 500, and GPT-5 decides to “reason” about your very simple message and burns through 1000 tokens it ends up returning an empty response.

OpenAI in their advice on prompting the reasoning models, suggest allocating at least 25,000 tokens for reasoning and adjust down accordingly, but if the final number you arrive at is only 1/10th of this that’s still a lot of tokens being used, and cost, for each request.

Problem 2: Model Speed

We gave GPT-5 the benefit of 5 days grace after the release, but in our tests the performance of the model was pretty bad and extremely unpredictable, even with reasoning set at low. The quickest response was 8 seconds, whilst the slowest was close to 35-40 seconds. This can be mitigated by streaming the response, but users will probably still tire of a response that is streamed so slowly. This is in sharp contrast to GPT-4.1 mini which responded in a predictable 3-4 seconds and felt lightning-fast by comparison when streamed.

Problem 3: Model Intelligence

GPT-5 launched to much fanfare and expectations regarding its capabilities, with some wide-eyed accelerationists believing it could be the first example of Artificial General Intelligence (AGI). OpenAI’s Sam Altman said:

"We think you will love using GPT-5 much more than any previous Al. It is useful it is smart it is fast [and] intuitive. GPT-3 was sort of like talking to a high school student. There were flashes of brilliance lots of annoyance but people started to use it and get some value out of it. GPT-4o maybe it was like talking to a college student… With GPT-5 now it's like talking to an expert - a legitimate PhD level expert in anything any area you need on demand they can help you with whatever your goals are."

Although some typical stumbling-blocks like counting the number of times the letter 'r' appears in the word strawberry had been special-cased, it wasn’t long before the usual set of problems that LLMs struggle with had been identified and called out, and nearly 5000 people successfully petitioned OpenAI to keep access to the older GPT-4o models in ChatGPT.

In our tests GPT-5 nano, mini and regular, with medium reasoning (1-2000 tokens) and small text inputs failed in comparison to GPT-4.1 mini. Instructions that we explicitly said not to include in the output were included. Reasoning seemed to be a wild-card here – some of our tests passed and then failed on subsequent runs, and it was hard to get consistent output.

Conclusion

We’d advise anyone thinking of migrating to GPT-5 to hold fire until some of these issues are explored further, or at the very least run a suite of tests to evaluate the quality, speed, and cost of the model relative to others. The issues with the additional uncontrollable cost of reasoning tokens could be mitigated by OpenAI by disabling reasoning altogether, however it’s possible this would further degrade the quality of the GPT-5 responses compared to the 4 family of models.

< Older Post

Newer Post >

Share This Post

Get In Touch

The AI-Hype Twist I Didn't See Coming: Gartner IT Symposium 2025

By Hanieh Madad • September 29, 2025

At Gartner Gold Coast 2025, everything was about AI — automation, efficiency, disruption, predictions. AI, AI, AI. But in between all that noise, I found myself thinking about how I work, and how I’ve found working at Patient Zero. Some of the things they talked about, like trust, autonomy and team culture, are the things I’ve experienced here.

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

By Joe Cooney • September 25, 2025

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

File Previews in .NET Core Web Applications: Two Practical Approaches

By Katelyn Cleary • August 6, 2025

The ability to preview files directly within a web application is a major enhancement to user experience. Enabling users to view uploaded documents or images without needing to download them first saves time and reduces frustration. This can be a game changer in document-heavy applications where users frequently and recursively review and upload files through the interface. There are many libraries, packages, software subscriptions, and external API services (you name it!) that exist to solve this problem. But when spoiled for choice, it can be difficult to decide on which kind of solution best fits your application’s needs. This article explores this conundrum in the context of .NET Core web applications, with a focus on their specific quirks and requirements.

Hanieh Madad Wins WIICTA Technical Award

May 20, 2025

We’re proud to announce that Hanieh Madad has been named the winner of the Technical Award at the prestigious 2025 ARN Women in ICT Awards.

Migrating Systems to GPT-5: Tricks and Pitfalls

Problem 1: You can’t accurately assess cost anymore, or set max output tokens

Problem 2: Model Speed

Problem 3: Model Intelligence

Conclusion

Share This Post

Get In Touch

Recent Posts

The AI-Hype Twist I Didn't See Coming: Gartner IT Symposium 2025

The Joy of Faking It - Reducing Security Risks in Legacy System Enhancement with Synthetic Data

File Previews in .NET Core Web Applications: Two Practical Approaches

Hanieh Madad Wins WIICTA Technical Award

Quick Links

Our Privacy Policy

Contact Us

Latest News

The AI-Hype Twist I Didn't See Coming: Gartner IT Symposium 2025