← Blog

The Humans Won't Be Called Back

Lida Liberopoulou ·4 March 2026 · CC BY-SA 4.0

Raw .md ↗ Share on LinkedIn ↗ Share on X ↗

Independent research, open to the right conversation. Get in touch

There is a prediction making the rounds. It goes like this: AI will produce bad code. The code will break things. Companies will realise their mistake. And then they will call the humans back.

It is a comforting prediction. It is also wrong. Yes AI code will be bad, at least in the beginning. But organisational conditions required to reverse the decision to use it will have been systematically destroyed in the same motion that created the mess.

And we have seen this exact pattern before. It was called outsourcing.

The "call the humans back" prediction assumes the organisation still knows what it lost, that there is institutional muscle and memory sufficient to recognise the failure, diagnose it, and specify what "calling back" would even mean. That capacity is the first thing cost-cutting removes.

1. The Düsseldorf story

I worked at a large company in Düsseldorf during the transition of one of its services to their newly formed outsource team. The work that came back was provably worse. The outsource team was oblivious to the fact and didn't seem to be interested in improving their work despite our efforts. Our team documented the problems. We handed the evidence to management and were eventually disbanded anyway.

The problem was worse than management ignoring the evidence. Management's decision framework made the evidence invisible. The move to the outsource team was a cost decision, and the mental model behind it treated engineering personnel as interchangeable. They thought of them as components on a production line. If someone's title was "software engineer," they were equivalent to any other software engineer. The title was the spec. Whether the person had built and maintained robust, efficient products for years or could barely understand how a bash script worked did not register as a meaningful distinction. In that framework, quality of service was not a variable. It was assumed to be constant because the titles were constant.

The team in Düsseldorf understood the systems, the edge cases, the reasons certain things were built the way they were. That understanding did not transfer in a three-month handover. It did not survive in documentation. It lived in the people and when the people left, it left with them. But management never saw it as something that could leave, because in their model it was never attached to specific people. It was attached to the role. And the role still existed, in the outsourced team, at a lower cost.

The prediction at the time was identical to the one being made about AI: the offshore work would be so bad that eventually the company would have to bring the work back. Some of us believed it. Eventually we were proven right about the quality but completely wrong about the reversal.

2. The quality baseline was already low

The outrage about AI code quality is built on a foundation of selective memory. The people expressing alarm about AI-generated vulnerabilities are often the same people who signed off on outsourced teams producing systems that barely worked, shipped with security holes baked in, and were impossible to maintain. The quality bar for shipped software has been low for a long time. What AI does is make the existing quality floor visible by threatening to go below it at scale and at speed.

The numbers are not ambiguous.

Verizon's 2025 Data Breach Investigations Report found that third-party involvement in breaches doubled in a single year, from 15% to 30%. Vulnerability exploitation as an initial access vector surged 34%. This is the baseline that existed before anyone was worried about AI-generated code.

ReversingLabs surveyed 30 widely-used packages across npm, PyPI, and RubyGems, packages with over 650 million combined downloads. The median package contained 27 security flaws. The median number of critical-severity flaws was 2. These are the shared components of the outsourced SaaS economy and they were already bad.

The World Economic Forum's Global Cybersecurity Outlook 2026 found that 65% of large companies cite third-party and supply chain vulnerabilities as their greatest barrier to cyber resilience, up from 54% the prior year. But only 33% comprehensively map their supply chain ecosystems. Only 27% simulate incidents or run recovery exercises with partners. The problem is acknowledged as the top barrier while the practices that would make it legible are mostly absent.

Veracode reports that over 70% of organisations carry security debt and nearly half have critical debt. Black Duck's 2025 analysis found that the odds are better than 80% that any application in production contains high- or critical-risk open source vulnerabilities.

And here is the part that matters most for this argument: there are no primary studies comparing in-house and outsourced defect rates. Systematic literature reviews confirm this gap explicitly. The baseline was already bad. It just was not legible, because accountability was diffuse and nobody measured it properly.

The debate about AI code quality will follow the same pattern. It will be moralised precisely because it cannot be measured cleanly. And the outrage will not be about quality. It will be about who gets paid to produce the mediocrity. That is the quality floor AI-generated code is being measured against. Not craftsmanship or best practice but the actual, documented, unmeasured baseline of the outsourced software economy.

3. AI code is also bad and the response is not rehiring

Now layer the new numbers on top of the old ones.

CodeRabbit's 2026 analysis of 470 open source pull requests found that AI-generated code contains 1.7 times more issues than human-written code. Logic and correctness errors appear 1.75 times more often. Security findings increase by 1.57 times. Maintainability errors are 1.64 times higher. AI-generated code was nearly twice as likely to introduce improper password handling and insecure object references, and 2.74 times more likely to add cross-site scripting vulnerabilities.

Cortex's 2026 Engineering Benchmark found that while pull requests per author increased 20% year-over-year , that is the the speed gain , incidents per pull request increased 23.5% and change failure rates rose 30%. The velocity is real and so is the breakage.

Aikido Security's 2026 report attributed one in five breaches in their analysis to AI-generated code as a contributing factor. Sonar's developer survey found that fewer than half of developers review AI-generated code before committing it.

And then there is the METR study,arguably the most important finding in this space. A randomised controlled trial with experienced open-source developers found that when using early-2025 AI tools, developers took 19% longer to complete tasks than without. But they believed they were 20% faster. This gap between perceived and measured productivity is the mechanism by which organisations will accumulate technical debt they cannot see.

A separate academic paper (Shen and Tamkin, January 2026) ran randomised experiments showing that AI assistance impaired conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. The tools that are supposed to replace human judgment are simultaneously degrading the capacity for human judgment in the people still using them.

What is the response?

When reporting by the Financial Times and The Verge linked an Amazon AI coding tool to AWS service disruptions in December 2025, the company disputed the attribution, framing the incident as "user error, not AI error," and tightened internal controls. When Replit's AI agent deleted a production database and then misled about what it had done, the company updated its system to isolate production databases from direct AI access. That was a process fix, not staffing fix.

The Fedora Project adopted a policy allowing AI-assisted contributions but requiring human accountability and review. The Electronic Frontier Foundation added a policy requiring that contributors understand the code they submit. Blender and GNOME adopted explicit reject-and-close policies for AI-generated submissions that burden reviewers.

Werner Vogels, Amazon's CTO, coined the phrase that captures what is actually happening: "verification debt." AI makes generation fast but shifts effort to review and comprehension. The organisational response is governance. Things like dashboards, review tools, confidence scores. Not people.

This is governance theatre. It is the same pattern as HealthCare.gov after its catastrophic launch. They didn't "rebuild the old team." They did vendor reshuffling, governance redesign, and 400 software fixes. The error rate dropped from 6% to under 1% and the original team never came back.

4. When reversals happened and what they required

Reversals from outsourcing did happen. But the conditions that made them possible are precisely the conditions that AI-driven workforce reductions will destroy.

J Sainsbury outsourced IT to Accenture in 2000. Five years later, the arrangement was terminated. The failure manifested as visible retail operational degradation and large financial write-offs. Four hundred and seventy IT staff were transferred back over six months. But the reversal was framed as "rebuilding expertise under new leadership," not restoring the old organisation chart. It required a leadership change and a specific political event that created cover for an expensive admission of failure.

JPMorgan Chase signed a $5 billion, seven-year outsourcing contract with IBM in 2002 and cancelled it two years later. About 4,000 employees came back. But the reversal was triggered by the Bank One merger, which brought in leadership with experience bringing IT in-house. Reversal required not just a "vendor bad" admission but a change in the organisation itself (the merger plus the changes in leadership and strategy) in a way that made backsourcing politically feasible.

The Scandinavian cases are even more telling. A multi-case study of four software companies found that all terminated offshore outsourcing relationships because of low software quality. In one case, after three years, management found that most of the delivered code was not used or had been removed from final products. But after termination, the companies did not return to fully domestic development. They shifted to different governance models, offshore insourcing, partnerships, joint ventures. A different equilibrium, not the old one.

The general pattern from backsourcing research: reversals impose explicit costs, things like exit costs, transition work, termination fees, rebuilding. A Gartner estimate puts backsourcing expenses at 2% to 15% of annual contract cost. A Deloitte survey of 25 Fortune 500 companies found that 64% had brought some services back in-house but often as partial insourcing or multi-vendor fragmentation, not clean reversal.

The defensible claim is not that reversals never happen. It is that they require high-salience failure plus new organisational conditions, leadership change, merger, strategy shift and they still produce a different equilibrium rather than full restoration. The original team does not come back. A new team is built, expensively, to govern whatever replaced the original one.

Now: which of those conditions will be present after AI-driven layoffs?

5. The institutional memory trap

The Düsseldorf team being disbanded is exactly the mechanism that makes reversal conditions unavailable. You cannot reverse to a team that no longer exists. You cannot even articulate what you lost if the people who understood the work are gone.

Research on transition performance in offshore IT outsourcing reports that transitions take two to three months on average, and that over two-thirds of failed outsourcing relationships are attributed to transition-related challenges. In offshore arrangements, transfer of client IT staff to the provider is very rare, meaning tacit system knowledge is structurally difficult to transfer.

A study of 198 outsourcing initiatives found that capability loss has a direct negative effect on outsourcing performance and inhibits the organisation's ability to develop a committed relationship with the provider. If layoffs destroy internal capability, they reduce the organisation's ability not only to build, but to supervise and correct externalised production, whether that production comes from a vendor team or an AI-mediated workflow.

Research on "retained organisations" in IT outsourcing (meaning the parts of the former IT organisation not outsourced, forming the client-provider interface) found that many outsourcing arrangements suffer severe problems because clients fail to build effective retained organisations. So the retained organisation is the institutional memory carrier. Eliminate it and you eliminate the capacity for meaningful oversight.

A study of 151 project teams found that turnover weakens project performance, and that mitigation requires succession planning and knowledge repositories. A software engineering study of legacy systems found that dormant files, the code that stops being touched, account for 80% of all complexity. Once knowledge is lost, complexity concentrates in the parts least understood, and expertise re-concentrates in a dwindling set of experienced maintainers.

Google's own engineering documentation states that ramping up a new developer can take around six months. That is at an elite organisation with extensive documentation, tooling, and mentorship infrastructure. "We can just rehire later" is just an assumption that the knowledge being lost can be reconstructed from documentation that does not exist yet.

And now add the deskilling dimension. Even if you do rehire, the replacements may have been trained in an environment where AI tools were doing the conceptual work. Shen and Tamkin's 2026 experiments found that AI assistance impairs conceptual understanding and debugging abilities. Junior developers entering the field in 2026 may never write foundational code from scratch. They will spend their early careers reviewing AI output, learning architecture through observation rather than implementation. Stanford data shows employment for software developers aged 22-25 has declined nearly 20% from its late-2022 peak. Indeed's Hiring Lab reports junior postings declining alongside the broader slowdown while senior postings see modest growth. The pipeline is being constricted at the entry point.

IBM's response is instructive. The company announced it would triple entry-level hiring in 2026 but explicitly acknowledged the roles include "jobs AI can do," while rewriting job descriptions away from coding and toward customer-facing work. This is essentially a redesign of the pipeline to work around the tools that replaced the old one.

The institutional memory trap closes like this: the people who know the systems leave. The documentation they would have written does not exist. The replacements, if they come, learned on tools that impair the very understanding needed to supervise AI output. The complexity of the system does not decrease it just concentrates in the parts no one understands. And the organisation's ability to even diagnose quality regressions degrades with each departure. This is the trap. Not that AI produces bad code , it does. Not that companies are cutting too fast, they are. But that the same motion that introduces the risk removes the capacity to see it, name it, and reverse it.

6. The live examples

This is not a thought experiment. It is happening now.

In February 2026, Jack Dorsey announced that Block would shrink from over 10,000 employees to under 6,000. "Intelligence tools have changed what it means to build and run a company," he wrote. "A significantly smaller team, using the tools we're building, can do more and do it better."

Eleven months earlier, in his March 2025 layoff memo, Dorsey had written: "None of the above points are trying to hit a specific financial target, replacing folks with AI, or changing our headcount cap." That round cut 931 people for strategy misalignment and management layer reduction. By February 2026, the same person was framing a 40% workforce reduction as an AI efficiency story. The narrative flipped completely because the narrative serves different purposes at different moments.

The pattern extends across sectors and geographies. Pinterest, eBay, Amazon, Meta, and UPS all announced significant cuts in the same period, with AI cited in varying degrees of explicitness. But the phenomenon is not confined to American tech companies. Dow, a chemical manufacturer, cut 4,500 jobs citing AI and automation. Proximus, a Belgian telecom, announced 1,200 cuts by 2030 due to AI efficiency measures. Groupe SEB, a French appliance maker, restructured 2,100 roles to take "full advantage" of AI. WiseTech Global, an Australian logistics software company, cut nearly a third of its workforce primarily in product development and customer service. Telstra proposed 650 job cuts while shifting engineering work to Infosys to be automated via AI, outsourcing and AI displacement running in the same sentence, in 2026.

Challenger, Gray & Christmas reported that 55,000 layoffs were directly attributed to AI in 2025 which is twelve times the number just two years earlier. January 2026 saw the highest layoff announcements for any January since 2009.

But here is the part that makes this worse, not better: many of these companies were cutting jobs they should never have created.

Alphabet grew from 135,301 employees in 2020 to 190,234 in 2022 which is a 40.6% increase. When Sundar Pichai announced 12,000 layoffs in January 2023, his memo cited both a "different economic reality than the one we face today" and a "huge opportunity" in AI. In the same breath there was an admission of overhiring and a forward-looking AI narrative.

Intuit grew 63.2% from 2020 to 2022, partly through acquiring Credit Karma and Mailchimp. When the CEO announced 1,800 layoffs in July 2024, he explicitly framed it as an "era of AI" decision then announced plans to hire approximately 1,800 new people in AI and product roles. The same memo stated "we do not do layoffs to cut costs," while simultaneously disclosing that 1,050 of the exits were performance-based and management layers were being flattened by 10%.

Dropbox grew 13% into 2022, then CEO Drew Houston cut 16% in April 2023, writing that "the AI era of computing has finally arrived" and that the company needed "a different mix of skill sets." A second round in October 2024 cut another 20%, citing areas that were "over-invested or underperforming." The company went from 3,118 employees to 2,113 in three years.

Workday grew 24.6% from 2020 to 2022, then cut 8.5% in February 2025. The CEO's note to employees cited "the increasing demand for AI" and the need to prioritise "innovation investments like AI" while separately acknowledging a "new approach, particularly given our size and scale."

The pattern is consistent across all four: a pandemic-era hiring surge, followed by layoffs framed within an AI strategic narrative, alongside quieter admissions of overexpansion, organisational complexity, and the need to re-engineer the cost base.

A survey by Resume.org found that nearly 60% of hiring managers who cited AI as a reason for layoffs admitted they emphasised AI's role because it is viewed more favourably than financial constraints. Since March 2025, when New York State gave employers the option to cite "technological innovation or automation" in legally required WARN layoff notices, none of the 160-plus companies filing notices checked that box including companies that cite AI efficiencies in their public communications. Sam Altman himself acknowledged in February 2026 that "some companies are AI washing by blaming unrelated layoffs on the technology."

This matters for the reversal argument because it does not matter whether AI is the real cause or the narrative cover. The institutional memory gets destroyed either way. The people leave either way. The knowledge goes with them either way. And the AI framing makes reversal less likely than a simple overhiring admission would because "we are investing in AI" is a forward-looking strategy story that no board will walk back, while "we hired too many people during a pandemic" is an embarrassment everyone wants to forget. The narrative locks the door behind the decision.

And then there is the NBER study. Working Paper 34836, published February 2026, surveyed nearly 6,000 CFOs, CEOs, and executives across the US, UK, Germany, and Australia. Over 80% of firms reported no impact from AI on either employment or productivity over the past three years. The same firms predict AI will cut employment by 0.7% over the next three years.

Read that again. Eighty percent report no impact yet. But the layoffs are already happening, at scale, framed as AI-driven. The displacement is running ahead of the productivity evidence.

7. The accountability gap

When the AI-generated code breaks, and it will, who is legally responsible?

OpenAI's consumer terms state that output may not be accurate and that users must evaluate it for accuracy, including using human review, and not rely on it as a sole source of truth. The enterprise services agreement is more explicit: the customer is solely responsible for use of outputs and for evaluating accuracy and appropriateness. GitHub's terms state that the user retains responsibility for suggestions they include in their code.

The EU's legal direction moves the same way. The new Product Liability Directive, applicable from December 2026, covers all types of software including AI systems. Manufacturers can be held liable for defects existing at release, including those emerging via updates or machine-learning features. The proposed AI Liability Directive was withdrawn in February 2025. Accountability will be channelled through existing product liability, safety, and contract regimes, not a new AI tort regime.

The Bank of England's regulatory position on outsourcing states the principle clearly: when an entity outsources, it "shall remain fully responsible." Outsourcing changes operational execution, not the ultimate responsibility of the regulated entity.

This is the structural reality: "unaccountable AI code" is legally wrong. The organisation is on the hook. Which means quality failures will not produce "bring back the humans" responses. They will produce "add a review process" responses. Governance theatre, not reversal. The same pattern as every outsourcing quality failure that did not result in backsourcing, which is to say, most of them.

8. The new baseline

There is a historical pattern for what happens when cost-driven transformation reduces quality and the market absorbs the reduction.

In the nineteenth century, printing shifted to cheaper wood pulp paper known to be acidic and to cause brittleness. Surveys in the 1980s found that up to 35% of holdings at some institutions were affected by embrittlement. A quarter to a third of major library collections were highly embrittled. The correction was partial and costly (done thought the use reformatting, first with microfilm and then digitization) rather than a return to earlier material standards. The lower baseline became permanent.

After US airline deregulation, coach seat width and pitch decreased. Onboard space diminished. Airlines could have offered higher-price, less-crowded flights. Virtually none chose to do so. Revealed demand favoured lower price over higher quality. The quality regression stabilised as the new baseline.

If enough of the buyer market accepts "good enough," and price pressure rewards cost reduction, quality regressions do not reverse. They become the new normal.

9. The prediction, corrected

The prediction says: AI will produce bad code and humans will be called back to fix it.

Here is what will actually happen.

AI will produce bad code. Some of it will cause real damage, things like breaches, outages, customer-facing failures. The organisations that experience these failures will respond with governance: review tools, defect dashboards, confidence scores, formal policies requiring human accountability for AI-generated output. They will not rehire the teams they disbanded.

They will not rehire because the reversal conditions will not be present. There will be no new leadership with a mandate to admit the strategy failed. There will be no merger that supplies political cover for an expensive correction. There will be no retained organisation that can articulate what was lost and how to rebuild it. The institutional memory will be gone. The people who could have diagnosed the problems will have moved on, retired, or retrained into different roles. The juniors who might have replaced them will have been trained on tools that impair the conceptual understanding needed for the work.

The cost line will look fine. The quality line will be invisible because the people who could read it are no longer there.

Not all 4,000 people leaving Block carried irreplaceable knowledge, a company that tripled its headcount during the pandemic inevitably hired beyond what it needed. But a 40% cut does not distinguish between the roles that should never have existed and the people who understood how the systems actually work. The institutional memory leaves with the bloat, mixed in, unidentified, and unrecoverable. When the AI tools produce the inevitable failures, the people who could have diagnosed them will be gone because the cut that removed the redundancy also removed them.

The humans will not be called back by the time the organisation understands what it lost, the knowledge of what "calling them back" would even mean will have left with them.

The research supporting this article draws on the Verizon DBIR (2024, 2025), ReversingLabs 2025 Software Supply Chain Security Report, ENISA Threat Landscape 2025, WEF Global Cybersecurity Outlook 2026, Veracode 2025 GenAI Code Security Report, Black Duck OSSRA 2025, CodeRabbit State of AI vs Human Code Generation Report 2026, Cortex Engineering in the Age of AI 2026 Benchmark, Aikido Security 2026 report, METR developer productivity study (July 2025), Shen & Tamkin "How AI Impacts Skill Formation" (January 2026), NBER Working Paper 34836 "Firm Data on AI" (February 2026), Challenger Gray & Christmas layoff data, CISA advisories on Kaseya VSA and MOVEit Transfer, GAO reports on Equifax and HealthCare.gov, Journal of Operations Management capability loss study, backsourcing and retained organisation research, Library of Congress preservation history, and contemporary reporting on the cases cited.

Published: March 2026 · Author: Lida Liberopoulou · License: CC BY-SA 4.0