AI Bits for Techies | Issue #4 | 5 Feb 2026

AI Bits for Techies | Issue #4

Three questions people are hammering into search and chat right now, plus the short answers you can steal.

Do AI coding tools actually make experienced devs faster?
Not reliably. A randomized trial on real open-source issues found that experienced contributors took ~19% longer to finish tasks when AI tools were allowed — even though they felt faster and predicted the opposite. The trap is that AI reduces "blank page time" but increases verification + integration time.
If AI boosts output, why doesn't productivity always go up?
Because "more code" isn't "more progress." One OSS study found that after AI assistants arrived, core maintainers reviewed ~6.5% more code while their original code productivity dropped ~19%. In practice, AI can shift effort from building to filtering, fixing, and enforcing standards — and that load lands on the people with the most context.
How can teams use AI without slowing down their best engineers?
Treat AI like a draft generator, not an authority. Start with low-risk areas (tests, refactors, internal tools), enforce quality automatically (CI, linters, type checks), and measure what matters: time-to-merge, review load per senior, rework ratio, and change failure rate. If those improve, AI is helping. If only LOC/PR count improves, you're probably just generating future maintenance.

Scientific illustration of transient image classification

💡Quick note

This guide is part of our broader series on Weekly Deep Dive into AI and ML Advancements & Updates. Prefer to jump ahead? Browse related articles →

Read this if you are:

Founders & Teams

For leaders validating ideas, seeking funding, or managing teams.

Students & Switchers

For those building portfolios, learning new skills, or changing careers.

Community Builders

For workshop facilitators, mentors, and ecosystem supporters.

AI Bits for Techies | Issue #4 | 5 Feb 2026

Your weekly Aussie-flavoured deep dive into what changed in AI/ML, what matters, and what to do next (without living on release-note social media).

This week in one breath: Two papers challenge the "AI makes everyone faster" narrative: experienced OSS contributors were ~19% slower with AI, and while output metrics rose after Copilot adoption, core maintainers absorbed more review and rework. Tools for image generation with readable text (GLM-Image), agent workflows (Qwen3-Max-Thinking), LaTeX writing (Prism), and personal automation (Moltbot), plus a book on AI governance that connects policy to infrastructure. The takeaway: AI productivity isn't universal—it depends on who you are, what you're building, and where the verification cost lands.

The two papers you should pretend you read at lunch

1) "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" (Becker et al., METR)

2) "AI-Assisted Programming Decreases the Productivity of Experienced Developers by Increasing the Technical Debt and Maintenance Burden" (Xu et al., Tilburg University)

What is the setup?

There's a quiet mismatch between what devs feel AI is doing and what actually happens in mature codebases. These two papers attack that mismatch from opposite angles: one is a controlled experiment where you time real work; the other is a big-picture OSS analysis where you watch what happens to review load, rework, and who ends up paying the "maintenance tax." Neither is a "Harvard paper," and both focus on experienced developers and real-world workflows.

What they did (yes, really)

Paper 1 (METR / RCT): Model Evaluation & Threat Research (METR) ran a randomized controlled trial with 16 experienced OSS contributors doing 246 real issues in large, mature repositories they already knew well (≈5 years average repo experience). Each issue was randomly assigned (coin flip) to AI-allowed vs AI-disallowed. The AI condition largely meant using tools like Cursor plus models like Claude 3.5/3.7 Sonnet, with detailed time tracking and (for a subset) labeled screen recordings of how time was spent.

Paper 2 (OSS / causal-ish observational): Feiyang (Amber) Xu and coauthors used a Difference-in-Differences design around GitHub Copilot's technical preview (June 29, 2021). They focus on Microsoft-owned OSS projects, using language endorsement during the preview (e.g., Python/JS/Ruby/TypeScript/Go) as the "treatment" versus non-endorsed languages as a comparison. Then they measure both output (LOC, commits, PRs) and maintenance (review and rework) — and split effects between peripheral (less experienced) and core (experienced) contributors.

What happened

Paper 1: Developers predicted AI would make them faster. It didn't. Allowing AI increased completion time by ~19% on these real OSS tasks. The activity breakdown suggests why: time shifts away from "just coding + searching" toward prompting, waiting, and reviewing AI output (plus some extra idle time). In a high-standard repo, "generated" isn't "done" — it's "more to validate."

Paper 2: At the project level, "productivity" goes up (more code/output). But the benefits are lopsided: peripheral devs drive most of the output gains, while core maintainers pick up the downstream cost. After Copilot's introduction, core developers review ~6.5% more code and show about a 19% drop in their original code productivity — consistent with AI increasing the volume of contributions that still require expert filtering, fixes, and standards-compliance rework.

Why it is interesting (beyond the number)

Because it reframes the real unit of productivity. AI can raise visible throughput (more PRs, more LOC) while lowering system productivity if it increases:

verification overhead (reviewing and correcting plausible-but-wrong output)
integration friction (code that "works" locally but doesn't match repo conventions/architecture)
maintenance burden concentration (experts become janitors for everyone else's AI-assisted output)

The METR result says: even when experts are working in familiar codebases, AI can be a net drag on task time. The OSS result says: even when the project "looks faster," the cost may just be moving — onto the people you can least afford to slow down.

The real question

If AI boosts novices and inflates output metrics, but increases review/rework for maintainers, your org can get stuck in a Jevons-style loop: cheaper code generation → more code produced → more code to maintain → experienced devs become bottlenecks. The builder question becomes: are you optimizing for code produced, or for features shipped per unit of expert attention?

Full papers:
https://arxiv.org/abs/2507.09089
https://arxiv.org/abs/2510.10165

So does AI really increase productivity? Sometimes — but not as a universal rule. The best reading of these two papers is: AI often helps when the work is modular, the quality bar is forgiving, or the developer is still ramping up — but in mature codebases with strict standards, the hidden costs (prompting, verification, integration, and maintenance) can outweigh the gains, especially for experienced devs who end up doing the final quality control.

Tools worth poking this week (in a sandbox first)

GLM-Image (by Zhipu AI)

Best for: If you generate posters, slides, menus, or infographics, you've seen the usual failure mode: the image looks great, the text is garbage. GLM-Image is a two-stage generator that's designed to render legible text (English + Chinese) more reliably, and it supports image editing + style transfer.
https://huggingface.co/spaces/multimodalart/GLM-Image

Qwen3-Max-Thinking (from Alibaba)

Best for: A flagship reasoning model that leans into tool-using, agent-style workflows (it can decide when to search, run code, etc.), plus a test-time scaling approach that boosts reasoning efficiency. If you're building agent pipelines or want a strong "think then act" model without manual tool wiring, this is worth a spin.
https://chat.qwen.ai

Prism (from OpenAI)

Best for: A LaTeX-native writing workspace that puts the model inside the paper workflow: draft, revise, manage citations, and keep full-document context (equations included) in one place. Think "Overleaf, but AI-first."
https://openai.com/prism/

Moltbot (formerly Clawdbot; also seen as OpenClaw)

Best for: An open-source personal agent you run yourself that connects to your messaging apps (e.g., Telegram, WhatsApp, and Apple iMessage) and can act on your machine via filesystem + Terminal access. It's designed to be hackable: configuration and "memory" live as local folders/Markdown so you can inspect and modify behavior directly, and it can be extended with add-ons like speech-to-text (e.g., Whisper) and TTS via ElevenLabs.

Use cases: local automations, scripting, and integrating personal workflows/tools (e.g., Todoist, Notion, smart devices like Philips Hue and Sonos).

Security note: giving an agent broad machine access creates real risk (impersonation/supply-chain attacks have already been reported around the rename), so treat it like you would a powerful script: isolate it, minimize permissions, and avoid running it on a machine with sensitive secrets.

https://github.com/openclaw
https://moltbot.org/

Book recommendation (because your brain deserves more than changelogs)

Governing the System, Not Just the Model — The Governance of Artificial Intelligence by Tshilidzi Marwala

This one is a straight pivot away from "what can the next model do?" toward the harder question: what rules, incentives, and infrastructure choices shape AI in the real world? The publisher lists it as a 1st edition dated February 6, 2026 (imprint: Morgan Kaufmann / Elsevier), though multiple retailers show an April 1, 2026 release window—so treat it as "new/just-releasing" depending on where you buy it. The structure is explicitly broad: it spans principles/values, data topics, AI algorithms, computing, applications, and governance, and it even gets concrete about the physical constraints with dedicated chapters on computing energy and computing water—a refreshing reminder that AI governance isn't just policy, it's also supply chains and datacenters. And because Marwala is the Rector of United Nations University and a UN Under-Secretary-General, the tone leans "systems-level": how to balance innovation with accountability and public trust without pretending governance is optional.

Geeky thought of the day

Is AI making us more productive — or just changing where the work happens?

AI feels like a speed boost because it collapses the "blank page" problem: instant scaffolds, instant suggestions, instant momentum. You ship something faster, your git graph looks healthier, and your brain gets that satisfying sense of motion.

But productivity isn't tokens-per-minute — it's validated progress. In real codebases, the bottleneck is rarely typing; it's verification: reading, testing, integrating, and making sure today's shortcut doesn't become tomorrow's incident. AI can shift effort from "writing" to "auditing," and that audit cost compounds when the output is plausible-but-wrong.

So the question isn't "does AI write code faster?" It's: does it reduce the total cost of correctness — or does it quietly move that cost onto the most experienced people, turning senior devs into throughput filters?

Housekeeping (so we stay honest)

This is general information, not legal advice. If you ship user-facing AI, be transparent about where AI is used, what it cannot do, and where humans stay in the loop.

About the Authors

Dr Sam Donegan

Founder & Lead Editor

Sam leads the MLAI editorial team, combining deep research in machine learning with practical guidance for Australian teams adopting AI responsibly.

Jun Kai (Luc) Chang

AI Software Developer

Luc is an AI Software Developer at Monash AIM, building neural networks on FPGA boards. He is pursuing a Master of AI at Monash and co-founding a startup in the event space.

Julia Ponder

Technical Writer

Julia specialises in translating developer jargon into plain English. She creates clear, expertly formatted documentation and tests products before they go to market.

Shivang Shekhar

Technical Writer

Shivang is a mechanical engineer and AI masters student at Monash University with a diverse science background. He is the main author for AI Bits for Techies each week.

AI-assisted drafting, human-edited and reviewed.

Frequently Asked Questions

Does AI actually make developers more productive?

Sometimes — but it depends on who you are and what you're doing. In the Model Evaluation & Threat Research (METR) randomized trial on real OSS issues, experienced contributors were ~19% slower with AI allowed. In the OSS study around GitHub Copilot adoption, overall output metrics rose, but the gains skewed toward less-experienced contributors while core maintainers absorbed more review and rework.

Why would experienced devs get slower with AI?

Because senior work is mostly correctness + integration, not raw typing. AI adds overhead: prompting, waiting, reading generated code, validating assumptions, adapting to repo conventions, and testing edge cases. In mature codebases, "looks right" is often the most expensive kind of wrong.

Is "more code shipped" a misleading productivity metric?

It can be. The second paper highlights a classic trap: AI can increase visible throughput (LOC/PRs) while also increasing the maintenance burden. The authors report core developers reviewing ~6.5% more code and showing a ~19% drop in original code productivity after Copilot's introduction — a sign that output may rise while expert time gets reallocated from building to filtering.

So what should teams measure instead of LOC or PR count?

Track "validated progress" metrics:

time-to-merge with review time included
change failure rate / rollback rate
escaped defects and incident links to changes
review load per senior engineer
rework ratio (follow-up commits / fixups)

If AI helps, you should see fewer cycles to correctness — not just more diffs.

When does AI help the most for coding?

When tasks are modular, specs are clear, repo standards are light, or the dev is ramping up. AI tends to shine at: scaffolding, test boilerplate, refactors with tight constraints, translating patterns across languages, and generating "first drafts" that a human will reshape.

What's the safest way to pilot AI coding tools without slowing seniors down?

Start narrow and measurable:

allow AI for tests, docs, internal tooling, or low-risk services first
require tests + linters before merge
log prompts/outputs for auditability (within your privacy policy)
add "AI-assisted" labels in PRs so reviewers calibrate scrutiny
cap usage in critical paths until you have evidence it helps

How do we avoid the "senior dev as cleanup crew" outcome?

Make quality constraints automatic. The more you can push verification into CI, the less human review becomes a bottleneck: strict formatters, type checks, unit/integration tests, contract tests, and policy-as-code guardrails. AI is best treated as a draft generator, not a trust anchor.

What's GLM-Image best for in this week's tool list?

Zhipu AI's GLM-Image is most valuable when your images must contain readable text: posters, slide graphics, marketing creatives, infographics, labels, bilingual signage. If you've been burned by image models mangling typography, this is the niche where it stands out.

Is Prism safe for writing papers with unpublished results?

OpenAI's Prism is aimed at end-to-end scientific writing, but "safe" depends on your lab/org rules. Before using any cloud workspace, confirm: what content is stored, who can access projects, retention controls, and whether your institution allows external processing of drafts/data. If you handle sensitive or regulated info, do a lightweight risk review before adopting.

Should I run a local personal agent like Moltbot on my main machine?

Be cautious. Tools like Moltbot (formerly Clawdbot) are powerful specifically because they can touch your filesystem and run commands — which also expands the blast radius if something goes wrong. Safer pattern: run it in a separate user account/container, minimize permissions, avoid giving it access to secrets, and treat "self-improving agent installs things" as a code-review-required workflow.

So does AI really increase productivity?

AI increases output easily. It increases productivity only when it reduces the total cost of correctness — including review, debugging, integration, and long-term maintenance. The two papers this week suggest a realistic framing: AI can speed up some developers and some tasks, while slowing down experienced devs and concentrating maintenance work — meaning the net effect can be positive, negative, or "it depends" based on where your team's bottlenecks actually are.

← Back to Articles Top of page ↑

Disclaimer: This article provides general information and is not legal or technical advice. For official guidelines on the safe and responsible use of AI, please refer to the Australian Government’s Guidance for AI Adoption →

AI Bits for Techies | Issue #4 | 5 Feb 2026

AI Bits for Techies | Issue #4

Read this if you are:

Founders & Teams

Students & Switchers

Community Builders

AI Bits for Techies | Issue #4 | 5 Feb 2026

The two papers you should pretend you read at lunch

1) "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" (Becker et al., METR)

2) "AI-Assisted Programming Decreases the Productivity of Experienced Developers by Increasing the Technical Debt and Maintenance Burden" (Xu et al., Tilburg University)

What is the setup?

What they did (yes, really)

What happened

Why it is interesting (beyond the number)

The real question

Tools worth poking this week (in a sandbox first)

GLM-Image (by Zhipu AI)

Qwen3-Max-Thinking (from Alibaba)

Prism (from OpenAI)

Moltbot (formerly Clawdbot; also seen as OpenClaw)

Book recommendation (because your brain deserves more than changelogs)

Governing the System, Not Just the Model — The Governance of Artificial Intelligence by Tshilidzi Marwala

Geeky thought of the day

Housekeeping (so we stay honest)

About the Authors

Dr Sam Donegan

Jun Kai (Luc) Chang

Julia Ponder

Shivang Shekhar

Frequently Asked Questions

Does AI actually make developers more productive?

Why would experienced devs get slower with AI?

Is "more code shipped" a misleading productivity metric?

So what should teams measure instead of LOC or PR count?

When does AI help the most for coding?

What's the safest way to pilot AI coding tools without slowing seniors down?

How do we avoid the "senior dev as cleanup crew" outcome?

What's GLM-Image best for in this week's tool list?

Is Prism safe for writing papers with unpublished results?

Should I run a local personal agent like Moltbot on my main machine?

So does AI really increase productivity?

Join our upcoming events

Melbourne | MLAI x StartSpace Monthly Saturday Co-working Day

MedHack: Frontiers

Melbourne | AI Builder Co-working x S&C

AI Bits for Techies | Issue #4 | 5 Feb 2026

AI Bits for Techies | Issue #4

Read this if you are:

Founders & Teams

Students & Switchers

Community Builders

AI Bits for Techies | Issue #4 | 5 Feb 2026

The two papers you should pretend you read at lunch

1) "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" (Becker et al., METR)

2) "AI-Assisted Programming Decreases the Productivity of Experienced Developers by Increasing the Technical Debt and Maintenance Burden" (Xu et al., Tilburg University)

What is the setup?

What they did (yes, really)

What happened

Why it is interesting (beyond the number)

The real question

Tools worth poking this week (in a sandbox first)

GLM-Image (by Zhipu AI)

Qwen3-Max-Thinking (from Alibaba)

Prism (from OpenAI)

Moltbot (formerly Clawdbot; also seen as OpenClaw)

Book recommendation (because your brain deserves more than changelogs)

Governing the System, Not Just the Model — The Governance of Artificial Intelligence by Tshilidzi Marwala

Geeky thought of the day

Housekeeping (so we stay honest)

About the Authors

Dr Sam Donegan

Jun Kai (Luc) Chang

Julia Ponder

Shivang Shekhar

Frequently Asked Questions

Does AI actually make developers more productive?

Why would experienced devs get slower with AI?

Is "more code shipped" a misleading productivity metric?

So what should teams measure instead of LOC or PR count?

When does AI help the most for coding?