MLAI Kangaroo logo
1Hello2Events3Founder Tools4People5Sponsor6Articles7Login

Disclaimer: This article provides general information and is not legal or technical advice. For official guidelines on the safe and responsible use of AI, please refer to the Australian Government’s Guidance for AI Adoption →

Next up

How Much Do Data Scientists Make?'

Salary ranges for data scientists in Australia in 2026—entry, mid, senior and lead. City and industry differences, skills that lift pay, and negotiation tips.

How Much Do Data Scientists Make?'

Join our upcoming events

Connect with the AI & ML community at our next gatherings.

Melbourne | MLAI x StartSpace Monthly Saturday Co-working Day

Melbourne | MLAI x StartSpace Monthly Saturday Co-working Day

Fri, 6 Feb
11:00 pm
State Library Victoria, 328 Swanston St, Melbourne VIC 3000, Australia
MedHack: Frontiers

MedHack: Frontiers

Fri, 20 Feb
1:00 pm
Wellington Rd, Clayton VIC 3800, Australia
Melbourne | AI Builder Co-working x S&C

Melbourne | AI Builder Co-working x S&C

Fri, 20 Feb
10:30 pm
Stone & Chalk Melbourne Startup Hub, 121 King St, Melbourne VIC 3000, Australia
View All Events →

Footer

Events

  • Upcoming
  • Calendar

About

  • Contact
  • LinkedIn

Sponsoring

  • Info for sponsors

Volunteering

  • Apply to Volunteer
LinkedInInstagramSlack
MLAI text logo

© 2026 MLAI Aus Inc. All rights reserved.·Privacy Policy·Terms of Service

  1. /AI Bits for Techies

AI Bits for Techies | Issue #2 | 19 Jan 2026

AI Bits for Techies | Issue #2

Three questions people are hammering into search and chat right now, plus the short answers you can steal.

  • Do different personality types actually prefer different LLMs?

    Yes. This week's paper shows Rationals tend to favor GPT-4 while Idealists prefer Claude 3.5, even when overall helpfulness scores are nearly identical. Personality-stratified analysis reveals preferences that aggregate ratings hide.

  • Are we measuring model intelligence or user–model compatibility?

    Most evaluations mix the two. What we call "model quality" often reflects compatibility between the user, task, and interaction style—not just the model itself. Two equally powerful models can feel radically different to different people.

  • Should we stop asking "Which LLM is best?"

    Yes. Start asking "Best for whom, and for what?" Design experiments around real users, workflows, and friction—not just leaderboard scores. This shifts from model-centric thinking to user-centric design.

Scientific illustration of transient image classification
💡Quick note
This guide is part of our broader series on Weekly Deep Dive into AI and ML Advancements & Updates. Prefer to jump ahead? Browse related articles →

Read this if you are:

Founders & Teams

For leaders validating ideas, seeking funding, or managing teams.

Students & Switchers

For those building portfolios, learning new skills, or changing careers.

Community Builders

For workshop facilitators, mentors, and ecosystem supporters.

AI Bits for Techies | Issue #2 | 19 Jan 2026

Your weekly Aussie-flavoured deep dive into what changed in AI/ML, what matters, and what to do next (without living on release-note social media).

This week in one breath: A paper showing personality types predict LLM preferences (Rationals favor GPT-4, Idealists prefer Claude 3.5), tools for comparing models and generating video/voice content, and a shift in thinking: "best LLM" isn't one answer—it depends on who you are and what you're doing.


Scientific illustration of transient image classification

The one paper you should pretend you read at lunch

Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks

What is the setup?

Most LLM evals treat users as basically interchangeable and report an average "helpfulness" score. This paper flips it: it asks whether different personality types systematically prefer different models during real, multi-turn collaboration.

What they did (yes, really)

They ran a user study with 32 participants, evenly split across four Keirsey personality types, and had them complete four collaborative tasks (data analysis, creative writing, information retrieval, writing assistance) using either GPT-4 or Claude 3.5.

What happened

On the surface, the models looked tied: overall helpfulness ratings were nearly identical. But once they segmented by personality, strong preferences popped out. Rationals tended to prefer GPT-4 (especially on goal-oriented work), while Idealists tended to prefer Claude 3.5 (notably on creative and analytical tasks). Other types varied by task.

Why it is interesting (beyond the number)

It's a clean example of how "best model" can be an illusion created by averaging across people. If you only look at aggregate ratings, you miss real usability differences that show up once you account for who is using the system and what they are trying to do.

The real question

If personality (and likely other user traits) changes what "helpful" even means, should we stop treating LLM evaluation as one leaderboard and start treating it like product fit? That pushes builders toward personalization, segmentation, and task-specific rollout decisions, not just model swaps.

Full paper: https://arxiv.org/abs/2508.21628


Tools worth poking this week (in a sandbox first)

T3 Chat

Best for: Comparing multiple LLM outputs side-by-side in one conversation — ideal for experimentation, prototyping, and A/B testing different model behaviors.
https://t3.chat/

LTX Studio

Best for: AI video generation and editing — create scenes, motion graphics, and video content from text prompts with manual controls for framing, camera direction, and storytelling
https://ltx.studio/

Auralix

Best for: Emotional voice synthesis — produces human-grade, emotionally nuanced audio for podcasts, audiobooks, and voiceovers in 100+ languages, ideal for creators and educators
https://www.auralix.ai/

Book cover

Book recommendation (because your brain deserves more than changelogs)

The Alignment Problem – Brian Christian

As LLMs become collaborators rather than tools, the real challenge is not raw capability but alignment with human goals, values, and expectations. Brian Christian traces how even well-intentioned systems drift when optimization targets miss what humans actually care about.

Reading alongside this week's paper, it sharpens the question: if models already "misalign" differently for different personalities, what does alignment even mean in a world of diverse users?


Geeky thought of the day

Is LLM evaluation measuring model intelligence… or user–model compatibility?

Two models can score the same on benchmarks and still feel radically different to different people. What looks like "better reasoning" to one user might feel rigid or frustrating to another.

The uncomfortable idea is that LLM performance may not be a single objective property at all. It may emerge from the interaction — shaped by the user's goals, personality, and expectations as much as the model itself.


Housekeeping (so we stay honest)

This is general information, not legal advice. If you ship user-facing AI, be transparent about where AI is used, what it cannot do, and where humans stay in the loop.

About the Authors

Dr Sam Donegan

Dr Sam Donegan

Founder & Lead Editor

Sam leads the MLAI editorial team, combining deep research in machine learning with practical guidance for Australian teams adopting AI responsibly.

Jun Kai (Luc) Chang

Jun Kai (Luc) Chang

AI Software Developer

Luc is an AI Software Developer at Monash AIM, building neural networks on FPGA boards. He is pursuing a Master of AI at Monash and co-founding a startup in the event space.

Julia Ponder

Julia Ponder

Technical Writer

Julia specialises in translating developer jargon into plain English. She creates clear, expertly formatted documentation and tests products before they go to market.

Shivang Shekhar

Shivang Shekhar

Technical Writer

Shivang is a mechanical engineer and AI masters student at Monash University with a diverse science background. He is the main author for AI Bits for Techies each week.

AI-assisted drafting, human-edited and reviewed.

Frequently Asked Questions

What does your choice of LLM say about you?

Different models optimize for different things: precision, creativity, tone, or structure. If you prefer one over another, you're often selecting for how you think and work, not raw intelligence. The model becomes a mirror of your cognitive style and goals.

Why do two equally powerful LLMs feel so different to use?

Small differences in response style, verbosity, and reasoning transparency compound over multi-turn conversations. Over time, these differences shape trust, frustration, and perceived intelligence more than benchmark scores.

Are some LLMs better for engineers and others for creators?

Yes. Some models excel at structured reasoning and deterministic tasks, while others feel more natural in open-ended or creative workflows. The "best" model depends on whether you value correctness, exploration, or collaboration.

Does model "helpfulness" mean the same thing for everyone?

No. Some users define helpfulness as speed and accuracy, others as clarity, empathy, or guidance. This makes aggregate ratings misleading unless you segment by task type or user profile.

Are we measuring AI intelligence or user–AI fit?

Most current evaluations mix the two. What we often call "model quality" is often the result of compatibility between the user, the task, and the interaction style, not just the model itself.

What is the practical takeaway for builders this week?

Stop asking "Which LLM is best?" and start asking "Best for whom, and for what?" Design experiments around real users, real workflows, and real friction, not just leaderboard scores.

← Back to ArticlesTop of page ↑