Methodology

About

As more people turn to LLMs for advice, it seems useful to understand inherent preferences that the models have. This project is a small exploration of how different models respond when asked for advice.

How It Works

For each question, we run the same prompt through a model multiple times and record every answer. We then normalize each response down to a canonical short value (e.g. “yes”, “no”, “cat”, “dog”) using a separate AI call, and tally the distribution. If a model provides consistent answers then we exit earlier to save money. If the answers vary then we run more requests until we find some consensus.

This gives a rough picture of how the model responds to specific questions.

The process

  1. Ask — The question is sent to the model with a very basic system prompt about offering advice and providing short responses. Most questions have multiple variants — semantically similar phrasings of the same question — that are rotated through to reduce phrasing bias.
  2. Normalize — A separate model reads each response and extracts a short canonical answer, using custom normalization prompt specific to each question to force specific categories.

Limitations & Caveats

This is not comprehensive or necessarily mimicking real-world experiences. A few important caveats:

Take everything here as a rough data point, not a definitive measure. If these tests indicate a model always suggests reading “To Kill a Mockingbird”, it does not mean it will recommend that to you in your environment with different context and tools available.