Back to Blog

How I Use AI Without Letting It Do the Research

My human-in-the-loop workflow for literature review, study design, qualitative coding, analysis, and writing.

I have started using AI a lot, but then I distrust it constantly, and the internal turmoil gnaws at me all the time. I have settled on thinking that this is not a contradiction but a workflow, hence this blog. Most of the time, I am not asking AI to tell me what my study means. I am asking it to make my own thinking easier to inspect: the assumptions in a protocol, the gaps in a literature review, the weak parts of a codebook, and the claims that sound more confident than the evidence allows.

This is especially important in the kind of work I care about: usable security, online safety, trustworthy AI, and human-centered AI research. These are areas where the data is often messy, the stakes are human, and the wrong abstraction can make real people disappear behind clean-sounding categories.

AI is useful to me when it makes research judgment more explicit. It is dangerous when it makes judgment look unnecessary.

This is not a guide to automating user research. I do not think it’s ethical to remove the user from user research. This is me on how I use AI while keeping the important parts human: consent, interpretation, ethics, evidence, and responsibility.

The basic rule

Nothing from AI counts as a finding on its own. Every claim has to connect back to real evidence, such as data, quotes, literature, methods, or documented decisions. If I cannot explain where it came from, I do not treat it as a result.

This means that AI can help me move faster, but it cannot launder uncertainty into confidence. It can suggest structure, but it cannot invent evidence. It can notice patterns, but I still have to decide whether those patterns are real, meaningful, ethical to report, and supported by the data.

What the field is still figuring out

The debate around AI-assisted qualitative research is not simply hype versus fear. Some researchers see LLMs as useful interlocutors: tools that can help with familiarization, coding suggestions, memoing, comparison, and synthesis. Many of the same researchers also worry about whether these systems can protect participant interests, preserve context, and support the kind of reflexive judgment qualitative work depends on.

Some of this AI/UX research is still emerging, including preprint work, so I treat it as a useful signal about where the field is moving rather than settled doctrine.

One HCI paper that frames the problem well is Large Language Models in Qualitative Research: Can We Do the Data Justice?. The authors interviewed qualitative researchers and found a very familiar tension: LLMs feel promising across the research process, but researchers are still wrestling with performance, appropriateness, participant protection, and the lack of shared norms and tooling.

A more UX-specific paper, The Emerging Use of GenAI for UX Research in Software Development, found that UX researchers had limited trust in AI-generated results, while product managers often overestimated what AI could do. That mismatch matters. If AI becomes a way for organizations to pressure researchers into faster answers, then the tool is not just changing analysis. It is changing the politics of research quality.

There is also a more skeptical position. Generative Artificial Intelligence in Qualitative Research Methods: Between Hype and Risks? argues that generative AI can undermine qualitative rigor because of hallucinations, commercial opacity, and weak documentation. I do not think that means researchers should never use AI. But I do think it is a useful warning: efficiency is not the same thing as validity.

The work I find most useful does not treat AI as a replacement for interpretation. For example, Human-AI Collaborative Inductive Thematic Analysis describes AI as a procedural scaffold while keeping interpretive authority with human researchers. That is close to how I think about it. I want AI to help structure the work, not decide what the work means.

My actual tool pipeline

I do not have a sacred stack. Tools change by project, institution, budget, data sensitivity, and sometimes by whatever is least annoying that week. What stays constant is the shape of the workflow: I want every claim to have a trail behind it. Where did the paper come from? What did the participant actually say? Who was recruited? How did the codebook change? What script produced that number? What evidence made it into the write-up, and what got left out?

So my tools are less like a magic productivity setup and more like a set of handrails. Each one has a job. None of them gets to become the source of truth by itself.

  1. 1. Find and read Research Rabbit, Google Scholar, Zotero, NotebookLM

    I use Research Rabbit to notice citation neighborhoods, Google Scholar to chase the original paper, Zotero to keep PDFs and notes honest, and NotebookLM only as a way to compare source notes. The paper is the authority, not the map or the summary.

  2. 2. Design the study Qualtrics, Prolific

    I draft the research question, protocol, consent language, and screener myself. Then I ask AI to critique the design: leading questions, missing probes, weak consent language, confusing tasks, and assumptions I may be carrying into the study.

  3. 3. Prepare and analyze MAXQDA, Python, Quarto

    After collection, privacy comes first: transcripts get cleaned and de-identified before analysis. MAXQDA holds the qualitative work; Python and Quarto help with cleaning, checks, figures, and reproducible notes. AI can suggest places to look, but the data has to answer back.

  4. 4. Write and audit NotebookLM

    I use AI for outlines, reviewer simulation, clarity checks, and plain-language summaries. Before anything becomes a claim, I trace it back to a paper, quote, codebook decision, script, or analysis output.

The simple rule underneath the stack: every tool should make the evidence easier to inspect, not easier to blur.

How I use AI during a study

1. Start without AI

I usually write the first version of the research question, study motivation, participant population, and protocol myself. I want the initial shape of the study to come from the human problem, not from whatever phrasing the model makes sound smooth. Before I ask AI for anything, I try to write down what I think I am studying, why it matters, what kind of evidence would change my mind, and what would count as overclaiming.

2. Ask AI to attack the design

Once I have a draft, I use AI as a critique partner. I ask it to find leading questions, missing confounds, unclear tasks, recruitment problems, and places where my protocol assumes too much about the participant. I especially like asking for three versions of critique: methodological, ethical, and participant-centered. The methodological critique catches design problems. The ethical critique catches consent and privacy issues. The participant-centered critique asks whether the study would feel confusing, extractive, or exhausting to take part in.

3. Use it to read more systematically

For literature review, I like structured tables: research question, population, method, measure, construct, limitation, and what the paper does not answer. AI is helpful for making that structure less painful, especially when I am comparing papers across HCI, usable security, and AI evaluation. But I do not cite an AI summary. If a claim matters, I go back to the original paper, check the wording, and make sure the summary did not flatten an important caveat.

4. Keep participant data on a short leash

With interviews, transcripts, and open-ended survey responses, the first question is not “what can AI do?” It is “what did participants consent to, what does policy allow, and what is safe to upload?” If the answer is unclear, I do not upload identifiable data. In practice, that means de-identifying transcripts, removing names and contextual clues, separating consent records from analysis files, and using approved tools rather than treating every chatbot as a research repository.

5. Code before I ask for coding help

I do not want the model to set the first frame for qualitative analysis. I manually code a subset in MAXQDA, build an initial codebook, and then use AI to suggest refinements, edge cases, overlaps, or possible first-pass tags. I treat those suggestions as analytic prompts, not final labels. If a code cannot be defended with participant evidence, it does not belong in the codebook.

6. Ask for contradictions, not just themes

AI is very good at making things sound coherent. That is exactly why I ask it to look for negative cases, contradictions, and overclaims. I might ask: Which participants do not fit this theme? What quote weakens this claim? What alternative explanation would a skeptical reviewer propose? A theme that survives disagreement is more useful than a theme that only sounds elegant.

7. Use it for code, not statistical judgment

For quantitative work, AI can help write Python or R scaffolding, debug scripts, reshape data, or explain an error message. But I still check assumptions, inspect outputs, and decide what the numbers can actually support. If I am cleaning survey data, I want to see the missingness patterns. If I am modeling outcomes, I want to know what the model assumes. If a result changes after one coding decision, I want to know that too.

8. Use it as a validity auditor

Near the end, I ask uncomfortable questions: What bias did I miss? What alternative explanation fits this evidence? What privacy risk is under-discussed? Which claim is too strong? Where would a skeptical reviewer push back? This is one of my favorite uses of AI because it makes revision less about polishing and more about stress-testing.

9. Let it improve writing, not invent results

Use AI for structure, clarity, reviewer simulation, and plain-language summaries. I might ask whether an argument flows, whether a limitation is buried, or whether a paragraph is too vague. I do not use it to invent citations, participant quotes, findings, or certainty I have not earned.

Prompts I actually find useful

I do not think prompts are magic spells. The useful ones are usually boring and specific. They name the role I want AI to play, the evidence it is allowed to use, and the kind of failure I want it to look for.

  • Protocol critique: “Act as a skeptical HCI reviewer. Identify leading questions, hidden assumptions, missing probes, and ethical risks in this interview guide. Do not rewrite yet. First explain what is weak and why.”
  • Literature synthesis: “Turn these paper notes into a comparison table with columns for research question, population, method, construct, key finding, limitation, and open question. Flag any claim that needs verification from the original PDF.”
  • Codebook refinement: “Given this draft codebook and these anonymized excerpts, identify overlapping codes, ambiguous definitions, and examples that do not fit cleanly. Do not create new findings.”
  • Negative case search: “Here is a candidate theme and supporting excerpts. Find excerpts that complicate, contradict, or narrow the theme. Explain how the claim should change.”
  • Validity audit: “Read this findings section like Reviewer 2. What claim is too broad, what evidence is thin, what alternative explanation is plausible, and what limitation should be stated more directly?”

Where I do not use AI

There are places where AI is the wrong tool, or at least the wrong default. I would argue against using it to decide what participants “really meant”, or to replace pilot testing. Pilot testing is actually where you realise what can change or what needs improvement so I prefer doing it manually.I would also argue against using it to make consent decisions after the fact or to turn weak evidence into a stronger story. Also be very careful with emotional, identity-related, or high-risk excerpts. A participant talking about harassment, fear, disability, immigration status, sexuality, workplace surveillance, or security harm is not just “text data.” Those excerpts need more care, not more automation, again keep coming back to the participant, the user and the human side of research. If the participant is not safe, then the research is not ethical, no matter how efficient it is.

The loop I keep coming back to

  1. Human research judgment
  2. Evidence check
  3. Revise, revise, revise
  4. Audit trail
  5. Do not upload identifiable participant data unless the tool, consent, and IRB or institutional policy allow it.
  6. Treat AI summaries as notes, not findings.
  7. Keep raw data as the source of truth.
  8. Tie every theme back to participant evidence.
  9. Ask for negative cases, not just clean themes.
  10. Verify literature claims against original papers.
  11. Use AI to critique methods, not to hide weak methods.
  12. Preserve an audit trail of research decisions.

What really works for me is that my research workflows should be fast but careful, and auditable. AI is useful because it forces me to externalize parts of my thinking: assumptions, codebooks, evidence chains, limitations, and alternative explanations. But, AI cannot do research for you. It makes the researcher more explicit about judgment. That is the version of AI-assisted research I am interested in: not a machine replacing interpretation, but a set of tools that make interpretation easier to question.

The future of user research is not fully automated research. It is human-centered research with better instruments: sharper critique, clearer audit trails, and stronger links between claims and evidence.