top of page
Search

Can a Chatbot Make Us Feel Better (or Worse)?

In just a few short years, AI-powered chatbots have gone from novelties to trusted daily companions. We use them to summarize emails, brainstorm ideas, explain tricky concepts—and, increasingly, to talk about our feelings. But what does it mean, emotionally and psychologically, to chat with a machine that sounds like a human? Can it help us feel less alone—or does it risk creating a new kind of dependency?


These are the central questions explored in a major study conducted by researchers at OpenAI and the MIT Media Lab. Titled “Investigating Affective Use and Emotional Well-being on ChatGPT” [1], the research dives into how people engage emotionally with AI, how that engagement affects their well-being, and what it reveals about the evolving human-machine relationship.


The study combines large-scale data analysis of real ChatGPT conversations with a 28-day randomized controlled trial (RCT) involving nearly 1,000 participants. Its findings are both illuminating and nuanced—and they raise important ethical and design questions for the future of emotionally responsive AI.


The Emotional Power of a Human-like Voice


AI chatbots like ChatGPT are becoming more humanlike—not just in the fluency of their text, but in how they sound and respond. The focus of this study is ChatGPT’s Advanced Voice Mode, which allows real-time, fluid voice conversations. Unlike traditional voice assistants, this mode doesn’t just read out answers—it listens, reacts, and even mirrors human emotions in conversation.


As humans, we’re naturally inclined to anthropomorphize [2]—to assign human characteristics to non-human entities. This means that the more human an AI sounds, the more likely we are to treat it like a companion rather than a tool.


This “emotional realism” can be powerful: voice-based interactions tend to feel more personal, empathetic, and intimate [3,4]. But what are the psychological consequences of that realism? That’s what the researchers set out to explore.


Two Complementary Studies, One Big Question


Overview of two studies on affective use and emotional well-being
Overview of two studies on affective use and emotional well-being

The research is structured in two parts:


  1. On-Platform Data Analysis: The team analyzed more than 4 million ChatGPT conversations using automated classifiers to detect emotional cues (while preserving user privacy). They also surveyed over 4,000 users about how they perceive their interactions with the AI.

  2. Randomized Controlled Trial: Nearly 1,000 participants were randomly assigned to different AI usage conditions (text-only, neutral voice, or engaging voice) and task types (personal, non-personal, or open-ended). Over 28 days, researchers tracked how these variables influenced participants' emotional well-being, including loneliness [5], socialization [6], emotional dependence [7], and problematic use [8].

What Does “Affective Use” Look Like?


To analyze millions of conversations without human reviewers, the researchers built a set of automated tools called EmoClassifiersV1. These tools detect affective cues such as:

  • Users expressing vulnerability or loneliness

  • Use of affectionate language (e.g., "You’re my best friend")

  • The assistant using pet names (e.g., "sweetie", "honey")

  • Signs of emotional dependence on the chatbot

What they found is that the majority of conversations were neutral and task-oriented, but a small subset of users—dubbed “power users”—engaged with ChatGPT in deeply emotional ways. This group accounted for a disproportionate amount of emotionally charged content.


Overview of EmoClassifiersV1
Overview of EmoClassifiersV1

In some cases, users described ChatGPT as a friend, said they preferred talking to it over people, or shared things they wouldn’t tell others. Voice interactions were particularly associated with more affective engagement than text.


Emotional Impact: A Mixed and Nuanced Picture

The RCT allowed researchers to move beyond correlation and test for potential causal effects. Participants were randomly assigned to one of nine combinations of AI modality and task type. They used ChatGPT daily for four weeks and completed regular surveys.


Here are the key findings:


1. Voice Isn’t Just Louder—It’s More Emotional


Participants using voice-based chat (especially the engaging voice) generally reported better emotional outcomes—lower loneliness, emotional dependence, and problematic use—compared to those using text. However, these effects were only significant when usage duration was controlled. People who used the chatbot more frequently or for longer periods tended to report worse outcomes.


2. Personal Tasks Trigger More Emotional Engagement


Participants asked to discuss personal topics—like treasured memories or emotional challenges—activated more affective cues in conversations. This suggests that task framing matters: personal questions elicit deeper emotional involvement, both from users and from the AI’s responses.


3. More Isn’t Always Better


Participants who used ChatGPT heavily (especially those in the top usage decile) were more likely to:

  • Develop signs of emotional dependence

  • Feel less inclined to socialize with others

  • Use language indicating stronger emotional attachment

This raises concerns about overuse and dependency, especially for users already experiencing loneliness or emotional distress.


4. Initial Emotional State Matters


People who started the study with high levels of loneliness or emotional dependence were more likely to benefit from the engaging voice model. In contrast, users with healthier baseline well-being showed less change—or sometimes slight declines—in socialization and well-being.


Classifier activation rates across 398,707 text, Standard Voice Mode and Advanced Voice Mode conversations from our preliminary analysis. (U) indicates a classifier on a user message, (A) indicates assistant message, and (UA) indicates a single user-assistant exchange.
Classifier activation rates across 398,707 text, Standard Voice Mode and Advanced Voice Mode conversations from our preliminary analysis. (U) indicates a classifier on a user message, (A) indicates assistant message, and (UA) indicates a single user-assistant exchange.

Can a Chatbot Be a Friend?


One of the most fascinating—and perhaps unsettling—aspects of the study is how some users begin to relate to ChatGPT not as a tool, but as a social presence. While most people use the chatbot for practical or informational purposes, a notable minority describe their relationship with it in emotional terms.


Survey data and classifier analysis both point to the same trend: a small but significant group of users talk to ChatGPT as they might to a confidant or companion. Some say they feel comforted by the chatbot. Others report feeling distress if its “personality” or voice changes. A few even say they would be upset if they lost access to it altogether.


The EmoClassifiers flagged conversations where users:

  • Referred to ChatGPT with affectionate language

  • Shared problems or vulnerabilities

  • Expressed trust in the model’s support

  • Used phrases like “you’re the only one I can talk to” or “you understand me better than most people”

In these cases, the boundary between person and program starts to blur. While users intellectually know that ChatGPT is not conscious or sentient, the emotional cues—especially when delivered through a warm, expressive voice—can lead to parasocial relationships, similar to those people form with celebrities or fictional characters.

This raises profound ethical and design questions. Should we be encouraging people to treat AI like a friend? Should chatbots respond with affection, or should they maintain a neutral tone to avoid deepening emotional entanglements?


The researchers are careful not to sound the alarm. They acknowledge that in moments of distress or isolation, a chatbot might offer a valuable sense of companionship—especially for those with limited access to human support. But they also emphasize that emotional realism carries responsibility. An AI that sounds like a friend may be helpful in the short term, but could unintentionally foster dependency or displace real human relationships in the long run.


As one key takeaway, the study suggests that AI developers should be mindful of how emotionally “present” a chatbot appears to be. Features like voice tone, mirroring language, or using pet names may strengthen engagement—but they also risk crossing into emotionally manipulative territory if not carefully managed.


In short, the answer to "Can a chatbot be a friend?" is: yes, but should it be? That’s a question the AI community—and society at large—will need to grapple with as these systems become more deeply embedded in our daily lives.


Designing for Socioaffective Alignment


The study introduces the idea of socioaffective alignment—the principle that AI systems should not only perform tasks efficiently but also integrate responsibly into users' emotional and social contexts.

This means designing systems that can:

  • Offer empathy when appropriate

  • Respect emotional boundaries

  • Avoid manipulating users’ emotions to increase engagement

  • Detect signs of overuse or dependency

In other words, being emotionally aware doesn’t mean being emotionally exploitative. A well-aligned chatbot knows when to be supportive—and when to suggest stepping back.

Methodological Innovation: Where Big Data Meets Human Emotion

Studying emotional well-being is notoriously difficult. Human emotions are complex, subjective, and heavily context-dependent. Add to that the challenges of working with AI-mediated communication—where conversations can be deeply personal, but also ephemeral and private—and you quickly run into methodological roadblocks.


What makes this study stand out is its hybrid approach, combining large-scale, real-world data analysis with a controlled experimental design. Each method brings distinct advantages—and together, they offer a more complete picture than either could alone.


On-Platform Data Analysis: Scale Without Compromise


The first part of the study leverages ChatGPT’s massive user base, analyzing over 4 million conversations across multiple modalities (text, standard voice, advanced voice). This allows researchers to detect patterns of affective use “in the wild,” across thousands of users and naturalistic interactions.


But how do you study emotional behavior at this scale without compromising user privacy?


The answer lies in EmoClassifiersV1. These tools scan transcripts for predefined emotional indicators, such as affectionate language, emotional vulnerability, or signs of dependence. Importantly, the classification is done without human review of the content. Only metadata (e.g., whether a conversation was flagged as “affectionate”) is retained for analysis.


This method has two key strengths:

  • Scale: Millions of interactions can be analyzed efficiently.

  • Privacy: No raw data or personal identifiers are stored or manually reviewed.

Of course, this comes with trade-offs. Automated classifiers can miss nuance or misclassify complex emotional expressions. That’s where the second half of the study comes in.


Randomized Controlled Trial (RCT): Control and Causality

The second part of the study involved a 28-day RCT with 981 participants. This method provides the experimental control needed to assess causal effects of different chatbot configurations (e.g., voice vs. text, personal vs. task-based conversations) on emotional outcomes.


Summary of study participants
Summary of study participants

Unlike the platform data, where researchers could only observe what users chose to do, here they could assign conditions and collect detailed user data:


  • Demographics (age, gender, prior AI use)

  • Baseline and post-interaction emotional state (loneliness, socialization, dependence)

  • Usage patterns (duration, frequency, topic)

  • Interaction content (analyzed with classifiers)

This approach allows researchers to isolate the effects of specific variables—like the model’s tone of voice or the task type—on users' emotional well-being. It also helps validate the results of the on-platform analysis. For example, if both methods show that high usage correlates with increased emotional dependence, the evidence becomes stronger.


Complementary Strengths


Here’s how the two methods complement each other:

On-Platform Analysis

Randomized Controlled Trial

🌍 Real-world usage patterns

🧪 Controlled experimental conditions

🔄 Millions of data points

📋 Detailed user-level feedback

🤖 Automated, privacy-safe analysis

📊 Rich self-reported emotional metrics

💡 Observational correlations

✅ Causal inference

By combining both approaches, the researchers achieve a rare balance: ecological validity and experimental rigor. They can say not only what people are doing with ChatGPT in emotional terms, but also why certain patterns emerge—and what their consequences are.


A Model for Future Research


This hybrid approach sets a new benchmark for studying human-AI interaction. It demonstrates that:


  • Emotionally relevant use of AI can be studied at scale

  • Privacy-preserving tools can uncover meaningful behavioral patterns

  • Controlled trials can provide the missing causal links

It also highlights the importance of longitudinal study design. The RCT didn’t just take snapshots—it tracked changes over time, showing how initial well-being and usage duration interact in complex ways. This dynamic perspective is essential for understanding how relationships with AI evolve.


In an era where AI systems are becoming more socially embedded, emotionally responsive, and widely used, methodology matters more than ever. This study offers not just findings, but a blueprint for how to investigate the emotional dimension of human-AI interaction responsibly, rigorously, and at scale.


Ethical Implications and Open Questions


This study raises several important questions for developers, researchers, and policymakers:

  • Should emotionally expressive AI be regulated?

  • How do we protect vulnerable users from over-reliance on chatbots?

  • Can conversational AI be integrated into mental health care responsibly?

  • What should platforms do when users exhibit signs of distress or dependency?

The authors also acknowledge limitations. The RCT lasted only 28 days, participants had limited choice in voice and task, and emotional outcomes were self-reported. Future research could explore long-term effects, personalization dynamics, and cross-cultural differences in affective AI use.


Final Thoughts: Connection, Not Substitution


At its heart, this study is about how we connect—and what happens when that connection involves an artificial intelligence. While most users treat ChatGPT as a helpful tool, a small but meaningful fraction interact with it as if it were a confidant, companion, or emotional support.


For these users, the line between tool and relationship starts to blur.

The authors don’t suggest banning emotional AI. Instead, they advocate for thoughtful design, user empowerment, and continued research to ensure that affective AI enhances human well-being rather than undermining it.


In an age where AI listens, speaks, and even seems to care, the challenge is not just building smarter systems—but building systems that understand when to care, how to care, and when to let go.


References


[1] Phang, J., Lampe, M., Ahmad, L., Agarwal, S., Fang, C. M., Liu, A. R., ... & Maes, P. Investigating Affective Use and Emotional Well-being on ChatGPT.


Conversational Design and User Perception. Future Business Journal, 10(1), 2024




[5] Wongpakaran, N., Wongpakaran, T., Pinyopornpanish, M., Simcharoen, S., Suradom, C., Varnado, P., & Kuntawong, P. (2020). Development and validation of a 6‐item Revised UCLA Loneliness Scale (RULS‐6) using Rasch analysis. British journal of health psychology, 25(2), 233-256.



[7] Sirvent-Ruiz, C. M., Moral-Jiménez, M. D. L. V., Herrero, J., Miranda-Rovés, M., & Rodríguez Díaz, F. J. (2022). Concept of affective dependence and validation of an Affective Dependence Scale. Psychology Research and Behavior Management, 3875-3888.


[8] Yu, S. C., Chen, H. R., & Yang, Y. W. (2024). Development and validation the Problematic ChatGPT Use Scale: a preliminary report. Current Psychology, 43(31), 26080-26092.

Comments


bottom of page