How It Works

How 132,342 messages became data — and what that data can and can't tell us.

This page explains the methodology in plain language. Training, keyword modelling, and data science methods were the driving force behind the datapoints — not AI. Messages were scanned for specific types of words and phrases, counted over time, and used to identify patterns. If you want the details — or if you want to understand the limitations before drawing conclusions — keep reading.

What was analyzed

The entire analysis is based on a text-only export of messages between A and S, spanning mid-2017 through early 2023. The dataset includes 132,342 total messages — 58,562 from A and 71,864 from S — across 131 separate chat logs.

This means: only the words they typed. No photos, no voice messages, no video calls, and — crucially — no in-person interactions. The messages were exported from their messaging platform and parsed into a structured timeline with timestamps, speaker labels, and message content.

Speakers were mapped to anonymized labels: A and S. Real names, nicknames, platform names, and identifying place names were systematically removed or replaced using a de-identification function that runs on every piece of text before it's displayed.

How language was measured

Training, keyword modelling, and data science methods drive the analysis — not AI or black-box sentiment models. This analysis uses lexicon-based matching: each message is scanned for specific words and phrases that belong to defined categories. If "sorry" appears in a message, it's counted as an apology/repair signal. If "hurt" or "angry" appears, it's counted as conflict. The word lists are auditable — you can look at any score and trace it back to specific words.

The five language families

Affection — Love, heart emojis, hugs, cute, endearments, pet names. When these are high, the conversation is warm. A produced 4,158 instances of affirmation language across the relationship; S produced 4,139.

Apology & Repair — Sorry, apologize, my bad, forgive me, "let's talk," "I understand." The presence of this language after a difficult period is what "repair" means throughout this site. A: 1,572 instances. S: 1,878 instances.

Conflict — Upset, angry, hurt, fight, argument, overwhelmed, depressed, anxious, lonely, cry, afraid, scared. These don't always mean a fight is happening — sometimes they reflect personal struggles being shared — but in aggregate, they indicate strain.

Withdrawal — Need space, shut down, can't talk, leave me be, too much. This tracks emotional distance as a behavior, distinct from conflict. A's 183 avoidant markers were her highest attachment category — withdrawal was her dominant stress response.

Financial / logistical stress — Rent, bills, money, budget, job, interview, work, unemployment. Financial strain was a recurring stressor, so it was tracked separately to distinguish financial episodes from purely emotional ones.

The advantage of this approach is complete transparency: every score traces back to specific words that can be verified. The disadvantage is that it misses nuance. Sarcasm, context, inside jokes, and the meaning behind words that don't fit neatly into categories are all invisible to this method. "I'm fine" counts as neutral even when it's shielding. "I love you" counts the same whether it's joyful or desperate.

How the scores work

Raw word counts aren't comparable across weeks — a week with 500 messages will naturally have more of everything than a week with 50. So all counts are converted to rates per 1,000 messages, with light Laplace-style smoothing to avoid wild swings in low-volume weeks.

From these rates, two composite scores are computed weekly using z-scores (a standard technique that converts different units to the same scale):

Connection score

Engagement (message volume, balanced participation) + Warmth (affection language, quick response times) − Imbalance − Latency. Think: "how connected and warm was this week?"

Strain score

Conflict language + Withdrawal language + Financial stress + Repair attempts. Think: "how much difficulty was showing up in the messages this week?" Repair is included because its presence signals something needed repairing.

How episodes are detected

An "episode" is a distinct period where conflict or withdrawal language clustered together. The algorithm works like this:

Any message containing conflict or withdrawal language is marked as a trigger message. Trigger messages less than 72 hours apart are grouped into the same episode. Gaps over 72 hours start a new episode.

Severity is scored by combining conflict, withdrawal, and financial stress density relative to the episode's length. Episodes are classified as Low, Medium, or High.

Repair detection looks for the first apology or reconciliation language within 7 days of episode start. The "repair lag" measures how quickly someone reached out.

Initiator identifies whose messages contained the first trigger language. This is not a determination of who "caused" the conflict — the first text signal might be a reaction to days of invisible buildup, something that happened offline, or accumulated tension finally surfacing.

How the model works

A logistic regression model was trained on the weekly metrics to identify which language features best predict weeks in the top 15% of strain. The value is in the coefficients — which features matter, how much, and in which direction — not in the predictions themselves.

Important caveat: the labels (top 15% strain weeks) are derived from the same features used as inputs. This means accuracy metrics are artificially optimistic. But for understanding which signals carry the most weight, this is still informative — think of it as a structured way of asking "what language features are most associated with the hardest periods?"

Limitations — what this can't tell you

These limitations are real and important. Everything on this site should be read with them in mind.

Text only. No tone of voice, body language, facial expressions, or in-person interactions. The "good quiet" — comfortable silence spent together, a hand held without words, laughter that doesn't get typed — is completely invisible. This means the data systematically over-represents conflict and under-represents contentment.

Lexicon limits. Word counting is blunt. "I'm fine" might be genuine or might be shielding. Sarcasm, inside jokes, and context-dependent meaning are all missed. "I love you" after a fight carries different weight than the same words on a lazy Sunday, but the lexicon counts them the same.

Bias toward conflict and logistics. Chat logs over-represent bids for connection, conflict processing, and logistics — because those are the things people type about. The peaceful, contented, mundane moments are under-represented. This means the data paints a more turbulent picture than the full lived experience.

This site was built by A. That introduces potential bias in framing, emphasis, and what gets highlighted. The analysis methods are transparent and the data is the data — but the decisions about what to narrate and how carry one person's perspective. Where interpretations differ, both sides are shown, but the risk is real and worth naming honestly.

Not a diagnosis. Nothing here is therapy, clinical assessment, or professional evaluation. Attachment theory and relationship psychology provide useful frameworks for understanding patterns, but frameworks are not verdicts. The attachment markers are counted from text — not from a clinical interview, not from behavioral observation, not from a validated assessment tool.

Why this might be useful for you

The pursuer-withdrawer dynamic is one of the most common patterns in relationship distress. The fearful-avoidant / anxious-preoccupied attachment pairing that A and S represent is documented extensively in research. What's unusual here is the depth of the data — 132,342 messages, over 5.5 years, with measurable language shifts that track the progression from connection through strain to dissolution.

If you recognize these patterns in your own relationship, the data here can help in a few specific ways: it can help you name what's happening (naming the loop is the first step toward interrupting it); it can help you see that the pattern isn't about blame (both people's responses make sense from the inside); and it can point you toward specific interventions (EFT therapy, structured timeout protocols, intimacy menus) that are designed for exactly these dynamics.

The single most important insight from this data: love and effort are necessary but not sufficient. A and S had plenty of both — 8,297 affirmations and 3,450 repair attempts prove that. What they didn't have was the structural tools to interrupt their loops in real time. That's not a character failing. It's a skills gap. And skills can be learned.

Technical details. Training, keyword modelling, and data science methods were the driving force behind the datapoints. The analysis was performed using Python, pandas, and scikit-learn. Message parsing used regex-based line-by-line extraction with speaker mapping. Charts use Chart.js. The full methodology documentation — including the specific word lists for each lexicon family — is available in the Deep Reading section. Source code and data processing are documented in the repository.