Skip to content

reading results update ready for review#37790

Open
lukasgoetzweiss wants to merge 1 commit into
masterfrom
lukas.goetzweiss/exp-results-update
Open

reading results update ready for review#37790
lukasgoetzweiss wants to merge 1 commit into
masterfrom
lukas.goetzweiss/exp-results-update

Conversation

@lukasgoetzweiss

Copy link
Copy Markdown
Contributor

AI assistance

Gave Claude a draft, edited output manually

@lukasgoetzweiss lukasgoetzweiss requested a review from a team as a code owner June 26, 2026 14:18
@github-actions github-actions Bot added the Images Images are added/removed with this PR label Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Preview links (active after the build_preview check completes)

Modified Files


### Confidence intervals

The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to say "that are consistent with the observed data." The way frequentist hypothesis testing works is that it doesn't directly "support" specific values/null hypotheses, but rather it "rules out" hypotheses/values that (if true) would make the observed data very unlikely. I think it's technically more precise to say "not inconsistent with the observed data" but it reads like a double negative so I'm willing to compromise there haha.


The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.

- If the **entire interval is above zero**, the result is statistically significant in the positive direction. The improvement to the metric is unlikely to be attributable to random variation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats snobs would push back on these because they sound like Bayesian interpretations (the wording sounds like it's referring to P(lift is real | data) rather than P(observed lift at least this large | true effect = 0). I'd say something like "An improvement at least this large is unlikely to occur if there is no true effect" or even just "The observed lift is inconsistent with a no true effect."


- If the **entire interval is above zero**, the result is statistically significant in the positive direction. The improvement to the metric is unlikely to be attributable to random variation.
- If the **entire interval is below zero**, the result is statistically significant in the negative direction. The treatment likely reduced the metric.
- If the **interval crosses zero**, the result is not statistically significant. The observed lift may have occurred by chance.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "the result is consistent with a true effect of zero" instead of "The observed lift may have occurred by chance."

Lift may have occurrerd by chance sounds like it's making a statement about P(H0) not P(Data | H0)

For each metric, the {{< ui >}}Global lift{{< /ui >}} tab displays:

- **Control and treatment values**: The average per-subject metric value in each variant—the same values shown on the main scorecard tab.
- **Coverage**: The estimated proportion of your global metric total that would come from the eligible population under a control-only rollout.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the "control-only rollout" description sounds a little jargon-ey and obscures the nice intuition of coverage. Maybe something like:

"The estimated proportion of your global metric total associated with the experiment's eligible population (excluding the effect of the experiment)"

I really want people to grok coverage as e.g., the % of revenue exposed to the change being tested rather than some technical causal inference concept. I think the "control only" correction is an afterthought/implementation detail

@brett0000FF brett0000FF self-assigned this Jun 26, 2026

@brett0000FF brett0000FF left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The only blocking feedback from Docs is that we need to add back the deleted image.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this file back? We have a job that deletes outdated images. We have to leave this hear temporarily so that the image doesn't break on non-English pages. Thanks!

Comment on lines +42 to +47
<div class="alert alert-info"><strong>How metrics are calculated</strong><br><br>
Datadog analyzes experiments at the <strong>subject</strong> level—the unit you configured when you set up the experiment, typically a user. Datadog computes a metric value for each enrolled subject (for example, revenue per user or whether the user completed a signup). These per-subject values form a distribution for each variant. Datadog's statistical engine then compares these distributions between control and treatment.<br><br>
<strong>Relative lift</strong> measures how much the treatment shifted the average per-subject metric value compared to the control:<br><br>
<pre><code>Relative lift = (Treatment − Control) / Control</code></pre>
A relative lift of 10% means the treatment group's average per-subject value is 10% higher than the control group's average. Negative lift means the treatment performed worse on average.
</div>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit too big to fit into a Note callout. I'd recommend pulling it back out into a section, or if you want to deemphasize it, you could put it inside a collapse-content shortcode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Images Images are added/removed with this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants