reading results update ready for review by lukasgoetzweiss · Pull Request #37790 · DataDog/documentation

lukasgoetzweiss · 2026-06-26T14:18:38Z

AI assistance

Gave Claude a draft, edited output manually

github-actions · 2026-06-26T14:23:11Z

Preview links (active after the `build_preview` check completes)

Modified Files

https://docs-staging.datadoghq.com/lukas.goetzweiss/exp-results-update/experiments/reading_results

tbuffington7 · 2026-06-26T19:39:13Z

+
+### Confidence intervals
+
+The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.


I'd prefer to say "that are consistent with the observed data." The way frequentist hypothesis testing works is that it doesn't directly "support" specific values/null hypotheses, but rather it "rules out" hypotheses/values that (if true) would make the observed data very unlikely. I think it's technically more precise to say "not inconsistent with the observed data" but it reads like a double negative so I'm willing to compromise there haha.

tbuffington7 · 2026-06-26T19:39:25Z

+
+The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.
+
+- If the **entire interval is above zero**, the result is statistically significant in the positive direction. The improvement to the metric is unlikely to be attributable to random variation.


Stats snobs would push back on these because they sound like Bayesian interpretations (the wording sounds like it's referring to P(lift is real | data) rather than P(observed lift at least this large | true effect = 0). I'd say something like "An improvement at least this large is unlikely to occur if there is no true effect" or even just "The observed lift is inconsistent with a no true effect."

tbuffington7 · 2026-06-26T19:40:09Z

+
+- If the **entire interval is above zero**, the result is statistically significant in the positive direction. The improvement to the metric is unlikely to be attributable to random variation.
+- If the **entire interval is below zero**, the result is statistically significant in the negative direction. The treatment likely reduced the metric.
+- If the **interval crosses zero**, the result is not statistically significant. The observed lift may have occurred by chance.


I'd say "the result is consistent with a true effect of zero" instead of "The observed lift may have occurred by chance."

Lift may have occurrerd by chance sounds like it's making a statement about P(H0) not P(Data | H0)

tbuffington7 · 2026-06-26T19:44:46Z

+For each metric, the {{< ui >}}Global lift{{< /ui >}} tab displays:
+
+- **Control and treatment values**: The average per-subject metric value in each variant—the same values shown on the main scorecard tab.
+- **Coverage**: The estimated proportion of your global metric total that would come from the eligible population under a control-only rollout.


I think that the "control-only rollout" description sounds a little jargon-ey and obscures the nice intuition of coverage. Maybe something like:

"The estimated proportion of your global metric total associated with the experiment's eligible population (excluding the effect of the experiment)"

I really want people to grok coverage as e.g., the % of revenue exposed to the change being tested rather than some technical causal inference concept. I think the "control only" correction is an afterthought/implementation detail

brett0000FF

Thanks! The only blocking feedback from Docs is that we need to add back the deleted image.

brett0000FF · 2026-06-26T20:11:13Z

Can you please add this file back? We have a job that deletes outdated images. We have to leave this hear temporarily so that the image doesn't break on non-English pages. Thanks!

brett0000FF · 2026-06-26T20:16:15Z

+<div class="alert alert-info"><strong>How metrics are calculated</strong><br><br>
+Datadog analyzes experiments at the <strong>subject</strong> level—the unit you configured when you set up the experiment, typically a user. Datadog computes a metric value for each enrolled subject (for example, revenue per user or whether the user completed a signup). These per-subject values form a distribution for each variant. Datadog's statistical engine then compares these distributions between control and treatment.<br><br>
+<strong>Relative lift</strong> measures how much the treatment shifted the average per-subject metric value compared to the control:<br><br>
+<pre><code>Relative lift = (Treatment − Control) / Control</code></pre>
+A relative lift of 10% means the treatment group's average per-subject value is 10% higher than the control group's average. Negative lift means the treatment performed worse on average.
+</div>


This reads a bit too big to fit into a Note callout. I'd recommend pulling it back out into a section, or if you want to deemphasize it, you could put it inside a collapse-content shortcode.

reading results update ready for review

8fe3866

lukasgoetzweiss requested a review from a team as a code owner June 26, 2026 14:18

github-actions Bot added the Images Images are added/removed with this PR label Jun 26, 2026

lukasgoetzweiss requested a review from tbuffington7 June 26, 2026 14:18

tbuffington7 reviewed Jun 26, 2026

View reviewed changes

brett0000FF self-assigned this Jun 26, 2026

brett0000FF approved these changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reading results update ready for review#37790

reading results update ready for review#37790
lukasgoetzweiss wants to merge 1 commit into
masterfrom
lukas.goetzweiss/exp-results-update

lukasgoetzweiss commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

tbuffington7 Jun 26, 2026

Uh oh!

tbuffington7 Jun 26, 2026

Uh oh!

tbuffington7 Jun 26, 2026

Uh oh!

tbuffington7 Jun 26, 2026

Uh oh!

brett0000FF left a comment

Uh oh!

brett0000FF Jun 26, 2026

Uh oh!

brett0000FF Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### Confidence intervals

		The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.


		The confidence interval is a range of lift values that the observed data supports. The true lift could fall outside this range, but values inside the interval are more consistent with what the experiment measured.

		- If the entire interval is above zero, the result is statistically significant in the positive direction. The improvement to the metric is unlikely to be attributable to random variation.

Uh oh!

Conversation

lukasgoetzweiss commented Jun 26, 2026

AI assistance

Uh oh!

github-actions Bot commented Jun 26, 2026

Preview links (active after the build_preview check completes)

Modified Files

Uh oh!

tbuffington7 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tbuffington7 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tbuffington7 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tbuffington7 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

brett0000FF left a comment

Choose a reason for hiding this comment

Uh oh!

brett0000FF Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

brett0000FF Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Preview links (active after the `build_preview` check completes)