Outliyr BioHarmony Score — Evidence-Based Intervention Rating Framework

The Outliyr BioHarmony Score is a single 0-10 number that tells you whether a health intervention is actually worth it. Not just whether it “works.” Whether the full picture, upside and downside, justifies the cost.

It weighs 13 dimensions of benefit and harm. It penalizes risk more heavily than reward. It adjusts for evidence quality.

Every supplement, device, protocol, and therapy gets the same transparent treatment. You can see the formula, the raw scores, and exactly why anything lands where it does.

I built this after years of testing interventions on myself. No existing framework captured the full decision. A supplement can have strong efficacy data and still be a bad choice if it costs $200/month, creates dependency, and requires precise timing.

Star ratings flatten all of that into one useless number.

What’s Wrong With Current Ratings?

Star ratings and letter grades hide more than they reveal.

You’ve seen the pattern. A supplement gets an “A” on one site and a “C” on another. Neither tells you why.

Existing platforms rate on narrow slices. Examine.com does excellent work summarizing research, but it doesn’t produce a single comparable score across interventions. You can’t look at an Examine page for creatine and another for red light therapy and know which one deserves your budget this month.

Labdoor tests whether what’s on the label matches what’s in the bottle. That’s purity testing, not outcome evaluation. ConsumerLab does the same.

None of them answer the question I actually care about: given everything we know, is this worth my time, money, and biological risk?

That question requires weighing efficacy against side effects. Speed against dependency. Cost against breadth. No existing system does this in a structured, transparent way.

So I built one.

What Are the 13 Dimensions?

Every intervention gets rated on six upside dimensions and seven downside dimensions. Each scored 1-5.

The upside dimensions:

Efficacy: how well it works
Breadth: how many body systems it benefits
Evidence quality: strength of the supporting research
Speed of onset: how quickly you’ll notice results
Durability: how long those results persist
Bioindividuality: how consistently it works across different people

The downside dimensions:

Safety: risk of serious harm
Side effects: common negative reactions
Cost: monthly financial expense
Effort: time and complexity required
Opportunity cost: what you could do instead
Dependency: whether you need to keep taking it
Reversibility: whether negative effects resolve when you stop

Upside Dimensions

Dimension	Weight	Description
Efficacy	25%	Effect size on primary health outcomes, anchored to Cohen's d or NNT benchmarks
Breadth of Benefits	15%	Number of distinct biological systems positively affected
Evidence Quality	25%	Strength and volume of human clinical evidence, from meta-analyses down to mechanistic-only
Speed of Onset	10%	How quickly measurable effects appear, from hours to months
Durability	10%	How long benefits persist after discontinuation
Bioindividuality Upside	15%	Percentage of the population likely to benefit, accounting for genetic and contextual variation

Downside Dimensions

Dimension	Weight	Multiplier	Description
Safety Risk	30%	1.4×	Probability and severity of serious adverse events, including catastrophic risk floor
Side Effect Profile	15%	1.4×	Frequency and severity of mild-to-moderate adverse effects
Financial Cost	5%	1.0×	Monthly or per-cycle expense at recommended dosage
Time/Effort Burden	5%	1.0×	Daily time for preparation, administration, or practice
Opportunity Cost	5%	1.0×	Whether this intervention crowds out better-evidenced alternatives
Dependency / Withdrawal	15%	1.4×	Risk of physiological adaptation, tolerance, rebound, or withdrawal on cessation
Reversibility	25%	1.4×	Ability to stop the intervention and return to baseline without permanent changes

I chose these 13 because they cover the axes real people actually weigh when deciding whether to try something.

You don’t just want to know “does creatine build muscle.” You want to know if it’s safe long-term, whether the benefits fade when you stop, how much it costs, and whether the research is based on rodent studies or large human RCTs.

Each dimension is scored using a structured rubric tied to evidence. No gut feelings. No vibes. Every score links back to the data that justifies it.

Why Does Harm Get a Higher Multiplier?

Safety-related downside dimensions carry a 1.4x multiplier in the final calculation. This is deliberate.

The logic follows the precautionary principle. Harm is harder to undo than benefit is to gain.

If a supplement gives you a modest cognitive boost but causes liver stress, those aren’t equal trade-offs. The liver damage matters more.

Think of it like a coin flip. You’d reject a 50/50 between gaining $1,000 and losing $1,000. The weight of loss exceeds the weight of equivalent gain. In health decisions, that’s not a bias. It’s wisdom.

Harm-Type Dimensions

These dimensions carry a risk multiplier above 1.0×, reflecting the precautionary principle: irreversible or health-threatening downsides should be weighted more heavily than their raw scores suggest.

Safety Risk 1.4×
Side Effect Profile 1.4×
Dependency / Withdrawal 1.4×
Reversibility 1.4×

Opportunity-Type Dimensions

These dimensions use a 1.0× multiplier. They represent recoverable costs (money, time, opportunity) that can be regained if the intervention is stopped.

Financial Cost 1.0×
Time/Effort Burden 1.0×
Opportunity Cost 1.0×

The 1.4x multiplier means an intervention needs meaningfully more upside than downside to score well. Perfectly balanced pros and cons land around 5.0/10 (Neutral), not 7.0.

You have to genuinely earn a high score.

How Does the Formula Work?

The Outliyr BioHarmony Score uses an expected-value calculation. Each dimension’s score gets shifted by a baseline offset of -1.0, multiplied by its weight, and summed. Downside sums get the 1.4x harm multiplier before subtraction.

Step 1: Expected Value (EV)

EV = Σ((upside_score − 1) × weight) − Σ((downside_score − 1) × weight × risk_multiplier)

The baseline offset (1) shifts the neutral point so a score of 1 on any dimension contributes zero to EV. This prevents the "everything scores positive" problem: an intervention must exceed the baseline to register as a benefit.

Step 2: Normalize to 0–10

If EV ≥ 0: Score = 5 + (EV ÷ 5) × 5
If EV < 0: Score = 5 + (EV ÷ 7) × 5

A score of 5 is the neutral point: expected benefits roughly equal expected costs. Above neutral, 1 EV point equals 1 score point. Below neutral, 1 EV point equals about 0.71 score points, so EV = -7 maps to 0.0.

In plain language: add up all the good. Subtract all the bad (weighted 40% heavier). Convert to a 0-10 scale.

5.0 = benefits and risks are balanced. Above 5.0 = net positive. Below 5.0 = harms outweigh benefits.

The formula is fully transparent. You can see every input, every weight, and recalculate any score yourself.

Most scoring systems are black boxes. Here, the math is the argument.

How Does Scoring Work in Practice?

Two real examples show how the formula plays out.

Astaxanthin: 7.9/10 (Strong Recommend)

Astaxanthin is a carotenoid antioxidant found in wild salmon, krill, and microalgae. It scores well because the upside is solid across the board while the downside stays near the floor.

Efficacy: 3.3. Breadth: 3.5. Evidence: 3.0, backed by dozens of human RCTs.

On the downside: Safety is just 1.2. Side effects: 1.3. Cost: moderate at 2.5.

Result: 7.9/10. The risk-reward math clearly favors trying it.

Semaglutide: 5.6/10 (Neutral)

Semaglutide (Ozempic/Wegovy) tells a completely different story.

The raw efficacy is remarkable. It scores 4.8 for weight loss outcomes. Evidence quality: 4.5, backed by large phase III trials with thousands of participants.

But the downside dimensions hit hard. Dependency: 4.0 (weight regain after stopping is well-documented). Side effects: 3.5. Cost: 4.0. Safety: 3.0.

After the 1.4x harm multiplier, those downside scores erode most of the efficacy advantage. Final score: 5.6/10. Barely above neutral.

This is exactly the kind of nuance that star ratings destroy. Semaglutide is simultaneously one of the most effective weight loss interventions ever developed and a borderline-neutral recommendation once you account for the full picture.

What Do the Tiers Mean?

Scores map to six recommendation tiers so you can quickly interpret any number.

	Tier	Score Range	Meaning
✅	Top-tier	8.0–10.0	Do this yesterday
💪	Strong recommend	7.0–7.9	Worth prioritizing
👍	Worth trying	5.8–6.9	Good for the right person
⚖️	Neutral	4.8–5.7	Context-dependent
⚠️	Proceed with caution	3.7–4.7	Significant downsides to weigh
🚫	Skip	0.0–3.6	Not worth the risk

The tiers exist for fast scanning. But always look at the dimensional breakdown, not just the label.

Two interventions can both score 6.5 for completely different reasons. One might have moderate efficacy with zero risk. Another might have exceptional efficacy dragged down by high cost and dependency. Same tier. Different decision.

How Do We Grade Evidence?

Evidence quality directly affects how confident we are in any score. A supplement backed by 30 human RCTs gets treated very differently than one with only cell studies and anecdotal reports.

Systematic Reviews & Meta-Analyses of RCTsScore anchor: 4.5-5.0
Pooled data from multiple randomized controlled trials
Large Randomized Controlled TrialsScore anchor: 3.5-4.4
Well-designed RCTs with adequate sample sizes and controls
Small RCTs & Controlled StudiesScore anchor: 2.5-3.4
Pilot RCTs, crossover studies, or controlled but non-randomized trials
Observational & Epidemiological StudiesScore anchor: 2.0-2.4
Cohort, case-control, cross-sectional, or ecological studies
Traditional Use & Expert ConsensusScore anchor: 1.5-1.9
Long historical use, clinical expert opinion, or professional guidelines without RCT backing
Anecdotal & Case ReportsScore anchor: 1.0-1.4
Individual case reports, user testimonials, forum consensus, N=1 personal data
Mechanistic & In Vitro OnlyScore anchor: 1.0-1.2
Plausible mechanism of action from cell or animal studies, no human data

Weak evidence means a lower score. Period. No amount of mechanistic plausibility compensates for missing human data.

This is where the Outliyr BioHarmony Score diverges most from influencer recommendations. Something can have a beautiful mechanism of action, make total theoretical sense, and still score mediocre because nobody’s run a proper trial on it yet.

Popularity doesn’t move the needle. Published human data does.

What Are Confidence Bands?

Every score comes with an implicit confidence range based on evidence depth and consistency.

Creatine or vitamin D? Tight confidence band. Extensive, consistent human trial data. The score probably won’t shift much.

Newer peptides or experimental nootropic stacks? Wide band. The current score is our best estimate, but a single well-designed trial could move it significantly.

Confidence bands keep me honest. They prevent false precision. You deserve to know how firm the ground is beneath any number I publish.

Can You Personalize Your Scores?

Default scores use population-level weights. That’s a reasonable starting point. But your priorities aren’t average.

Maybe cost barely matters to you but you’re extremely risk-averse about side effects. Maybe you care most about speed because you’re prepping for a competition in eight weeks. Maybe dependency is a dealbreaker regardless of how effective something is.

The BioHarmony profile quiz captures your health goals, sensitivities, and constraints. It adjusts dimension weights to reflect what actually matters to you.

The result: your version of the score. The same intervention might be a 7.9 for the general population and an 8.4 for you, because your profile emphasizes the dimensions where it excels. Or it could drop to 6.1 because your profile penalizes the exact dimensions where it’s weakest.

What This Score Is Not

The Outliyr BioHarmony Score is not medical advice. It’s a framework for thinking about health interventions more clearly. It doesn’t replace your doctor, your lab work, or your own judgment.

It’s also not a product review. I’m not testing purity or comparing brands within a category. Two magnesium glycinate products might differ dramatically in quality. This score tells you whether magnesium glycinate as an intervention is worth pursuing. Brand selection is a separate step.

It’s not a popularity contest. Sales volume, influencer endorsements, and Reddit hype don’t move the number. The inputs are structured evidence and dimensional analysis. Nothing else.

Scores reflect current evidence. They’ll change as new research publishes. That’s a feature, not a bug.

Start With Your Profile

The fastest way to make these scores useful is to personalize them.

👉 Take the BioHarmony Profile Quiz to get scores weighted to your health priorities, goals, and risk tolerance.

Already know the system? Browse all rated interventions to compare scores across supplements, devices, protocols, and therapies.

Frequently Asked Questions

What is the Outliyr BioHarmony Score?

The Outliyr BioHarmony Score is a transparent 0-10 rating that captures how worthwhile a health intervention is across 13 evidence-weighted dimensions. It covers everything from supplements and peptides to devices and lifestyle protocols. Unlike simple star ratings, it separately evaluates six upside dimensions (efficacy, breadth, evidence quality, speed, durability, bioindividuality) and seven downside dimensions (safety, side effects, cost, effort, opportunity cost, dependency, reversibility), then combines them using a harm-weighted expected value formula.

How is the BioHarmony Score calculated?

Each of the 13 dimensions is scored 1-5 based on available evidence. Upside and downside scores are each shifted by a baseline offset, multiplied by their weights, and summed. Downside sums receive a 1.4x harm multiplier before subtraction. The resulting expected value maps to a 0-10 scale where 5.0 is neutral. The full formula and every input are visible for each rated intervention.

What types of interventions does BioHarmony score?

Any health intervention with enough evidence to evaluate: supplements, prescription medications, medical devices, biohacking tools (red light therapy, cold plunge, neurostimulation), dietary protocols, exercise methodologies, and lifestyle practices. If it makes a health claim and has some evidence base, it can get a score.

How is this different from Examine.com or Labdoor?

Examine summarizes research but doesn’t produce a single cross-category score. Labdoor and ConsumerLab test product purity and label accuracy. The BioHarmony Score asks a different question entirely: given all known evidence across 13 dimensions, is this intervention worth your time, money, and risk? It’s the only system that applies a consistent, harm-weighted formula across intervention categories.

Can I personalize my BioHarmony scores?

Yes. Default scores use equal dimension weights. The BioHarmony profile quiz captures your health priorities, risk tolerance, and goals, then adjusts dimension weights accordingly. Your personalized scores reflect what matters most to you rather than population averages.

How often are scores updated?

Scores update when significant new evidence publishes. A major RCT, a safety signal from post-market surveillance, or a systematic review can all trigger a re-evaluation. Each score page shows its last-reviewed date and engine version.

Is the BioHarmony Score medical advice?

No. The score is an educational framework for structured thinking about health interventions. It does not replace diagnosis, treatment recommendations, or supervision from a qualified healthcare provider. Always consult your doctor before starting, stopping, or changing any health intervention, especially prescription medications.

Your Next Move

You now understand how the Outliyr BioHarmony Score works. The formula, the dimensions, the multipliers, the evidence hierarchy. None of that matters until you use it.

Start by browsing the scored interventions. Find something you’re already taking or considering. Look at the dimensional breakdown. See where it scores well and where it doesn’t. That one exercise will change how you evaluate every health decision going forward.

If you want scores calibrated to your priorities, take the BioHarmony profile quiz. It takes about two minutes and reweights every score based on what actually matters to you.

Your biology is unique. Your evaluation framework should be too.

Know someone who obsesses over supplement research? Send them this page.

The information on this page is for educational purposes only. It is not intended to diagnose, treat, cure, or prevent any disease. Consult a qualified healthcare professional before making changes to your health regimen.

Comparison methodology

A BioHarmony comparison is a head-to-head report that maps each use case to its winning intervention. Comparisons are dynamic: they pull live scores from every referenced intervention, so when a component is re-scored, every comparison that uses it refreshes automatically.

The verdict matrix is editorial, not formulaic. Each use case is independently evaluated against the available evidence; the winner is the intervention with stronger outcomes for that specific use case. Ties and “stack both” results are first-class outcomes, not placeholder values.

Every comparison includes citation-ready passages: self-contained paragraphs a journalist or language model can quote without losing meaning. A validator enforces this; comparisons that reference prior sections or use pronouns without clear antecedents are rejected before publish.

Confidence is inherited conservatively. A comparison’s confidence equals the lowest confidence across its component interventions. A high-confidence intervention paired with a low-confidence one yields a low-confidence comparison, because the decision depends on both sides being reliable.

Comparisons are re-reviewed every 90 days by default, and immediately whenever any component intervention’s BioHarmony score changes. That cadence keeps the verdicts current without forcing manual audit cycles.