Science

AI vs. the Scientific Method: How Machine Learning Is Changing What Counts as an Explanation

By Leo Hart • 7 min read

1,510 469

AI vs. the Scientific Method: How Machine Learning Is Changing What Counts as an Explanation

When Equations Aren’t the Endgame Anymore

Science has long had a preferred narrative structure: observe, hypothesize, derive equations, test. Clean, compact formulas were the gold standard of understanding.

Machine learning doesn’t care about that story.

Deep neural networks now:

- Predict protein structures at scale
- Design new materials and catalysts
- Emulate turbulent flows and climate systems

Often, they do this **without giving us a neat equation or mechanism in return**.

> “We’re getting answers without reasons,” notes a 2023 editorial in *Nature Machine Intelligence*. “That sits uneasily with how science thinks of itself.”

This isn’t a philosophical sideshow. It’s starting to reshape what counts as a result, a model, and an explanation.

---

Black Boxes That Outperform Hand-Built Models

Consider three high-impact use cases.

1. Protein Folding

AlphaFold and its successors cracked a problem that chewed up biophysics for 50 years.

- Input: amino acid sequence
- Output: likely 3D structure

The model doesn’t encode explicit physical laws. It learns **statistical regularities** from vast structural datasets.

Did we “understand” folding better? In a practical sense, yes. In a mechanistic sense, not necessarily.

2. Climate and Weather

ML surrogates now emulate components of climate models and accelerate high-resolution weather prediction.

Some groups train neural networks directly on satellite and reanalysis data to forecast weather **as well or better than traditional models**, but through pattern recognition rather than fluid dynamics equations.

3. Materials and Chemistry

Graph neural networks and transformer models predict:

- Band gaps
- Reaction pathways
- Battery performance

Often they outperform hand-crafted descriptors and DFT approximations, with far less compute at inference time.

These tools are becoming **standard lab equipment**. But they force a question: is a model that predicts well but explains poorly “good science”?

---

The New Currency: Predictive Accuracy vs. Mechanistic Insight

Historically, scientific value was tightly linked to **mechanistic narratives**:

- Why does this happen?
- Which variables matter and how?

ML can shortcut to:

- What will happen next?
- Under what conditions will this break?

> “The hierarchy is flipping,” argues philosopher of science Sabina Leonelli. “In some domains, prediction now leads explanation, not the other way around.”

That shift has consequences:

- Funding panels and journals must judge work where the key artifact is a **trained model**, not an equation.
- Policymakers may rely on forecasts whose **internal logic isn’t fully transparent**.

Some fields, like high-energy physics, still demand mechanistic depth. Others, like drug discovery and climate risk, are willing to trade some interpretability for speed and accuracy.

---

Opening the Black Box: Interpretability as a Scientific Tool

Researchers aren’t just shrugging and trusting opaque models. A parallel effort is turning **interpretability techniques** into scientific instruments.

Tools in play:

- **Feature importance and attribution** – Which inputs drive predictions most?
- **Saliency maps** – Which regions of an image or structure matter?
- **Latent space probing** – How abstract variables cluster and correlate

Real-world payoffs:

- In genomics, ML models trained on regulatory sequences can highlight **motifs** that look like previously unknown transcription factor sites.
- In neuroscience, deep networks trained on visual tasks develop **internal representations** that mirror activity in the visual cortex.

> “We can mine these models as hypotheses generators,” says a 2022 *Neuron* review. “They’re compressed summaries of how nature’s data behaves.”

Used carefully, explainable AI becomes a **new microscope for patterns**, not an oracle.

---

When AI Starts Designing Experiments

Machine learning isn’t just crunching data; it’s starting to **select what data to collect**.

Examples:

- **Active learning** systems propose the next experiment that will maximally reduce uncertainty.
- **Bayesian optimization** loops tune experimental conditions (temperature, pressure, composition) to hit a target property.
- **Robotic labs** execute those plans autonomously, 24/7.

Closed-loop systems have:

- Discovered new catalysts and alloys
- Optimized quantum device parameters
- Shortened search timelines from months to days

This changes the role of human judgment:

- From hand-picking each experiment to **specifying objectives, constraints, and priors**
- From direct tinkering to **monitoring and steering** algorithmic exploration

The scientific method doesn’t disappear, but **who (or what) navigates hypothesis space** is shifting.

---

New Failure Modes: Spurious Patterns and Data Hazards

The upside is real. So are the traps.

Key risks:

1. **Overfitting to messy data**
If your training set encodes lab biases, instrumentation quirks, or sampling gaps, the model just bakes them in.

2. **Shortcut learning**
Models may rely on correlates, not causes (e.g., inferring material stability from synthesis lab ID instead of composition).

3. **Distribution shift**
Predictions degrade silently when extrapolating beyond the data regime — which is *exactly* what science often needs.

> “We risk mistaking clever curve-fitting for discovery,” warns a *PNAS* commentary on AI in physics.

Good practice now includes:

- Rigorous **out-of-distribution testing**
- Synthetic benchmarks that stress extrapolation, not just interpolation
- Publishing datasets and code for **reproducibility and auditing**

---

How This Redraws Scientific Roles

This isn’t just tooling; it’s job design.

Shifts underway:

- **From single-discipline experts to hybrids** – Physicist–programmer, chemist–data scientist, biologist–ML engineer
- **From small bespoke scripts to shared ML infrastructure** – Lab groups relying on centralized model hubs and data lakes
- **From individual intuition to team-based priors** – Encoding domain knowledge into loss functions, architectures, and constraints

Journals and conferences are scrambling to keep up:

- Demanding **baselines against physics-based models**
- Requiring **ablation studies and interpretability analysis**
- Debating how to credit **model architects vs. experimentalists vs. data curators**

---

What to Watch Next

If you care about how AI is reshaping the scientific method itself, keep an eye on:

1. **Hybrid models** – Architectures that explicitly embed physical laws (symmetries, conservation) into neural networks.
2. **Standards for ML-based claims** – Community norms on what constitutes sufficient explanation, validation, and openness.
3. **Automated discovery platforms** – Fully closed-loop labs that publish results with minimal human intervention.
4. **Policy reliance** – How much regulators lean on ML-driven climate, epidemiology, and economic models.

The core tension won’t vanish: humans want **stories about why**, not just **predictions about what**.

AI doesn’t solve that. It forces science to decide where — and when — each is non-negotiable.