Computational Musicology · Research Analysis

Network Science of Indian Classical Music — The Unclaimed Frontier

What Western music science found, what Indian classical music still hasn't, and why the gap is real

Research date: May 17, 2026 Source: Nature Scientific Reports (s41598-026-42872-7) Context: Note-transition network analysis applied to Indian classical

The Western Baseline

In 2026, Di Marco, Loru, Galeazzi, Cinelli, and Quattrociocchi published a study in Nature Scientific Reports that did something musicology had never done at scale: they took roughly 20,000 MIDI files spanning six Western genres and four centuries, modeled every piece as a weighted directed network — notes as nodes, note-to-note transitions as weighted edges — and then measured that network's topology to quantify structural complexity.

The method: Each MIDI becomes a directed graph. A node for each distinct pitch (12-tone system). An edge from note x to note y with weight equal to how many times y followed x. Loops removed. Chords handled as complete bipartite graphs between simultaneous note sets. Then they measured five quantities on every graph:

• Density — fraction of possible transitions actually used
• Reciprocity — tendency of bidirectional transitions (ping-pong note patterns)
• Mean Node Entropy — how uniformly transitions are distributed from each note
• Global Efficiency — inverse of average shortest path; high efficiency = non-repetitive, varied sequences
• Weighted Efficiency — same idea but accounting for transition weights

Key Findings

Finding 01

Classical and Jazz are structurally distinct and most complex

Diverse note transitions, low reciprocity (not going back-and-forth), high entropy. Measurably different from each other as well as from everything else in the corpus.

Finding 02

Pop, Rock, Electronic, Hip Hop cluster together — low complexity, high reciprocity

Heavier repetition of specific transitions, more bidirectional ping-pong patterns, less topological variety in their note graphs.

Finding 03

Music has been simplifying over time

Pre-1950 music was structurally richer and more differentiated by genre. Post-2000, everything converges toward the same simpler structural template. Classical and Jazz themselves have measurably simplified. The 1950–1979 era is the inflection point.

Finding 04

Post-2000 music is more homogeneous

Fewer distinct note transitions, more repetition, less topological variety. A factory that used to make 500 parts now makes 50 — efficiently, but with less structural optionality.

"Since our analysis spans several centuries and diverse musical genres, notes represent the fundamental unit common to all of them. In contrast, the role and function of chords vary significantly across genres and centuries."

Dataset & Methodology

The primary source was the MetaMIDI Dataset — initially ~160,000 MIDI files. After filtering for six genres, duration >60 seconds, and parsability, they retained 21,480 unique pieces. Release dates were assigned to 72% of the corpus. For pre-1980 music, where Spotify lists remaster dates rather than originals, they used Google Gemini LLM to infer original release dates — validated against 100 manually annotated tracks. Gemini is more accurate for older music; Spotify for recent music.

Limitations the authors acknowledge

MIDI bias — dataset skews toward Western formal compositions, incomplete on contemporary underground and DIY scenes
Release date accuracy is shaky for pre-1980 material
Chord handling is crude (complete bipartite between simultaneous notes) — misses harmonic nuance
12-tone quantization may flatten microtonal variation

Has Anyone Done This for Indian Classical Music?

Short answer: not at scale, not with networks, not yet. Here's the full landscape.

The closest existing work

Thakur, Saluja & Ughade (2026)

Asian Journal of Probability and Statistics

Built directed graphs and Markov chains for exactly two ragas — Raga Yaman and Raga Bhupali. Used a novel "raga-restricted operation" to enforce grammatical constraints. Demonstrates the framework is mathematically sound for ICM, but the corpus is two ragas. Proof of concept, not corpus analysis.

CompMusic / MTG (UPF Barcelona)

Most sophisticated data infrastructure in the field. Saraga dataset: largest annotated open corpus for Indian Art Music, with time-aligned melody, rhythm, and structural annotations for both Hindustani and Carnatic. compIAM toolkit: pitch tracking, tonic identification, beat detection, raga recognition. However — their approach is audio-feature-based (spectrograms, chromagrams, self-similarity matrices), not note-transition networks.

Singh & Arora (2024)

CNN-LSTM deep learning for raga identification from 191 hours of Prasar Bharti recordings across 144 ragas. Uses explainable AI. Again: audio → raga label, not structural graph analysis. They classify what raga a recording is, not what network topology a raga produces.

Why nobody has done it at scale

Challenge	What It Means for Network Analysis
Continuous pitch / 22 shrutis	Indian classical uses microtones that don't map to 12-tone MIDI quantization. Glides (meend) are structural, not ornamental. MIDI representation kills the music's identity.
Improvisation is the corpus	Western classical has fixed compositions. ICM's "text" is the raga grammar + live performance. Every alap is unique — so what exactly is the "piece" you're graphing?
Raga grammar is a constraint system	Aaroh, avroh, vadi, samvadi, nyasa, pakad — these define legal note sequences. A generic network model would need to encode these rules or it would generate graphs that include impossible transitions.
No large MIDI corpus	MetaMIDI had 160K Western files to draw from. There's nothing comparable for Indian classical. Audio-to-MIDI transcription for ICM is an active research problem, not a solved one.
Two traditions, not one	Hindustani (North) and Carnatic (South) have different theoretical frameworks. A unified analysis would need to handle fundamentally different ontological assumptions.

How You'd Actually Do It

If someone wanted to replicate the Nature study's methodology for Indian classical music, here's the honest implementation path:

Option A — Audio-first (harder but more authentic)

Use Saraga / Carnatic / Hindustani datasets with time-aligned pitch contours. Segment into discrete note events using existing compIAM pitch trackers. Build transition networks from those pitch segments. Apply the same graph metrics.

Problem: microtonal ambiguity, ornamentation noise, transcription errors. The line between "an ornament" and "a structural note" is genuinely contested in ICM theory.

Option B — MIDI / Notation approach (cleaner but smaller)

Compile a corpus from digitized notation (Bhatkhande volumes, Indian music MIDI repositories). Build networks from the notated compositions. Apply the full graph metric pipeline.

Problem: written notation represents the grammar, not the performed reality. You're analyzing prescription, not description. Same raga, two artists, completely different networks — but the notation is identical.

Option C — Hybrid (probably the right answer)

Use Saraga annotations to extract "skeleton" note sequences from multiple performances of the same raga. Build per-raga networks from those skeletons across several performances. Compare across ragas, gharanas, and time periods.

Problem: requires someone who understands both computational network analysis and ICM music theory deeply. That's a very small intersection of skills.

What You'd Probably Find

If a well-executed version of this study were run on a decent Indian classical corpus:

Informed Hypothesis

Indian classical would blow Western classical out of the water on network complexity metrics. The raga system is explicitly designed to maximize melodic variety within a constrained note set — exactly the kind of structure that produces high entropy, low reciprocity, and dense transition graphs. A raga with a wide aroha-avroha and many vakra (zigzag) phrases would show extraordinary topological richness compared to a Pop song that essentially loops 4–6 chord-tone transitions.

Hindustani vs. Carnatic would show distinct topological signatures. Carnatic music's more rigid composition structure (krithis have fixed pallavi-chittai structure) vs. Hindustani's freer alap-tanabadant improvisation would map to measurably different density and efficiency patterns.

Gharana differences might be detectable as sub-cluster variations within a single raga's network space. The Kirana school's meend-heavy approach vs. the Gwalior school's more angular contours would produce different edge-weight distributions even on the same raga.

The "simplification over time" trend might not apply, or might reverse. Contemporary ICM performers often add more notes, faster tempi, and cross-raga experiments. Post-2000 Hindustani experimental music may show increasing complexity — a striking counterpoint to the Western finding.

For MINY Context

This is a blue ocean — nobody has built the "network science of Indian classical" at scale. The closest related work is Thakur et al.'s two-raga proof of concept. Everything else is audio-feature classification, not structural graph analysis.

Partner with CompMusic/MTG — they have the Saraga data, the annotations, and the audio processing tools. We have the application layer and the audience.
Start with a pilot on Saraga Hindustani + Carnatic — ~300+ hours of annotated audio, enough to build a credible per-raga network corpus.
Use compIAM pitch extraction → note segmentation → network construction — the pipeline is largely there, it just needs to be wired together and run.
Compare per-raga networks — density, reciprocity, entropy, efficiency. Publish the methodology and the dataset. This would be genuinely new science.
Cost estimate: 2–3 months of a research engineer who knows both Python music analysis and network science (igraph / networkx). The data exists. The tools exist. The idea is sitting there unclaimed.

Signal vs. Noise

The Nature paper's methodology is genuinely applicable to Indian classical music — the core insight (model pieces as weighted directed note networks, measure topology) translates directly. The challenge isn't the method, it's the data infrastructure and musicological encoding.

The Western finding — music has been structurally simplifying and homogenizing — is itself interesting for ICM. If Indian classical also shows this trend, it would be a striking confirmation across traditions. If it doesn't, it defines exactly what makes ICM different from Western popular music at the network level.

Either result is publishable. That's rare in computational musicology.