What Western music science found, what Indian classical music still hasn't, and why the gap is real
In 2026, Di Marco, Loru, Galeazzi, Cinelli, and Quattrociocchi published a study in Nature Scientific Reports that did something musicology had never done at scale: they took roughly 20,000 MIDI files spanning six Western genres and four centuries, modeled every piece as a weighted directed network — notes as nodes, note-to-note transitions as weighted edges — and then measured that network's topology to quantify structural complexity.
The method: Each MIDI becomes a directed graph. A node for each distinct pitch (12-tone system). An edge from note x to note y with weight equal to how many times y followed x. Loops removed. Chords handled as complete bipartite graphs between simultaneous note sets. Then they measured five quantities on every graph:
• Density — fraction of possible transitions actually used
• Reciprocity — tendency of bidirectional transitions (ping-pong note patterns)
• Mean Node Entropy — how uniformly transitions are distributed from each note
• Global Efficiency — inverse of average shortest path; high efficiency = non-repetitive, varied sequences
• Weighted Efficiency — same idea but accounting for transition weights
Diverse note transitions, low reciprocity (not going back-and-forth), high entropy. Measurably different from each other as well as from everything else in the corpus.
Heavier repetition of specific transitions, more bidirectional ping-pong patterns, less topological variety in their note graphs.
Pre-1950 music was structurally richer and more differentiated by genre. Post-2000, everything converges toward the same simpler structural template. Classical and Jazz themselves have measurably simplified. The 1950–1979 era is the inflection point.
Fewer distinct note transitions, more repetition, less topological variety. A factory that used to make 500 parts now makes 50 — efficiently, but with less structural optionality.
"Since our analysis spans several centuries and diverse musical genres, notes represent the fundamental unit common to all of them. In contrast, the role and function of chords vary significantly across genres and centuries."
The primary source was the MetaMIDI Dataset — initially ~160,000 MIDI files. After filtering for six genres, duration >60 seconds, and parsability, they retained 21,480 unique pieces. Release dates were assigned to 72% of the corpus. For pre-1980 music, where Spotify lists remaster dates rather than originals, they used Google Gemini LLM to infer original release dates — validated against 100 manually annotated tracks. Gemini is more accurate for older music; Spotify for recent music.
Short answer: not at scale, not with networks, not yet. Here's the full landscape.
Asian Journal of Probability and Statistics
Built directed graphs and Markov chains for exactly two ragas — Raga Yaman and Raga Bhupali. Used a novel "raga-restricted operation" to enforce grammatical constraints. Demonstrates the framework is mathematically sound for ICM, but the corpus is two ragas. Proof of concept, not corpus analysis.
Most sophisticated data infrastructure in the field. Saraga dataset: largest annotated open corpus for Indian Art Music, with time-aligned melody, rhythm, and structural annotations for both Hindustani and Carnatic. compIAM toolkit: pitch tracking, tonic identification, beat detection, raga recognition. However — their approach is audio-feature-based (spectrograms, chromagrams, self-similarity matrices), not note-transition networks.
CNN-LSTM deep learning for raga identification from 191 hours of Prasar Bharti recordings across 144 ragas. Uses explainable AI. Again: audio → raga label, not structural graph analysis. They classify what raga a recording is, not what network topology a raga produces.
| Challenge | What It Means for Network Analysis |
|---|---|
| Continuous pitch / 22 shrutis | Indian classical uses microtones that don't map to 12-tone MIDI quantization. Glides (meend) are structural, not ornamental. MIDI representation kills the music's identity. |
| Improvisation is the corpus | Western classical has fixed compositions. ICM's "text" is the raga grammar + live performance. Every alap is unique — so what exactly is the "piece" you're graphing? |
| Raga grammar is a constraint system | Aaroh, avroh, vadi, samvadi, nyasa, pakad — these define legal note sequences. A generic network model would need to encode these rules or it would generate graphs that include impossible transitions. |
| No large MIDI corpus | MetaMIDI had 160K Western files to draw from. There's nothing comparable for Indian classical. Audio-to-MIDI transcription for ICM is an active research problem, not a solved one. |
| Two traditions, not one | Hindustani (North) and Carnatic (South) have different theoretical frameworks. A unified analysis would need to handle fundamentally different ontological assumptions. |
If someone wanted to replicate the Nature study's methodology for Indian classical music, here's the honest implementation path:
Use Saraga / Carnatic / Hindustani datasets with time-aligned pitch contours. Segment into discrete note events using existing compIAM pitch trackers. Build transition networks from those pitch segments. Apply the same graph metrics.
Problem: microtonal ambiguity, ornamentation noise, transcription errors. The line between "an ornament" and "a structural note" is genuinely contested in ICM theory.
Compile a corpus from digitized notation (Bhatkhande volumes, Indian music MIDI repositories). Build networks from the notated compositions. Apply the full graph metric pipeline.
Problem: written notation represents the grammar, not the performed reality. You're analyzing prescription, not description. Same raga, two artists, completely different networks — but the notation is identical.
Use Saraga annotations to extract "skeleton" note sequences from multiple performances of the same raga. Build per-raga networks from those skeletons across several performances. Compare across ragas, gharanas, and time periods.
Problem: requires someone who understands both computational network analysis and ICM music theory deeply. That's a very small intersection of skills.
If a well-executed version of this study were run on a decent Indian classical corpus:
Indian classical would blow Western classical out of the water on network complexity metrics. The raga system is explicitly designed to maximize melodic variety within a constrained note set — exactly the kind of structure that produces high entropy, low reciprocity, and dense transition graphs. A raga with a wide aroha-avroha and many vakra (zigzag) phrases would show extraordinary topological richness compared to a Pop song that essentially loops 4–6 chord-tone transitions.
Hindustani vs. Carnatic would show distinct topological signatures. Carnatic music's more rigid composition structure (krithis have fixed pallavi-chittai structure) vs. Hindustani's freer alap-tanabadant improvisation would map to measurably different density and efficiency patterns.
Gharana differences might be detectable as sub-cluster variations within a single raga's network space. The Kirana school's meend-heavy approach vs. the Gwalior school's more angular contours would produce different edge-weight distributions even on the same raga.
The "simplification over time" trend might not apply, or might reverse. Contemporary ICM performers often add more notes, faster tempi, and cross-raga experiments. Post-2000 Hindustani experimental music may show increasing complexity — a striking counterpoint to the Western finding.
This is a blue ocean — nobody has built the "network science of Indian classical" at scale. The closest related work is Thakur et al.'s two-raga proof of concept. Everything else is audio-feature classification, not structural graph analysis.
The Nature paper's methodology is genuinely applicable to Indian classical music — the core insight (model pieces as weighted directed note networks, measure topology) translates directly. The challenge isn't the method, it's the data infrastructure and musicological encoding.
The Western finding — music has been structurally simplifying and homogenizing — is itself interesting for ICM. If Indian classical also shows this trend, it would be a striking confirmation across traditions. If it doesn't, it defines exactly what makes ICM different from Western popular music at the network level.
Either result is publishable. That's rare in computational musicology.