getLyricsBySongId

Add support for synchronized lyrics, multiple languages, and retrieval by song ID.

Opensubsonic:

Extension

OpenSubsonic version: 1

OpenSubsonic extension name songLyrics (As returned by getOpenSubsonicExtensions)

Retrieves all structured lyrics from the server for a given song. The lyrics can come from embedded tags (SYLT/USLT), LRC file/text file, or any other external source.

http://your-server/rest/getLyricsBySongId

Parameters

Parameter	Req.	OpenS.	Default	Comment
`id`	Yes	Yes		The track ID.
`enhanced`	No	Yes	`false`	When `true`, the response includes `cueLine` arrays and non-main `kind` tracks (translations, pronunciations). When `false` or omitted, only `kind="main"` entries are returned with no `cueLine` data. Added in `songLyrics` version 2.

Special notes about the lang field

Ideally, the server will return lang as an ISO 639 (2/3) code. However, tagged files and external lyrics can come with any value as a potential language code, so clients should take care when displaying lang.

Furthermore, there is special behavior for the value xxx. While not an ISO code, it is commonly used by taggers and other parsing software. Clients should treat xxx as not having a specified language (equivalent to the und code).

Example

http://your-server/rest/getLyricsBySongId.view?id=123&u=demo&p=demo&v=1.13.0&c=AwesomeClientName&f=json

Result

A subsonic-response element with a nested lyricsList

Version 1 (default)

{
  "subsonic-response": {
    "status": "ok",
    "version": "1.16.1",
    "type": "AwesomeServerName",
    "serverVersion": "0.1.3 (tag)",
    "openSubsonic": true,
    "lyricsList": {
      "structuredLyrics": [
        {
          "displayArtist": "Muse",
          "displayTitle": "Hysteria",
          "lang": "eng",
          "offset": -100,
          "synced": true,
          "line": [
            {
              "start": 0,
              "value": "It's bugging me"
            },
            {
              "start": 2000,
              "value": "Grating me"
            },
            {
              "start": 3001,
              "value": "And twisting me around..."
            }
          ]
        },
        {
          "displayArtist": "Muse",
          "displayTitle": "Hysteria",
          "lang": "und",
          "offset": 100,
          "synced": false,
          "line": [
            {
              "value": "It's bugging me"
            },
            {
              "value": "Grating me"
            },
            {
              "value": "And twisting me around..."
            }
          ]
        }
      ]
    }
  }
}

<subsonic-response status="ok" version="1.16.1" type="AwesomeServerName" serverVersion="0.1.3 (tag)" openSubsonic="true">
  <lyricsList>
    <structuredLyrics displayArtist="Muse" displayTitle="Hysteria" lang="en" offset="-100" synced="true">
      <line start="0">It's bugging me</line>
      <line start="2000">Grating me</line>
      <line start="3001">And twisting me around...</line>
    </structuredLyrics>
    <structuredLyrics displayArtist="Muse" displayTitle="Hysteria" lang="en" offset="100" synced="false">
      <line>It's bugging me</line>
      <line>Grating me</line>
      <line>And twisting me around...</line>
    </structuredLyrics>
  </lyricsList>
</subsonic-response>

Does not exist.

Version 2 (`enhanced=true`)

When enhanced=true is passed, the response includes kind to classify lyric tracks, cueLine arrays with word/syllable-level timing, optional per-entry agents metadata for agent attribution, and additional tracks such as translations and pronunciations.

http://your-server/rest/getLyricsBySongId.view?id=456&enhanced=true&u=demo&p=demo&v=1.13.0&c=AwesomeClientName&f=json

{
  "subsonic-response": {
    "status": "ok",
    "version": "1.16.1",
    "type": "AwesomeServerName",
    "serverVersion": "0.1.3 (tag)",
    "openSubsonic": true,
    "lyricsList": {
      "structuredLyrics": [
        {
          "kind": "main",
          "lang": "ko",
          "synced": true,
          "line": [
            { "start": 2747, "value": "눈을 뜬 순간" },
            { "start": 6214, "value": "모든 게 달라졌어" }
          ],
          "cueLine": [
            {
              "index": 0,
              "start": 2747,
              "end": 6214,
              "value": "눈을 뜬 순간",
              "cue": [
                { "start": 2747, "end": 3018, "value": "눈" },
                { "start": 3018, "end": 3179, "value": "을" },
                { "start": 3179, "end": 3582, "value": " " },
                { "start": 3582, "end": 4100, "value": "뜬" },
                { "start": 4100, "end": 4500, "value": " " },
                { "start": 4500, "end": 5200, "value": "순" },
                { "start": 5200, "end": 6214, "value": "간" }
              ]
            },
            {
              "index": 1,
              "start": 6214,
              "end": 9000,
              "value": "모든 게 달라졌어",
              "cue": [
                { "start": 6214, "end": 6800, "value": "모" },
                { "start": 6800, "end": 7200, "value": "든" },
                { "start": 7200, "end": 7600, "value": " " },
                { "start": 7600, "end": 8000, "value": "게" },
                { "start": 8000, "end": 8400, "value": " " },
                { "start": 8400, "end": 9000, "value": "달라졌어" }
              ]
            }
          ]
        },
        {
          "kind": "translation",
          "lang": "eng",
          "synced": true,
          "line": [
            { "start": 2747, "value": "The moment I opened my eyes" },
            { "start": 6214, "value": "Everything had changed" }
          ]
        },
        {
          "kind": "pronunciation",
          "lang": "ko-Latn",
          "synced": true,
          "line": [
            { "start": 2747, "value": "nuneul tteun sungan" },
            { "start": 6214, "value": "modeun ge dallajyeosseo" }
          ],
          "cueLine": [
            {
              "index": 0,
              "start": 2747,
              "end": 6214,
              "cue": [
                { "start": 2747, "end": 3179, "value": "nuneul" },
                { "start": 3582, "end": 4100, "value": "tteun" },
                { "start": 4500, "end": 6214, "value": "sungan" }
              ]
            },
            {
              "index": 1,
              "start": 6214,
              "end": 9000,
              "cue": [
                { "start": 6214, "end": 7200, "value": "modeun" },
                { "start": 7600, "end": 8000, "value": "ge" },
                { "start": 8400, "end": 9000, "value": "dallajyeosseo" }
              ]
            }
          ]
        }
      ]
    }
  }
}

<subsonic-response status="ok" version="1.16.1" type="AwesomeServerName" serverVersion="0.1.3 (tag)" openSubsonic="true">
  <lyricsList>
    <structuredLyrics kind="main" lang="ko" synced="true">
      <line start="2747">눈을 뜬 순간</line>
      <line start="6214">모든 게 달라졌어</line>
      <cueLine index="0" start="2747" end="6214" value="눈을 뜬 순간">
        <cue start="2747" end="3018">눈</cue>
        <cue start="3018" end="3179">을</cue>
        <cue start="3179" end="3582"> </cue>
        <cue start="3582" end="4100">뜬</cue>
        <cue start="4100" end="4500"> </cue>
        <cue start="4500" end="5200">순</cue>
        <cue start="5200" end="6214">간</cue>
      </cueLine>
      <cueLine index="1" start="6214" end="9000" value="모든 게 달라졌어">
        <cue start="6214" end="6800">모</cue>
        <cue start="6800" end="7200">든</cue>
        <cue start="7200" end="7600"> </cue>
        <cue start="7600" end="8000">게</cue>
        <cue start="8000" end="8400"> </cue>
        <cue start="8400" end="9000">달라졌어</cue>
      </cueLine>
    </structuredLyrics>
    <structuredLyrics kind="translation" lang="eng" synced="true">
      <line start="2747">The moment I opened my eyes</line>
      <line start="6214">Everything had changed</line>
    </structuredLyrics>
    <structuredLyrics kind="pronunciation" lang="ko-Latn" synced="true">
      <line start="2747">nuneul tteun sungan</line>
      <line start="6214">modeun ge dallajyeosseo</line>
      <cueLine index="0" start="2747" end="6214">
        <cue start="2747" end="3179">nuneul</cue>
        <cue start="3582" end="4100">tteun</cue>
        <cue start="4500" end="6214">sungan</cue>
      </cueLine>
      <cueLine index="1" start="6214" end="9000">
        <cue start="6214" end="7200">modeun</cue>
        <cue start="7600" end="8000">ge</cue>
        <cue start="8400" end="9000">dallajyeosseo</cue>
      </cueLine>
    </structuredLyrics>
  </lyricsList>
</subsonic-response>

Does not exist.

Example with background vocals (agents + agentId)

When a source distinguishes both a lead/default vocal layer and background vocals within the same lyric line, the server emits a shared agents array on that structuredLyrics entry and splits the lyric into separate cueLines with the same index. Each cueLine references one agent via agentId, and the cueLine whose referenced agent has role: "main" comes first:

{
  "agents": [
    { "id": "lead", "role": "main", "name": "Lead Vocal" },
    { "id": "backing", "role": "bg" }
  ],
  "cueLine": [
    {
      "index": 0,
      "agentId": "lead",
      "start": 1000,
      "end": 3000,
      "value": "Hello echo",
      "cue": [
        { "start": 1000, "end": 1400, "value": "He" },
        { "start": 1400, "end": 1800, "value": "llo" }
      ]
    },
    {
      "index": 0,
      "agentId": "backing",
      "start": 1000,
      "end": 3000,
      "value": "Hello echo",
      "cue": [
        { "start": 2000, "end": 2500, "value": "echo" }
      ]
    }
  ]
}

<agent id="lead" role="main" name="Lead Vocal" />
<agent id="backing" role="bg" />
<cueLine index="0" agentId="lead" start="1000" end="3000" value="Hello echo">
  <cue start="1000" end="1400">He</cue>
  <cue start="1400" end="1800">llo</cue>
</cueLine>
<cueLine index="0" agentId="backing" start="1000" end="3000" value="Hello echo">
  <cue start="2000" end="2500">echo</cue>
</cueLine>

Does not exist.

Example with multiple agents (TTML-style attribution)

When a source has multiple named singers (e.g. a duet from TTML with ttm:agent and ttm:name), the server stores those identities once in agents and each cueLine references the relevant singer or group via agentId:

{
  "agents": [
    { "id": "lead", "role": "main", "name": "Chris Martin" },
    { "id": "guest", "role": "voice", "name": "Jin" },
    { "id": "choir", "role": "group", "name": "All" }
  ],
  "cueLine": [
    {
      "index": 0,
      "agentId": "lead",
      "start": 1000,
      "end": 4000,
      "value": "You and I",
      "cue": [
        { "start": 1000, "end": 1800, "value": "You " },
        { "start": 1800, "end": 2400, "value": "and " },
        { "start": 2400, "end": 3200, "value": "I" }
      ]
    },
    {
      "index": 1,
      "agentId": "guest",
      "start": 4000,
      "end": 7000,
      "value": "Under this sky",
      "cue": [
        { "start": 4000, "end": 4800, "value": "Un" },
        { "start": 4800, "end": 5400, "value": "der " },
        { "start": 5400, "end": 5900, "value": "this " },
        { "start": 5900, "end": 7000, "value": "sky" }
      ]
    },
    {
      "index": 2,
      "agentId": "choir",
      "start": 7000,
      "end": 10000,
      "value": "Together tonight",
      "cue": [
        { "start": 7000, "end": 8000, "value": "To" },
        { "start": 8000, "end": 8800, "value": "ge" },
        { "start": 8800, "end": 9200, "value": "ther " },
        { "start": 9200, "end": 10000, "value": "tonight" }
      ]
    }
  ]
}

<agent id="lead" role="main" name="Chris Martin" />
<agent id="guest" role="voice" name="Jin" />
<agent id="choir" role="group" name="All" />
<cueLine index="0" agentId="lead" start="1000" end="4000" value="You and I">
  <cue start="1000" end="1800">You </cue>
  <cue start="1800" end="2400">and </cue>
  <cue start="2400" end="3200">I</cue>
</cueLine>
<cueLine index="1" agentId="guest" start="4000" end="7000" value="Under this sky">
  <cue start="4000" end="4800">Un</cue>
  <cue start="4800" end="5400">der </cue>
  <cue start="5400" end="5900">this </cue>
  <cue start="5900" end="7000">sky</cue>
</cueLine>
<cueLine index="2" agentId="choir" start="7000" end="10000" value="Together tonight">
  <cue start="7000" end="8000">To</cue>
  <cue start="8000" end="8800">ge</cue>
  <cue start="8800" end="9200">ther </cue>
  <cue start="9200" end="10000">tonight</cue>
</cueLine>

Does not exist.

Response fields

Field	Type	Req.	OpenS.	Details
`lyricsList`	`lyricsList`	Yes	Yes	List of structured lyrics

Implementation notes

Backward compatibility

Without enhanced=true, the response is identical to version 1:

Only kind="main" entries are returned (the kind field itself is omitted)
No cueLine arrays are included
The existing line array is always present and unchanged
cueLine is a parallel structure, not a replacement for line

Servers that don’t support TTML or word-level timing simply never include these fields. Clients that don’t support karaoke display simply ignore them.

cueLine behavior

cueLine data is only meaningful when synced=true. Servers must not emit cueLine arrays for unsynced lyrics.
Within a cueLine, cue.end must be either present on all cues or none (all-or-nothing). When the source provides partial end times, servers must fill missing values. When no cues have end times, end is omitted from all cues. This is a documented contract rule; the OpenAPI schema does not encode the all-or-none shape structurally.
agents are scoped to a single structuredLyrics entry. When present, agents must contain at least one entry, and each agents[].id must be unique within that entry. agents are optional for simple unattributed single-layer lyrics. When a structuredLyrics entry represents multiple vocal agents/layers, it must include agents; a single-agent attributed/default entry may also include agents, and if it does, exactly one agent must use role: "main". agents should not be emitted without cueLine data.
When multiple cueLines share the same index, the cueLine whose referenced agent has role: "main" must come first. Clients should not assume every source can distinguish or emit multiple agents.
If agents is present, every cueLine in that entry must include agentId, and each agentId must match exactly one agents[].id in that entry. If agents is absent, cueLines must not include agentId.
Cues within a cueLine must not overlap (i.e. cue[n].end must be ≤ cue[n+1].start). Servers must normalize any source overlaps so that clients can iterate cues sequentially without overlap-resolution logic. Overlapping timing across different cueLines (different agentId values) is expected, since those represent parallel vocal layers.
Cues where start == end (zero-duration) may occur. Clients should treat these as instantaneous markers.
structuredLyrics entries are independent across kind tracks, including main. Clients should not assume 1:1 correspondence of line arrays or cueLine arrays between tracks.
Cue counts may differ across kind tracks for the same lyric passage. Clients should not assume 1:1 cue correspondence between tracks.
For right-to-left scripts (Arabic, Hebrew), cues are in logical reading order. Clients are responsible for bidi rendering.

Last modified March 26, 2026: feat: songLyrics extension version 2 — word/syllable-level timing (#218) (7072ab1)