Lyrical thoughts and moans and groans

HMMMM

Active member
I’ve just finished attempting to add lyrics and synchronize them with the music on my website. It’s a mess, but it has been fun! It all boils down to synchronizing lyrics.

Can anyone create a system that will automatically synchronize lyrics with songs across a large library (like 40,000 songs) using AI and metadata such as ID3 tags? This would involve:

  1. Reading the metadata (ID3 tags) from MP3 files.
  2. Fetching or generating the lyrics for each song.
  3. Synchronizing the lyrics with the song’s timeline (timestamps for each line or verse).
  4. Embedding the lyrics back into the MP3 file with synchronized timing.

Plan Outline for the Project:​

This could be broken down into key components, and the tool would be designed to:

  1. Read the MP3 file’s metadata to get song details like artist, album, title, etc. This will help us match the song to its lyrics.
  2. Use an AI-powered tool or service to fetch lyrics for the given song. You could use databases like Genius API or Musixmatch API to get lyrics based on song metadata. Optionally, a model could be trained to extract lyrics from the song itself if they aren’t available.
  3. Sync the lyrics to the music: This is the most complex part. You would need to analyze the audio of the song to sync the lyrics with the music. Using audio analysis tools (like SonicAPI, Aubio, or machine learning models) to identify when specific words or phrases are sung would work. These timestamps would need to be calibrated to match the flow of the song accurately.
  4. Embed the lyrics into the MP3: After synchronizing the lyrics, we need to embed them back into the MP3 file using the ID3 tags (specifically the USLT (Unsynchronized Lyrics) frame for lyrics, or if using LRC format, embed that too).
  5. Batch processing for efficiency: The program will need to handle batch processing efficiently to process thousands of songs at once. You could run it in parallel for multiple files to speed up the process.

Key Steps in More Detail:​

Step 1:Use a Python library like eyed3 or tinytag to extract MP3 file metadata.

Example:

python
CopyEdit
<span><span>import</span> eyed3<br><br><span>def</span> <span>get_mp3_metadata</span>(<span>mp3_path</span>):<br> audio_file = eyed3.load(mp3_path)<br> title = audio_file.tag.title<br> artist = audio_file.tag.artist<br> album = audio_file.tag.album<br> <span>return</span> title, artist, album<br></span>
Step 2:For fetching lyrics from an API (e.g., Genius API), you can use the song title and artist to search.

Example using requests:

python
CopyEdit
<span><span>import</span> requests<br><br><span>def</span> <span>fetch_lyrics</span>(<span>song_title, artist_name</span>):<br> url = <span>f"https://api.genius.com/search?q=<span>{song_title}</span> <span>{artist_name}</span>"</span><br> headers = {<span>"Authorization"</span>: <span>"Bearer YOUR_ACCESS_TOKEN"</span>}<br> response = requests.get(url, headers=headers)<br> json_data = response.json()<br> song_path = json_data[<span>'response'</span>][<span>'hits'</span>][<span>0</span>][<span>'result'</span>][<span>'url'</span>]<br> lyrics = extract_lyrics_from_url(song_path) <span># You'd have to scrape the lyrics from the song URL</span><br> <span>return</span> lyrics<br></span>
Step 3:For this, you'd need audio analysis. This can be done using libraries like Aubio or a custom deep learning model trained on song lyrics and timings.

A simpler method would involve:

  • Audio segmentation to identify the tempo and beats.
  • Speech-to-text or lyric recognition tools to detect the timing of lyrics.For more precision, consider using an existing deep learning model or a tool like SonicAPI, which can generate timestamps for lyrics.
Step 4:After synchronizing the lyrics with timestamps, embed them into the MP3. For this, you’d use the eyed3 library to add or update ID3 tags.

Example:

python
CopyEdit
<span><span>import</span> eyed3<br><br><span>def</span> <span>embed_lyrics_in_mp3</span>(<span>mp3_path, lyrics</span>):<br> audio_file = eyed3.load(mp3_path)<br> frame = audio_file.tag.frame_set(eyed3.id3.frames.USLT)<br> frame.set_text(lyrics) <span># Set the lyrics with timing</span><br> audio_file.tag.save()<br></span>
Step 5:To handle 40,000 songs, you can run the script in batches, processing files in parallel (using multiprocessing or an async approach).

Example:

python
CopyEdit
<span><span>from</span> concurrent.futures <span>import</span> ThreadPoolExecutor<br><br><span>def</span> <span>process_mp3_file</span>(<span>mp3_file</span>):<br> title, artist, album = get_mp3_metadata(mp3_file)<br> lyrics = fetch_lyrics(title, artist)<br> sync_lyrics = synchronize_lyrics(lyrics, mp3_file)<br> embed_lyrics_in_mp3(mp3_file, sync_lyrics)<br><br>mp3_files = [<span>"song1.mp3"</span>, <span>"song2.mp3"</span>, <span>"song3.mp3"</span>]<br><span>with</span> ThreadPoolExecutor() <span>as</span> executor:<br> executor.<span>map</span>(process_mp3_file, mp3_files)<br></span>

Challenges and Considerations:​

  • Accuracy of Lyrics Sync: Automatically syncing lyrics is quite challenging, especially with songs that have fast lyrics or varying tempos. Using pre-trained models for speech recognition or music-specific analysis would help, but it may require fine-tuning.
  • Handling Missing Lyrics: Not every song will have available lyrics through APIs. A fallback method could involve generating approximate timing based on audio features.
  • Performance: Processing 40,000 songs requires optimization. Ensure you’re handling memory usage, concurrency, and large file sizes effectively.

Tools/Libraries to Consider:​

  • eyed3: For reading/writing ID3 tags.
  • requests: For interacting with external lyric APIs.
  • Aubio: For audio analysis and tempo detection.
  • SonicAPI or Google Speech-to-Text: For more advanced audio-to-lyrics synchronization.
  • multiprocessing: For parallel processing.

Conclusion:​

Creating an executable that processes 40,000 songs, extracts metadata, fetches lyrics, syncs them, and embeds them back into the MP3s is definitely feasible. The core challenge is syncing the lyrics with the audio, but leveraging existing APIs and audio analysis libraries can simplify this process.

 
My new website is a bit more complicated. It uses AJAX to spawn a new copy of itself, embedding server-run PHP code, which restarts the whole process. I'm encountering some teething problems, including CSS issues and a lyrics popup that isn't working.

The reading and dumping to SQL are faster now, and I've hooked the lyrics into JavaScript on the backend. The idea was to time the lyrics by breaking them down into sentences and calculating the duration for each word. This would then let me multiply up the sentence to get the timings. It would have been great, but the issue is that the songs don't start with lyrics but have an iteration. I wondered if these timings could be acquired but have not found anything to date. Have you any ideas?

hugs D x
 
  • Time A: The timestamp when the last lyric finishes (the end of the lyrics).
  • Time B: The timestamp when the song actually ends (the very last note, beat, or fade-out of the song).
I’ve had another thought, and I believe this is doable.

Time A would be the timestamp when the last lyric finishes (the end of the lyrics), and Time B would be the timestamp when the song actually ends (the very last note, beat, or fade-out). This would give us the amount of time before the lyrics start, essentially calculating the intro length of the song, excluding the outro.

Knowing the total length of the lyrics can be calculated by determining the number of words in the sentence multiplied by the interval time for each word, which can be determined by dividing the total number of words by the length of the song. If we have the song length and the timings for each sentence, we should be able to work out the first intro before the lyrics.

For example, if the entire song’s lyrics take 8 seconds per word (based on BPM and track length) and there are 23 words in the song, we can calculate the total length of the lyrics as: 8 seconds × 23 words = 184 seconds. If the total song length is 202 seconds, we can then subtract the length of the lyrics (184 seconds) from the total song length (202 seconds), giving us 18 seconds of intro before the lyrics start.

Hugs D
 
Back
Top