UTAU Sidequest
Written on January 26, 2026 by ._______166
Recently, I went on a few random sidetracks and ended up writing some tools for UTAU, Yes, That UTAU, And discovered some things.
This side project came about after I wrote a VocaDB metadata/YT-DLP wrapper tool for vocaloid songs, And I saw that I had OpenUTAU on my laptop from when I tried to get my computer to talk the last time (long story, It did not sound that good).
So I got a different voicebank (Momone Momo, Sorry about the fandom link, I couldn't find a better resource) and realised an issue: Momo (unlike Teto) is JP only, And (surprise) I don't speak Japanese. Luckily, OpenUTAU has a solution: a built-in EN-to-JP phonemizer.
And I wrote a small (at the time) script that would output note timings, Well..., That script eventually became libmomone and a note timings command and, At some point, An UTAU project generator (for TTS/Talkaloid). ,
So I thought that it was ready a week ago, I was wrong, As there was a bug in the initial version of libmomone. It used to calculate note length using the following line of Rust:
Well, That is actually wrong, As longer words were spoken too slowly (and as you could see by the
(Also, If you are wondering why
The code is here, It is written in Rust and was tested on Arch Linux using OpenUTAU 0.1.565 and 桃音モモ連続音Soft (2011)
Important Note!: I am not a music person, (I am an Android dev ffs), So I had no idea what I was doing going into this, And all this code is for it Talkaloid (Using singing synthesis software for talking), Not for making music
This side project came about after I wrote a VocaDB metadata/YT-DLP wrapper tool for vocaloid songs, And I saw that I had OpenUTAU on my laptop from when I tried to get my computer to talk the last time (long story, It did not sound that good).
So I got a different voicebank (Momone Momo, Sorry about the fandom link, I couldn't find a better resource) and realised an issue: Momo (unlike Teto) is JP only, And (surprise) I don't speak Japanese. Luckily, OpenUTAU has a solution: a built-in EN-to-JP phonemizer.
And I wrote a small (at the time) script that would output note timings, Well..., That script eventually became libmomone and a note timings command and, At some point, An UTAU project generator (for TTS/Talkaloid). ,
So I thought that it was ready a week ago, I was wrong, As there was a bug in the initial version of libmomone. It used to calculate note length using the following line of Rust:
note_lengths.push((note.replace(".","").replace(",","").replace("?","").replace("!","").replace("'","").len()*180).max(360).to_string());
Well, That is actually wrong, As longer words were spoken too slowly (and as you could see by the
.max(360) bit, Shorter words were too fast), So in f91ca56fbfd59e296b62348f9d14d1a403e061f3 I fixed it, It now does let mut note_length = (base + vowels.sqrt() * scale) as i32; to calculate it, And then it snaps it to UTAU's default 60-tick grid using note_length = snap_to_grid(note_length, 60);
(Also, If you are wondering why
snap_to_grid() does ((value + grid / 2) / grid) * grid, It is because of rounding, It halves it, Then divides the result by the grid, Then multiplies the result of that by the grid. It works, It looks dumb, But it works.)
The code is here, It is written in Rust and was tested on Arch Linux using OpenUTAU 0.1.565 and 桃音モモ連続音Soft (2011)
Important Note!: I am not a music person, (I am an Android dev ffs), So I had no idea what I was doing going into this, And all this code is for it Talkaloid (Using singing synthesis software for talking), Not for making music