Lipsync

From Milo Modding Wiki
Jump to navigation Jump to search

Every game Harmonix has released contains some form of lip sync.

Lip Sync Systems in Games

Guitar Hero

Guitar Hero has very simplistic lip sync system. The mouth is either closed or open and the singer always makes a shape saying “Ah”. How much the mouth opens cannot be controlled, it is 0% or 100%. The mouth is controlled by note 108 on the T1 GEMS track (the guitar track).

Guitar Hero II

For a feature that is hardly visible, Guitar Hero II has a fairly complex lip sync system.

All lip movements are now in a separate file that ends with the extension .voc. Harmonix could use the separated vocals to create lip movements. There are 16 different lip positions (called visemes). Each viseme is listed, along with time stamps that determine when to open the mouth. In addition, a “strength” value is introduced, meaning the mouth can open a little bit or all the way. Different visemes can be combined to create many mouth movements.

The different lip positions are named after the mouth shape they make.

The following is a list of all available lip movements:

Neutral Eat Earth If Ox Oat Wet Size Church Fave Though Told Bump New Roar Cage

Try saying some of these words and notice the shape your lips make (for multi-syllable words, it only makes the shape of the first syllable).

In addition to mouth movements, the voc file also contains data that controls the eyes and head. You can direct the singer to look a certain way, lift the eyebrows, and control the Pitch, Roll, and Yaw of the head.

All animations smoothly transition into one another from the moment a viseme is mentioned. For example, if at 0:35 a viseme has strength 0, and at 0:40 the same viseme’s strength is 85%, it will smoothly transition from 0 to 85% over those 5 seconds. The smoothness is determined by an integer in the header of the voc file which determines the frequency of the transition.

This feature is not always seen as lip sync only appears during certain camera cuts in-game. Since the camera cuts are randomized, you may not always see the singer singing, even if he’s in the shot!

Rock Band 1

Rock Band’s lipsync is very similar to Guitar Hero II. It still uses the voc file, and the base visemes are the same. However, a few more expressions have been added allowing for more emotion to be added.

Unlike Guitar Hero II, characters always seem to be singing, even if the camera is far away, unless the vocalist is told to idle in the MIDI file.

Rock Band 2

The lip sync system received an overhaul with Rock Band 2. Lip sync is now in a container file called .milo_xxx (where xxx is either 360, ps3, or wii). The structure of the file and how the game reads it has been changed a lot (more on that below in the structure section). In addition, all the visemes that control the lip movements have been split into hi and lo, allowing animators to animate the upper and lower lips separately for more control. No additional expressions seem to have been added and the lip sync always transitions at 30 frames per second. All songs that came out for Rock Band 1 received updated lip sync files for use in Rock Band 2 found on the disc.

Rock Band 3

Rock Band 3’s lip sync system is the same as Rock Band 2. However, the milo container can now have up to 4 lipsync files so that each band member can sing or show emotions separately.

Rock Band 4

From the release of Rock Band 4 until November 29th 2018, the lip sync data was pretty similar to Rock Band 3. However, all lipsync was moved into a file called rbsong which contained everything from the song’s metadata to animations for the venue. The actual lip sync data is almost the exact same as Rock Band 2 and 3, but some data was moved around.

After November of 2018, the song structure got an overhaul. Lipsync was moved out of the rbsong file and got its own file again, called lipsync_rb4. The visemes did not get updated, but how the game read the data was changed significantly again. Framerate was added back, so you could make transitions smoother or rougher.

Structure

VOC file

VOC Header
Name Type Size Function
FACE Byte Array 4 bytes Used to tell the game this file is for animation. All VOC files start with FACE
Sample Rate Integer 4 bytes Sample rate of the transition between visemes. 1500 (DC 05 00 00) seems to be used for all files in both GH2 and RB and represents a 15Hz sample rate
Unknown1 Short 2 bytes Unknown, seems to always be 1 (01 00)
Developer Name Integer + Byte Array At least 4 bytes The name of the developer. Is always Harmonix, but can be anything (or nothing). Size is always 4 bytes + the amount of characters in the name.
Unknown2 Short 2 bytes Unknown, seems to always be 1 (01 00)
String1 Integer + Byte Array At least 4 bytes A string of text. In Harmonix games, this always says "5 projects developed before 5/7/2007" but can have anything.
Unknown3 Integer 4 bytes Is always 1000 (E3 03 00 00)
Unknown4 Integer + Short 6 bytes 6 zeroes placed in a row. Function unknown.
VOC Type Short 2 bytes VOC Type, is 1 for lip sync, 0 for just face animations (for the guitarist in GH2 for example)
Song Name Integer + Byte Array At least 4 bytes A string of text describing the name of the file used to generate the VOC file.
Unknown5 Short 2 bytes Unknown, seems to always be 3 (03 00)
File Size Integer 4 bytes Complete file size of the VOC file
Unknown6 Short 2 bytes Unknown, seems to always be 0 (00 00)
Viseme Count Integer 4 bytes The number of visemes used in this VOC file
Viseme Structure Header
Name Type Size Function
Unknown1 Integer + Short 6 bytes 6 zeroes placed in a row. Function unknown
Unknown2 Short 2 bytes Unknown, seems to always be 1 (01 00)
Viseme Name Integer + Byte Array At least 4 bytes A string of text describing the name of the viseme
Unknown3 Integer 4 bytes Is always 0 (00 00 00 00)
Unknown4 Integer 4 bytes Is always 0 (00 00 00 00)
Event Number Integer 4 bytes The number of times this event changes
Viseme Event Structure
Name Type Size Function
Unknown1 Short 2 bytes Unknown, seems to always be 0 (00 00)
Time Float 4 bytes Time of the event in seconds shown as a float
Strength Float 4 bytes Strength of the event from 0 to 1 shown as a float. Non lip visemes can go up to 4 and be negative too down to -4
Unknown2 Integer 4 bytes Is always 0 (00 00 00 00)
Unknown3 Integer 4 bytes Is always 0 (00 00 00 00)

Repeat for the amount of times shown in the Event Number in Viseme Structure Header

VOC Footer

After going through all visemes, all VOC files end with the following bytes: 00 00 00 00 00 00 0A D7 23 3E AE 47 61 3E 00 00 00 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 FF FF FF FF

Lipsync file

Lipsync_RB4 file