I find your game marvelous and want to thank you creating this amazing world!

However from several feedback I've concluded it is generally heavily lacking one specific trait which drags the immersive experience down one level. It feels like a silent film... which is completely understandable considering the estimated cost for hiring the number of dubbers for the amount of dialogue.

Nevertheless there is one option that is very much affordable mostly for dialogue-rich games like GotT and for example Beholder 2 are. Beholder 2 uses this technique of voiceover which makes the make much more immersive than without any voicover at all.

The idea behind this technique is simply that you don't have to have every single sentence completely dubbed. A made up language is used instead that sounds more like babbling or Simlish used in The Sims games. This way each character needs only a few sets of generic samples for respective type of response matching the avatar pose/expression. For example you will do plentifully with having 3 samples for each of expressions sounding surprised or angry or amused and so on. I did not count exactly how many types of avatar expressions characters have but let's estimate it to 10 for easy computing.

10 expressions
3 samples to randomly choose from for varied impression
15 characters
450 samples that don't even have to be as long a as the full sentences and this is probably the highest possible result which may easily be halved in the end.

The made up language would fit perfectly into this world where speaking in sounds makes much more sense than English. Frog, mice and rats could all sound differently to match their natural noises.

To hear what this technique sounds like listen to Youtube video named
I could not find any other with audible voices.
I recommend skipping to 41:55 from where it is easily heard though the samples are not as varied in beta as in the final release.

Does it sound doable?
Do you want it to happen?
