Obsidian/Speech synthesis.md

## Goal
The goal is to write a text to speech program, where all sounds are purely created through mathematical equations, in order to create my own true 100% machine speech synthesis.

## Inspirations
These inspirations include some newer products and some very old software solutions which are practically impossible to obtain and as such have been almost completely forgotten.
### [MUSIC-N](https://en.wikipedia.org/wiki/MUSIC-N)
It was used to create [this famous piece of music](https://www.youtube.com/watch?v=41U78QP8nBk). If my guess is correct, this should be made using Formant synthesis. Since this is a product of Bell Labs and because of it's age, it probably lives in an archive somewhere or maybe has even been lost.
### [gnuspeech](https://savannah.gnu.org/projects/gnuspeech)
It makes use of Articulatory synthesis and is an open source project. Sadly it only runs on extremely old versions of Mac OS X and on the BSD-based NeXTSTEP, which is closed source and as such practically impossible to get. These factors have led to gnuspeech being unmaintained and mostly forgotten.
### [UTAU](http://utau2008.web.fc2.com/)
I really like Utane Uta (Defoko) and Adachi Rei. Both of them are completely synthesised and make use of no Voicebanks.
### [Vocaloid](https://www.vocaloid.com/en/)
Not the kind of speech synthesis I am going for, but I still really like them.

### SAM
I currently cannot easily figure out which type of synthesis it is even after taking a short look at the source code of [this newer implementation](https://github.com/s-macke/SAM). My current guess is, that it is using some form of Concatenation synthesis, since it is combining phonemes to create words.

## Planning
### Spoken Language
The language is one of the most important aspects of this program and as such needs to be carefully chosen. The possible languages from which I wanted to choose from where German, English or Japanese.
In order to help me which of those three languages I should choose, I came up with these requirements.
- Small alphabet in order to minimise the amount of work required
- Individual letters stitched into words together should require no further processing in order to be understandable
- The words need to be understandable even without proper or any emphasising at all
Since I speak German and English fluently and they have a relatively small alphabet, I chose to do both, but I am going to start with English first.

### Programming Language
There are a many options, but to help narrow it down, I came up with a few requirements.
- should be functional (Haskell for example) or procedural (C for example)
- compiles to binary (should require no interpreter, so no python or java for example)

### What type of speech synthesis
When I looked on [Wikipedia](https://en.wikipedia.org/wiki/Speech_synthesis#Synthesizer_technologies) I saw that there are multiple types of speech synthesis. 
Since I am restricting myself to pure machine synthesis, I can only choose between two types.
- Formant synthesis
- Articulatory synthesis
I really wanna do both, but I am going to start with Articulatory synthesis. I am only doing it first, because it seems way cooler to me.
initial commit 2024-10-21 22:52:33 +02:00			`## Goal`
			`The goal is to write a text to speech program, where all sounds are purely created through mathematical equations, in order to create my own true 100% machine speech synthesis.`

small progress on speech synthesis research 2024-10-25 00:58:30 +02:00			`## Inspirations`
			`These inspirations include some newer products and some very old software solutions which are practically impossible to obtain and as such have been almost completely forgotten.`
			`### [MUSIC-N](https://en.wikipedia.org/wiki/MUSIC-N)`
			`It was used to create [this famous piece of music](https://www.youtube.com/watch?v=41U78QP8nBk). If my guess is correct, this should be made using Formant synthesis. Since this is a product of Bell Labs and because of it's age, it probably lives in an archive somewhere or maybe has even been lost.`
			`### [gnuspeech](https://savannah.gnu.org/projects/gnuspeech)`
			`It makes use of Articulatory synthesis and is an open source project. Sadly it only runs on extremely old versions of Mac OS X and on the BSD-based NeXTSTEP, which is closed source and as such practically impossible to get. These factors have led to gnuspeech being unmaintained and mostly forgotten.`
			`### [UTAU](http://utau2008.web.fc2.com/)`
			`I really like Utane Uta (Defoko) and Adachi Rei. Both of them are completely synthesised and make use of no Voicebanks.`
			`### [Vocaloid](https://www.vocaloid.com/en/)`
			`Not the kind of speech synthesis I am going for, but I still really like them.`

			`### SAM`
			`I currently cannot easily figure out which type of synthesis it is even after taking a short look at the source code of [this newer implementation](https://github.com/s-macke/SAM). My current guess is, that it is using some form of Concatenation synthesis, since it is combining phonemes to create words.`

initial commit 2024-10-21 22:52:33 +02:00			`## Planning`
small progress on speech synthesis research 2024-10-25 00:58:30 +02:00			`### Spoken Language`
initial commit 2024-10-21 22:52:33 +02:00			`The language is one of the most important aspects of this program and as such needs to be carefully chosen. The possible languages from which I wanted to choose from where German, English or Japanese.`
			`In order to help me which of those three languages I should choose, I came up with these requirements.`
small progress on speech synthesis research 2024-10-25 00:58:30 +02:00			`- Small alphabet in order to minimise the amount of work required`
			`- Individual letters stitched into words together should require no further processing in order to be understandable`
initial commit 2024-10-21 22:52:33 +02:00			`- The words need to be understandable even without proper or any emphasising at all`
small progress on speech synthesis research 2024-10-25 00:58:30 +02:00			`Since I speak German and English fluently and they have a relatively small alphabet, I chose to do both, but I am going to start with English first.`
initial commit 2024-10-21 22:52:33 +02:00
			`### Programming Language`
small progress on speech synthesis research 2024-10-25 00:58:30 +02:00			`There are a many options, but to help narrow it down, I came up with a few requirements.`
			`- should be functional (Haskell for example) or procedural (C for example)`
			`- compiles to binary (should require no interpreter, so no python or java for example)`

			`### What type of speech synthesis`
			`When I looked on [Wikipedia](https://en.wikipedia.org/wiki/Speech_synthesis#Synthesizer_technologies) I saw that there are multiple types of speech synthesis.`
			`Since I am restricting myself to pure machine synthesis, I can only choose between two types.`
			`- Formant synthesis`
			`- Articulatory synthesis`
			`I really wanna do both, but I am going to start with Articulatory synthesis. I am only doing it first, because it seems way cooler to me.`