Try Orpheus TTS here
I don't see 150M parameter model, can anyone share it?
Generate natural-sounding speech from text