Late to the party, so answering more for future reference.
Advances in the field + Mozilla's mindset and agenda led to these two projects towards that end:
The latter has a 12GB data-set for download. The former allows for training a model with your own audio files to my understanding