Feedback

by asigalov61 - opened 20 days ago

I am posting here to keep things organized and relevant...

I've reviewed your inference code notebook so here is what I think:

First of all, I think you over quantizing onsets and durations! 32 bins is way too little for good continuations and especially for performance piano. I understand that you are budgeting and optimizing but the rule of a thumb is that you simply must have at least 128 bins for onsets and durations for good results. In fact, even 128 is barely enough which is why I use 256 in my Orpheus.
This is the most significant issue I found with your implementation, and this is why some of your model output samples do not sound smooth. This may also be a reason why you got a rather high loss as a result. Also, make sure that when/if you are quantizing, you are using int() and not round().

In regard to architecture: I think the model is too small for good results. I am assuming that you going to make it larger in the next release. I mean, GDN may be different in that regard but I still think its a universal rule. So consider at least 1024 dim with at least 8 layers. Also, double-check your implementation for problems. I understand that you are exploring never before used tech which may make it difficult to troubleshoot but there are still reliable ways: i.e try to overfit on a tiny homogenous dataset (i.e POP909) and see if it works well. In fact, since you seem to vary of scaling right away, I would highly recommend POP909 for your tests/demos. I can also recommend nice tiny classical datasets as well if you do not want POP.

In regard to posted samples: obviously, there is a lot of work that needs to be done. I mean the samples are very nice all things considered but many of them are far from a golden standard (MuseNet) so you definitely need to work towards improving the output. Since you are concentrating on piano continuations/long-term structure, definitely add bar tokens and if possible separate embeddings for onsets and durations. This what allowed MuseNet to be very good at long term structure. Please listen carefully to the reference samples I gave you (the MuseNet ones) so that you can have a good idea of what it should sound like. Specifically, listen carefully to Two Minutes to Midnight MuseNet sample and compare it to my Orpheus sample which allow you to clearly hear the difference between implementations, with MuseNet being superior still. Btw, do not mind that they are multi-instrumental - simply convert it to a solo piano - it does not really matter in this case.

Anyways, this is all for now. I would be happy to elaborate more if you have any questions.

Please keep me posted on your progress and updates. I find your work very interesting.

Sincerely,

Alex.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment