Pretty good!
The biggest issue with this is definitely timing the animation with the audio. I've seen this happen to so many animations, and tbh it's a really simple problem to fix, and if done correctly can even help the animation do better in views! You even tried doing lipsync, (which I applaud the effort, barely anyone even wants to touch that) so don't waste all of that effort over desynchronized audio!
I suggest you do animations with scenes all within one card, as if there's too many scenes, there's gonna be a lot of cards.