The audio is 19.5 seconds long, she says her first words at 1.5 seconds, but at 1.9 seconds (or rather ends at 1.9) the first word is "soki". The length of the word "soki" is 1.9 - 1.5 = 0.4. The length of the audio without the first words is 19.5 - 1.5 = 18, but also at the end he doesn't say the word "soki" for about 0.3 seconds, so 18 - 0.3 = 17.7 . Now we divide these two values to make it clear how many times he said this word: 17.7 / 0.4 = 44.25, rounded up to 44, and this means that he did not say the word "soki" a thousand times.