First every game is different requiring different methods. I can only attest what works well for me.
1) @cybereality is correct, from my experience
2) Node hierarchy is different for everyone, but what works best for me is to have a main node "world" that has the constant/persistent nodes e.g. player, camera etc.. for the world and or levels I just add and remove them as necessary. This keeps load times low and reduces ram usage...this also works well if you are streaming chunks in a large open world, or if you are just going through level by level keeping the previous level and next level in memory or loading them in the background.
3) AudioStreamPlayer is just a reference, like any other node AFAIK...what matters is how many streams are playing simultaneously...most computers can usually deal with it pretty well, that is, I have never run into issues with sound taking up too much of my resource pool. Like with anything else you can optimize sound with some clever techniques IF it gives you trouble..
4) this seems like either a bug, or something wrong in code... if you are using very large sound files I suppose it can do this when loading into memory, but I'm pretty sure they get preloaded... it's better to use many small sound files(like an orchestra) than a few very large ones. This can also be unrelated to the sound and something with the logic happening when the sound is playing. You can easily test this by assigning a key to trigger the sound while you are motionless...it's not the most accurate, but it could give you a hint that your logic is just too heavy...with many loops and lots of physics...or many deep statements(if else).
If you need more clarification feel free to ask more questions...the more specific you can be the better people here can help...good luck :)