This makes them ideal for code that creates dozens of thousands of instances in servers and controls them from threads. Of course, it requires a bit more code, as this is used directly and not within the scene tree.
If you mean the above statement, I find that vague. It says instances in servers. That suggests server side objects, manipulated in threads in godot.
OK my reason for raising an issue with creating threads in _physics_process and pending in physics process is that all this smacks of huge latencies introduced your engine fixed-rate calls which must be budgeted in with everything else you will be doing.
You can only demand it completes in a single frame if the processing time (the round trip to the server) of all threads completes in < 60th of second. I have found instancing scene stuff in _physics_process(), A Bad Idea. Threads may be no different. They are usually 'semi' costly to create and start up.
I also don't expect them run synchronously with your _physics_process() call or run at the same, lower or higher priority.
I think it might be cheaper to simply cast ray_test() N times in physics process.
I don't know of intersect_ray() is guaranteed to be 'immediate', that is, not subject to injecting a physics frame delay, so I might expect that could be a cause of your issue. I played around with lots of calls using direct state and at some point _physics_process was running well longer than 60th of a second. I think I had 15 or so calls going on and these were motion casts on shapes (more expensive than ray cast).
My worry about physics direct state had been that it might be state-full at least on the 'godot' side. But distinct instances show the same problem so it rules out this worry of mine.
I have experimented with physics-direct-state calls verses say using RayCast node or 'casting' KinematicBodies with move_and_slide in test motion mode, and found the latter faster and less hassle. At the very least, the calls and nodes have been written with those specific tasks in mind and they are running in C++ not a script.
RayCast is nice because you set it up, enable it and wait a physics_frame and it has nice collision report for you at any other time. Just read and enjoy and it is updated so long as it's enabled and in the scene.
So if you know about the RayCast Node, you might try the same thing you are doing manually but instance several ray-cast nodes in your scene and have each of them do the same thing ray_test() is.
Basically scoop up your code and stuff it in a RayCast.
RayCasts seem to only work if you make them look_at() their target (end) and set the cast_to vector to a scaled Vector3.FORWARD. The length should be from ray_cast origin to the target (end). Placing it at 'start' and having aim and end is the way to go. The look_at part may not be necessary for a statically position node, but i move mine around so it might caused some issue look_at solved
-- I think the thing to be said here is the ray-casts are happening on the server side. Throwing threads on the godot side doesn't change how the server decides to handle the multiple requests. If it runs them synchronously and in its own threads (cores) I don't really thing you are getting anything.