Currently the program is single-threaded. Due to the sequential nature of events, the scope for multi-threading is limited. Also, multi-threading is complex and can create difficult to pin down bugs, so there needs to be sufficient upside to go down that route.
Having said that, if I find an area that has performance issues and has potential for multi-threading, I will look into it at that point.
One thing that can cut down on the robustness issues is to do "SIMD" (Single-Instruction, Multiple Data) parallelism rather than "MIMD" (Multiple-Instruction, Multiple Data) parallelism when using threads.
MIMD is what most people think of/use when they're doing multi-threading: every thread is running its own little chunk of program, with different stack pointers and (potentially) different branching in the code. As you say, this is hard to keep robust, since the individual threads don't have Turing behavior, so you have to be careful about globals causing deadlock etc.
An example of SIMD is GPU programming, where you only do "data parallel" operations. The parallelism comes from running a small (in lines of code) loop that does the same operation on arrays of objects (rather than individual objects). For example foreach(ii){c[ii] = a[ii]*b[ii];}. The idea is that the CPU is running the same instruction on all the data in parallel, so you can't put any branches in the loop that depend on the data. These terms originally came from supercomputing; Connection Machine 2 was an example of a SIMD machine, while intel clusters can be configured to do MIMD (in a multi-processing context). The old "vector" machines like Crays also fall into the SIMD category in terms of structuring the program, although under the covers a vector machine is performing the operations in the loop in a slightly different way than a SIMD machine would - the CPU is able to start the next floating point operation in a vector loop before it's finished the one it's working on. ("vector" was synonymous with "array", and a vector operation was an operation performed in parallel on arrays of data).
The thing you have to be careful with in vector/SIMD programming is that the different iterations of the loop are truly independent - the 9th iteration can't depend on the results of the 5th iteration. For things like sensor detection this shouldn't be a problem; for movement you would need to be careful to separate out TGs that are following other TGs (and have better initiative) and do those in a second, follow-up loop. Otherwise the thread updating the following TG might execute before the TG it's following has moved and it might aim at the old location. This is MUCH simpler to conceptually manage, however, than the usual stuff with multi-threading involving global variables etc.
The reason I'm going here is that C# has a lovely little "threadpool" mechanism and "System.Threading.Tasks.Parallel" class that makes it dead simple to code up data parallel loops
https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/data-parallelism-task-parallel-library. The example they use in the link is overly ambitious in terms of MIMD vs. SIMD (since they're calling functions inside the body of the loop), but use can use the same stuff to code up small loops like I listed above and the use of the threadpool makes the overhead for launching the threads small. I realized a few years back that vector code (written for e.g. a Cray) could be directly transcribed to multi-threaded code simply by switching all the parallel loops over to using Parallel. Disclaimer: I don't think I've ever actually used the Parallel stuff in C#, but I've got colleagues who have.
The downside of the above is that the structure of the code tends to be very different - your core data objects when writing the logic of a function are vectors/arrays rather than elements of the vectors that are used in "scalar" code. In other words, scalar code tends to have a big outer loop over an index, then have very complicated logic with branching etc. inside that loop that is only working on that one slice of the data corresponding to the loop index. In vector code this is turned inside out (again think GPU programming) - the complicated logic is on the outside and the individual operations in that logic are performed on entire arrays at a time. So for example a double temporary variable in scalar code is a single double; in vector code it's an array of doubles. So this can lead to big rearrangement of code for e.g. scientific programs that need to be data parallel everywhere.
The good news is that Aurora probably has fairly isolated loops that can be rearranged in isolation, so if you find hotspots in e.g. sensors or movement, it should be fairly easy to parallelize them.
Apologizes if the stuff above is stuff you already knew about. Also, I'm not advocating you jumping in and parallelizing, just making a Public Service Announcement that it's available
Happy to discuss more if you're interested.
John