Author Topic: Technical Suggestion: Data to Ram (Read 1755 times)

Ninetails · « **on:** August 23, 2014, 04:20:24 AM »

First, I must say that I both love and hate aurora. I have already spent many hours (maybe 100-200) playing this game, and it has amazingly deep gameplay and depth, but sadly there are some problems which drag the game quite a bit down from its potential.
Personally I find the performance and UI to be where the main trouble is. Luckily there are a few things that could be done which would masively help these, especially performance.

From what I can tell, it seems as if the main performance bottleneck comes from reading and writing to the database which is placed on the harddisk. The problem here is two fold. First of all, reading and writing to secondary memory, such as a harddrive is massively slower than doing the same to primary memory (RAM). The second problem is that even if one uses an solid state drive to increase the read time, you run into problems because soli state drives are fairly slow in terms of writing, and they wear down quire fast for it. With so many writes to the harddisk (since there seem to be a write for every action you take) I even fear that users using solid state drives will significantly shorten the lifetime of the drives, which might cause problems for the other data on the drives. The saving grace is that the data in the database is actually really small by todays standards. I took a quick look at the actual size of the database, and my current database is 78Mb, which include several campains of up to 150 years into the game, while my backup of some older campains is 50Mb. This is extremely small, it is barely much larger than the data needed to just keep images of the varius windows you may have open during the game. Since the standard resolution is around 1200x1000, there would be a bit more than a million pixel for each window. Each pixel would require around 1-4 bytes of data in terms of colors, depending on the quality of colors used. If you had just a few windows open you would easily reach 5Mb of data for the minimum settings, while someone playing with 1900x1200 resolution and 32 bit colors would easily end up using say 40-100Mb of memory just to keep the images of the different windows in the memory. Add to this, that there are several compains stored in my current database, and it ends up with the data in the database only corrisponding to a small fraction of the data used just to display the windows on the screen. All in all, it means there is not really any reason to keep all that data in a large but slow memory format, and one could move all the data into the ram, and for most computers there would barely be seen any difference in the amount of memory used by the program, especially once one considers that computers have RAM in the gigabyte scale for perhaps around the last decade.

So how much of a performance increase would this amount to? Roughly speaking a single or a few sequentially read(s) on a RAM would take in the order of 50 nanoseconds, while the same could easily take a few miliseconds on a harddrive. If memory was still a bottleneck afterwards (or close to), then this would amount to a performance increase of the game by a factor of 100 000 times. It could shorten a 45 min long turn to be done within a single frame. And turns previously taking more than 24 hours could still be done in less than a second. No more people being anoyed at NPR's fightig each other in subintervals, no more mins/hours long turns. In fact it might even be so fast, that it would be necessary to introduce artificial wait time between each auto turn, just so that the player has a chance to react. In pratice there are actually some technical details that might move this performance increase in both dirrections. First off, there might be some other bottleneck to be encountered. It is however fairly unlikely that such a bottleneck is anywhere close to the same scale, and we should still easily be able to expect a performance increase in the order of 100-10 000 times. The next thing to consider is that good use of the sequential access can serverely reduce the overall read/write time of larger amounts of data, in theory. In pratice, the frequent use of most of the data during a turn increment is done in different phases and in different ways, so we cannot expect a huge increase from this, and also partly for the simple reason that we are not working with very large amounts of data. Better though, is that the amount of data we are using, is only a few factors larger than what could be inside the larges caches of a new computer, which means that some parts of the data used frequently could experiance an even larger performance increase.

I hope that you will take this suggestion, and end up massively improving the performance of aurora, so that we can all spend more of our time enjoying your amazing aurora.

Kind regards
Ninetails - computer science student.

sloanjh · « **Reply #1 on:** August 23, 2014, 08:42:42 AM »

Hi Ninetails,

Thanks for posting. Since you identified yourself as a CS student, I've got a few comments for you to (hopefully) help you with your learning process:

1) Many years ago Steve went through an exercise to eliminate database access during the timesteps of an update (when most of the work is done). So in large part he's already done what you suggest. So when you say "From what I can tell, ..." are you basing this on profiling information while running Aurora, or just things you've read on the board? If you look on your performance monitor and see a single CPU maxed out (e.g. 12% on an 8-core machine), then Aurora is CPU bound, not disk bound. My (possibly faulty) recollection is that people who have run Aurora on a SSD have not seen significant performance improvement, which would indicate that it's CPU bound. The lesson here is that one is often wrong when one guesses where the performance bottlenecks are - profiling data is much better. Another lesson is that (if you've got the ability to run the application in the debugger, which for Aurora you don't), you don't have to use a fancy profiler. Most of the time simply breaking execution while it's running and looking at where you land in the call stack will give you enough information. If you break 10 times and always land in the same spot, that's the spot you want to speed up. If you land in 10 different spots, then you don't have a significant hot spot and optimization attempts aren't likely to give you big improvements.

2) Second, I'm a bit skeptical of your 100,000x factor. I just googled for HDD bandwidth, and the first page I hit http://www.tomshardware.com/reviews/desktop-hdd.15-st4000dm000-4tb,3494-3.html had disk bandwidths on a benchmark of ~100Mb/s. I assume the millisecond number you mentioned is talking about seek times, but one thing you should remember here is that modern disk driver caching software does a LOT of caching to RAM i/o buffers. I remember the performance guy at my old company talking about how difficult it is to get good performance numbers when evaluating programmatic caching strategies because you need to make the problem big enough to make sure you're getting significant cache misses in the disk's i/o buffers. (Take that last at 80% confidence - I'm not an expert). So it's very unlikely that you'll be doing a seek with every disk access. The other way of looking at it is, in order to get a 10^5 performance increase, every other piece of the program (i.e. CPU calculations) could only be taking 0.001 percent of the time at present. This is extremely unlikely. Look up Amdahl's law to see the same principle applied to parallelization optimization.

3) For better or worse, Steve wrote Aurora in Visual Basic 6 with the primary data objects being the database. Switching to not using the database would (probably) require a major "big bang" rewrite. Here's a blog article from Joel Spolsky discussing why that's a bad idea http://www.joelonsoftware.com/articles/fog0000000069.html (and yes, the rewrite Steve would need to do wouldn't be as big as the Netscape case that Joel is talking about, but I mainly wanted to make you aware of the JoelOnSoftware blog in case you weren't all ready - every professional software engineer should have read Joel's stuff). Again, Steve already did a lot of this when he switched the time steps from hitting the DB to hitting in memory class objects, but I still suspect it would be a fairly large effort to completely divorce Aurora from the disk when making an update, since (my understanding is that) most of the code that hits the disk is centered on SQL queries and he's not using C# so (I think) it would be difficult for him to make in-memory objects that emulate the DB. Another thing to consider here is to ask what persistence mechanism would replace the DB. At present, Aurora is set up so that the DB is a valid restore file at the end of every update. How would you rearrange things so that the user can save his games? The lesson I'm trying to get across here is that optimizations have a development cost and a risk, so it's important to think about the tradeoff. This kind of comes back to the profiling - if the cost is big, you'd better make sure that the expected improvement is worth it before starting

Now that I think of it, I view Aurora II as a perfect example of what Joel is talking about in his blog article. Aurora II is/was an attempt by Steve to completely rewrite Aurora in C# (because VB 6.0 is, how should I say it errrr suboptimal as a programming language

). At present the project is on hiatus, and Steve is mostly maintaining Aurora and Newtonian Aurora (a different flavor of Aurora using the same VB code base). So the completely big bang update was unsuccessful, while the incremental performance increase project to stop hitting the DB during timesteps succeeded (and yes, I know above I called the performance project a big bang rewrite - the point I was trying to make is that eliminating DB access would be so big that it would be pretty high-risk).

Anyway, I hope at least some of the above helps. Good luck on your studies!

John

Ninetails · « **Reply #2 on:** August 23, 2014, 01:19:46 PM »

Hi slonjh

Thanks for your response. I was not aware that the substeps was already being cached, since I could not find a way to detect this in the game, nor any outside way to tel this, so I assumed it was working the same way as the rest of the game. I will try to go over you different points.

1)
My "From what I can tell" is based on a combination different sources of information and some reasoning. The primary information source being the effect it appears to have on the game, especially concerning shut downs. From my experiance, it seems as if every actual action you take in the game is recorded in the database immidiately, since whenever the game shuts down to some random bug (like clicking the system name in the colony summery) every thing seems to be just as I left it, never missing a turn or some other modification. The second information source is small bits from different discussions of the game over different forums, where my conclussions seemed to be reinforced. Finally I have run some minor analysis of CPU use during some several second turns, where it only amounted to 11-12% on a 4 core CPU, and on closer analysis, it was one of the CPU's firing up at the end of the turn calculations, and not through the majority of the turn processing, which one would have expected if CPU was the major bottleneck. I could not get usable readings from my hard drive, due to not being able to get propor information of exact what type it is (I use an external hard drive for this, due to certain practicalities), and even if I did, without such knowledge they would be of limmited use. The next rational was a sort of possibility elimination, which goes as follows: Some of the problems with performance that people reported would not actually be that bad on the CPU, unless the algorithms and compiler/language was terribly slow. Here I am partially talking about the detection of other ships in a system. Here it was mentioned that when hundreds of ships had to detect each other, it would require the calculation of a large matrix, in other words it would be O(N*M) (where N and M are the ships from 2 factions). However, if we even assume that each faction has 1000 ships (way more than otherwise mentioned) it would only require 10^6 calculations, but almost every (mordern) CPU is capable of several 10^9 calculations per second per core, so this should at most take a few miliseconds, unless each calculation would take hundreds or thousand clock cycles, which I assumed to be unlikely. Even if they had to check on themselves too, it would only be O((N+M)^2) which would only be a factor 4 larger, and still nowhere near a problem range. Since the CPU could not account for the performance, even when the problem was increase significantly, there would have to be some other form of bottleneck. There simply were not that many other options that would make sense to give such problems, other data load problems. Considering that memory and caching just as often can be bottlenecks, even when a program only uses RAM, it seemed to be screaming at me that there would be a bottleneck if the data objects manipulated during turn processing had to be loaded and saved to a the database on a secondary hard drive.
Concerning the use of solid state drives (SSD), they would actually be expected perform relatively poorly on aurora, since solid state drives are much slower at overwriting data than it is reading (technically it takes a long time to delete data). Since aurora have to write the new state to the database, which for consistency writes it to the harddrive, it will be overwriting a lot of data during each update, and as such one would loose most of the gain from reducing the reading speed.
As I do not have access to the source code, and my destribution seems to be in a compiled version, it would be quite hard to use to examine break points. I would also rather not have to reverse engineer from the assembler code, if I tried to do it with that.

2)
You have every right to be sceptical about the 100 000 factor, since it in practice could be sevaral orders of magnitude smaller or larger (mostly smaller). This is partly due to the complexity of the general problem, and exactly how one should measure it. Since I was looking speedups using a database, I searched for some article that would describe this proporty. What I found was: hxxp: www. directionsmag. com/articles/ram-is-100-thousand-times-faster-than-disk-for-database-access/123964 (be aware that I cannot get hyperlinks to work in the preview). My main point here is actually more that the scale of potential (theoretical) speed up is on a massive scale, not just a factor 2-10, which one can otherwise often wave away far more easily, if the costs are substantual. The numbers you qouted are concerned with sequential reads, which is generally the optimal scenario, where certain overhead costs can be ignored. Even if we could use those numbers, it would still amount to some time, since we would have to read and write, say 10Mb, which would still take hundres of miliseconds. This is not exactly enough to cover multisecond long turn times, but it is fairly unlikely that we will actually get to do long sequential read and writes on the hardisk. The reason for this is 2 fold, first off our data is not large enough that we can ignore the seek overhead, which is in the order of miliseconds, secondly we would actually expect the reads to happen over several orders. This is due to each SQL request usually being processed atomically by the database, so they would have to be in seperate reads. Also, if he is using more complicated SQL queries, he would have to read from different tables, which would each constitute some seek time. This means, that every time there is a SQL query we can expect it to take several miliseconds to actually happen compared to the nanoseconds it would take to perform if the data was primarly located in the RAM.
It will agree upon it being very hard to get good performance numbers on caching schemes, and if there is some more delicate caching scheme running, then the numbers will end up very differently. The problem here, is that persisteny seems to be reinforced, which means that everything have to be put down on the harddrive and up from it again (otherwise we would not experiance everthing being the same after a crash). Since caches are voided when such a crash happens (the operation system will take over those resources, and they are as such dirtied), we are effectively working in a situation where caching is not really working for us, and it can therefor be ignored.
Your point about Amdahl's law was actually already considred, since I mentioned earlier, that other bottlenecks would probably rise, and I therefor guessed that the actual performance increase would problably be in the order of 100-10 000. Part of my point is actually that since the harddisk read/write times constitute such a huge proportion of the porformance (according my rational), then there would not be much point in improving performance in other areas, that was not related to this, since they would end up miniscules, since this bottleneck would be dominating the performance. This is precisely the point of Amdahl's law.

3)
I will admidt that since I have no access to the code base, nor any experiance with Visual Basic, I found it wiser to not give actual recommendations for exactly how one would move data to the RAM. There are several approaches to this, amoung the steps you mentione Steve have already taken, and some of them will have quite severe develpment costs. I neither have any idea how flexible the code base is with regard to this topic, and how costly it will be to rewrite it into different ways (since this depends on the former). Amoung the possible ways to go about it, would be things like more bulk like loading of the database, and then do the actual sorting in the RAM, as opposed to several SQL queries, another way would be to simply use the constructs he uses for substeps over longer periodes, and then only do back ups to the database more rarely, and finally one could consider using a database which is entirely loaded into the RAM, which would then only load and save to the disk less often. I suppose that the method that would require the least develpment cost is the later (a RAM based database), since it would mainly require converting the SQL queries to a slightly different database version.
Consistency would have to go down with this, because this is what we are paying such a huge price for in performance. While a database could still be used for long term consistency, one could reduce the amount the requirement of action to be included, to say only the last turn on a X day schedule or such would send data to the database for long term storage, or ofcause the propor termination of the program.

I will agree with you, that it is probably quite unrealistic to expect a full rewrite. It will however make it even more appealing to do, since there will also automatically come better performance by eliminating this kind of problem.

Also, thanks for the link the blog. I do not usually follow blogs, so I was not aware of it.

SteelChicken · « **Reply #3 on:** August 25, 2014, 07:44:18 AM »

Putting the entire Aurora directory into a Ram Disk makes ZERO difference.

It's slow because its ancient single-threaded visual basic 6.0
/thread

Author Topic: Technical Suggestion: Data to Ram (Read 1755 times)

Ninetails (OP)

Technical Suggestion: Data to Ram

sloanjh

Re: Technical Suggestion: Data to Ram

Ninetails (OP)

Re: Technical Suggestion: Data to Ram

SteelChicken

Re: Technical Suggestion: Data to Ram