Posted March 14, 2016
:D ..um, well, yes. But quantum computing as a concept relies on producing very complex instruction sets that execute virtually instantly, and so on. And this is.. very old theory. 70s, before the x86 approach proved to be most useful. PPC and various simd and mips type processors survived, though. In anything from game-consoles to mp3-players. And it's of course very obvious that relying on linear processing power to increase forever is not realistic.
For example, most if not all of the performance increases on intel hardware since core2duo has been done on the microcode level, not in hardware improvement. The hardware engineer advances have made the cores smaller, but the fundamental problem with performance on silicon until a certain temperature and effect limit follows it around. It's impressive to get very high performance down on a smaller chip that has massively lower power consumption than before. But the last 10 years have been scraping the barrel, technologically speaking.
So the optimisations are essentially drawn out of reducing x86 atomic instructions that recur very often into the product of the process. On the very low level now - you have a reasonably complex instruction word. And when it's split up, you add a register to another, subtract the third, add the tail, increment, etc. Something like that. So when the result completes (that take several clock-cycles), the result of that initial operation is kept, and can be produced instantly without actually performing the reduction operation again. That's efficient, and increased cache size and therefore hits essentially do 90% of that. Since a lot of these operations also do run over and over again for more complex high-level tasks, the optimisation gain is fairly high.
Then there's cisc-principles. On a mathematical level, you can deduce that a series of operations actually always result in the same less complex instruction, for example. There are numerous examples of sequences of these very common instruction commands that can be logically reduced to different and less complex instructions. So while the instruction set offered on the processor doesn't actually compute it directly without subdividing it, when that command occurs, it can be shortened. So a good compiler will try to reduce these known instructions into batches, and further increase the optimisations of the kind in the paragraph over here. This sub-set is where intel wins out over AMD in tests.
At the same time, none of these optimisations are required for it to run on an x86-compatible processor, it just won't run as fast.
Outside of that - there's also the problem, of course, that no amount of platform optimisation can rescue extremely inefficient code. Although people claim that, the truth is that better executed code will always be faster than what an automatic compiler can do. If we assume programmers generally are not very good, make use of high-level sdks, and so on - then the real-world output is of course different. Then the human made code should be as linear as possible, and without anything that could possibly have done away with lower running-time if it potentially could be much longer. That's where we get the "law" that we should always program with a linear constant, no matter how high, as in "practice", the linear constant is still lower than an exponential run-time element. We of course know that's not true, but it's still done that way.
There's also the extended instruction sets (encryption for example, or video encode and decode - frequently a target here) - and that's probably where the biggest increase in performance will be over the next ten years. In specific compilation optimisations that are designed to execute specific routines on specific memory areas the platform will crunch. ..It's no different than Nintendos black boxes on the snes cartridges, really.. But again, the package will still execute all the instructions without those extended instructions, just not as fast.
And we want to keep that general approach to programming on a conceptual level because it means your compiler always has the same target, and the target can always be executed on different hardware. That's the strength of all of this, and it's not a bad thing at all to have a general programming language on the low-level. Just imagine if we had one instruction set for AMD and another one on Intel. With each of their horrible in-house compilers possibly working in concert with some programming languages but not well with others, etc. It'd be the end of open source, for one. But it's proven to be the most useful way to go about executing high-level code consistently.
Not in the least because you know that this will not force high-level conventions into the programming languages because of the low-level limitations.
Alas, that is of course the result anyway, when we have such a thing as particular ways to compile with extended platform-dependent instruction sets. And arguably, what we've seen with C++ a lot is that we know on beforehand that certain types of routines run much, much faster than anything else. So you start making these shortcuts that actually lock you into a specific type of hardware. And that's where we get things such as a particular benchmark being the golden standard for measuring performance - while it really gives no indication of how fast the computer actually will finish actual "real world" tasks. The amount of favoritism and conventions here frankly border on the religious. Although C++ on it's own is completely flexible and allows anything from a high-level object-oriented, completely agnostic approach, and all the way down to specific near assembly type implementations as well as specific memory addressing and manual handling of locks on specific memory areas, and so on. So that's not actually the programming language that causes this, but the practical output you get with a particular approach.
What could conceivably replace that? Thing is that before something actually does replace common programming conventions, it's going to be superceded by an approach to programming that actually allows that alternative to exist. It's not going to be a new language that everyone switch to. It's going to have to be a structure and an approach to high-level programming that then on the low-level potentially benefits in huge and massive amounts by parallel distribution and execution of routines.
And that's complex. It means that you would have to design your code on the high-level to actually subdivide the task in what you already now detemine is going to potentially be parallelizable. And you just need talent to see that. It's easy with graphics and per pixel-operations, of course, because it always depends on what's directly next to the origin, without fail. All the graphics card logic and the way simd is used on graphics cards depend on that. The entire industry with graphics cards with hardware instruction sets (of the "extended instruction set" kind in the example above) comes from that. The ability to parallelize with 100% predictability the task in the graphics contexts.
But if you wanted to do this with more complex objects, that rely on potentially a lot of objects, and a very large amount of functions, complex math, and so on, then you're suddenly running into a problem. Because when things area not 100% parallelizable, the performance drops utterly and completely. It's just not useful.
So that works against you if you design graphics code like that. You still can't get above a certain level of processing power without essentially making the shader code obsolete. OpenCL is a good answer to that, because it attempts to include incrementally more complex math on the peripheral cards. But it's like going over the stream to fetch water from a more tech-nerd perspective.
And... what you want to do instead is to program complex instruction sets manually. To create routines that will execute that complex math in for example the same pattern as you would execute shader code on a graphics card. But that then and also rely on that more complex "cpu math".
But it's not something that will turn up with a new language, or even with just hardware that was capable of doing this. Like explained, the principles have been around since the 70s, and we've had numerous examples of this in consumer electronics to a varying degree (although not to the level on the Cell BE... rip... but be that as it may).
Instead it's going to have to be reliant on a different approach to programming that doesn't rely on very simple solutions that anyone can do. And that's... not an easy sell, right..? Over "predictable platform increase every single year". It's like selling home-made buns as a market concept over a Deli chain. Doesn't happen.
For example, most if not all of the performance increases on intel hardware since core2duo has been done on the microcode level, not in hardware improvement. The hardware engineer advances have made the cores smaller, but the fundamental problem with performance on silicon until a certain temperature and effect limit follows it around. It's impressive to get very high performance down on a smaller chip that has massively lower power consumption than before. But the last 10 years have been scraping the barrel, technologically speaking.
So the optimisations are essentially drawn out of reducing x86 atomic instructions that recur very often into the product of the process. On the very low level now - you have a reasonably complex instruction word. And when it's split up, you add a register to another, subtract the third, add the tail, increment, etc. Something like that. So when the result completes (that take several clock-cycles), the result of that initial operation is kept, and can be produced instantly without actually performing the reduction operation again. That's efficient, and increased cache size and therefore hits essentially do 90% of that. Since a lot of these operations also do run over and over again for more complex high-level tasks, the optimisation gain is fairly high.
Then there's cisc-principles. On a mathematical level, you can deduce that a series of operations actually always result in the same less complex instruction, for example. There are numerous examples of sequences of these very common instruction commands that can be logically reduced to different and less complex instructions. So while the instruction set offered on the processor doesn't actually compute it directly without subdividing it, when that command occurs, it can be shortened. So a good compiler will try to reduce these known instructions into batches, and further increase the optimisations of the kind in the paragraph over here. This sub-set is where intel wins out over AMD in tests.
At the same time, none of these optimisations are required for it to run on an x86-compatible processor, it just won't run as fast.
Outside of that - there's also the problem, of course, that no amount of platform optimisation can rescue extremely inefficient code. Although people claim that, the truth is that better executed code will always be faster than what an automatic compiler can do. If we assume programmers generally are not very good, make use of high-level sdks, and so on - then the real-world output is of course different. Then the human made code should be as linear as possible, and without anything that could possibly have done away with lower running-time if it potentially could be much longer. That's where we get the "law" that we should always program with a linear constant, no matter how high, as in "practice", the linear constant is still lower than an exponential run-time element. We of course know that's not true, but it's still done that way.
There's also the extended instruction sets (encryption for example, or video encode and decode - frequently a target here) - and that's probably where the biggest increase in performance will be over the next ten years. In specific compilation optimisations that are designed to execute specific routines on specific memory areas the platform will crunch. ..It's no different than Nintendos black boxes on the snes cartridges, really.. But again, the package will still execute all the instructions without those extended instructions, just not as fast.
And we want to keep that general approach to programming on a conceptual level because it means your compiler always has the same target, and the target can always be executed on different hardware. That's the strength of all of this, and it's not a bad thing at all to have a general programming language on the low-level. Just imagine if we had one instruction set for AMD and another one on Intel. With each of their horrible in-house compilers possibly working in concert with some programming languages but not well with others, etc. It'd be the end of open source, for one. But it's proven to be the most useful way to go about executing high-level code consistently.
Not in the least because you know that this will not force high-level conventions into the programming languages because of the low-level limitations.
Alas, that is of course the result anyway, when we have such a thing as particular ways to compile with extended platform-dependent instruction sets. And arguably, what we've seen with C++ a lot is that we know on beforehand that certain types of routines run much, much faster than anything else. So you start making these shortcuts that actually lock you into a specific type of hardware. And that's where we get things such as a particular benchmark being the golden standard for measuring performance - while it really gives no indication of how fast the computer actually will finish actual "real world" tasks. The amount of favoritism and conventions here frankly border on the religious. Although C++ on it's own is completely flexible and allows anything from a high-level object-oriented, completely agnostic approach, and all the way down to specific near assembly type implementations as well as specific memory addressing and manual handling of locks on specific memory areas, and so on. So that's not actually the programming language that causes this, but the practical output you get with a particular approach.
What could conceivably replace that? Thing is that before something actually does replace common programming conventions, it's going to be superceded by an approach to programming that actually allows that alternative to exist. It's not going to be a new language that everyone switch to. It's going to have to be a structure and an approach to high-level programming that then on the low-level potentially benefits in huge and massive amounts by parallel distribution and execution of routines.
And that's complex. It means that you would have to design your code on the high-level to actually subdivide the task in what you already now detemine is going to potentially be parallelizable. And you just need talent to see that. It's easy with graphics and per pixel-operations, of course, because it always depends on what's directly next to the origin, without fail. All the graphics card logic and the way simd is used on graphics cards depend on that. The entire industry with graphics cards with hardware instruction sets (of the "extended instruction set" kind in the example above) comes from that. The ability to parallelize with 100% predictability the task in the graphics contexts.
But if you wanted to do this with more complex objects, that rely on potentially a lot of objects, and a very large amount of functions, complex math, and so on, then you're suddenly running into a problem. Because when things area not 100% parallelizable, the performance drops utterly and completely. It's just not useful.
So that works against you if you design graphics code like that. You still can't get above a certain level of processing power without essentially making the shader code obsolete. OpenCL is a good answer to that, because it attempts to include incrementally more complex math on the peripheral cards. But it's like going over the stream to fetch water from a more tech-nerd perspective.
And... what you want to do instead is to program complex instruction sets manually. To create routines that will execute that complex math in for example the same pattern as you would execute shader code on a graphics card. But that then and also rely on that more complex "cpu math".
But it's not something that will turn up with a new language, or even with just hardware that was capable of doing this. Like explained, the principles have been around since the 70s, and we've had numerous examples of this in consumer electronics to a varying degree (although not to the level on the Cell BE... rip... but be that as it may).
Instead it's going to have to be reliant on a different approach to programming that doesn't rely on very simple solutions that anyone can do. And that's... not an easy sell, right..? Over "predictable platform increase every single year". It's like selling home-made buns as a market concept over a Deli chain. Doesn't happen.