It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
So far i used:

NMS options: 30 fps cap makes it 25 fps instead of 30fps aka complete judder and completely not usable option.
AMD frame limiter: doesn't work
Riva Statistic Center: doesn't work
D3Doverider: duh it is only for d3d games and not open gl ones
DXtory: framelimiter doesn't work
No posts in this topic were marked as the solution yet. If you can help, add your reply
avatar
Perkel: So far i used:

NMS options: 30 fps cap makes it 25 fps instead of 30fps aka complete judder and completely not usable option.
AMD frame limiter: doesn't work
Riva Statistic Center: doesn't work
D3Doverider: duh it is only for d3d games and not open gl ones
DXtory: framelimiter doesn't work
Can I ask why are you trying to lock your frame rate at 30?
..to avoid overheating, and to limit the traffic between the graphics card and the bus, which avoids microstutter during compute type operations, to produce a more stable picture?

Just a guess!

And they've done something in the experimental branch that improves part of the rendering pipeline.. not sure what. Presumably means they intend to solve it eventually.. But yeah, they have something that makes a specific pass run as fast as possible, when a different part of the pipeline is limited normally. And I'm pretty sure that releasing the graphics card so it starves the compute type process will decrease the stutter slightly. But not a solution, obviously.. Not all that great to have something like that at launch, really.
avatar
nipsen: ..to avoid overheating, and to limit the traffic between the graphics card and the bus, which avoids microstutter during compute type operations, to produce a more stable picture?
Water cooling and/or fans always bears down to proper hardware...

microstuttering is caused by sudden frame rate drop... unlimited fps will help more cause you are less likely to notice drop from 80 fps to 40 fps as you would from 30 to 15
15 unplayble
40 mey playable but mey..

frame rate limiting is only to prevent screen tears by the system being TOOO powerful for the game.

vsync is frame limiting lol to refresh rate to prevent screen tearing... custom settings for custom problems.

if your frame rate is lower than the cap... that means your system can not even maintain the cap.
Post edited August 16, 2016 by Regals
avatar
nipsen: ..to avoid overheating, and to limit the traffic between the graphics card and the bus, which avoids microstutter during compute type operations, to produce a more stable picture?
avatar
Regals: Water cooling and/or fans always bears down to proper hardware...

microstuttering is caused by sudden frame rate drop... unlimited fps will help more cause you are less likely to notice drop from 80 fps to 40 fps as you would from 30 to 15

frame rate limiting is only to prevent screen tears by the system being TOOO powerful for the game.
Actually, no. Once upon a time, pretty much all chips were made so that the delta in effect draw between "idle" and "full load" was very small. So if you managed to get a chip running, you would simply have to adjust a little bit downwards to properly cool the peak effect draw. The downside would be that south and northbridge components on intel, for example, might be running above what they were designed for, and you might get errors on memory and io operations you wouldn't get normally. And from a more abstract point of view towards the actual program function throughput, what you were really doing by raising the clock speed was simply to make sure the bus transport was used more optimally. There is literally no such thing as a code-snippet that runs faster only because the clock-speed on the processor is higher, for example (with the exception of lv2/3 cache hits without resubmits). In fact, if you look at process diagrams for a super-overclocked ancient pentium 3 running something other than fake max-load code, for example, what you actually see is that when you overclock it right, the processor returns to io somewhat complex assembly a bit faster, before the process idles longer. So what really speeds up the process isn't that the processor can run more operations in a smaller period of time, but that it can return a completed operation to the bus quicker.

The way chips are made now is a bit different. All cpu-type processors have boost-states to exploit this effect more consciously (read: 90% of the performance increases you see in processors comes from more strategic use of the actual resources on the chip). Basically, the processor "overclocks" in small bursts way over specification, and runs on higher bus-frequency.

So the delta between max effect load and minimum effect load is much wider by design.

On graphics cards, this is taken to a completely new level lately. All nvidia cards are essentially capable of running at insanely high clock-speeds in bursts (that's what makes the kepler cards so good overclockers - they're designed to run in load-bursts).

And the reason for that is that for normal linear graphics operations, like shader runs and so on, what you need is short periods of effect bursts. Like on the overclock with the p3, the actual process diagram has relatively long pauses with low activity, so what you need is a way to burst during the commit, while turning the effect down during idle. This strategy is what has Pascal perform so well (it's really a laptop card chip, like maxwell, designed specifically for higher efficiency per watt used, not overall effect load. So the strategy is to have elements on the chip that can be enabled and disabled during run-time, to add more grunt when needed).

It makes sense, too, because a normal 3d game does have, like explained, very little need for overall high performance according to the process diagrams(or it's a very inefficient way to increase performance by boosting all components). What you need instead is to increase performance during the commits before the process returns. So you get something like 1ms burst performance, followed by 10ms idle and bus transport/preparation.

So what happens if you run complex compute type operations on these cards? It's complete crisis, because well-designed compute is intended to run over and over again without resubmits/preparation. So the burst-length is increased, and suddenly the cards run at an overall higher effect draw. On top of that, the actual internal bus on specially the "gtx" nvidia cards are not designed to deal with context switches - they're designed for linear bursts, and so even though the chip on your maxwell and kepler cards are the same between the "professional" quadro and gtx cards, the function set is limited.

There are sometimes architecture differences too. But for example on the kepler gtx cards, the chip is literally the same one as on the quadro cards, the only difference is the software/firmware with the scheduler/queue depth planner.
(...)

So here's the best case "theoretical" scenario that happens on a gtx card during compute runs: compute task runs during idle (those 9ms for every burst that happens ever 10ms), increases the effect load, but still maintains the same burst speed during normal runs.

The worst case scenario is a bit different. What you get instead is that the preparation stage stalls, because the context switches are very slow. Not only that, the return time during a compute run can be extremely variable.

So imagine a process diagram for a compute run that needs to complete every frame. What you're looking at is that the shader-operation burst run needs to complete once every 16.6ms to maintain 60fps. The actual run might take as little as 2-3ms, but the preparation time might increase this to 6ms. We're still good, though, for the most part. So you still get 60fps no problem.

Now we add the compute run. There's a 10ms non-activity on the gpu-bus, so this should be a piece of cake, right? Let's imagine the context shift takes 5ms. So if the compute run completes in 5ms, we're still pushing 60fps, since the response time for a completed operation is less than 16.6ms. And most compute runs are like that - the preparation time is high, and the actual run you use completes in a very short time.

Turns out that it's not that simple. Because: the shader run has to complete before the compute task can start the context shift. It's effectively locking all the resources on the graphics card until the frame is generated. So, if you get a framedrop for a short period, every frame generated suddenly has an extra tail. There's no functionality now to halt the compute runs. And this is when you get the insane framedrops. When the shader-run is very slow (like they sometimes are - but the extra tail still has to complete for each of the delayed frames).

So - what if you could limit the max framerate to 30fps? Now you have 33ms to go on. And the compute run will have plenty of time to complete, even if the context shifts are slow. So in preparation for the framerate crush, you essentially hope to release the locks in time for the tail to complete. And you get an overall smoother performance, without the irregularly displaced process diagram.

(...)


There are much better strategies to handle this (like using compute for indirect calculations instead of per frame, and then only sending to rendering once you have the information needed - basically, you don't need to render everything equally accurate). But it's complex, and it doesn't really solve the underlying issue, which is that the hardware we have now simply has architectural problems with simultaneous complex math being performed.

So ideally, you would have several processors with instruction sets that guarantee completion time within such and such amount of ms. And then you plan the process diagram properly, and know that you will have completed everything within those 16.6ms or 33ms.

The problem is that we don't have computers like that. PCs are essentially linear, and you can only "cheat" concurrent processes by having very quick context shifts/time-sharing/multithreading. Actual multitasking from the hardware abstraction level simply doesn't exist. You're always running a process and flagging an interrupt, that then releases hardware.

And the planning of executing that type of IO is extremely difficult. The solution Intel has is their extended instruction set (read: SSE), where you can complete more complex operations in as little as a single clock-cycle, and guarantee the return and release. So therefore the time-sharing works more efficiently if you can move certain functionality to the extended instruction set. Compiler optimisation can make use of this to shorten the time it takes for a complex math function to complete (or a repetitive simple math operation can complete faster). But the fundamental problem of variable execution time purely dependent on when the context shift happens still exists.


In the same way, the "multicore" processors we have now are not actually multicore. What they do is to divide the process load at the assembly level to several cores, so that each single operation can execute faster. In the same way, much of the architecture advance comes actually from being able to context shift faster and more reliably, to shorten the resubmit phase. On the assembly level, when you execute functions that basically reduce predictable and long functions over and over again to result in a smaller output, this is invaluable. But there are actually no such thing as concurrent high-level processes running.

Uh.. cough... so yeah, limiting the framerate to avoid resource crunch/starved processes makes sense.


While in this specific case, with nms, it's possible that increasing the shader-type runs pre-empts the compute runs, starves that process (that likely still has a fixed execution time once prepared), and essentially allows for an overall higher framerate. Since the real-time computation/compute preparation then isn't scheduled as often.


So like said, it's probably a scheduling problem that causes the heavy stutter on all the rigs, high end and low end.
...and I broke the boards again.
Post edited August 16, 2016 by nipsen
avatar
nipsen: Wall of pointless text
If you knew what I did for a living you would know how fucking useless any crap you wasted time saying to me is, I didn't read waste to me.

Good fuck might as well just posted the book you read off...

https://www.youtube.com/watch?v=azM6xSTT2I0

Difference you and me... I know the truth child.
Post edited August 16, 2016 by Regals
avatar
Perkel: So far i used:

NMS options: 30 fps cap makes it 25 fps instead of 30fps aka complete judder and completely not usable option.
AMD frame limiter: doesn't work
Riva Statistic Center: doesn't work
D3Doverider: duh it is only for d3d games and not open gl ones
DXtory: framelimiter doesn't work
Unless your monitor is native 30fps (is that even a thing??) then limiting it to that is pointless, especially when this game does strange things the lower you have the frame cap. As Regals said, if you use vsync it doesn't matter if you have the game on MAX frame cap; you will still only draw your monitor's native fps. For example; my monitor supports 60fps, so with vsync and MAX (160) on, it's still only going to do 60fps.
avatar
nipsen: Wall of pointless text
avatar
Regals: Good fuck might as well just posted the book you read off...
lol.. which book would that be, then? "I hate Intel, x86, and the entire industry is a sham and a dead end: practical application of the last spasmic throes to extend the current ISA"?

Knew I should have copyrighted that one a while ago.