Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Render code for TextBuffers #2865

Open
wants to merge 2 commits into
base: master-MC1.7.10
Choose a base branch
from

Conversation

cam72cam
Copy link

@cam72cam cam72cam commented Jun 17, 2018

This PR replaces the existing infrastructure for TextBuffer Rendering with a texture sheet.

Benefits:

  • Significantly less OpenGL calls
  • Dramatic increase in FPS on certain GPUS (see below for stats)
  • Only re-draws the parts of the screen that need it
  • Viewing the screen both in world and in window draw a single quad each

TODO:

  • Changing screen resolution does not stretch as expected. I would argue that this is a feature, but it could be considered a bug

Stress Test Program (Courtesy of Ristelle):

local component = require("component")
local gpu = component.gpu
local r = {"00","33","66","99","cc","ff"}
local g = {"00","24","49","6d","92","b6","db","ff"}
local b = {"00","40","80","c0","ff"}
local x,y = gpu.getResolution()
local rr = 1
local rg = 1
local rb = 1
while true do
    rr = rr+ 1
    if rr == 6 then
        rr = 1
        rg = rg + 1
    end
    if rg == 8 then
        rg = 1
        rb = rb +1
    end
    if rb == 5 then
        rb = 1
    end
    local rgb = string.format("0x%s%s%s",r[rr],g[rg],b[rb])
    gpu.setBackground(tonumber(rgb))
    print("Hello Cam asjdkfasdhjkhfakh For example, this.12839054812908902!")
    os.sleep(0)
end

Test Results:
AMD R9 380: 5FPS - 60FPS (linux vsync locked)
NVIDIA GTX 650: 80FPS - 80FPS
INTEL Celeron 847 HD Graphics (1.1GHz): 10FPS vs 30FPS
more coming soon

@Vexatos Vexatos requested review from payonel and fnuecke June 17, 2018 16:28
@fnuecke
Copy link
Member

fnuecke commented Jun 17, 2018

So if I understand correctly, this replaces display lists with textures? In which case, cool, this is something I would have liked to try out (back when I was more active on OC dev :P) but never got around to, in part because I wasn't certain it'd actually be feasible, GPU memory-wise.

And that's one of the two points where you could make me a lot more at ease with merging this: would you mind running a comparison of the new rendering code and the old one with a superflat (less noise) scene that has a whole bunch (*) of individual (actively rendering) T3 screens in it, and see how GPU memory usage reads? GPU-Z should probably work, otherwise there's an OpenGL callback to get memory usage IIRC. I have no idea how memory-(in)efficient display lists are, so this might even be better with textures, but still, I'd like to see actual numbers.

Second, I'd very much like for the old method to be available via an option, still. Just so we can tell people in case they should run into any kind of issue to use that until it can get fixed. Once it's proven itself, we can kill the old codepath (say in half a year/with MC 1.13 or so).

I'll be gone for most of the week, but I expect to be able to look at the code more in-depth next weekend.

However this turns out, thanks in advance for taking a shot at this!

(*) Very specific, I know. I'd go with 32 or so, just to be able to see a significant difference and avoid noise messing up results.

@Ristellise
Copy link

Ristellise commented Jun 18, 2018

I'll like to mentions some more details as well.
Some users may experience poorer performances
Cam Test:
2018-06-18_21 11 11
Main Branch:
2018-06-18_21 23 38
Same Exact Setting for both.

@Ristellise
Copy link

Oh wow I can upload the Test world... Sure!
Testworld.zip

@skyem123
Copy link
Contributor

First thing I noticed, this does not fix issue #779. That's more of a different thing focused on the Lua side, this PR is more about rendering efficiency (and for rendering it's good! 👍)

Secondly, the stretching would be fine if it only stretches by integer multiples, which it doesn't when the size it is stretched to wouldn't otherwise fit, leading to it looking wrong when a high resolution is set.

@asiekierka
Copy link
Contributor

A big change I'd like to introduce with this is making linear scaling the min filter for screens - which was impossible in the old system, but would definitely help performance.

I have my own concerns over VRAM usage at scale, though.

@Vexatos Vexatos added this to the v1.7.3 milestone Sep 9, 2018
@payonel
Copy link
Member

payonel commented Sep 9, 2018

if we've added this to the our 1.7.3 milestone, we need to have the PR author finish the concerns of the review.

@cam72cam before we can merge this, we would need all of the following addressed

  1. how GPU memory usage reads comparing 32 screen with superflat scenes? GPU-Z should probably work.

  2. keep the old method avaialble via config until it's proven

  3. Ristelle demonstrated on nvidia 650 we drop from 91fps to 52fps? That seems bad, doesn't it?

  4. asie would like to see making linear scaling the min filter for screens.

  5. asie is concerned about vram usage at scale. @asiekierka: what scale value matters?

@cam72cam
Copy link
Author

cam72cam commented Sep 9, 2018

  1. I am on linux and don't have GPU-Z
  2. I'll try to see if I can figure out a way to do this
  3. That's a fair comparison, a lot of the performance of this code depends on the GPU you are using. Intel and AMD seem to work better with my PR. NVIDIA is a toss up IIRC
    4: I'll see what I can do
    5: aise?

@Vexatos
Copy link
Contributor

Vexatos commented Sep 9, 2018

For what it's worth, for NVIDIA GPUs on Linux (with the official drivers), a very useful tool is running nvidia-smi -l to see basically all relevant data at a glance. I know nothing about other GPUs, or about vendor-independent ways to view this data, unfortunately.

@payonel
Copy link
Member

payonel commented Nov 10, 2018

@cam72cam I can confirm there is a drop in FPS to half on integrated graphics. This will affect a good number of users, unfortunately
This speedup, however, would be a welcome improvement. The right thing going forward is to provide a configuration option to enable this.

We are very close to releasing our 1.7.3 patch. I choice on this PR is three-fold

  1. I'm removing the 1.7.3 miletone from this PR
  2. I think this change in behavior should be configurable, but off by default. We should continue using the old code by default.
  3. I think it is okay to only upgrade our 1.12 branch with this feature. I'm not interested in making these improvements to the lower versions

@payonel payonel removed this from the v1.7.3 milestone Nov 10, 2018
@cam72cam
Copy link
Author

Sorry I have not been able to work on this PR much. Work has kept me busy and all my current free time goes into IR.

@Ristellise
Copy link

@cam72cam Any updates on this?

@asiekierka asiekierka added this to the OC 1.8.0 milestone Jun 8, 2022
@asiekierka asiekierka modified the milestones: OC 1.8.0, OC 1.9.0 May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants