TrueColor: implement optional tablified translucency#1233
TrueColor: implement optional tablified translucency#1233JNechaevsky wants to merge 12 commits intomasterfrom
Conversation
For nicer look and better alphabetical sorting.
src/i_truecolor.c
Outdated
| } tcpixel_t; | ||
|
|
||
| // [JN] LUTs to store precomputed values for all possible 512 color combinations | ||
| static uint32_t blendAddLUT[512][512]; // Additive blending |
There was a problem hiding this comment.
Looks pretty good already, thank you! Do you think we could get away with only two arrays, a "main" one and an "alt" one (which would be additive in Doom)?
There was a problem hiding this comment.
Like this:
diff --git a/src/i_truecolor.c b/src/i_truecolor.c
index 21f9128b..b3b4b95a 100644
--- a/src/i_truecolor.c
+++ b/src/i_truecolor.c
@@ -38,9 +38,8 @@ typedef union
} tcpixel_t;
// [JN] LUTs to store precomputed values for all possible 512 color combinations
-static uint32_t blendAddLUT[512][512]; // Additive blending
static uint32_t blendOverLUT[512][512]; // Overlay blending
-static uint32_t blendOverAltLUT[512][512]; // Overlay "alt" blending
+static uint32_t blendAltLUT[512][512]; // Additive blending (Doom), Overlay "alt" blending
const uint32_t (*I_BlendAddFunc) (const uint32_t bg_i, const uint32_t fg_i);
const uint32_t (*I_BlendOverFunc) (const uint32_t bg_i, const uint32_t fg_i, const int amount);
@@ -104,23 +103,28 @@ void R_InitBlendMaps (GameMission_t mission)
fg.g = ((fg_index >> 3) & 0x07) << 5;
fg.b = (fg_index & 0x07) << 5;
- // Additive blending calculation
- retAdd.r = MIN(bg.r + fg.r, 0xFFU);
- retAdd.g = MIN(bg.g + fg.g, 0xFFU);
- retAdd.b = MIN(bg.b + fg.b, 0xFFU);
- blendAddLUT[bg_index][fg_index] = retAdd.i;
-
// Overlay blending calculation
retOver.r = (overlay_alpha * fg.r + (0xFFU - overlay_alpha) * bg.r) >> 8;
retOver.g = (overlay_alpha * fg.g + (0xFFU - overlay_alpha) * bg.g) >> 8;
retOver.b = (overlay_alpha * fg.b + (0xFFU - overlay_alpha) * bg.b) >> 8;
blendOverLUT[bg_index][fg_index] = retOver.i;
- // Overlay "alt" blending calculation
- retOver.r = (overlay_alt_alpha * fg.r + (0xFFU - overlay_alt_alpha) * bg.r) >> 8;
- retOver.g = (overlay_alt_alpha * fg.g + (0xFFU - overlay_alt_alpha) * bg.g) >> 8;
- retOver.b = (overlay_alt_alpha * fg.b + (0xFFU - overlay_alt_alpha) * bg.b) >> 8;
- blendOverAltLUT[bg_index][fg_index] = retOver.i;
+ if (overlay_alt_alpha == 0)
+ {
+ // Additive blending calculation
+ retAdd.r = MIN(bg.r + fg.r, 0xFFU);
+ retAdd.g = MIN(bg.g + fg.g, 0xFFU);
+ retAdd.b = MIN(bg.b + fg.b, 0xFFU);
+ blendAltLUT[bg_index][fg_index] = retAdd.i;
+ }
+ else
+ {
+ // Overlay "alt" blending calculation
+ retOver.r = (overlay_alt_alpha * fg.r + (0xFFU - overlay_alt_alpha) * bg.r) >> 8;
+ retOver.g = (overlay_alt_alpha * fg.g + (0xFFU - overlay_alt_alpha) * bg.g) >> 8;
+ retOver.b = (overlay_alt_alpha * fg.b + (0xFFU - overlay_alt_alpha) * bg.b) >> 8;
+ blendAltLUT[bg_index][fg_index] = retOver.i;
+ }
}
}
}
@@ -157,7 +161,7 @@ const uint32_t I_BlendAddLow (const uint32_t bg_i, const uint32_t fg_i)
bg_index = PixelToLUTIndex(bg);
fg_index = PixelToLUTIndex(fg);
- return blendAddLUT[bg_index][fg_index];
+ return blendAltLUT[bg_index][fg_index];
}
const uint32_t I_BlendDark (const uint32_t bg_i, const int d)
@@ -214,7 +218,7 @@ const uint32_t I_BlendOverAltLow (const uint32_t bg_i, const uint32_t fg_i, cons
bg_index = PixelToLUTIndex(bg);
fg_index = PixelToLUTIndex(fg);
- return blendOverAltLUT[bg_index][fg_index];
+ return blendAltLUT[bg_index][fg_index];
}
// [crispy] TRANMAP blending emulation, used for DoomThere was a problem hiding this comment.
Ah, I was thinking how to divide tablifying per game, but in the end chickened out because of:
- Possible hit 512x512 times of checking
if/elseconditions. - To avoid bloating
forloop.
But looks likes it's not that scary, so sure, let's go this way. Most important question for this PR: how should we name menu item? Maybe something like:
- Translucency quality: 8bit / 32bit (or 8bpp / 32bpp? or low/high?)
- Blending mode: speed / quality
Hot-switch is fairly simple, we just have to kick R_InitBlendQuality, even post-rendering doesn't seems to be not needed.
Finally, I have an itch that something else could be done. Current implementation helps to improve performance and get higher average framerate in -timedemo runs, but results are not as high as expected. I'm thinking to play around with R_DrawTLColumn by unrolling by four, @rfomin recommending to go farther by turning blending functions to inline's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.
So maybe you can give few recommendations regarding where to think or dig? 🙂
There was a problem hiding this comment.
- Translucency quality: 8bit / 32bit (or 8bpp / 32bpp? or low/high?)
That'd be 9bit, strictly speaking. 😉
- Blending mode: speed / quality
Translucency mode: speed / quality
I'd avoid using the term "blending" if we only call it "translucency" just one line above.
Finally, I have an itch that something else could be done. Current implementation helps to improve performance and get higher average framerate in
-timedemoruns, but results are not as high as expected. I'm thinking to play around withR_DrawTLColumnby unrolling by four,
Sure, there's so much left for optimization in Doom's source code.
@rfomin recommending to go farther by turning blending functions to
inline's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.
Nope, we cannot inline function pointers, obviously.
I'm fine with the current implementation. Heck, I was even too lazy for the last ~10 years to write the code that you just contributed in this PR. 😁
If you find some code path that could use some further optimization, sure, please feel free to investigate it. But this is probably topic for another PR.
There was a problem hiding this comment.
@rfomin recommending to go farther by turning blending functions to
inline's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.Nope, we cannot inline function pointers, obviously.
I suggest making more functions (e.g. R_DrawTLColumnAdd, R_DrawTLColumnOver). Function calls on every pixel are expensive, I checked when implementing Woof's "function factory" (about ~30% performance hit).
There was a problem hiding this comment.
Dammit... There is just a small difference in performance comparing between 512x512 and 256x256 LUTs. Something really have to be done with column drawing functions.
Just in case, 256x256 LUT code, still much better than just 256 colors:
Details
--- R:/GITHUB/crispy-doom/src/i_truecolor.c Thu Oct 31 22:15:20 2024
+++ R:/MSYS/home/Julia/crispy-doom/src/i_truecolor.c Fri Nov 1 23:33:47 2024
@@ -38,8 +38,8 @@
} tcpixel_t;
// [crispy] LUTs to store precomputed values for all possible 512x512 color combinations
-static uint32_t blendOverLUT[512][512]; // Overlay blending
-static uint32_t blendAltLUT[512][512]; // Additive blending (Doom), Overlay "alt" blending
+static uint32_t blendOverLUT[256][256]; // Overlay blending
+static uint32_t blendAltLUT[256][256]; // Additive blending (Doom), Overlay "alt" blending
const uint32_t (*I_BlendAddFunc) (const uint32_t bg_i, const uint32_t fg_i);
const uint32_t (*I_BlendOverFunc) (const uint32_t bg_i, const uint32_t fg_i, const int amount);
@@ -90,18 +90,18 @@
retAdd.a = 0xFFU;
retOver.a = 0xFFU;
- for (int bg_index = 0; bg_index < 512; ++bg_index)
+ for (int bg_index = 0; bg_index < 256; ++bg_index)
{
- for (int fg_index = 0; fg_index < 512; ++fg_index)
+ for (int fg_index = 0; fg_index < 256; ++fg_index)
{
// Convert LUT indices back to RGB values with reduced bit depth
- bg.r = (bg_index >> 6) << 5;
- bg.g = ((bg_index >> 3) & 0x07) << 5;
- bg.b = (bg_index & 0x07) << 5;
-
- fg.r = (fg_index >> 6) << 5;
- fg.g = ((fg_index >> 3) & 0x07) << 5;
- fg.b = (fg_index & 0x07) << 5;
+ bg.r = (bg_index >> 5) << 5;
+ bg.g = ((bg_index >> 2) & 0x07) << 5;
+ bg.b = (bg_index & 0x03) << 6;
+
+ fg.r = (fg_index >> 5) << 5;
+ fg.g = ((fg_index >> 2) & 0x07) << 5;
+ fg.b = (fg_index & 0x03) << 6;
// Overlay blending calculation
retOver.r = (overlay_alpha * fg.r + (0xFFU - overlay_alpha) * bg.r) >> 8;
@@ -130,9 +130,9 @@
}
// [crispy] Helper function to convert a pixel color to a LUT index
-static inline uint16_t PixelToLUTIndex (const tcpixel_t color)
+static inline uint8_t PixelToLUTIndex (const tcpixel_t color)
{
- return ((color.r & 0xE0) << 1) | ((color.g & 0xE0) >> 2) | (color.b >> 5);
+ return ((color.r & 0xE0) | ((color.g & 0xE0) >> 3) | (color.b >> 6));
}
const uint32_t I_BlendAdd (const uint32_t bg_i, const uint32_t fg_i)
There was a problem hiding this comment.
Dammit... There is just a small difference in performance comparing between 512x512 and 256x256 LUTs.
Array access time doesn't depend on array size. 🤷
There was a problem hiding this comment.
Ah, thanks, good to know - that's important. I'm playing around with inline-ing, at least for additive blending in R_DrawTLColumn and it actually helps, -timedemo differences before and after inlining are about ~466 w/o inlining and ~522 with inlining.
Looks likes it is have to be done, maybe really in another PR, since we have to divide R_DrawTLColumn to blend/alt functions (and to high/low detail + high/low quality, oh God!), and unlikely I will handle Format Factory approach. Otherwise, this whole idea and implementation is nothing more than cosmetical change with tiny performance improvement.
There was a problem hiding this comment.
Oops, tongue was ahead of thoughts. So the idea is put TrueColor TL drawing functions as macroses to i_truecolor.h where we can have inlining without overbloating existing code in r_draw.c of all four games and we don't need to care about tranmap[], tinttab[] and xlatab[] there. This will still require small separating of colfunc pointers for TrueColor only, but in theory, this will do the trick.
There was a problem hiding this comment.
We shouldn't overload this PR, though. Let's do one step after the other...
Also, fix nasty typo: it's 512x512, not 512 colors, i.e. 262144 in total. Co-Authored-By: Fabian Greffrath <fabian@greffrath.com>
This is slightly faster than fully dynamic and may look more consistent when TrueColor mode is off. Few remarks:
mallocneeded.PLAYPALcolors.