Uniform registers requirement on Nvidia with a sampler in vertex shader










2















The issue I'm going to write about can only be reproduced with some hardware and drivers.



I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.



I try to compile and link the following shader program.



Vertex shader:



#version 330

uniform vec4 vs[1000];
uniform sampler2D tex;

void main()

gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)



Fragment shader:



#version 330

out vec4 outFragColor;


void main()

outFragColor = vec4(0.0);



Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:



-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs


Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.



Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.



I remember that




OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.




So is there a workaround to use all available registers along with a sampler in a vertex shader?



If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.



Is it a misoptimization in shader compiler?



UPD: a quick-n-diry reproduction example is here.



UPD2: Webgl repro with workaround description.










share|improve this question
























  • IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

    – genpfault
    Nov 15 '18 at 15:16












  • @genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

    – Sergey
    Nov 16 '18 at 3:23











  • @Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

    – Nicol Bolas
    Nov 30 '18 at 20:39











  • @NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

    – Sergey
    Nov 30 '18 at 20:43











  • BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

    – Sergey
    Nov 30 '18 at 20:44















2















The issue I'm going to write about can only be reproduced with some hardware and drivers.



I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.



I try to compile and link the following shader program.



Vertex shader:



#version 330

uniform vec4 vs[1000];
uniform sampler2D tex;

void main()

gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)



Fragment shader:



#version 330

out vec4 outFragColor;


void main()

outFragColor = vec4(0.0);



Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:



-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs


Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.



Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.



I remember that




OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.




So is there a workaround to use all available registers along with a sampler in a vertex shader?



If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.



Is it a misoptimization in shader compiler?



UPD: a quick-n-diry reproduction example is here.



UPD2: Webgl repro with workaround description.










share|improve this question
























  • IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

    – genpfault
    Nov 15 '18 at 15:16












  • @genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

    – Sergey
    Nov 16 '18 at 3:23











  • @Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

    – Nicol Bolas
    Nov 30 '18 at 20:39











  • @NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

    – Sergey
    Nov 30 '18 at 20:43











  • BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

    – Sergey
    Nov 30 '18 at 20:44













2












2








2


0






The issue I'm going to write about can only be reproduced with some hardware and drivers.



I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.



I try to compile and link the following shader program.



Vertex shader:



#version 330

uniform vec4 vs[1000];
uniform sampler2D tex;

void main()

gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)



Fragment shader:



#version 330

out vec4 outFragColor;


void main()

outFragColor = vec4(0.0);



Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:



-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs


Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.



Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.



I remember that




OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.




So is there a workaround to use all available registers along with a sampler in a vertex shader?



If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.



Is it a misoptimization in shader compiler?



UPD: a quick-n-diry reproduction example is here.



UPD2: Webgl repro with workaround description.










share|improve this question
















The issue I'm going to write about can only be reproduced with some hardware and drivers.



I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.



I try to compile and link the following shader program.



Vertex shader:



#version 330

uniform vec4 vs[1000];
uniform sampler2D tex;

void main()

gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)



Fragment shader:



#version 330

out vec4 outFragColor;


void main()

outFragColor = vec4(0.0);



Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:



-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs


Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.



Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.



I remember that




OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.




So is there a workaround to use all available registers along with a sampler in a vertex shader?



If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.



Is it a misoptimization in shader compiler?



UPD: a quick-n-diry reproduction example is here.



UPD2: Webgl repro with workaround description.







opengl glsl shader nvidia






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 30 '18 at 20:33







Sergey

















asked Nov 15 '18 at 6:58









SergeySergey

5,37333160




5,37333160












  • IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

    – genpfault
    Nov 15 '18 at 15:16












  • @genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

    – Sergey
    Nov 16 '18 at 3:23











  • @Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

    – Nicol Bolas
    Nov 30 '18 at 20:39











  • @NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

    – Sergey
    Nov 30 '18 at 20:43











  • BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

    – Sergey
    Nov 30 '18 at 20:44

















  • IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

    – genpfault
    Nov 15 '18 at 15:16












  • @genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

    – Sergey
    Nov 16 '18 at 3:23











  • @Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

    – Nicol Bolas
    Nov 30 '18 at 20:39











  • @NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

    – Sergey
    Nov 30 '18 at 20:43











  • BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

    – Sergey
    Nov 30 '18 at 20:44
















IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

– genpfault
Nov 15 '18 at 15:16






IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?

– genpfault
Nov 15 '18 at 15:16














@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

– Sergey
Nov 16 '18 at 3:23





@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.

– Sergey
Nov 16 '18 at 3:23













@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

– Nicol Bolas
Nov 30 '18 at 20:39





@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.

– Nicol Bolas
Nov 30 '18 at 20:39













@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

– Sergey
Nov 30 '18 at 20:43





@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.

– Sergey
Nov 30 '18 at 20:43













BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

– Sergey
Nov 30 '18 at 20:44





BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.

– Sergey
Nov 30 '18 at 20:44












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313978%2funiform-registers-requirement-on-nvidia-with-a-sampler-in-vertex-shader%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313978%2funiform-registers-requirement-on-nvidia-with-a-sampler-in-vertex-shader%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Dutch intervention in Lombok and Karangasem

Using Rectangle.Intersects for Collision detection causes objects to “stick” to surfaces (Java)

Último Guerrero