Uniform registers requirement on Nvidia with a sampler in vertex shader
The issue I'm going to write about can only be reproduced with some hardware and drivers.
I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.
I try to compile and link the following shader program.
Vertex shader:
#version 330
uniform vec4 vs[1000];
uniform sampler2D tex;
void main()
gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)
Fragment shader:
#version 330
out vec4 outFragColor;
void main()
outFragColor = vec4(0.0);
Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:
-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs
Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.
Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.
I remember that
OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.
So is there a workaround to use all available registers along with a sampler in a vertex shader?
If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.
Is it a misoptimization in shader compiler?
UPD: a quick-n-diry reproduction example is here.
UPD2: Webgl repro with workaround description.
opengl glsl shader nvidia
add a comment |
The issue I'm going to write about can only be reproduced with some hardware and drivers.
I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.
I try to compile and link the following shader program.
Vertex shader:
#version 330
uniform vec4 vs[1000];
uniform sampler2D tex;
void main()
gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)
Fragment shader:
#version 330
out vec4 outFragColor;
void main()
outFragColor = vec4(0.0);
Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:
-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs
Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.
Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.
I remember that
OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.
So is there a workaround to use all available registers along with a sampler in a vertex shader?
If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.
Is it a misoptimization in shader compiler?
UPD: a quick-n-diry reproduction example is here.
UPD2: Webgl repro with workaround description.
opengl glsl shader nvidia
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via theGLFW_OPENGL_PROFILE,GLFW_CONTEXT_VERSION_MAJOR, andGLFW_CONTEXT_VERSION_MINORhints?
– genpfault
Nov 15 '18 at 15:16
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44
add a comment |
The issue I'm going to write about can only be reproduced with some hardware and drivers.
I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.
I try to compile and link the following shader program.
Vertex shader:
#version 330
uniform vec4 vs[1000];
uniform sampler2D tex;
void main()
gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)
Fragment shader:
#version 330
out vec4 outFragColor;
void main()
outFragColor = vec4(0.0);
Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:
-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs
Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.
Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.
I remember that
OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.
So is there a workaround to use all available registers along with a sampler in a vertex shader?
If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.
Is it a misoptimization in shader compiler?
UPD: a quick-n-diry reproduction example is here.
UPD2: Webgl repro with workaround description.
opengl glsl shader nvidia
The issue I'm going to write about can only be reproduced with some hardware and drivers.
I managed to reproduce it on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.
I try to compile and link the following shader program.
Vertex shader:
#version 330
uniform vec4 vs[1000];
uniform sampler2D tex;
void main()
gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)
Fragment shader:
#version 330
out vec4 outFragColor;
void main()
outFragColor = vec4(0.0);
Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:
-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = program.local[0..2001] ;
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs
Here we see that the array takes registers 0..999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line PARAM c[2002] = program.local[0..2001] ;.
Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.
I remember that
OpenGL implementations are allowed to reject shaders for
implementation-dependent reasons.
So is there a workaround to use all available registers along with a sampler in a vertex shader?
If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results.
Is it a misoptimization in shader compiler?
UPD: a quick-n-diry reproduction example is here.
UPD2: Webgl repro with workaround description.
opengl glsl shader nvidia
opengl glsl shader nvidia
edited Nov 30 '18 at 20:33
Sergey
asked Nov 15 '18 at 6:58
SergeySergey
5,37333160
5,37333160
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via theGLFW_OPENGL_PROFILE,GLFW_CONTEXT_VERSION_MAJOR, andGLFW_CONTEXT_VERSION_MINORhints?
– genpfault
Nov 15 '18 at 15:16
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44
add a comment |
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via theGLFW_OPENGL_PROFILE,GLFW_CONTEXT_VERSION_MAJOR, andGLFW_CONTEXT_VERSION_MINORhints?
– genpfault
Nov 15 '18 at 15:16
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the
GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?– genpfault
Nov 15 '18 at 15:16
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the
GLFW_OPENGL_PROFILE, GLFW_CONTEXT_VERSION_MAJOR, and GLFW_CONTEXT_VERSION_MINOR hints?– genpfault
Nov 15 '18 at 15:16
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313978%2funiform-registers-requirement-on-nvidia-with-a-sampler-in-vertex-shader%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313978%2funiform-registers-requirement-on-nvidia-with-a-sampler-in-vertex-shader%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
IIRC NVidia plays (played?) fast 'n loose with what GLSL it accepts in Compatibility contexts. Anything change if you request an actual versioned Core context via the
GLFW_OPENGL_PROFILE,GLFW_CONTEXT_VERSION_MAJOR, andGLFW_CONTEXT_VERSION_MINORhints?– genpfault
Nov 15 '18 at 15:16
@genpfault Thanks for your advice. I tried to request 3.3 core context, but it didn't help.
– Sergey
Nov 16 '18 at 3:23
@Sergey: "So is there a workaround to use all available registers along with a sampler in a vertex shader?" Um, why do you want to? What's the point of using "all available" storage, when things would probably be far more efficient with a nice UBO or something.
– Nicol Bolas
Nov 30 '18 at 20:39
@NicolBolas I need webgl1 compatibility. For webgl2 and modern native contexts we use UBO of course.
– Sergey
Nov 30 '18 at 20:43
BTW, I found a workaround described here: sergeyext.github.io/sergeyext/…, but the reason of such behavior still would be interesting.
– Sergey
Nov 30 '18 at 20:44