Khronos public bugtracker – Bug 435
GL 4.1/ARB_separate_shader_objects required to redefine built-in per-vertex block.
Last modified: 2016-01-07 08:38:21 PST
From the specification:
To use any built-in input or output in the gl_PerVertex and gl_PerFragment blocks in separable program objects, shader code must redeclare those blocks prior to use. A separable program will fail to link if:
* any shader uses a built-in block member not found in the redeclaration of that block.
This makes no sense and makes using this extension harder for no reason. Being forced to redeclare blocks doesn't give the compiler any information that it doesn't already know, since it can plainly see the name when you use it in your shader.
Yes, it would be nice to avoid this. There are two interrelated issues with doing so:
1) The goal is to declare a solid interface, independent of what's used, that will be shared across stages. Different stages may use different subsets of this interface.
2) The compiler does not have all compilation units at any one time that might be using this interface, and the point of Separate Shader Objects is to avoid a link step.
Without hurting performance by introducing a heavier weight link step and possibly without eliminating multiple compilation units per stage, using a "declare before you use" model works. Any other ideas?
> Any other ideas?
The problem to tackle is make sure that values which are read in stage N were written in stage M to avoid undefined behavior. As Alfonse pointed out, the compiler sees all built-ins used in a single program anyway, so what could be done is to store with the program object the read/write accesses to any built-in in question and store that information in, say, an interface record. This can be done at compile-time (and read-accesses could be optimized out in vertex shader records). The interface necesseray for a subsequent stage to work properly is then already implicitly defined, i.e. built-ins read in stage N must have a matching write-access in a preceding program's interface record (unless some default value can be provided by definition or via the API).
Now, assuming linkage is much more involved than simply matching at most a few built-in names and RW-properties, we can move the problem to the point when a separable program is bound to a pipeline: glUseProgramStages() will check if *all* interface requirements are met between the current and preceding stages (if any), or generate an INVALID_OPERATION. If debug output is enabled, additional information should be generated.
> As Alfonse pointed out, the compiler sees all built-ins used in a single program anyway,
The compiler only sees one compilation unit at a time. A program has multiple stages, and each stage has multiple compilation units. The compiler sees one of the latter at a time. By the time it sees shader N uses a field for the first time, shader N-1 is already compiled.
With SSO, the goal is that there is no work done at the link step, so the linker also does not deal with the whole program.
(In reply to comment #3)
> > As Alfonse pointed out, the compiler sees all built-ins used in a single program anyway,
> The compiler only sees one compilation unit at a time. A program has
> multiple stages, and each stage has multiple compilation units. The
> compiler sees one of the latter at a time. By the time it sees shader N
> uses a field for the first time, shader N-1 is already compiled.
> With SSO, the goal is that there is no work done at the link step, so the
> linker also does not deal with the whole program.
Doesn't the ability to create separate programs from more than one shader object work against this goal? There is a lot of language in the spec that clearly indicates that it is perfectly valid to create a "separate" program containing, for example, a VS and a GS linked together. The spec says that the linkage between the VS and GS *must* be validated at link-time.
So if you are linking shader objects into a separate program, the link step will still have to do some work. Since the goal of doing no work has already failed, is there a problem with adding a bit more work?
If you're using glCompileShaderProgram, then there is no need for an explicit declaration, since the compiler can detect during compilation what built-in outputs are being written to.
And if you're linking multiple objects together, the linking process is already going to have to do work in order to create the program. So having it scan each shader object for built-in outputs(or have the compilation of a shader object simply store any built-in outputs) should not be particularly difficult, compared to the rest of the linking process. So again, the compiler/linker can work all of these things out.
I know this is old, but we're doing a sweep and making sure we've looked at everything.
To summarize, this is as designed.
(In reply to Alfonse from comment #4)
> So if you are linking shader objects into a separate program, the link step
> will still have to do some work. Since the goal of doing no work has already
> failed, is there a problem with adding a bit more work?
The goal is not to do no work at link time. In fact, most code generation is done at link time. The goal is to do no work (or as little work as possible) at glUseProgramStages time.
So, for example, if I link a VS and a GS into one program object, and a FS into another, the compiler (linker) could probably figure out the interface between VS and GS as it always did, but it doesn't have the information to form the interface between GS and FS. Things get even hairier when there's tessellation involved.
If I create a program object with just a VS in it and another with just a GS in it, the output of the VS is compatible with the input of the GS if the redeclarations of gl_PerVertex match. If the VS writes to gl_PointSize and the GS doesn't consume it, that's fine, but the GS still needs to declare gl_PerVertex (as gl_in) with gl_PointSize in it in order for its interface to be compatible.
(In reply to Alfonse from comment #4)
> And if you're linking multiple objects together, the linking process is
> already going to have to do work in order to create the program. So having
> it scan each shader object for built-in outputs(or have the compilation of a
> shader object simply store any built-in outputs) should not be particularly
> difficult, compared to the rest of the linking process. So again, the
> compiler/linker can work all of these things out.
Again, the problem isn't the compile or link phases. It's the "Use"-time, which is orders of magnitude more frequent than link. You need to fully declare the interface at the "dangling ends" of the program object to ensure that it matches up with the other parts of the pipeline. This has to include built-ins because on most implementations, at least some of the built-in variables don't map to fixed hardware and work more like automatically generated user variables. We enforce this redeclaration at link time because that's when we know whether the program is linked in separable mode. Even if we knew at compile time whether the resulting shader would be used in a separable program object, that still might not be enough because we'd want to enforce this only for the dangling interfaces, not the internal ones, and we wouldn't know that at compile time either.
If we didn't have this rule, drivers would either have to cross-validate all the interfaces at glUseProgramStages time and potentially recompile shaders on the fly, or assume worst-case behavior all the time (all built-ins active always). Either would be disastrous for performance.
Hope that helps.