Compute Shader
Compute Shader (CS) is a programming model for executing general-purpose computing tasks on a GPU. Cocos Compute Shader inherits the syntax and built-in variables of glsl, and is added in the same way as rasterization Shaders, presented in effect form, and can only be used in custom render pipelines. Compute shader uses multiple threads to achieve parallel processing, making it highly efficient when dealing with large amounts of data.
Syntax
The definition method is the same as that of rasterization Shader shown below. Configuring PipelineStates under Computer Shader is meaningless.
CCEffect %{
techniques:
- name: opaque
passes:
- compute: compute-main // shader entry
pass: user-compute // pass layout name
properties: &props
mainTexture: { value: grey } // material properties
}%
CCProgram compute-main %{
precision highp float;
precision mediump image2D;
layout(local_size_x = 8, local_size_y = 4, local_size_z = 1) in;
#pragma rate mainTexture batch
uniform sampler2D mainTexture;
#pragma rate outputImage pass
layout (rgba8) writeonly uniform image2D outputImage;
void main () {
imageStore(outputImage, ivec2(gl_GlobalInvocationID.xy), vec4(1, 0, 0, 1));
}
}%
For more, please refer to Shader Syntax.
Input / Output
Compute Shader input and output consist of built-in input variables and Shader Resource variables.
The built-in input includes:
in uvec3 gl_NumWorkGroups;
in uvec3 gl_WorkGroupID;
in uvec3 gl_LocalInvocationID;
in uvec3 gl_GlobalInvocationID;
in uint gl_LocalInvocationIndex;
layout(local_size_x = X, local_size_y = Y, local_size_z = Z) in;
Shader Resource includes:
- UniformBuffer
- StorageBuffer
- ImageSampler
- StorageImage
- SubpassInput
CS has no built-in output, and output can be achieved through StorageBuffer/Image.
Shader resource declaration
Compute shader currently supports resource binding at two frequencies: PerPass and PerBatch, as shown below:
#pragma rate mainTexture batch
uniform sampler2D mainTexture;
#pragma rate outputImage pass
layout (rgba8) writeonly uniform image2D outputImage;
PerPass resources can be defined as resources that require pipeline tracking to handle synchronization, while PerBatch resources are typically constant data or static textures that can be bound through Material.
The PerBatch mainTexture
can be configured in the Material panel.
The PerPass outputImage
needs to be declared in the pipeline and referenced by ComputePass, and the data read/write synchronization and ImageLayout management need to be managed by RenderGraph. Please see below for details.
Pipeline integration
Adding a Compute Shader in the Custom Render Pipeline involves three steps:
- 1.Add a Compute Pass, where
passName
is the Layout Name of the current Pass and must correspond to the pass field in the Effect.
const csBuilder = pipeline.addComputePass('passName');
- Declare and reference resources, set access types and associate shader resources.
const csOutput = 'cs_output';
if (!pipeline.containsResource(csOutput)) {
pipeline.addStorageTexture(csOutput,
gfx.Format.RGBA8,
width, height,
rendering.ResourceResidency.MANAGED);
} else {
pipeline.updateStorageTexture(csOutput,
width, height,
gfx.Format.RGBA8);
}
csBuilder.addStorageImage(csOutput, // resource name
rendering.AccessType.WRITE, // access type
'outputImage'); // shader resource name
- Add a dispatch call and set Compute material.
csBuild.addQueue().addDispatch(x, y, z, rtMat);
Cross-platform support
Feature
WebGL | WebGL2 | Vulkan | Metal | GLES3 | GLES2 | |
---|---|---|---|---|---|---|
support | N | N | Y | Y | Y(3.1) | N |
It can be queried through device.hasFeature(gfx.Feature.COMPUTE_SHADER)
.
Limitation
maxComputeSharedMemorySize
: maximum total shared storage size, in bytes.maxComputeWorkGroupInvocations
: maximum total number of compute shader invocations in a single local workgroup.maxComputeWorkGroupSize
: maximum size of a local compute workgroup.maxComputeWorkGroupCount
: maximum number of local workgroups that can be dispatched by a single dispatching command.
It can be queried through device.capabilities
.
Platform-specific differences
Cocos Creator will convert the Cocos Compute Shader into platform-specific versions of GLSL shaders. Therefore, to ensure compatibility across different platforms, it is necessary to meet the limitation requirements of all platforms as much as possible, including:
- In Vulkan and GLES, it is required to explicitly specify the format identifier for Storage Image, according to the GLSL specification.
- GLES requires explicit specification of the Memory identifier for Storage resources, and currently only supports "readonly" and "writeonly". In addition, default precision must be explicitly specified.
Best Practices
- When performing screen-space image post-processing, it is recommended to prioritize the use of Fragment Shader.
- It is recommended to avoid using large work groups, especially when using shared memory. The size of each work group should not exceed 64.
Sample Code
The following code demonstrates a simple ray tracing shader using a single sphere with 1 ray per pixel, implemented through ComputePass. It uses UniformBuffer, ImageSampler, and StorageImage.
Shader Pass Declaration:
techniques:
- name: opaque
passes:
- compute: compute-main
pass: user-ray-tracing
properties: &props
mainTexture: { value: grey }
compute-main
implement:
precision highp float;
precision mediump image2D;
layout(local_size_x = 8, local_size_y = 4, local_size_z = 1) in;
#pragma rate tex batch
uniform sampler2D tex;
#pragma rate constants pass
uniform constants {
mat4 projectInverse;
};
#pragma rate outputImage pass
layout (rgba8) writeonly uniform image2D outputImage;
void main () {
vec3 spherePos = vec3(0, 0, -5);
vec3 lightPos = vec3(1, 1, -3);
vec3 camPos = vec3(0, 0, 0);
float sphereRadius = 1.0;
vec4 color = vec4(0, 0, 0, 0);
ivec2 screen = imageSize(outputImage);
ivec2 coords = ivec2(gl_GlobalInvocationID.x, gl_GlobalInvocationID.y);
vec2 uv = vec2(float(coords.x) / float(screen.x), float(coords.y) / float(screen.y));
vec4 ndc = vec4(uv * 2.0 - vec2(1.0), 1.0, 1.0);
vec4 pos = projectInverse * ndc;
vec3 camD = vec3(pos.xyz / pos.w);
vec3 rayL = normalize(camD - camPos);
vec3 dirS = spherePos - camPos;
vec3 rayS = normalize(dirS);
float lenS = length(dirS);
float dotLS = dot(rayL, rayS);
float angle = acos(dotLS);
float projDist = lenS * sin(angle);
if (projDist < sphereRadius) {
// intersection
vec3 rayI = rayL * (lenS * dotLS - sqrt(sphereRadius * sphereRadius - projDist * projDist));
vec3 N = normalize(rayI - dirS);
vec3 L = normalize(lightPos - rayI);
color = vec4(vec3(max(dot(N, L), 0.05)), 1.0);
}
imageStore(outputImage, coords, color);
}
On the API side, it is as follows:
export function buildRayTracingComputePass(
camera: renderer.scene.Camera,
pipeline: rendering.Pipeline) {
// Get the screen width and height.
const area = getRenderArea(camera,
camera.window.width,
camera.window.height);
const width = area.width;
const height = area.height;
// Declare the Storage Image resource.
const csOutput = 'rt_output';
if (!pipeline.containsResource(csOutput)) {
pipeline.addStorageTexture(csOutput,
gfx.Format.RGBA8,
width, height,
rendering.ResourceResidency.MANAGED);
} else {
pipeline.updateStorageTexture(csOutput,
width, height,
gfx.Format.RGBA8);
}
// Declare Compute Pass, the layout needs to be consistent
const cs = pipeline.addComputePass('user-ray-tracing');
// Update the camera projection parameters.
cs.setMat4('projectInverse', camera.matProjInv);
// Declare the reference of the Storage Image in the current Compute Pass.
cs.addStorageImage(csOutput, rendering.AccessType.WRITE, 'outputImage');
// Add Dispatch parameters and bind Material
cs.addQueue()
.addDispatch(width / 8, height / 4, 1, rtMat);
// Return the name of the current Image resource, which will be used for subsequent Post Processing.
return csOutput;
}
Users need to update and bind PerPass resources in Compute Pass, while PerBatch resources will be bound by the material system. The final effect after presenting is as follows: