Saturday, May 13, 2017

SteamEdit

Okay, so apparently I stopped writing anything here. Big surprise. But now I have some news!

I've just release to the public version 1.0 of a tool I've been using privately for a few years.

Presenting: SteamEdit v1.0!



So what is SteamEdit? Check the product page for more details, but I'll include a quick rundown here too. SteamEdit is a tool allowing you to make certain local changes to the games in your Steam Library. By local changes, I mean it's generally cosmetic-only (you can't enable new games you don't own or anything like that); any changes made are only visible on your local system.

In short, SteamEdit allows you to turn this:
Into this:

That's right. This is a solution to ugly, inconsistent naming in games, bad sorting, as well as a few other things!

Head over to the SteamEdit Website and download it today! It's free! While you're there, you might consider supporting me with a small contribution to my Tip Jar.

Saturday, April 23, 2016

A Material System, Part 2: Deciphering the HLSL Packing Rules

Tentative series plan:
  1. An Introduction
  2. Deciphering the HLSL Packing Rules (you are here)
  3. Shader Reflection (clever title pending)
  4. Runtime Parameters (clever title pending)

Last time, on...

In the previous installment of this series, we saw a high level overview of how a flexible material system could look. Ignoring a few details, the result was a largely data-driven approach, where the shader and the parameters that make a specific material can be defined in data, with enough flexibility to change the parameters -- not just the values, but even what the shader expects -- without any code changes.

One of the hand-wavey parts was how to go from the cbuffer layout in HLSL, to proper offsets where to put the final parameter values within a buffer. This article will cover a part of that, the packing rules of HLSL cbuffers.

Disclaimer: Unless otherwise noted, the following is the results of my own experiments. It seems to be the case, but I can't guarantee it wasn't just a coincidence that things worked out.

Disclaimer: I am only concerned with the automatic packing done by HLSL. It's also possible to explicitly define the layout of cbuffer members, using the register keyword, but my aim is to minimize the work needed when writing shaders, putting the complicated finicky stuff in code instead.

First, we RTFM

Obviously the first thing we should do is check out the documentation, see what it says about things. So we go to Packing Rules for Constant Variables at the Windows Dev Center.
HLSL ...  packs data into 4-byte boundaries. Additionally, HLSL packs data so that it does not cross a 16-byte boundary. Variables are packed into a given four-component vector until the variable will straddle a 4-vector boundary; the next variables will be bounced to the next four-component vector.
Okay, so far so good. We can check various cases by running a simple shader through FXC. So let's try some basic stuff

Simple Vectors

cbuffer A
{
    float a1;       // Offset:    0 Size:     4
    float2 a2;      // Offset:    4 Size:     8
    float3 a3;      // Offset:   16 Size:    12
    float a4;       // Offset:   28 Size:     4
    bool2 a5;       // Offset:   32 Size:     8
    int a6;         // Offset:   40 Size:     4
};
This is pretty much as advertised. a2 fits immediately after a1, but a3 needs to start on a new 16-byte boundary. a5 is 4 bytes per component even though it's just a boolean value. This is easy!

Maybe we want to put a matrix in there. What happens to those?

Matrices

cbuffer B
{
    float4x4 b1;    // Offset:    0 Size:    64
    float4x3 b2;    // Offset:   64 Size:    48
    float3x4 b3;    // Offset:  112 Size:    60
    float2x2 b4;    // Offset:  176 Size:    24
    float1x4 b5;    // Offset:  208 Size:    52
};
We can see b1 takes up a full 64 bytes, as expected. Likewise, b2 is 48 bytes (basically 3 x float4). But what about b3? If it were tightly packed, we would expect 48 bytes again, but if we treat it as 4 x float3, each float3 needs to start on a new 16-byte boundary, so a full 64 might make sense as well. But instead we have 60 bytes. Well, I guess the above excerpt only concerns where a value starts, not where it ends, so okay, b3 packs the same as if we had this:
cbuffer B
{
    float4x4 b1;    // Offset:    0 Size:    64
    float4x3 b2;    // Offset:   64 Size:    48
    float3   b3_0;  // Offset:  112 Size:    12
    float3   b3_1;  // Offset:  128 Size:    12
    float3   b3_2;  // Offset:  144 Size:    12
    float3   b3_3;  // Offset:  160 Size:    12
    float2x2 b4;    // Offset:  176 Size:    24
    float1x4 b5;    // Offset:  208 Size:    52
};
Moving on to b4, we see again something a bit unexpected. Based on what happened with b3, I would expect b4 to take 16 bytes (2 x float2), but instead we have 24! Well, as it turns out, this works out so that each row of the matrix starts on a new 16-bytes. The same carries over to b5.

Let's check the docs again, maybe it says something about this. The closest thing that resembles it is this about arrays:
Arrays are not packed in HLSL by default. To avoid forcing the shader to take on ALU overhead for offset computations, every element in an array is stored in a four-component vector.
This seems to indicate that each element in an array fills 16 bytes, but otherwise could match what's going on with the matrices. So let's play with arrays a bit:

Arrays

cbuffer C
{
    float4 c1[3];   // Offset:    0 Size:    48
    float3 c2[4];   // Offset:   48 Size:    60
    float2 c3[2];   // Offset:  112 Size:    24
    float  c4[4];   // Offset:  144 Size:    52
    float  c5;      // Offset:  196 Size:     4
};
Well this is familiar! c1,c2,c3,c4 look the same as b2,b3,b4,b5! So the docs are a little misleading here: array elements aren't stored in 4-component vectors, they're just aligned to 16 bytes. c5 verifies that the elements of c4 aren't filling the 16 bytes.

So where do we stand?
  1. Vectors are easy. Pack them together, but a single vector can't cross a 16-byte boundary.
  2. Matrices are treated as arrays of vectors.
  3. Each element in an array of vectors is aligned to 16 bytes. Padding is not inserted after the last element, so the next constant can be packed tightly if it fits.
We're almost done our exploration of HLSL cbuffer packing. We next turn to structs.

Structs

Here's what the docs have to say about structs in cbuffers:
Each structure forces the next variable to start on the next four-component vector. This sometimes generates padding for arrays of structures. The resulting size of any structure will always be evenly divisible by sizeof(four-component vector).
And here's what some basic experimentation shows:
cbuffer D
{
    struct
    {
        float d1_1;     // Offset:    0
    } d1;

    struct
    {
        float2 d2_1;    // Offset:   16
    } d2;               // Offset:   16 Size:     8

    float d3;           // Offset:   24 Size:     4

    struct
    {
        float2x2 d4_1;  // Offset:   32
        float d4_2;     // Offset:   56
    } d4;               // Offset:   32 Size:    28

    float d5;           // Offset:   60 Size:     4
};
So right off the bat, the docs seem to be giving the wrong information. None of these structs have a size that's a multiple of sizeof(four-component vector). d1 has a single float, and is the 4 bytes you would expect if it weren't a struct. d2 starts on a 16-byte value, but again has only the size of its contents. d3 confirms that a value outside the struct is packed tightly after it. d4 has the 24 bytes we saw earlier for a float2x2, plus an additional 4 bytes for d4_2 following immediately. And d5 again packed right after d4 without any padding.

There is one final topic for us. What happens if we take a struct and put it in an array?

Arrays of Structs

Based on past experience, it's probably reasonable to assume that an array of structs will behave similar to any other array. That is, each element starts on a 16-byte address, with no padding at the end. How does it look?
cbuffer E
{
    struct
    {
        float2 e1_1;    // Offset:    0
    } e1[3];            // Offset:    0 Size:    40
    
    float e2;           // Offset:   40 Size:     4
    
    struct
    {
        float  e3_1;    // Offset:   48
        float4 e3_2;    // Offset:   64
        float  e3_3;    // Offset:   80
    } e3[2];            // Offset:   48 Size:    84
};
Looks about how we expect! Going by the sizes given, each array element starts on a 16-byte address, with no padding after the last element.

Summary

So I'll just give a quick summary of what we found:
  1. Vectors are easy. Pack them together, but a single vector can't cross a 16-byte boundary.
  2. Matrices are treated as arrays of vectors.
  3. Each element in an array of vectors is aligned to 16 bytes. Padding is not inserted after the last element, so the next constant can be packed tightly if it fits.
  4. Structs are aligned to 16 bytes. As with arrays, padding is not inserted after the last member.
  5. Arrays of structs behave as expected with these rules.
It's really not so complicated, but it took a bit of experimentation to get a handle on it. The single page of documentation was mostly correct, but had some misleading bits. I didn't look at double values here, but I expect they would behave consistently -- just keeping in mind that each component is now 8 bytes instead of 4, while the alignment is probably still 16 bytes.

With this information, hopefully you can go forth and build all sorts of complex cbuffers, and pack them correctly.

Stayed tuned for next time, when I use the D3D Shader Reflection interface to automatically figure out the entire cbuffer memory layout!

Saturday, April 9, 2016

A Material System, Part 1: An Introduction

Tentative series plan:
  1. An Introduction (you are here)
  2. Deciphering the HLSL Packing Rules
  3. Shader Reflection (clever title pending)
  4. Runtime Parameters (clever title pending)
I've been working on some sort of material system, for rendering objects and such. One feature I'm looking for is that it should be easy to setup up new materials and shaders with minimal (or preferably no) code changes. At first glance, this may seem like a simple thing: "well, shaders are written in some shader language, generally as separate data files... so just write a new shader and attach it to your mesh!" But things are seldom so simple...

This post is going to look at some high-level concepts for my materials to set the stage. For the time being, I'm focusing on the pixel shader for the material design, as that's what I'm currently working on. Vertex/Input Assembly has not reached the level of flexibility I want yet, so maybe I'll write about that later when I get there (as well as maybe other crazy things like tessellation support).

Disclaimer 1 : The material system I have arrived at suits my needs at this time. There may be better ways to do it, but this is what I've gone with. To build up to my design, I'll talk about some other possibilities that don't work for me. I'm not saying they're terrible, just that they don't fit what I want. And even if I do say it's terrible, maybe it's perfect for some other purpose. If you're using one of them, and don't need to go any more advanced, then that's fine!

Disclaimer 2 : I'm talking about D3D11/HLSL here. The material design can probably be carried over to other APIs and shader languages, but I'm not generally considering that.

Start with something simple

The simplest thing, with limited flexibility, is probably to have your pixel shader like this:

cbuffer Material : register(b0)
{
  float4 Color;
};
Texture2D DiffuseTex;

float4 psmain( VSOUT In )
{
  return Color * DiffuseTex.Sample( sampler, In.uv );
}

Give or take some missing code, this gives you a configurable color parameter and a texture to sample from. In your C++ code, you might have something like:

struct MaterialParameterData
{
  float Color[4];
};
struct PerObjectMaterialParameters
{
  ID3D11Buffer* Constants;
  ID3D11Texture2D* DiffuseTexture;
};

Give each object a PerObjectMaterialParameters, with Constants filled with a MaterialParameterData. Bind everything and draw your thing. Maybe you read the colors from some data file when creating the object, along with a filename to grab a texture from. Totally flexible! Just change the data and get different colors and textures! Ship it!

Don't get too excited... What happens when color modulation isn't good enough? Maybe someone decided that some objects should use the texture as a mask over a solid color. Well that's easy, just use a new shader:

cbuffer Material : register(b0)
{
  float4 Color;
};
Texture2D DiffuseTex;

float4 psmain( VSOUT In )
{
  float4 tex = DiffuseTex.Sample( sampler, In.uv );
  return lerp( Color, tex, tex.a );
}

Like magic! And look, the cbuffer and texture are the same, so no code changes required! Just point the object at this new shader, and it'll be perfect! ... but wait, someone now wants to have a layered material:

cbuffer Material : register(b0)
{
  float4 Color0;
  float4 Color1;
};
Texture2D DiffuseTex0;
Texture2D DiffuseTex1;

float4 psmain( VSOUT In )
{
  float4 tex0 = DiffuseTex0.Sample( sampler, In.uv );
  float4 tex1 = DiffuseTex1.Sample( sampler, In.uv );
  return lerp( Color0*tex0, Color1*tex1, tex1.a );
}

Phooey. We've got more constants and more textures. Now there are two obvious choices:
  1. Update the other shaders to have the same cbuffer and textures, just don't use the extra stuff. This is will require some small C++ changes to use the new data, but it's pretty simple. But as the materials get more complex the buffer size and number of possible texture bindings may rapidly increase.
  2. Add a new struct in C++. Objects can specify what type of material they use, and get the appropriate constant and texture bindings. Each material's buffer will only contain the data it needs, but any new material will require several code changes.

Which one is best? Neither, they're both terrible.

A Little More Flexible

It's likely that you don't want to spend all your time supporting new materials, with new parameters, new textures, new computations. Maybe eventually it would stabilize, but at what cost? There are more important things to do!

So throw out everything. From the C++ side, we'll treat the constant buffer as a black box. It's just a chunk of memory that gets filled with something. For textures, we'll just have a list of bindings (essentially a texture and the slot to bind it to). Considering the first shader above, with a single Color parameter and a single Texture, we might define the parameters in some data file like:

constants:
  1, 1, 0, 1
textures:
  0=texture.dds

... or whatever. I don't care how it's stored, but somehow we parse that, come up with a bunch of floats to stick in a buffer, and a texture to load and bind. And we get a lovely yellow thing. Now how about the clever layered material? Well, how about doing something like:

constants:
  1, 1, 0, 1,
  0, 1, 1, 1,
textures:
  0=bottom_layer.dds
  1=top_layer.dds

Now there are 8 floats for the buffer and two textures, but because we aren't making assumptions about it, there's no need to make any code changes. Amazing! Okay, this is the best thing since bacon-wrapped hot dogs! We can do anything now, what more could we want?

Well, as it happens, the moment you've finished off this masterpiece, someone comes along and gives you this:

cbuffer Material : register(b0)
{
  float4 Color0;
  float Blend;
  float4 Color1;
};

float4 psmain()
{
  return DoSomethingCleverWithTheParameters();
}

"Easy," you think, "I'll just give it data like this:"

constants:
  1, 0, 0, 1,
  0.3,
  0, 1, 0, 1

... and then it doesn't work as expected... This is where things can get a little complicated. The HLSL compiler has certain rules for how variables are packed into a cbuffer. When using float4, it's nice and easy. Using just float, or just float2 is also nice. When you start mixing things, it gets much worse. I'm not going to go into detail here, I'll just say that in this case, there's 12 bytes of padding inserted after the Blend variable. You can check the link for some more detail, although it's maybe not as complete as it should be.

Let's assume we've got the packing all worked out. We can explicitly pad stuff like so:

constants:
  1, 0, 0, 1,
  0.3, -1,-1,-1
  0, 1, 0, 1

Or we can use shader reflection to figure out programmatically where every value needs to go. This is what I'm doing, and a future article in this series will cover all the annoying fiddly bits of that.

Another potential problem here, is that we're assuming the material parameters are packed into a single cbuffer. But what if we have some effect we want to apply on top of a regular material:

cbuffer Material : register(b0)
{
  float4 BaseColor;
};
cbuffer Effect : register(b1)
{
  float Amount;
  float2 Displacement;
};

These have been split up because Material is some basic properties that are likely shared between many objects (maybe instances of the same object, maybe entirely different, doesn't matter). Sure, we could merge the two, and just not share buffers when the effect is active. But if the base material is much bigger than a single color, and if the effect parameters are changing per frame, maybe it would be a good idea to have a small buffer to update.

An easy solution here is to do the same for constant buffers as we did for textures: Just have a list of them. Then the data might look like:

cb0:
  1, 1, 1, 1
cb1:
  0.75,
  -3, 2.7

This works fine for static data, but if it's static we're probably better off with everything in one buffer. For this effect, we want to update parameters at runtime, which requires runtime knowledge of where one value ends and the next begins. This will be another topic for the future.

That's All For Now

So far, we have a material system that allows a pixel shader to be written with any parameters packed into any constant buffers, and any texture bindings we want. The parameter values for the constant buffers, and names for the textures, can be specified in a separate data file. There are a lot of details that I've glossed over, which I hope to explore deeper in the future.

Saturday, January 30, 2016

Find Junk Released

 

That's right, Find Junk has been released to the wild!

This initial release marks my first Windows Phone game. There are over 250 objects to find across almost 70 different images. More will be added over time in future updates and add-on packs.

Check it out on the Windows Phone store!

Wednesday, December 23, 2015

Find Junk Tools: Leveler

I've made several small tools to aid in development of Find Junk. The oldest, and largest, is the "Leveler". This is where I set up the outlines and hit areas for all objects in all levels. I'm going to talk a bit about its evolution, from how it was back when I was working on iOS, to the current state.

I won't really show much code here, I don't think there's much of interest there. This is just to give an overview of how it works and some of the advancements that have been made.

Let's start out with a couple screenshots! First, this is the main window from the original tool, circa 2009 or so (screenshot is from today with Win10 styling... this is just the build from then):
Find Junk Leveler, 2009
And here's the main window today:
Find Junk Leveler, 2015

There are a few obvious differences, and a few that don't really show in a screenshot. The first thing you might notice is that with the new Leveler, you can see a lot more of the image. In the original version, I would work with the final cropped and sized image (with a fixed resolution of 320x420). I could resize the window to get a closer view of what I was doing, but that's about it. If I decided to reframe the image at all, I'd have to manually move every shape. When I started to revive the project, I knew I would be using the higher resolution original photos, so the cropping would likely change as well. But I also wanted to redo the outlines anyways, so I wasn't worried about that work. But, if the Leveler works with the full uncropped image, it will be much easier to make further tweaks later, so I won't be locked into any decisions. Anyways, so the Leveler takes in the original photo, and I can crop it within the program (the lighter grey areas on the sides are going to be cropped away). When I export the data, the images are cropped and resized as appropriate.

Aside from that, the outline in the preview image is drawn much nicer. The line is thicker and I can see at a glance where each control point is. Previously, I just drew the line, and highlighted whichever point the mouse was over. I also support zooming, so I can get an even better view than just by resizing the window.

Apart from the user interface and editing abilities, I've also changed how the data is stored on disk. The old Leveler would work with the same files as were bundled with the game. Each level would have a text file with a list of the shapes in it, along with x,y coordinates for each point in the drawn outline and hit shape, and also a separate jpg to go with the level. These were loaded and parsed as needed (so before switching to a level, I would read in the text file to find out what shapes are there, and load the image). In addition to that I had another file that listed how many objects could be found in each level, which I edited manually. Overall, this meant a lot of separate files to deal with:
Level files packaged with the game, 2009
This lead to a few problems during my rewrite efforts. First, when developing for Windows Phone in Visual Studio, I need to add every content file to the project. While maybe not the worst thing ever, this was an inconvenience and popped up in other places besides these level files. Second, the "challenges.lst" file that I had to set up manually was really annoying. It existed because I originally didn't want to load all the level descriptions up front (I don't know why, that was a long time ago), but I still needed to know how many total objects there were and where I could find them. This time around I decided to load everything but the images during startup. And of course, reading and parsing a text file is not the best thing if it can be avoided.

I was already making some major changes to how the Leveler dealt with files, because I needed to store a bunch of extra info that wasn't needed for the game (how to crop the image, path for the source image), so while I was at it I also revised how this stuff is stored. Now the files that are bundled with the game look like this:
Level files packaged with the game, 2015
The files with the cleverly "dat" extension store the list of objects in all levels, as well as the final cropped and resized images for each (still kept in JPG format, to keep file size down a bit). With Windows Phone, I can add those "scale-" annotations so that when I request a file named "levels.dat", it'll actually give me the appropriate version for the device's screen. So each one has the same images, but scaled to different resolutions. I've split out the object names into a separate "strings" file, again with a "lang-" annotation so I'll get the correct language for the user's settings. Leveler doesn't currently support multiple languages, but it's nice to have that available if I decide to do it eventually.

The files are laid out in a way that I can load them with just a couple of reads. Generally first there's a small header saying how big the data is; and then I can read the rest as a single chunk into one block of memory. I can then reference the individual pieces by treating different sections as arrays of structs. I keep the handle for the "dat" file open, so that as the game is played I can stream in the images as needed.

I mentioned that I have additional editor-only data that I need to keep track of. This does not go into the "dat" or "strings" files. Those are exported by the Leveler, and contain exactly what the game needs. Instead, I have a separate location to store all my source data. This includes the original photos, individual sprite images, and several JSON files describing fonts, string tables, levels, etc.

The levels description file is where this extra metadata is kept. So Leveler loads and saves the JSON file, and then I can run an Export which takes that and dumps out the final "strings" and "dat" files. There's no need for the tool to be able to read in those exported files.

So I think that's about it for now. If you're reading this, thanks! Feel free to leave any comments or questions. Next time around I may go into more technical detail of what the exported files look like, and how I use them.

Sunday, December 20, 2015

Welcome!

Hello! My name is Tim. I'm a game developer. ... and, I don't really know what else to say about that.

So this is something like my 3rd attempt this year, to get into posting stuff regularly. This time for sure, though! In the past I wanted to make something like a dedicated dev blog for whatever project I was working on at the time. And then I'd lose interest, in either the blog or the project, and I'd end up writing nothing. But this time it'll be different! Hopefully! I'll still treat it as a sort of dev blog, but I won't limit it to just a single project, so that when that project is inevitably replaced with some new thing, it won't be awkward.

Lately I've been working on reviving an old project:


I don't have any easy iOS dev setup, so I can't really target that right now. Instead, I've ported the game over to Windows Phone. Maybe when I have that done I'll expand to other platforms.

Anyways, I'll post more about some of the challenges I've had with this, or anything else I feel like.