(A dat version of the devlog is available.)
Thanks to all patrons who are supporting us so we can keep the Duangle ship flying.
You are no patron yet but would like to support our work? Please visit our Patreon and check out your options! Every little contribution that moves us closer to our monthly goal helps.
Every time I think I'm out, they drag me back in. Yes, I'm once again occupied with rewriting a part of the compiler, but first: more info on the voxel renderer.
I got hierarchical raytracing working, as well as physically based shading. Here is a gif animation of what it looks like in action, demonstrated on a test model. Since the data is hierarchical, LOD support is also working.
The next step would be improving the SDF to voxel converter, as well as implementing a GPU-side bounding volume hierarchy to accellerate rendering many objects sharing a single scene, but the absence of basic memory management in scopes made itself painfully clear, and compile times were shooting up, so I had to take care of that first.
Unfortunately the solution to both problems was to completely rewrite the intermediate language and update associated systems.
A user reported that his simple project, which made heavy use of generics and compile time folding, had its start up time shoot up to 28 seconds. It became evident that heavy reliance on constant propagation and constant expression folding had been a bad call. We were practically running into a magnified version of the problem that C++ already has, with its usage of templates and constexprs slowing down compilation, which resulted in veteran gamedevs trying to avoid those features where they could.
Why are constant expressions slowing down the compiler? It's simple: having to solve expressions at compile time, the compiler has to resort to a mode of interpretation. So compile time expressions aren't being evaluated at machine code speeds.
Scopes made the problem worse by introducing constant loops which too fold at compile time, and have to be processed by an interpreter. This was what ultimately drove up compile times. The whole approach doesn't scale.
It is theoretically possible to put a JIT in the compiler whose only job it is to fold expressions, but even the JIT can't abstract away the fact that intermediate language datastructures have to be continuously updated.
How did I fix the problem? First, I disabled constant propagation & expression
folding completely: templates would now only be instantiated for argument types,
branches would not be folded, loops would not be unrolled, and any operation
on constants would now generate code. That also removed the need for
as well as made it easier to predict what code would actually be generated.
Then, to reintroduce controlled constant propagation at function boundaries, I added the concept of “inlines”, which are functions that are guaranteed to be inlined into the calling context. Since all arguments are inlined into the function body, that would effectively constitute constant propagation. Scopes did implicitly inline functions before to resolve compile time closures, but now the process is nicely explicit, and users can reason easier about what the compiler will do.
Lastly, one question remained: now that the compiler no longer folds constant expressions, how are we going to get that feature back where we actually need it, and how can we use it in a way that is faster than interpreting constant expressions atomically?
The answer was Label macros, which quickly got renamed to AST macros, and I'll tell you more about them after we talked about the next issue, and how it was solved:
I wanted C++ style RAII support that would run destructors of variables going out of scope, but the CFF representation, which I used up to then, made it costly to analyze scoping. I figured that an AST, which I didn't have until that point, would retain the information that is already user-authored through the usage of nested scoped constructions, so I moved away from labels towards a tree of operators and control structures. This also made it easier to generate SPIR-V bytecode, which requires scope annotations for branches and loops.
I also had to throw away the
Any type, as it became rendundant, as well as
Syntax type, as anchors are now directly retained by values, which made
syntax macro related code generally easier to read.
The whole code processing pipeline now roughly works like this: the parser takes a byte stream and outputs an S-expression tree. The syntax expander takes the S-expression tree and generates an untyped AST (which is more of an Abstract Semantic Tree). The prover (formerly called the specializer), which I had to rewrite completely, takes the untyped AST and outputs a typed SSAT (Single Static Assignment Tree), where all expression trees have been flattened and only control structures are nested. The code generator takes the typed SSAT and generates LLVM IR or SPIR-V, which are then translated to machine code by either LLVM or the graphics driver.
So, with the label graph now being an AST, our Label macros were now AST macros. How do they work? You call an AST macro like any other function, but it is invoked by the prover at compile time, receiving the arguments as typed AST nodes. The AST macro is a compiled function generated at an earlier compile stage, so it runs with the same performance profile as the compiler itself. You could say that it operates very much like a compiler plugin.
The idea is not new, but a staple of multistage programming. I discovered its first implementation in the Terra project, and truth be told, one of the earliest prototypes of Scopes had such an implementation, but I opted for constant expression folding instead, which was, I admit it, a huge mistake.
An AST macro can analyze the types of incoming arguments as well as extract possible constant arguments, and then return a new piece of untyped AST that the prover then types and expands to SSAT form. If that form contains more AST macro calls, these get expanded again, and so on.
AST macros can abbreviate a lot of constexpr boilerplate into a single function. Need to go over every argument, extract keyed attributes and construct a new type? For constexprs, each step would be interpreted individually. But AST macros use machine registers and API calls for this. The overhead is much smaller, and compile time goes down.
So, the AST and AST macros were the two big changes. Because of AST macros, I also had to rewrite the core bootstrap file, and I'm still busy finishing that up. The REPL console started working again yesterday, but none of the tests are working yet.
As soon as the core rewrite is complete and the tests are working again, I'll release version 0.14 of Scopes. If you are using Scopes, be prepared to make changes to your codebase. Your code should end up becoming smaller rather than bigger.
Once version 0.14 of Scopes is released, I'm continuing work on the engine and the raytracer, along with adding the RAII mechanism for Scopes 0.15.
Addendum: Oh did I mention, Scopes 0.14 will also have proper support for
exceptions. There are now builtin constructs for
and they are backwards compatible to C.
How are they implemented? Each function can have two return paths, a regular
one, entered with
return, and the exception path, entered with
On the C level, a function that can return an exception returns a struct
with three hidden members: a boolean, an except value and a return value.
The boolean indicates the index of which of the two value fields is valid. If
0, the exception field is valid, otherwise the return value is valid.
When such a function is called, the calling function implicitly generates a conditional check and either resumes execution with the return value or enters the currently active catch block with the exception value.
To allow type inference for exception types, a function can only throw a single
exception type, and likewise a
try block can only catch a single exception
type. If two different exception types are thrown in the same block, a
compile time error is generated.
How do we get around the limitation of single exception types? The Scopes core
will implement support for sum types, which makes sum exception types possible
as well. This way, the basic building blocks for exceptions can be leveraged
to raise different exception types within one sum type, and to dispatch
for different exception types in the catch section of
All in all, this brings us up to 1:1 feature parity with Rust's
while providing basic convenience for enforced and typesafe error handling.
Four months later! What can I say. I don't like to write devlogs when I have nothing interesting to report, and the first quarter of 2018 was indeed rather sluggish and eventless, with me mostly working on our programming language Scopes and our game engine Tukan — oh did I mention our game engine is named Tukan now, and it has a fancy logo too.
I was brooding over the fact that I had to regress to voxel raytracing, which would mean that I wouldn't get every feature that I wanted, but my mood has considerably lightened over the fact that the list of things that I could get with sparse voxel octrees grew longer and longer, and it might even be possible to get rid of the jaggy voxel look by using good magnification filters for the surface and/or raytracing operators directly. There are many options.
I just reached an important milestone in development. The language is mature enough to write complex programs and shaders with it. I'm still not happy with memory management and declaring mutable values, but it works, and it has to do for now. I invested a bit of time in borrow checking and still haven't found a way to do it that doesn't hurt. The engine is also coming along nicely and steadily growing in tool functions and programming sugar.
But that's not the important milestone. The important milestone is that the sparse voxel octree raytracer works! (click for gif) Considering that the game will render all models this way, that's a big deal. I'm currently using a variation of Perfect Spatial Hashing to store models of up to 1024\3 voxel resolution in two one-dimensional texture buffers currently about 3.2% the size of the full volume. A query only requires two sample fetches, which speeds up raytracing considerably. Cache coherence of the buffers isn't great tho, which is why I want to try Coherent Parallel Hashing later on (thanks to mmalex for pointing me to it).
Next up is clean up, optimizations, then cone tracing to improve performance and fidelity further, as well as first primitive attempts at realtime GI. The model is currently generated from a fixed distance field, which is supposed to become a data structure (list of operators), and finally, models are supposed to be generated on the GPU so the preprocessing step when the game first loads doesn't take forever.
As a final note, I should mention that while it took almost two years (!) to develop the new language, the work is paying off. I prototyped the SVO raytracer in two days from scratch, yielding excellent performance from the start while making coding nearly as easy as whipping up something in Python (unlike Python, Scopes forces you to think a little more about data structures, but not to an obnoxious degree). I still have to halt the Scopes-car every few kilometers and pop open the hood to make a correction, but it's coming together. I don't need C, C++, Python or Lua! Coding is not awful anymore!
I just noticed that the entire lifetime of Scopes, formerly Bangra, formerly Bang, is documented in the devlog! The inception begins 18.7.2016
Happy Halloween! Happy Thanksgiving! Happy Holydays! Happy New Year!
For the past three months I've worked on something called Frustum Tracing, which was an attempt to adaptively render CSG products of implicit surfaces in viewspace. I had severe problems with figuring out many details of the implementation, so I wrote and published a 2D version first, called RTCSG2D (the 3D project was just called RTCSG), from which I learned quite a bunch of things, but it wasn't enough. In order to efficiently render a series of subtractions for a tile in viewspace, it is necessary to sort all surfaces first in ray direction, which has a complexity of at least O(n log n), for each pass of a process that is already O(n log n), and that killed the idea. It's still possible to solve the problem in worldspace by raytracing a sparse voxel octree of CSG products, but the idea is still too new and I decided I have sunk enough time into this.
Over the holidays and in the weeks after I've sunk into a bit of a depression - not the medical kind, more the one you get when bad things happen to you, you don't see a way out and thus you're paralyzed. The new renderer didn't work out. The project is taking forever. I couldn't tell what comes next. I felt like a failure.
I worked a bit on the game design document, trying to figure out what communication of players with Nowherians would work like (an open problem since project inception), and didn't progress much there either, save for an interesting list of feature requirements for it. Intuitively I feel that the solution will be non-systemic: look at the individual needs for communication, devise a unique and fitting solution for each one, and then maybe some common attribute will present itself that could be turned into a system. But right now, it's impossible to say what that would look like.
And then Staxel got released, I farmed the hell out of it and am generally in love with the style. I toyed with the idea of doing a small parody farming game as a one-off, where you would start with a perfectly running farm, and then slowly sell everything off in order to build a cryptomining operation. I would only be able to do it if most of the code were already written and asset creation wasn't too difficult, so I figured, maybe UE4 + some voxel engine. I went through the UE4 marketplace and found nothing that would permit a Staxel-ish look or workflow. There are a lot of Minecraft engines, but nothing at a 1/16 voxel granularity. In desperation, I even checked the Unity marketplace, and nothing presented itself. Bummer.
I remembered Voxelnauts, which is a game that sadly never got released, but has a similar style as Staxel, and peculiar enough, is rendered by raytracing a sparse voxel octree. Hmm, I thought, a sparse voxel octree? I would have had to write one anyway. And they're really useful. You can do all nice sorts of rendering effects with them, thanks to a technique called voxel cone tracing.
Maybe, I reasoned, if I used the same technique for the main game, I could rationalize putting time into this. Yes, voxel graphics aren't as smooth and perfect as implicit surfaces. But they can easily be procedurally generated from the same functions. They can also be hand-authored with apps like MagicaVoxel, which is great for one-offs and auxilliary assets! They're well documented, and with a modern technique like VXGI, they can support physically-based shading and look beautiful. And later on, I could possibly continue building on SVO to get the implicit surface CSG representation that I originally wanted, for a Nowhere 2.0, perhaps.
So then I had a bit of a good morning moment and then went back to work. I'm still reading papers on the subject of implementing SVO's, and I'm writing shadertoys again to understand some of the details better. I expect to progress much faster this time. Let's see how it goes.
Let's get right to it. Last post, Bangra was still called Bangra, but now it's called Scopes! And it has a logo! I changed the name because I wanted something that fits in a single syllable, and a name that better reflects what the compiler is about. Scopes are a datatype within the language, and it's very much about domain specific solutions, depending on the scope of projects, so this is what I settled with. It's meant as sort of a meta-name; something that you would implement, for example, a markup language with, and then say something like “YAMAHAML (Yet Another Mildly Amazing Hypertext Annotation Markup Language) is a library for Scopes.”
Where are we on the Roadmap?
1. The few Scopes tests that I have need to be fixed to match the modified parsing rules and the reduced type system.
All tests run (in fact I added quite a bunch of new ones) and the bootstrap part is pretty complete.
3. A lifetime tracker needs to be written (in Scopes). (...)
I'm unfortunately too stupid to think of a good approach. Every time I try to think about it I realize that I know nothing. I try to google something about it but nothing useful comes up. To be honest at this point I'm hoping that someone's going to look at the IL and figure it out for me. The problem is that when a value goes out of scope, a destructor must be called, which means new labels must be injected into the existing code, and those labels need to be specialized. But we can only do the lifetime analysis after the labels have been specialized, and destructor functions also need to be specialized and then tracked. I don't look forward to any of this because it's going to make the pipeline ugly, but I also want that feature, but I want it in a clean form that fits the existing principles.
5. Analog to 4., a SPIR-V generator needs to be written that translates specialized IL into SPIR-V. (...) Now, with the power of Vulkan, we can magically run Scopes functions on the GPU as well as the CPU, so shader functions and gameplay functions can source the same math library.
It's all done and embedded in Scopes directly. It turns out that implementing a Vulkan loader wasn't needed at all (by god I tried, but the bureaucracy is insufferable). There exists a wonderful small library called SPIRV-Cross which translates SPIR-V binary back to GLSL. I embedded it directly into Scopes so code can be transpiled to GLSL on-the-fly.
The SPIR-V support has forced me to make many shader features intrinsically supported on the CPU as well; so functions like sin, cos, log2, normalize, length are now builtins; and the arithmetic operators all support vectors. This allows me to run shader code on the CPU and to write unit tests for it. It's delicious. Also, compile time closures and type inference are just as well supported for shaders as for native.
I'm now working on porting Liminal, the game engine for NOWHERE, over from None to Scopes. Many essential modules have been already ported, and there's support for SDL 2, ImGui, GL 4.5 and GLM-style vector and matrix types (not completely done, but working). Thanks to Scopes' GLSL-backend I can target CPU and GPU with a single language, so there are no more duplicate code bases for interval arithmetic, automatic derivatives, perspective projection, HSL/YCoCg color conversion, versor rotation, and so on. It's what I always wanted.
The biggest part of Liminal is the implicit volume renderer that I originally just wanted to port from the original but then realized that most of it is stuff that I will throw away anyway for the big refactoring to support realtime CSG. In preparation of that I dug out my one and a half year old notes and source code to illuminate some of these tricky detail corner cases that the devil proverbially likes to hide in.
The first part was figuring out how to do better frustum testing for primitives. I couldn't think of a good non-iterative analytical test for the sdSuperprim, so I decided I'd try it for the basic set of Box, Ellipsoid, Cone, Cylinder instead, and then compose scenes from those. I wasn't sure it would work at all, so I first wrote a implementation guide for Ellipsoids. Then I carefully approached the problem by doing shadertoy prototypes, first the 2D case for spheres, then ellipses. This looked all very promising, so I implemented the sphere 3D case next, then boxes. Doing cones and cylinders is too hard for me right now and non-obvious so I'll put that off until later.
I also had to verify that my CSG evaluation idea would actually work so I implemented a prototype for that one too, the Interval CSG Register Machine. There's a summary in the header that explains how this works in principle.
All this done, it is time to actually implement all this in the game engine, and this is where we are now.
Oh my god. Four months later, and there's no end in si— just kidding. I have had a breakthrough a few weeks ago when I did one more rewrite of Bangra in C++ and this time skipped the interpreter completely and went straight for type solving & compilation in the first step. It works! The worst is over, the result is fun to use, and most of the work is in userland now. Bangra even got its REPL console back :-)
Along with a working compiler, I have also written a Visual Studio Code extension for Bangra and a pygments script for Bangra's documentation. I would like to submit the pygments tokenizer to Bitbucket and Github at some point so my code previews and gists look okay, but right now I don't think they're going to accept it.
Let's revise the roadmap I laid out in the last post:
1. The few Bangra tests that I have need to be fixed to match the modified parsing rules and the reduced type system. The FFI needs to be ported before step 4, along with the clang parser (Both still exists only as C++ implementation).
The clang bridge has been ported from the old C++ version to the new one. The FFI got a rewrite and is part of the solver now, allowing to fold constant expressions involving “pure” C functions. The bootstrap part isn't complete and the language has changed, so most of the tests don't work (including the test runner itself).
2. A partial evaluator / specializer needs to be written (in Bangra) before Bangra functions can be trivially compiled. (...)
This is all done, but written in C++, not Bangra - which is fine.
normalize() solves all argument and return types in a graph of labels,
inlines constants, folds expressions and lowers closures. The result is
passed to a trivial IL->IR transpiler.
3. A lifetime tracker needs to be written (in Bangra). It annotates the IL with information on what values are required in each continuation, how they should be allocated and when they are dead. (...)
None of this exists. Bangra doesn't have any memory management
whatsoever, and I'm not sure if I want to change it. I want to do
lifetime tracking but I don't yet understand well how it would work,
what the syntax would look like in a type inferred language (as opposed
to Rust), and if it couldn't just be implemented on the user side. I
also want to add RAII support; The simplest version I can think of is to
defer keyword that allows to queue functions to be
executed when the scope is exited. As always, the design goal is to
provide the fewest building blocks necessary to give users the power to
express whatever idea they can think of.
4. A LLVM IR generator needs to be written (...) that takes specialized IL and turns it into LLVM IR (using the LLVM C library). (...)
All done (this is the IL->IR transpiler), including basic debug symbols to support GDB and GDB frontends.
5. Analog to 4., a SPIR-V generator needs to be written that translates specialized IL into SPIR-V. The format is very similar to LLVM IR, and I know at least one helper library designed to assist with generating SPIR-V. Now, with the power of Vulkan, we can magically run Bangra functions on the GPU as well as the CPU, so shader functions and gameplay functions can source the same math library.
Is not in the scope of Bangra directly, but will be part of the engine, and written in Bangra. I can't really start this work until the Vulkan framework is set up. This is where the actual game engine work continues.
Inbetween 6. to 8., there's going to be a problem of a rapidly growing codebase that takes longer and longer to load. The interpreter already takes 1.5 seconds to bootstrap Bangra.
...and at times, it got even worse. But the problem is gone now, because Bangra compiles all code directly. Bangra enters the REPL within 400ms in debug and 150ms in release on my machine. I would say anything up to one second is acceptable.
10. The interpreter needs to be ported from Lua to Bangra so Bangra can run and most importantly specialize itself. (...)
That is all over. Points 10 to 13 are off the agenda. I'm not going to rewrite Bangra in Bangra, there's nothing to be gained by that.
Alright. Back to work.
I was hoping the work on Bangra wouldn't take much longer, but I was wrong. Bootstrapping a static compiler is really really hard. Absolutely none of my original ideas for it have panned out. All ideas that came instead where actually better. In the process of building Bangra, I gained even more useful insights:
I was struggling to support continuation passing style with garbage collection in C++ in a compact, comfortable and debuggable way. The line count had swollen to about 8000 lines with no end in sight.
So, after doing some tests, I rewrote the interpreter in Lua within a week, to make use of LuaJIT's tail call optimization (permitting endless forward calling) and GC. The result has roughly 5000 lines. The language I worked on before, “None”, also runs on LuaJIT, but the difference is that None compiled to LuaJIT bytecode. As a result, many Lua-specific idiosyncracies bled into None, such as the type system, Lua tables, API functions, the 60 upvalue/200 locals limit, or indexing starting at 1 instead of 0 by convention. Bangra runs in a bootstrap interpreter written in Lua to isolate it from Lua internals and keep control over the design.
What I have now is an interpreter running in an interpreter. That
means Bangra runs about
4x times as slow where
x is an overhead
constant incurred every time a process is virtualized. But it's
fine, because we'll
the shit out of it.
I don't want to get too much into this right now, but there are a bunch of devils in the details. Suffice to say, once you have allocated some memory, you don't ever want to move it again, especially when other memory depends on it. Also, CMTA has no way of tracking dead memory, so a special solution is required here too if you need to invoke custom destructors. I put a week of work aside to explore this avenue and it was a waste of time.
The C++ implementation made extensive use of tuples (unnamed
structs) to pass values between functions. I removed tuple types
from the Lua port as soon as I realized that when you have closures,
you already have tuples. In continuation-passing style, you can
encode any tuple
(x0, x1, ..., xn) as a closure
fn (f) (f x0 x1 ... xn) where
f is a function that receives all
tuple values. If
f doesn't know or care how many elements the
tuple stores, it can take a variable number of arguments.
Furthermore, if the language supports multiple return values (and
Bangra does), the closure can be simplified to
fn () (return x0 x1 ... xn), where
return is an implicit return
function provided by the caller.
The same goes for enums and the inevitable switch cases that use them, which leads us to the next point.
This is an original solution I haven't seen elsewhere yet, and which I'm particularly proud of. The challenge is to allow types to overload relational operators, preferably through a single function, while also keeping the dispatcher to these operators simple.
C++, Python and Lua for example require a minimum number of
overloaded operators to consider the interface complete. If you
specify operators for
== (equal) and
> (greater), you can use
negation to deduce
<= (less-equal), and through
deduction, all other operators from the set
== != > >= < <=.
Because some types offer fast operations for different cases, or
have special values that don't compare (such as
NaN for real
types), the dispatcher that translates what the interface provides
to what the user expects has to consider every permutation.
Composed from these operators, C (like many other languages) offers
another approach to comparisons, mostly used in conjunction with
sorting algorithms, a function whose C signature resembles
Ordering compare(T a, T b), where
Ordering is an enumerator
(told you this would come up) with one of the outcomes
Greater. Here we have a single function dispatching
each relational operator to an equivalent enumerator, and on the
user side a switch case that is able to turn each enumerator value
back into code flow.
compare is composable; we can comfortably
compare for tuples from
compare functions for its
So why not attempt to invert this relationship and, rather than
compare functions from relational operators, compose our
relational operators from intrinsic type-dependent
functions instead? There are however two obstacles to pass before
this solution works.
The first obstacle is that
compare only covers ordered values, but
we also need to compare unordered values such as structs (and values
of unrelated type), or partially ordered values such as types (which
can be unrelated or supertypes of each other) and reals (where a
comparison with the
NaN type is always unordered). To that end we
add a fourth result to
Ordering, which is
Unordered, and covers
all these cases where the result is neither
Greater. Special thanks here to
@korenchkin, who provided me with
the mathematical idea of ordering and its accompanying technical
The second obstacle is the paradox of comparing the result of a
comparison. In order to compute
(compare a b) == Equal, we need to
expand it to
(compare (compare a b) Equal) == Equal, which expands
(compare (compare (compare a b) Equal) Equal) == Equal and I
think you get the picture.
We solve Zeno's Paradox of Comparisons by reaching once more for
church encoding and turning
compare into a four-way branch
function of the format
fn branch-compare (a b f-equal f-unordered f-less f-greater) that
b and, depending on the outcome, invokes one of
the four continuations. Then the relational operators can be
implemented with one-liners.
a >= b for example expands to
branch-compare a b return-true raise-error return-false return-true.
There are additional upsides to this method. Relational operators
often appear in conjunction with branches. Since
created four branches, any branch that depends on the returned
booleans can be folded into a direct invocation of
branch-compare has no direct equivalent on hardware (as far as I
know), but can be lowered without much difficulty depending on
atomic type and the number of unique outgoing paths.
I wanted to be done with the compiler by February so we could prepare the March release of NOWHERE, but was struggling with winter depression and the fact that things took way too long to move forward in C++. Really, every time I touch that damn language, pain and misery become my permanent companions. Now that the interpreter runs in Lua, I progress much faster with my agenda, but there are still a few things to do before I can assemble the next alpha. Here's a laundry list:
All Bangra functions are already stored in IL, so I don't foresee any logistic problems here. I have never written a partial evaluator before but conceptually, I understand the problem well. Nevertheless I expect surprises.
Up until a few weeks ago I had my doubts that steps 2. and 3. were even possible, considering that I hadn't yet seen a project based on aggressive deep type inference which did this, but then I discovered Carp and that lifted my spirits. I had productive talks with Erik Svedäng, head dev of Carp and also game developer (in fact, like Bangra, Carp is written with game development in mind), and I hope I can learn more from him on these subjects.
1xfor the first time!
Inbetween 6. to 8., there's going to be a problem of a rapidly growing codebase that takes longer and longer to load. The interpreter already takes 1.5 seconds to bootstrap Bangra. Macro expansion in particular is very costly. A quick fix is to cache the generated IL on disk (which is what None did), and that takes the writing of some serialization routines. IL caching can't however cover for the LLVM compiler which I'm told is not the fastest one on the block. I can also try to devise some caching here as well, but these are not pretty solutions.
What follows is the pretty solution as outlined by the Futamura projections, and I'll try to restrain myself to implementing it after the alpha has been released:
Now we have an interpreter (Bangra in Bangra) running in an
interpreter (Bangra in Lua) running in an interpreter (LuaJIT).
Would we actually run anything of value here, execution would slow
8x times as slow as a native execution. (We could also run
the interpreter repeatedly recursively and bring everything to a
crawl, which is more limited by available memory than computing
And that's it for now.
Hooboy, this Bangra business is taking way longer than anticipated. Bootstrapping a static compiler is hard. Nearly none of my original ideas for it have panned out. On the upside, the ideas that came instead were actually better. In the process of building Bangra, I gained several useful insights:
To this end, I picked CFF as intermediate form. It turned out CFF is also ridiculously easy to interpret: the eval-apply loop for CFF requires less than 100 lines of C++.
PyObjecttype for this), that require conversion to/from ffi compatible forms when interfacing with C functions. Instead, the interpreter should deal with C values, structs and arrays directly, so that no conversion is necessary.
But where do we store the types that an interpreter requires in order to make binary blobs inspectable? The answer is: fat pointers. Bangra's fat pointer is a dynamic pointer that stores a type reference along with its target in order to support interpreter features that depend on inspection: type checking, overloaded operators, bounds checking, element access and garbage collection.
Fat pointers turned out to be a fantastic solution for several of my problems, and my only regret is that I had not learned about them sooner. I am still in the process of transitioning the interpreter to this new format, along with the introduction of libffi for FFI calls instead of LLVM.
This is by the way the densest problem I've worked on so far. The interpreter hovers around only 6k lines of C++, but I've agonized over these lines in a way I never had to before.
I hope that the interpreter work is completed somewhere in the next week. Then I'll start work on the compiler, which will be needed for what's next: the new version of the engine uses Vulkan instead of GL by default, and the Bangra compiler will be able to target SPIR-V directly; it's just a matter of translating CFF to SPIR instead of LLVM IR. Once that's done, I hope I'll never have to deal with generating GLSL code again (except for the mac port), and that will significantly speed up the next development phase.
The realtime CSG renderer is practically done and waiting in a drawer to be taken out and put into use once the Vulkan backend stands. I can't wait for the moment when the plan finally comes together.
Looks like I've been slacking on my habit of regularly updating this devlog, probably mostly because I was ashamed that in the past three months I've been procrastinating on game engine work, and instead worked on perfecting our programming environment after I had a hard time writing a new visual editor for live programming, which was supposed to help with procedural audio design for NOWHERE, for which the ability to tweak parameters in real time is a must in order to be able to, ironically, progress swiftly. And there I was, shaving the yak again. And what a beautifully naked yak it has become.
My trouble started when I began to realize that keeping source in Lua tables was at odds with needing to render the UI as fast as possible, using dear imgui (Oh - I also wrote comfy bindings for imgui. So there's that.), as Lua doesn't really give you a high performance interface to its data structures. So it would have been either calling dynamically scripted functions from statically compiled code, or using the Lua C API to iterate and edit tables, which turned out to be a pretty awful experience. Once again, abstraction becomes the enemy. There's no direct access to the data structures, instead you're supposed to communicate with the VM using an awkward stack machine interface that renders all code unreadable, and doesn't lend itself well to abstraction due to constantly moving stack indices and a significant overhead associated with parameter checking.
I can't remember how I got to the conclusion so I must have intuitively felt that I had to fork Terra (as LLLua) in order to be able to fix a few long standing issues, primarily the way too difficult process of building win32 versions (didn't manage to build a single working version) but also the inability to annotate struct alignment for better auto-vectorized code generation, hooking initialization mechanisms in the AST, support for latest LLVM, throwing out code I didn't need (like the Lua language extensions).
In the process of maintaining LLLua, I had to concede that I don't understand LLVM and Clang very well; both C++ API's are incredibly complex, and one had to spend some time with the architecture to get a better understanding of how it worked.
So I put a week aside for an experiment, a small toy project which I felt would be an ideal way to get to know the LLVM infrastructure. The goal was to write a cousin of None, a minimalistic statically typed Scheme-like, implemented in a single C++ source file, that would leverage the low-level MCJIT capabilities of LLVM in order to create a self-expanding language. I nicknamed it “Bang”, after the big bang which created a universe from the smallest possible point.
Based on my experiences I expected the experiment to fail; There was a reason Scheme was a dynamic language, so I argued, and good metaprogramming would therefore need some kind of Terra-like two tier system: a dynamic VM like Lua would bootstrap a compiler using a language toolkit like LLVM.
But the cul-de-sac moment never came. Instead, the undertaking proved to be frustratingly fruitful, and it became impossible to quit working on it because I kept succeeding, and the system exposed attributes early on that None lacked.
With the help of MSYS2, Bang proved to be incredibly easy to build on Windows. So easy, actually, that I almost look forward to booting Windows to make a new binary. I don't even use a build system; Just a Bash script for Linux and a batch file for Windows. Bang uses the LLVM/clang distribution on Linux both as its sole dependency, and as its compiler. On Windows, compiling and linking had to be done with gcc.
Bang has recently been renamed to Bangra. Its release number is now at 0.2, there are binaries and documentation online and I wrote an introduction that gives a rationale for its existence. The only language Bangra supports at the moment is the more low level IR language, which has a restrictive type system, but due to its hygienic macro support, even coding at this level is beginning to become convenient.
I'm currently in the process of implementing a new version of Nonelang on top of it, as Bangra's default high level language. When I'm done, the Liminal game engine will be ported and the Nonelang runtime retired. Then it will be possible again to do Windows builds on a casual level, and a big hurdle to releasing the next NOWHERE alpha is overcome.
It's time to document what I've learned so far, before I dive in and tackle the next topic (which is diffuse indirect lighting). Let's start with a video of what the renderer looks like right now. And here's a video with slightly more geometry.
It turned out that the point renderer as I envisioned it was simply not feasible for dynamic scenes. Trying to solve geometry over multiple frames produced all sorts of reprojection woes that I don't want to get into right now.
The current renderer, which I'll stick to, is able to dynamically render up to 64K solid convex superprimitives per frame per viewport, raytracing only, no triangles. Everything is movable and morphable, there are no static elements, which is perfect for outer space and highly dynamic scenes with rearranging geometry.
The primitives (which in the engine are called brushes) are of a single type that I engineered for this application, sdUberprim. Uberprims can represent boxes, spheres, cylinders, capped cones, capsules, pellets, and smoothly morph between those shapes. (My ideal superprimitive would be a tetrahedron or hexahedron with individually weightable edge smoothness, but I couldn't figure out a distance function for those. Any takers?)
On my GTX 460, the rendering time ranges from 4-10ms for a single 1280×720 viewport. Depth complexity is handled reasonably well. These two worst case test scenes give a good sense of where the visual fidelity limit is. There's no material system yet, and the timing is for a scene without shadow mapping, so 30 FPS seems a reasonable target for this GPU. Current gen GPU's with three times the compute capability of the GTX 460 should have considerably less trouble.
I had to let go of smooth and subtractive CSG to get to this performance level, and the primitives must be convex, because the abysmally slow sphere tracing has been swapped for Newton-Raphson iteration, which can iterate towards convex surfaces in less than 8 steps, and even more importantly, is able to provide correct conservative bounds (the inner bound dissolves for features thinner in depth than the cone width, which is not so desirable. Any takers?).
(smooth) CSG using intersections is still possible with Newton-Raphson, as the intersections of two or more convex primitives are also convex, but these operations can not be performed directly as viewport objects are accumulated, but have to be performed in separate registers first, and I couldn't figure out a good data model for that yet.
If you would like to implement a similar system, here's roughly how the renderer works:
The efficiency of this method stems from the implied early occlusion culling and the refinement of contours only, which approximate curves with infinitely small thickness. N contour tiles are refined to 4N tiles on average, while the number of overall pixels increases by factor 16.
Some implementation notes:
f(t) - bias*t / (min(screenwidth,screenheight) * sqrt(2)) = 0where f(t) is a distance function, t is the ray scalar, and bias is either 1 or −1 depending on whether we want the outer or inner shell of the primitive. Subtracing the linear term effectively narrows the distance to the ray as it progresses, turning the ray into a widening cone with a spherical cap. See this shadertoy for a reference implementation.
As mentioned I tried raytracing again, RevAA, then back to SDF and then developed a piecewise polynomial evaluator for rays to speed up convergence, but it's simply too slow. I wrote a bit about this here. To sum it up, when evaluating complex distance fields, every iteration hurts. Just one to two iterations per frame are okay.
So I tried the next thing, based on an idea that Timothy Lottes had, about using particles to represent the scene, and am experimenting with points converging on a distance field scene along rays. Here is a video of the technique in action. Performance on my GTX 460 is fairly good (a steady 1-2ms, because the work is completely uniform, regardless of scene complexity), while image quality is terrible, but now I'm optimizing the picture rather than performance (could be worse: I used to work on both at the same time), which feels like the right domain to think about.
Several avenues of improvement remain: a) Attempt to no longer limit lifetime of points and instead do a perfect detection of when points are truly dead. b) Seed points not at random locations but only to fill cracks. c) Seed points not at near plane, but closer to depth at surface.
Some more details to these directions:
a) These points are recycled so far: backfacing points, points that leave screenspace. These points need to be recycled as well: points covered by the Z-Buffer, points in regions with too high point density. It all hints towards maintaining points not as a list of coordinates, but directly on the framebuffer (on a grid), because this way the grid automatically culls backfacing points, points out of screenspace, points covered by zbuffer and surfaces with high point density. Storing a point per pixel implies all those things. (However it is a big problem that I can only write 32-bit atomic-min values in GLSL, which limits the amount of information I can store with a single pixel. I will need two passes: one where points scatter their new depth values, and one where the same points evaluate if they've won and if so, write additional information for that pixel. Ho hum.) b) Seeding new points to fill cracks requires either a histopyramid on the Z-Buffer to enumerate empty pixels, or, when points are stored in the framebuffer, this can be part of the iteration pass. c) Seeding new points closer to depth at surface to reduce convergence time: not sure what to do here. Using the (unprojected) previous Z-Buffer seems okay most of the time unless stuff fades in from the sides, or new objects appear. The danger lies in picking a starting depth that is behind a distance field volume, effectively clipping that volume from view.
Happy new year everyone! We just whooshed past our original deadline, which was the end of 2015, aaand I'm still doing R&D.
What transpired? Reducing my tetrahedral mesh to a triangle mesh in 2D still left me with an approach that felt icky to work with. I still didn't know how to do painlessly procedurally generate structures with triangles. What I could work with, however, were lumps of classical game objects as I used in the last 2D prototype (a small demo movie here), but I thought that would bring back the old object vs. terrain dichtonomy, and I always wanted a single system to do everything, as the world is also not split into living and dead things, but everything can fluidly transition from resting to animate and back. Splitting a game into world and objects was a performance hack that ended up with the creation of two systems that are incompatible to each other, and make transitions hard.
Then Media Molecule started demoing their Dreams project (just look at this awesome demo and how easy they can create complex topology with it). I had occasional talks with Alex Evans about the technology they were using, and we shared our R&D woes in the past years. A few years back I had dabbled with similar systems for NOWHERE. I had thought that their approach would not work for the huge continuous planetary landscapes that I was envisioning. But as my dream of tetmesh backed giant realtime transforming asteroids was collapsing under the weight of what computer consumer hardware can realistically do in 2015 (it's still a great idea - for the computers of 2030!), and admitting that I was unable to explain to myself how I would actually teach a program to generate and alter shapes this way, I took another look at Alex' SIGGRAPH slides and repondered on the intersections of our projects.
Then it dawned on me.
The first thought was that since composition of operations was trivial with Media Molecule's smooth CSG approach, it would also be trivial to write procedural generators for it: smooth-add and -subtract a bunch of boxes and ellipsoids, and evaluate the resulting shape as a point cloud. Then compose the whole world out of these objects. An asteroid for example would then be simply a huge blob made from many medium-sized rocks clumped together. Digging a tunnel would be removing rocks from the structure. It would have the added benefit of keeping surface irregularity; any dug tunnel would still look rocky. And because we have smooth CSG, we can blend structures and make them seem more continuous instead of mashing up discrete elements.
That brought me to the second, and even more important realization: I was approaching the abandonment of the terrain/object dichtonomy from the wrong angle. All this time I thought that the way forward would be to understand objects as a special case of terrain, that any game model is a sort of mini-terrain - one volumetric body geometrically built out of a connected graph of tetrahedral cells with different attributes. As this was how the physical world worked: as a fluid laplacian field where everything was connected. The wave-like way of seeing the world.
Sidenote: It turns out that while graphs are the lowest common denominator of all data structures (you can describe any data structure as a graph), computers are exceptionally awful at parallelizing any form of graph work. It takes a lot of research to make these things fast. It's much better to work with trees, as trees only branch - they don't merge again. It's even better to work with linked lists, as lists don't branch - they just skip. It's absolutely excellent to work with arrays, as arrays neither branch or skip, and can just be traversed linearly, a job that parallelizes easily. GPUs want arrays. It is possible to describe a tree in form of a list, and a list in form of an array, and most certainly can a graph be described as a tree, but these reduced data sets are lossy and therefore immutable. Triangle meshes make editing bearable by keeping a few connectivities invariant (e.g. every triangle edge is adjacent to only two triangles), but graph editing incurs a constant chasing of pointers (implying random access all over memory) that is detrimental to performance of contemporary computer architectures.
Anyway. It did not occur to me that the terrain / object dichtonomy could also be approached from the other side: that terrains can be described as superset of objects. Our smallest atom is no longer a vertex, a singularity, but a geometric primitive that has surface and volume - the particle-like way of seeing the world. Any terrain can be composed from primitives, any game model is a composite of primitives anyway. We've been doing this for years already, in our rigid body physics systems. Assuming that we can find a simple way to smooth the surface gradient to get rid of hard corners that invariably form where primitives overlap, the result can look just as convincing - and Smooth-CSG offers just this approach.
Philosophically, it makes no sense for a computer game to attempt to do a full physical simulation. This way, we end up with an abstraction that is so occupied with every atom of a tree, we lose all semantic understanding of the tree. Traditionally, games prefer to organize their world semantically - the world not as we experience it, but as we make sense of it, and how we communicate with it. Every game object's appearance exists merely to convey its semantical points: the color of leaves of a tree may communicate the tree's genetic heritage, the season we are in, the quality of the soil it is embedded in, the cleanliness of the air that it breathes, the presence of parasites and how much sunlight it is getting. A physical simulation attempts to recreate a system where the semantic understanding is an emergent effect. A game takes shortcuts to explicitly recreate emergent effects without the systems that produce them, and so discards a wide range of possible emergent side effects for the efficient mapping of the few relationships it cares about. It is an edited, condensed version of reality forced by the evil ever-present constraints of computer performance :-)
“That is all nice and well”, you may interject impatiently, “but what have you actually done in the past weeks?”. How rude of you - but also how right you are!
The first thing I did was attempt to recreate the distance field evaluator that Alex described in his slides, first on the CPU. I started by assembling some of the algorithms I would need, such as a 3D hilbert curve encoder / decoder (here's a video of the 2D version). Then with the help of @rianflo (who runs the truenext forum on Slack, a place where developers interested in writing the next generation of dynamic world games meet to talk business). I wrote a few max-norm distance functions for popular primitives such as the box and the ellipsoid, and did some initial spatial bisection tests in 2D (picture). Then I wrote and experimented with a GPU backed 3D point renderer (a beautiful picture here) and added hilbert curve ordering (picture).
The last CPU-based version of the evaluator randomly spammed space with objects (picture), and then I.. I concluded that the whole thing was already getting too complex for dynamic scenes and threw it away.
That's not entirely how it happened. In the middle of it it occurred to me that my old friend Interval Arithmetic (IA) could help describing more primitives than max-norm, which has no composable arithmetic rules, would allow (in fact they often lead to identical results). Then it transpired that, since IA operates well in distorted domains, I could perhaps do the evaluation directly in window space and use a Z-buffer to cull hidden and occluded surfaces early (2D prototype picture). I prototyped a few more IA operators (picture) and rendered a beautiful testing scene here, which convinced me of the practicality of working with special arithmetic rather than distance norms.
It turned out that Alex had also written about a frustum renderer in his slides, but this one operated on trees of volume textures, while I went straight to dynamic surface evaluation. Alex helped me a lot by explaining the minutiae of his approach, which allowed me to make more informed decisions. I had to learn how to write compute shaders at the same time, and I'm still not entirely fluid in it, and am forced to figure a lot of things for the first time.
I wrote a first prototype of a GPU version (without CSG evaluator, just a simple implicit surface for testing). Here's a video of how the implicit surface frustum rasterizer (ISFR) converges towards a solution. A problem that I wrestled with was the behavior of evaluation close to grazing surfaces. Many surface samples can become relevant for the same pixel, and the evaluator tends to do 3-10 times more work than necessary on the finest resolution. With optimal scenes, the frame time can go as low as 1.5ms, while it climbs to 20-30ms at grazing angles. Anyway, here's a picture of some beautiful smooth-min CSG rendered with the ISFR.
I also did some ground work on a unifying function for many different primitives (sdSuperprim shadertoy) and smooth/hard CSG operations (Soft CSG Superlogic shadertoy).
Another issue with the ISFR is the required log2(N) stream expansion steps, as the frustum space has to be recursively bisected, and, to be honest, Interval Arithmetic is not very good at dealing with rotations and affine (shearing) transformations in general. Raytracing with IA has been done before but seemed particularly wasteful, as this effectively constructs a box around the ray, and reports positive for any surface inside that box, often neccessitating a bisection even when the ray is not even close to the surface.
So I bit the bullet and looked into the more complex and confounding successor to Interval Arithmetic, Affine Arithmetic (AA). That is where I am today. There is a beautiful paper on raytracing and voxelizing implicit surfaces with revised affine arithmetic (RevAA) that I am following currently. I have written a prototype on shadertoy that demonstrates how effective RevAA can be in finding zero-plane intersections between a ray (which remains error-free under rotation) and an implicit function.
Classic intervals describe a box around the range (effectively offering us a binary choice between “interval contains surface” and “interval does not contain surface”), and there is no other way to refine the result further but to bisect it into two parts and continue bisecting until the interval reaches our desired resolution. Not so the revised affine interval. Because each affine interval is a parallelogram (see shadertoy prototype), not only can it tell us whether it potentially intersects the surface, it can also prune the range further in the same step, giving us a far smaller search range to bisect. In the ideal case where the function is linear (as when evaluating the faces of a box), the resulting intersection width can even be zero, terminating our search after a single step!
This means that when evaluating surfaces with RevAA, we only pay for surface complexity (high curvature) and depth complexity (many layers of surface behind each other). Unlike with IA, grazing angles of near-flat surfaces are “free” because of the affine component and the resulting pruning efficiency. Each affine computation is costlier than classic IA, but we need less of them, netting a win (see the benchmarks in the paper).
I am now inclined to reconsider raytracing, as using RevAA to do spatial refinement doesn't offer the same ease of pruning, and on top requires two more affine components. In 1D ray evaluation (where the ray travels on f(x)=0), the range of intersection describes a simple interval (a line). In 2D evaluation we measure the intersection of a zonohedron with the f(x,y) = Z = 0 plane (which is our ray lifted by one dimension), so the range of intersection describes anything from a line to a parallelogram to a hexagon. Limited to searching a rectangular domain, we can only refine a bounding rectangle, greatly diminishing the benefit of pruning. You can imagine how bad the 3D case looks, where we do the intersection in 4D.
These are the problems I'm currently facing, and which I attempt to tackle in the next prototype:
min(a,b) = (a + b - abs(a - b))/2. Fryazinov's paper gives a step by step explanation on how to construct new operators, and that's what I'm studying right now.
Many additions to the 2D prototype, primarily figuring out the basic mode of movement, which is now a sort of flat version of first person controls; WSAD control lateral movement, while horizontal mouse movement rotates. I'm planning to test a depth-adjustable crosshair for vertical mouse movement. I started out with fish-like swimming controls, and while that was interesting, I think the basic mode of transport should offer high precision, as most of your movement in early life has high locality. There will be many other combinable forms of movement suitable for different activities later on, including the tether-based transportation we had in the last alpha.
The next bit to do is one of the most important mechanics in the game; In NOWHERE, your avatar is identical with your inventory. What you have is what you are. Therefore a way to manage your appearance is needed. I call it “self-assembly mode”. In this mode, bits and pieces floating by are held by your avatars electromagnetic telekinesis abilities, can be rotated and attached to your body, and assigned to keyboard keys and controller buttons. It's a sort of diegetic version of the Besiege interface. This way, editing appearance, special abilities, loadout, role, character stats, inventory is all folded into a single, infinitely expandable system. There will be no additional skill tree. One look at a Nowherian and you can tell what their primary profession is, based on the bits and pieces they are made of.
I also need to reintroduce the GL rendering pipeline so the screenshots become more presentable again (and the next alpha doesn't look too terrible). Right now I mostly use NanoVG to render debug graphics.
Lots of improvements to the engine codebase as of late, primarily obvious modernizations and ports from the old Python-based codebase:
Business wise, in order to avoid confusion among our founders, we talked about splitting off the 2D part into a separate prequel, working title “NOWHERE 2D” or “NOWHERE Flatland” ;-) The 3D version will still be completed as promised, but will have to be delayed because, as it turns out, directly developing in 3D without solving most of your gameplay in 2D first is so hard, I would wage that with what we know now, a 3D version takes the time of doing a 2D version, squared, if the 2D version isn't worked out first. I was more than frustrated with that realization, but then remembered that even GTA started out as a top-down 2D clone of the APB arcade game. Knowing how to do small iterations on your work is key, and sometimes you have to go three steps back in order to go six steps forward. Or something like that.
Anyway, the idea is that each founder gets a NOWHERE 2D download channel in his humble bundle account, with a separate alpha, and eventually both versions will be developed alongside, with the same engine. I hope that this way we can make up to our founders for the extended duration of development, and have a more productive time doing it. At this point I'd like to say that we have amazing founders who have so far been more than patient and understanding with how things have been going, and I'd love nothing more than make all of them happy. One thing we'll never do is abandon the project. To Sylvia and me, it's still the most interesting thing to do in the world. As the saying goes, “It does not matter how slowly you go as long as you do not stop.”
I finished the new input mapping system today, which allows to bind arbitrary input events to logical events. A declaration of a mapping is as easy as
input-mapping <name> <option> ...
An example declaration is:
input-mapping left ; if input is an axis, negate the value flip ; set an event handler for button presses on-press on-activate ; specify a list of default bindings defaults $ Input.axis-leftx Input.key-left Input.key-a Input.button-dpad-left
The system allows to register handlers for released buttons and value changes, and to set press/release threshold limits and controller id filters. The bindings can at any point be cleared and re-assigned. The generated event handling code is completely inlined.
I'm happy with how it works now, as the basic setup is open to future extensions that don't force big changes in the interface.
After struggling to implement the position based dynamics system I bit the bullet and postponed most world physics features in favor of working on gameplay, controls and AI. I went back to a simpler 2D physics system, Chipmunk 2D, and am now reimplementing player movement controls and the body editor. At this point, considering the pressure we're under to deliver, I lost all interest in doing more world engine research and just want to get it done. 3D and malleable worlds will have to be done at a later point when our funding level is better and we have regained the trust of our supporters.
The None programming language is now reasonably complete and I work with it every day.
Mesh refinement is working nicely now, and I wrote an article on how it works exactly for anyone who wants to implement it as well.
We were away four days to help my father-in-law move. I thought a lot about how to do overloaded names in None in a generic way and arrived at this solution (this is the test case for this feature), but could still not figure out how to deal with the corner cases of declaring method names this way.
In the end, I unified the way methods are called in None (both dynamic and static part). Before, there were three ways to do it:
(--> object (method-name ...) ...)which allows to chain method calls, but it turned out I hardly ever used the chaining feature.
(object --> (method-name ...)), the infix version, which was also cumbersome to use.
(object.method-name ...)which only works with specially rigged classes and descriptors in the dynamic language, and specially declared struct functions in the static language. That means all imported Lua objects, including strings, can't be method-called with this notation.
There was a bit of Twitter conversation about this topic with Michael
Zinn, and this was why I tried the
overloaded name solution (comparable to how Haskell and similar
functional languages do it), which effectively would allow something
(method-name object ...) and make methods indistinguishable from
regular functions. Alas, this approach doesn't work well with dynamic
languages (or rather creates more confusion than it's worth), and so I
ultimately settled for this notation:
(.method-name object ...)
The notation is recognized by both the static and dynamic list hook and translated into method calls. I removed the old notations and fixed all tests and dependent code, and now it's all neatly uniform. Should have done it like this from the start, good that I caught it in time.
Everything is now fun again. I also experimented with building a Voronoi diagram from the self-optimizing delaunay mesh. It seems that six neighbors (three inner, three outer vertices) allow to build a reasonable map, but there are often situations where this is not the case, because the mesh is not fully conforming. Considering the set of the triangle vertices full 1-ring neighborhood would give a complete map, but be more computationally expensive.
I also began work on the Position Based Dynamics physics system for the self-optimizing mesh, but in order to be able to play around with it more and devise interesting test cases, I need to cut down turn-around times and work on an editor for it, which brings me to a painful old problem, designing a good UI system.
I already wrote about this earlier in the devlog (7.9.2015), but I haven't progressed very well on this item so far. I guess it's time now to get it done, so I can move on and have more fun playing with physics.
Earliest footage of the position based dynamics physics system animating a bunch of points. Mesh refinement struggles keeping up when there are more than four points on screen, eventually a flip fails and the system collapses, don't know why yet.
Short update: I went back to the trimesh prototype and fixed all the
mesh operations that broke by porting to newer syntax. Next up is
finally implementing that purely gather-based approach I talked about
last month. It was
useful to devise a template for handle types (
to detect parameter mistakes early. I also wrote a general
Mesh.validate function that checks if all links within the mesh are
coherent and no triangle is degenerate, that allowed to fix functions
Some things were added as part of fixing the prototype:
float==comparison function now for doubles and floats that checks equality within half epsilon (the precision limit of each data type). Linear Algebra programmers know how important that one is.
crossfunctions for Liminal's GLSL-style vector types (it's growing more GLM-like by the day)
liminal.hashmodule that provides a generics-based wrapper around CityHash64 for both static and lua (hashes nearly anything you throw at it).
And two small things about None:
__globalcan be used to register new global names in the translated module scope-wise, and also redirect global names to a name that holds a table within the file.
usinghas been changed to directly require a module name. The macro loads the module during expansion and pastes
__globalstatements for the exported symbols into the source code. That means that
usinghas no further runtime cost than retrieving a name from the dictionary now, and therefore performs much better.
It could still be possible to devise a
using that resolves table
attributes in local scopes, either by passing the attributes you
want to access explicitly, or the name of an attribute list
(effectively a sort of type) that is available at compile time.
?yet, I might never do it or do it on a lazy afternoon. I can imagine that the Lisp-style
ifmight trip up a lot of people (it still trips me up), so I still feel that would be the right move. What do you think?
What happened! Another 10 days gone!
Some work on Liminal (the game engine):
Lots of features and fixes for None (the programming language):
lfsmodule (LuaFileSystem) in its path to be able to generate the necessary timestamps, otherwise cached bytecode will never be loaded.
meta-dothat have side effects other than altering the active translation context had to be removed because they can't be saved with the bytecode; That means all
syntax-*-globalfunctions operate in the script context, and newly registered syntax only becomes available after execution of the script. Also,
syntax-exportno longer allows to export the active syntax table, but has to be called in the script context with a table created by that script. That fix is usually quickly applied, but syntax tests typically need to be moved out of the module that defines them.
nilis now named
null, because None lends other keywords from C (
var), and therefore
nulljust fits better.
ast-list-unpackfor unpacking expressions with patterns
syntax-rulesno longer takes extra arguments for literals and hygienic variables. Instead, a literal can be defined inline using the
_:literalprefix, and hygienic names can be prefixed as
$:tempvarthe first time they appear in the template.
syntax-rulesalso uses pseudoquotes now, allowing to
unquote-splicedynamic expressions into patterns and templates.
switchmacro as an abbreviation to
enummacro working similar to C's
enumthat also stores the reverse lookup in the generated table and the enumerator count (last enumerator + 1)
tryhas an alternate notation friendlier for naked mode
Two more changes to None that I want to see through next:
ifworks like its Scheme and Lisp counterparts, but is in a modern sense much closer to C's ternary conditional operator cond?a:b.
condhas an unsual syntax for people used to classic
if. Therefore I'll experiment with renaming
?and adding an
ifsyntax that is convenient to use in naked mode and follows the structure
(if <condition> (then ...) (elseif <condition>) (else ...)to avoid surprising novices who come from other languages (it also trips me up when I switch between None and other languages).
(object.method ...)as one would expect.
Can I do the same thing for static structs? Thanks to terra's
metamethods, I can indeed, and now structs can also define methods
that use the simple accessor; LLVM's code generator is very
efficient and completely optimizes out the intermediate construction
of a struct that holds the instance reference.
- I work on better error reporting while I translate the TriMesh
prototype to use the new method syntax. At that point I realize that
static AST nodes will never get annotated properly with the present
system, and I return back to the task of realizing the bytecode
- It turns out that luajit-lang-toolkit is relatively simple, almost
fun to use and I refactor
eval_none.lua to output luajit AST nodes
instead of S-Lua in a day. It's amazing how quickly things are
- I also find out that the LuaJIT bytecode has no problem with
symbolic names that are considered illegal by Lua's parser.
my-f*ing-amazing-variable is a legal name to use there, and will
even be honored by LuaJIT's debug output. Lovely, but I'm not going
to make use of that fact yet.
- The S-Lua -> Lua AST generator is no longer required, neither is
the AST -> Lua code generator, so I remove them both from the
project. Good riddance.
- Lots of fixes to get coherent line number annotation all the way
from the parsed symbolic expressions to generated bytecode and
- A new
__pragma special form that allows to turn compiler features
on and off for debugging. Very helpful for figuring out why new
features (in this case it's inlining pure Lua expressions to avoid
too many temporary variables) break production code.
allows to dump generated Lua AST nodes, equivalent lua code,
expanded symbolic expressions, and the linenumber and filename that
AST nodes will be tagged with.
- The static DSL tags each and every expression with a filename and
linenumber, which I don't have to do anymore now. I add a rather
elaborate visitor pattern that patches each and every terra AST node
with the correct anchor, and that turns out to be pretty elaborate
work. Terra has too many AST nodes.
- One odd bug is that line numbers passed to the bytecode seem to be
modulated by 256. I guess that this is because LuaJIT uses the
lastline bytecode tags to pack linenumbers into
8-, 16- and 32-bit numbers, and indeed, once I generally define
lastline = firstline + 65536, the problem is gone. It's a hack but
I'm not going to expand the parser now so that it always supplies a
valid lastline. I hate the parser.
- 10 new tests have now been added as part of debugging new issues in
the prototype's codebase.
So there we go, bytecode translator all done. While I'm at it, I do some more cleanup:
__escapeto stop the macro expander from expanding any further, etc.). The static translator however still uses its own macro expander, and that means all linenumber propagation work that I did, I would have to do here again.
So I refactor
static.n to use the internal macro expander, and now
it's all splendid. It still has its own “compiler” stage that
translates the expanded forms to terra AST node constructors, but
given that the static “compiler” does next to nothing, I wonder if I
couldn't completely merge that stage with the macro expander.
Which brings us to the present day. The TriMesh prototype currently renders no content (but at least the hit testing works), so I'll have to fix that next up, and then I hope I can finally work on the game editor.
Now on to financial matters.
It appears we are running out of money again. The long waiting time for new content has eroded the number of new founders, and one of our long time patreons had to reduce his monthly support because he's also in a financial pickle right now. We also used to borrow a small complementary amount of money from my mum each month and are about to reach the cap next month, so we will also run out of that funding pot by end of October. Sylvia is still waiting on payment from one of her commissions (this was before Sylvia began to require advances), but it is very likely they screwed us and we're not going to see a cent. There are currently no new commisions in sight for her, but of course that can change any day.
That means we have to think about how to get fresh funds before September runs out. I've considered starting a Patreon for None/game engine development exclusively, so we are at least getting support for that part of the work (which is not what Nowhere players are necessarily interested in, although they damn well should ;-)), but Patreon only allows one page per account, and managing that one account is already hard work for Sylvia, so we'll have to merge the tool development part with the page and make it about more things than one. We'll have to think about how to best present this.
We also need something new to show so we can justify doing a new video and invite everyone to check out the new alpha. Sylvia has done lots of new concept art that could go into a new video, we can finally exchange that old dreadful logo, and replace “Psychedelic RPG” with a much more fitting “Alien Life Simulator”. The fundraising video is nearly 2 years old. Back then we gave away the end game without mentioning many of the intermediate steps, and I believe people would be more inclined to trust us if there were a more detailed plan of action, and we would be more clearly on our game being developed as a service, not as a one trick pony.
It also becomes clear that we will not be able to deliver on our full promise by the end of 2015, but I want to do a release anyway with what we get to work by that date. We don't want anyone to feel disappointed or exploited. The plan is to ship what's there, then continue on a “sequel” (which is really just a new major version) and keep our founders on board until they get what they paid for.
I can not believe how fresh our concept for Nowhere still is. The pitch has been out for several years and no one has stolen it from us, and I guess that's because it really is that ambitious and forward thinking. We have ideas for art, sound and gameplay design as well as programming, modding and player service which are quite different from how most game development is done today, and I would love to see all our goals realized in exactly the way we envision them.
I don't mind if it takes ten years to get there, but we need to compromise on the progress on the way and reintroduce regular releases of alphas, betas and “masters” to keep the company operational without incurring additional debt.
I wanted to hold off new alphas until we had a game model we could definitely stick to, but I see now that I have to get over my perfectionism and, as the mantra goes, release early, release often.
Lots of optimizations on the trimesh prototype. Streamlined the structs
and interfaces, translated the remaining shaders to the new GLSL
compiler (that went remarkably well, just had to add a missing
node). Now I'm back to thinking how I can add more map editing features
to the prototype, and the challenge of writing UI annoys the hell out of
me. I currently have no GUI framework I'm happy with. I considered
embedding ocornuts C-wrapped IMGUI, but I don't like how it's set up.
IMGUI attempts to provide a ready-to-use library of widgets, very much
the classical approach of UI frameworks. The problem with that system
is, that while it makes it easy for users to slap together a bunch of
preset widgets, they're typically completely on their own when they want
custom widgets - and they always want custom widgets. My OUI library
has a minimalistic abstraction, which I like a lot, because it is
agnostic to theming and allows to build any sort of widget from a
handful of configurable rectangles, a very Scheme-y design.
One could say: if IMGUI is the GL 2.0 (with lots of extensions) of immediate UI, then OUI is Vulkan (less presets, more combinatory opportunities) ;-) but OUI has a design flaw, which is that the way items are cached between frames (to maintain coherence during stateful operations) leads to the UI losing track if the order in the UI layout changes, and all in all, I had heavy optimization issues with this touch-everything-in-one-frame approach. It's awfully difficult to both track and cache operations when your UI objects are all supposed to be ephemeral.
Therefore I have begun work on a OUI2 (implemented in None), which is going to use a retained mode again. The vision I have for it is to become a minimalistic system to facilitate a browser-like DOM (something that I tried to write with OUI before). I enjoyed designing webpages in the browser more than working with any UI library, but I despised the shit stacking and the legacy baggage. Considering that symbolic expressions drive everything in my engine, I believe this is the perfect abstraction for GUI / data visualization / websites etc as well. Where Chrome has HTML/CSS/JS syntax, I'm eventually going to have S-expressions with just three different domain specific semantics (of which the JS part is already done).
The principle of items (the lowest abstraction of a widget) and the layout algorithms (box model, word wrapping) are kept, but the items will remain persistent in a pool. I'm also going to make changes to event handling. OUI will have an outgoing event queue (& associated callbacks) for abstract item events (hot/cold, clicked, focus/blur; drag, drop; etc.), and there will be no more polling of item state, as that technique is frame-dependent and typically ends up being lossy. The modifications open the door for implementing future optimizations like partial layout refreshes and draw surface caching.
The trimesh map editor is OUI2's first test case. Considering how many GUI systems I've used and written before, I hope that I'll end up with a lean minimal UI system that remains expressive and resourceful for years to come.
I managed to get the GLSL compiler to a working state, and the trimesh subdivision shader (which is fairly complex and uses three stages) is now working perfectly; https://gist.github.com/paniq/aff6395684c22aa978d1 is translated to https://gist.github.com/paniq/ad0783d4c61a5ba3f72a. It's now going to be comfortable to write and interface with new shaders. I still need to add a few small tweaks and improvements, plus a new testing mode that allows to directly test shader functions on the GPU and read out their return values. After that, I can start working on a faster Delaunay transform shader.
Not so much happened the past 10 days; We visited Sylvia's dad and then fought against the looming heatwave. Our studio has a mobile climate unit that eats a lot of power and makes the room halfway bearable. The heat is distracting and it's difficult for me to think about the current problem that I have, which is transforming a triangle mesh into a constrained Delaunay triangulation on the GPU, but without compute or atomics. I rubberducked the problem within this article, but the write-up got so long that I saved it here instead: https://gist.github.com/paniq/e09a9f6c5b8b9aa1b835
Lots of things happened the past two weeks, along with my birthday. I took three days off to catch up on recent games.
Conspire had matured enough that it was time to single it out into its own repository. Oh, and I also renamed it. It's now called “None”, which is a backronym. The website (which currently just redirects to Bitbucket) is at http://nonelang.org. The language should run on Linux, OSX and Windows. There are a dozen test cases now that demo aspects of None.
There are too many small changes to None to mention here, best check the commit log: https://bitbucket.org/duangle/none/commits/all - The most striking improvement is that the largest parts of macro expansion are now accessible as hooks from within the language, scoped, so that the DSL can be completely swapped, and global macros overridden. The static language still uses its own expression parser, but that is no longer necessary. Using the built-in macro expension has the benefit that the error reporting facilities can be reused. Expressions that cause trouble are included in the traceback.
After all these upgrades, I thought that it might finally be doable to try a GLSL backend, and that's what I'm sitting on now, the shader DSL. The general idea here is that you can compose your shader functions along with your dynamic and static functions (semantically, syntactically it's almost identical).
The GLSL code generator implicitly figures out the necessary
dependencies to generate vertex, fragment and geometry shaders (compute
to be added later). This becomes most convenient when dealing with
out variables that connect the stages with each other,
vertex streams and framebuffers; In the shader DSL, they're called
shared variables. The code generator tags these using a simple
heuristic: when a function sets a shared variable, it's an
this shader stage, otherwise it's an
in. If the shared variable is
neither read nor written to, it is not declared. This way, a shared
variable only needs to be declared once, and is then included in
generated code where needed.
There's also support for declaring overloaded external functions that are either built-in or residing in native GLSL source code. The code generator does a type inference pass, matches applicable overloads, enables the extensions and includes the source code required to make it work. To avoid notation fatigue (a condition that I just made up), type inference is also used to figure out when a numerical constant is a float, and when it's an int, with strict implicit conversion rules: an int can be upgraded to float, but not the other way around. Overloads first try a perfect match, then an implicitly casted one.
Then there's the implicit goodness of None's powerful macro system: dynamic functions can build permutated shader functions from templates and quasiquotation. Shader functions can directly inline constants declared in dynamic code. The system is no longer hermetically sealed off from the rest of the codebase.
I hope these extensions will make writing shaders frictionless enough to make me want to write more of them. I have the largest part of the code generator done, and am finishing up on the top level declarations, namely uniforms and uniform blocks. My first test case is the shader program I use in my 2D remeshing prototype, which is a fairly complex shader with three stages (the code is here https://gist.github.com/paniq/e5b5ebd4797065f33a57)
Once that works (should take this week at most), all hard language work is done and I can return to continue work on the prototype.
More features for Conspire!
rmlc-prefix has been changed to
static-. The RMLC DSL is now simply called the static DSL.
assertis now a macro; The second argument is only evaluated when the assertion fails. There's also an
assert-exceptfor unit testing against code that is expected to raise an error.
usingnow also works in the static DSL.
column-arrayarray type constructor.
dprintis like print, but outputting the module name in which it was called.
=#operator comparable to
reprallow introspection into tables and types in form of formatted symbolic expressions.
...argument is now passed unfiltered and argument unpacking is now a special form (
__unpack). Lua's select() function is available as
__select, allowing to enumerate varargs.
$operator now serves both as table constructor and namespace for table operations. It's a little jQuery-like.
length-ofgot renamed to a Pythonesque
*operators will expand to
^*if used as unary operators.
What... what happened? I spent all week writing all sorts of improvements and conveniences for the language and forgot the time because it's too much fun. Let's see.
exceptsyntax sugar for xpcall.
*can be used as symbol prefixes, as shortcuts for
(^*). Makes de-/referencing more C-like.
alias-castsyntax sugar, similar to C++'s
usingmechanism that allows to hook scoped lookups for variables, inspired by Jonathan Blow's extension to C++'s
usingin his Jai language. I believe my implementation is generic enough that it allows extending the language with new lookup strategies from within.
The dynamic GLSL front-end was fun until I ran into a tough problem: how to import the function declarations contained in existing GLSL shaders (and I have a lot of those). So I'll stick with the previous GLSL shader generator for now.
Therefore: the yak is shaved! I ported the trimesh prototype I talked about on 10.6.2015 to conspire2, everything is ready for a thorough refactoring now, and work on the main game can continue.
What are the features of conspire2 so far?
So, my branching problem got fixed with an alternative solution suggested on the mailing list (https://mailman.stanford.edu/pipermail/terralang/2015-June/000441.html ). I'm happy to report that the RMLC (runtime machine language compiler) works without much further changes, and is completely expression based. Writing some fast assembly now got super convenient.
I also added a bunch of macros for struct declarations and dynamic class definitions, based on the 30log library.
I'm currently sitting on a GLSL compiler with a syntax analog to the RMLC compiler, and have just completed a markup tree -> C-like code generator implementation to that end. The markup tree will be generated from a GLSL AST which in turn is constructed from a semantic parser.
This way it will be possible to mix dynamic functions, static functions and shader functions in the same context, which all use more or less the same syntax. Different shader main functions can use the same dynamic declarations yet generate completely different material shaders from them. The GLSL system will then also be able to use Conspire's module import system, convert struct declarations into uniform structs and other niceties. I hope writing shaders will then no longer be a total PITA.
Turns out the bug report was necessary, I submitted it here: https://github.com/zdevito/terra/issues/99
As long as this isn't fixed, I'm not quite sure how to proceed with
branching support. I can't code the ternary conditional as a regular
if-block because those are statements; I would have to first declare a
variable, then set this variable in the
then or the
else block, but
the type of the variable has to be negotiated between what the returning
expression expects and what the two branches contain. Also, what to do
with situations where the final result is thrown away?
void can be
passed as a value for these situations, but it would help to know
beforehand that these expressions only matter as statements.
It looks like I'll have to revert the RMLC code generator to the same
putdest structure I use for the Lua translator. Sad, because I thought
terra's expression quoting system was flexible enough. Great, because
this gives me more control. If generators know what an expressions
destination will be (return value, variable or discard), and more
importantly, the destinations type, they can annotate temporary
variables correctly and generate different code.
(if cond-expr then-expr else-expr) can produce a ternary
conditional expression when the target is a variable or return value,
enforcing both branches to return types compatible to the target, or a
if statement block when the return value is discarded.
Terra and the LLVM compiler both optimize out excessive use of temporary variables, so generating code in this fashion should be fine, except for immutable struct types declared on the stack, where temporary values might evoke unnecessary copy constructors. I'll have to do some testing first to know what happens here.
I added the new RMLC DSL (RMLC = runtime machine language compiler),
which is designed to be a full analog of the dynamic language DSL, only
with static typing added. This way, it is easy to switch a static for a
dynamic function or vice versa. Terra makes it easy to write code that
is completely expression-based, thanks to the
quote a in b end form,
but I've been running into an evaluation issue with
the ternary conditional operator, where quoted statements get executed
regardless. I'm currently building a minimum reproduce to test with the
latest version and submit to the Terra project if necessary.
Before I could start re-adding the LLVM DSL I had to re-implement a
bunch of syntax forms to make the subsequent task a bit more
comfortable. In the process a few more special forms also got added. I
which makes writing syntax rules easier),
(cond) (Scheme's variant of
(import-from) (which automates the task of
declaring local variables and initializing them with values read from a
table, sort of the counterpiece of
imports variables from arrays, and
(foreach), with support for
auto-unpacking iterator results.
The first thing I did today was write an infix parser that rewrites
(func 1 + 2 * 3 arg2 4 + 5 arg3) as
(func (+ 1 (* 2 3)) arg2 (+ 4 5) arg3). While the most popular
operators are already bootstrapped, code can register new infix
operators that will be translated to regular s-expressions. Infix
operators are always treated as binary. Unary prefix and postfix
operators are not supported.
Unlike in Lua or C, operators and arguments need to be properly spaced
apart because the s-expression parser doesn't know anything about
operators, so e.g.
a+4 is read as a single token,
a + 4 is read as
After running into some semantic difficulties (it is simply easier for
syntax rewriters to assume that an expression list guarantees one
expression per token), I decided that infix operators always need to be
declared in dedicated parentheses; So
(func 1 + 2 3 + 4 + 5) must be
(func (1 + 2) (3 + 4 + 5)).
Variables are now running on the stack again, and
an ad-hoc table.
backquote have been ported.
The next step is to re-add the LLVM / terra DSL. I'm thinking of a style
that is transparent to the main language, with types added e.g.
(var name 0) becomes
(var name : int 0),
(function name (arg1 arg2) expr) becomes
(function name (arg1 : int arg2 : int) : int expr) or
(function name (arg1 arg2) : (int <- int int) expr).
The past few days I continued work on the new Conspire interpreter, which is coming along nicely. The special forms expanded to 12, a number that is continuously in flux. I'm still in the process of bootstrapping all language features in the core header, and today I managed to move syntax expansion to its own pre-pass before compiling.
There's one central language feature I'm having trouble implementing, which is hygienic macros for syntax-rules. I tried parsing and rewriting the templates, but it's difficult doing this without replicating the syntax expander.
The other feature I'll need to change is the way scope variables are captured; I'll move this back to the Lua stack, mapping generated symbols to the actual names to support free naming. Currently, the values are directly written and read from lua tables, which complicates the code and may cause problems with garbage collection; I'll implement the feature I did this for (locals) differently; (locals) will simply expand to a table mapping all variables captured in the current scope.
I took a week off from Twitter and that seems to have helped with the productivity. I forced myself to look at the code I have to refactor and realized that the Scheme-inspired language that I developed for game programming needed a few fixes in its semantics so it's fun again.
The first version was very closely built along the principles of Lisp and Scheme, and I made some trade-offs to be able to translate code to Lua, and it turns out a) I don't like (let) and (letrec) much, because they make variable declaration in functions semantically different from variable declaration in modules b) piggybacking variables on Lua's stack (with the drawback of not being able to use the full Lisp character set for variables) for speed wasn't really needed, because I write nearly all game code in the terra DSL, and so I can make scopes a little more Pythonic instead (supporting easier injection and extraction of variables) and c) The terra DSL part has become the heart of the implementation but runs on C-like semantics that are sufficiently different from the Scheme one that switching between both languages becomes jarring, so they need to meet in the middle.
I wrote a list of changes I want to do in the implementation (it's at https://gist.github.com/paniq/34525b55cd4ae8744961), then made a new translation unit to fit the specs. I'm very happy with how it turned out, it only needs 8 special forms, 6 constants and 5 builtin functions to bootstrap the rest of the language.
Conspire supports multiple translation units. The previous revision still works without any changes, so I don't have to port anything right away, and because transforming the AST is simple, it's going to be easy to translate some of the code I already have.
That should make it a lot less annoying to refactor the triangle mesh implementation, and I can think about more new language feature like support for SOA (struct of array).
It appears that many of my problems stem from the inability to rubberduck them with a colleague; When I was still working at a game studio, I could meet my colleagues in the hallway and bounce my ideas off them - someone I'd deem competent who would listen to my stream of thoughts and give it a seal of approval, giving me enough confidence to go down that road — or enlighten me with an alternative approach that leads to the same end.
Nowadays, I mostly work alone, and while Twitter has been helpful many many times in providing a rubberduck surrogate, it appears that often my problems are too complex to describe in few tweets in such a way that they invite others for collaboration or judgement. When everyone who is competent believes that I already know what I'm doing but does not tell me so, I feel like I'm on the wrong path. I get the sense that my friends have their doubts but keep them to themselves, and I procrastinate, sitting in a mental waiting room, waiting for my next shot of enthusiasm.
Therefore, in the hopes of catalyzing thought and confidence, I decided to start a devlog within a gist, which I will attempt to update each day. The devlog is to be seen as a precursor to a possible blog article; thoughts are less ordered and focused, documented approaches can be completely pointless in the end.
It is possible that some of the things written here might ultimately turn into a blog post; I usually begin planning one when enough people express their interest for a topic.
So, with that out of the way, what's going on? I'm currently in the process of writing a 2D prototype of the NOWHERE terrain and physics system as I imagine it to work; following the ideas of this paper, http://matthias-mueller-fischer.ch/publications/airMeshesPreprint.pdf the world will be completely embedded within a huge square 2D triangle mesh, tesselating both air and solid materials. The mesh will host collision detection, rigid and softbody physics, most likely using new position based dynamics techniques as documented in http://www.researchgate.net/profile/Jan_Bender/publication/274940214_Position-Based_Simulation_Methods_in_Computer_Graphics/links/552cc4a40cf29b22c9c466df.pdf and I also want to experiment with 2D light propagation and wind current simulation later on.
What I have so far is a 2D mesh class with support for generating the beginning quad that bounds the simulation space, splitting faces, splitting edges, flipping edges and an optimization pass that flips all edges where it improves the quality of the mesh.
There's also a testing and editing front-end that renders the mesh straight from a buffer texture using a GL geometry shader, allowing me to dynamically split faces and edges, flip edges and initiate an optimization pass — all this happens on the CPU right now and will continue to do so until a move to GPU is really necessary.
What is missing right now is the physics and collision system and the associated auto-mesh-optimization step. For this to happen, I believe I need a double-buffering scheme for the mesh. Currently, vertices and faces are stored in two separate pools, and I suppose those would then become four pools. Each pool would store the individual attributes in separate arrays (struct-of-array), so buffers could be flipped array-wise, not pool-wise; this also allows me to upload only the absolutely relevant data to the GPU.
The methods for the mesh would all have to support reading from a front- and writing to a backbuffer target which can be freely set.
For the opimization via edge flips, it appears I'll have to operate on the backbuffer completely, as operations update opposite half-edges (I call them “twins”), and the operation can't be parallelized. It probably can, but I can't think of or seem to find an efficient way to do this. So it will have to do for now.
When the double-buffering is implemented, I can probably start writing a primitive physics loop with some colliding bodies.