This post is Part I of a series on how smoλ handles resources. And I can already hear you sighing.
"Yet another take on memory safety. Aren't there enough out there?"
If you are a little bit familiar with programming language design, this is a probable first reaction. And, honestly, you would be right. But I believe there is a useful insight or two here. Not to mention that snippets like the next one are not going to explain themselves within smoλ's standard library. So I might as well give an explanation somewhere.
// from std/mem.device.s
smo ContiguousMemory (
nominal,
MemoryDevice,
u64 size,
Primitive,
ptr mem,
ptr underlying
)
-> @args
// from std/mem/arena.s
smo Arena(nominal type, ContiguousMemory contents)
@noborrow
length = 0
size = contents.size
with contents.Primitive:is(char)
---> type, contents, length, size
Don't feel guilty if your first reaction was to skip the above snippet. It's exactly what I would have done, too, because it makes little sense out-of-context. If you are more careful than me, you could also look more closely and decide that, with bloated data structures like these to move around, who needs chromium in every app munching on our memory? - we are going to consume it all ourselves first! And, wait, did I not say something about safety? Where's that?
I will address these concerns with a series of blog posts, starting from the very basics of how one can have a safe memory release model in this one. I will cover mutability and a fast memory system that can easily pivot between arenas and dynamic memory in next installments.
In particular, here I will explain some of the zero-cost design principles of smoλ that help delay resource cleanup until it's safe. Principles discussed apply to any resource, such as several kinds of memory management and files.
Runtypes - Before starting, a brief recap that smoλ is centered
around runtypes; functions whose returned values serve
as both tuples and type fields. For example, the following declares a nominal
type, so that if you write p=nominal:point2(x,y)
you can access fields p.x
and p.y
.
smo point2(nominal, f64 _x, f64 _y)
x = 2*_x
y = 2*_y
-> x,y // return statement
Nominal types - I shamelessly forced upon you a couple more concepts in the above explanation,
because they are pretty important. First is
nominal
, which is a value indicating a type
matched to its name instead of structure (two f64
values).
The language's typing is static, so nominal
indicators
are checked during compilation but removed and do not affect execution.
Currying - The language's currying symbol :
transfers the left-hand-sight as the first argument to the right. For example,
obj:fun
means fun(obj)
and obj:fun(arg1,arg2)
means fun(obj,arg1,arg2)
,
and so on.
Smo vs services - In general, there are two ways to declare runtypes; as smo
declarations,
like above, or as service
s. The main difference is that the former are inlined,
whereas services are implemented as co-routines and give the opportunity of error handling when errors occur inside - to be addressed in
another post.
A nice feature of smoλ is that its default runtype manipulation mechanism does not use heap memory. Instead,
registers and the stack store fields as local variables. For example, the previous point2
runtype is simply stored into underlying variables p__x.p__y
.
goto
as
its intermediate representation. Mostly because I am uncomfortable around the LLVM stack's bloat. So these variable
optimizations are deliberately delegated to modern C compilers (by default: gcc), which are very adept at them.
What is C, after all, if not portable assembly? Jokes aside, LLVM is an engineering wonder and I am being stuborn here, but this
would be the least of my moral failings.
As an example, if the value p.y
is never used, p__y
will be eliminated by the compilation process completely. Even if used elsewhere, inlining eventually means that we
do not need to load p.y
from memory because if it is already in the cache. Of course,
cache locality is still key. But our implementation is equipped with the tools to allow compilation to actually optimize for that.
The end result: we are free to keep "metadata" variables
that help us track resource properties, such as ptr underlying
to track the common allocated memory, and u64 size
to track the current accessible size.
The costs of those variables are either eliminated by removing dead computations or would have been payed either way.
For example, z=point2(point2(x,y).x,y) print(z.x+z.y)
has the same compilation
outcome as the minimal required operations print(2*2*x+2*y)
.
As a final note, optimizations do not occur for service arguments, since the latter are proper functions. But the cost there
is negligible compared to co-routine switching (which is already reasonably lightweight but incures costs due to mutexes).
Importantly, only one variable -typically of a pointer builtin type ptr
- is sufficient
to hold attached resources.
For example, generic strings are declared as follows, where memory
represents the memory resource
they have been attached to, and first
the first character to reduce cache misses when comparing strings by comparing
that instead of the contents
.
// from std/builtins/str.s
smo str (
nominal, // no storage or runtime cost
ptr contents,
u64 length,
char first,
ptr memory
)
-> @args
Concurency aside, the main difference between smo
and service
is that resources like memory are gathered
by the compiler and their release code is inserted at the end of service calls. Those working in domains
where delays are critical are now aware of exactly when systems are going to do work. But we also
have the benefit that, upon failures like invalid file reads,
we can safely terminate the service without leaking resources.
Error handling is also not for now, though.
In the simplest case, there would only be the main
service and
all releases would occur at its end - just before program termination.
By the way, releasing all resources related to a variable can also be done manually
with @release var
. In this case, smoλ will complain if you try
to use/leak a resource later.
Something important to consider is what happens to returned data. For those, we just delay freeing until they are not used anymore - as if they were allocated by the calling service. The mechanism for doing so consists of detecting resource frees at compile time and delegating them to the caller. In short: if A calls B, the act of releasing resources on which B's return depends is delegated to A.
Practical example - Below is an example using the language's built-in buffer
system, where buffers are created by placing []
after a type.
Default buffers reside on the
heap to account for their dynamic nature, though this can be adjusted for systems with different memory models
by adding new runtimes in the std/runtimes/ folder.
Fast implementations from the standard library for strings and vectors use other memory constructs
described next.
@include std.builtins // basic arithmetics, etc
service samples()
buf = u64[]
:push(42)
:push(10)
-> buf // return statement
service main()
buf = samples()
print(buf[0]) // prints 42
-- // end block, buffer is deallocated here
Formalism - A particular family of programming fans may be ready to point out that I follow linear logic for ownership (hello there, Rust enthusiasts!). This is not entirely true, however, because I actually use linear time logic (LTL); the statement I am making is that "eventually allocated memory is deallocated and at the end of a services after which it is not referenced anymore". Ahem! Trying very hard to not spam notation here. For reals. But I promise to release a white paper for this... eventually. Just notice that the nature of this promise completely eliminates use-after-free. Though it obviously comes at the cost for keeping some memory alive for longer.
Safe arenas - The same principle is in action for other types of resources, such as arenas. I am not going ito cover the memory model implemented by the standard library in this post, so it suffices to know that arenas (per their common definition) are pre-allocated chunks of memory where new allocations are as simple as incrementing pointers. The downside is that their size should be fixed beforehand.
Without going into syntax details, in smoλ one can open an arena context
per on Heap:arena(size)
. Memory contexts
like this essentially provide an allocation mechanism for string concatenations and vector numeric operators.
You can also replace the heap with Stack
if you plan to remain within a service's call
stack. But the compiler will create an error if you would try to return stack memory from a service.
Or you could use on Heap:dynamic
to allow
dynamic allocations that are all freed up together.
The example below uses an arena for string concatenations. Do note that the language
has three types of strings: cstr
corresponding to raw text enclosed in quotations,
like "there"
,
null terminated strings nstr
, which are normally the outcome of concatenation,
and str
that are either null-terminated or a zero-cost substring view.
Each of those types is convertible to subsequent ones through zero-cost
abstractions. The reason I am mentioning string types is to explain
why we need to explicitly convert "there"
to str
in the code.
@include std.builtins
@include std.mem
service greet(str name)
on Heap:arena(1024) // preallocate 1kB on the heap
greeting = "Hi "+name+"!"
----> greeting
service main()
greeting = greet("there":str)
print(greeting)
-- // end block, arena is deallocated here
Type details out of the way, we allocated a disproportionately large arena for string concatenation.
Then, this arena's freeing operations are attached to its allocated pointer, with string results keeping track of it.
Hence, when the greeting
string is returned, it is also accompanied by the whole arena and its deallocated code.
Of course "returning" the arena has no execution overhead other than moving its pointer value.
Automatic releases are similar to declaring destructors, but deallocation code is identified statically. Furthermore, despite the slightly bloated memory consumption in this case, speed gains from arenas (or fixed-size buffers) are what people often refer to when they mention the term "blazingly fast"; you get the benefits of minimized memory fragmentation, cache locality, and near-instantaneous allocation.
Smoλ aims to automate resource allocation and deallocation for higher level data types. Still, you might be curious what interfaces the language provides to allow the addition of new resource types. The way to transfer freeing code is by attaching it to underlying variables and moving it alongside those variables' values during compilation. I repeat, because it is worth repeating: we track memory VALUES during compilation, not variables. Because we need to reason about the status of memory contents.
Below is an example of how resource acquisition looks like; this is how
Heap
memory can allocate a contiguous memory segment, for example to be used by arenas. Allocation
code is different for different types of memory; for instance, arenas themselves implement
allocate
based on pointer additions.
The code below is a bit of a mess
because it interweaves raw C. Any data that stores
mem
's value retains a link to the freeing code, and there is an additional dependency
exploration during returns to have only one of those (either mem
or -preferred if it exists- a
returned value) own the freeing code. Note that @unsafe
is needed
to convey to the author of this file -me- is who you should be trusting instead of the language.
Now, I would not trust me too much, but this is another story...
// From std/mem/device.s, comments only here.
// File-level @about is needed whenever @unsafe
// is declared to get a sense of why one could
// trust this unsafe file. The compiler can
// summarize these sources of presumed trust.
@unsafe
@about "Standard library implementation of memory management ..."
smo allocate(Heap, u64 size, Primitive)
// Usually optimized away if, for example
// one called Heap:allocate(1024, char)
if size==0
-> fail("Cannot allocate zero size")
// C header
@head{#include <stdlib.h>}
// A hack to declare a local variable with the
// appropriate builtin type (among char,f64,u64,
// i64) that the C code in @body below can see.
// Optimized away.
Primitive = Primitive
// Direct C code here.
@body{ptr mem=__runtime_alloc(size*sizeof(Primitive));} // malloc usually
// Allocation safety - just fail services whenever.
if mem:bool:not
-> fail("Failed a Heap allocation")
// Attach freeing C code to the mem ptr address.
// That is managed by the compiler and is curated to run correctly
// even if the code above fails triggering the release (unitilized
// variables are always set to zero).
@finally mem {
if(mem)
__runtime_free(mem); // free usually
mem=0;
}
-> nominal:ContiguousMemory(
Heap, // track type as zero-valued object (optimized away later)
size, // size of region within underlying memory
Primitive, // track type as zero-valued object (optimized away later)
mem, // region start within underlying memory
mem // underlying memory
)
Notice that there is a ton of information injected with the indent of tracking it during compilation but
to be optimized away later. The resulting code may end up being an malloc
, allocation test,
and eventual free
. But the richness of all intermediate information lets us do nice things
during compilation, such as track where the memory should be released and the primitive alignment (64-bit
for numbers or 8-bit for char primitives).
This post is part of the smoλ language's material. For more resources, check out these links: