A while back ago we had a demo where we were running CloudFlare's workerd as a unikernel. To elaborate, that means it was a unikernel running as a virtual machine that was spinning up wasm payloads in isolates.
However, when we first started looking at this we ran into an issue with something we didn't support yet. This was causing an abort signal and looked like this:
2 direct return: 2, rsp 0xfff69708
2 write
workerd/util/symbolizer.c++:96: warning: Not symbolizing stack traces
because $LLVM_SYMBOLIZER is not set. To symbolize stack traces, set
$LLVM_SYMBOLIZER to the location of the llvm-symbolizer binary. When
running tests under bazel, use `--test
_env=LLVM_SYMBOLIZER=`.
2 direct return: 275, rsp 0xfff695f8
2 write libc++abi: 2 direct return: 11, rsp 0xfff6b1e8
2 write
terminating due to uncaught exception of type kj::ExceptionImpl:
kj/exception.c++:634: failed: sigaltstack(&stack, nullptr): Bad 2 direct return: 128, rsp 0xfff6ab68
2 write address 2 direct return: 8, rsp 0xfff6b0d8
2 write
2 direct return: 1, rsp 0xfff6b228
2 rt_sigprocmask
2 direct return: 0, rsp 0xfff6b200
2 gettid
2 direct return: 2, rsp 0xfff6b1b0
2 getpid
2 direct return: 2, rsp 0xfff6b1a8
2 tgkill
2 thread_attempt_interrupt: tid 2
2 uninterruptible or already running
2 direct return: 0, rsp 0xfff6b1b0
2 signal 6 received, errno 0, code -6
2 default action
*** signal 6 received by tid 2, errno 0, code -6
*** Thread context:
lastvector: 00000000000000ea
frame: ffffc00002a01000
type: thread
active_cpu: 00000000ffffffff
stack top: 0000000000000000
rax: 0000000000000000
rbx: 0000000000000002
rcx: 0000007821cad517
rdx: 0000000000000006
rsi: 0000000000000002
rdi: 0000000000000002
rbp: 0000000000000006
rsp: 00000000fff6b1b0
r8: 0000000000000073
r9: 0000000002c060d0
r10: 0000000000000008
r11: 0000000000000246
r12: 0000164abfc06530
r13: 0000000000000016
r14: 00000000007ca7ea
r15: 0000164abfc0e000
rip: 0000007821cad52b
rflags: 0000000000000246
ss: 000000000000002a
cs: 0000000000000023
ds: 0000000000000000
es: 0000000000000000
fsbase: 0000003b4f680900
gsbase: 0000000000000000
frame trace:
loaded klibs:
stack trace:
00000000fff6b1b0: 00000000fff6b290
00000000fff6b1b8: 83081b4d4181c000
00000000fff6b1c0: 0000000000000006
00000000fff6b1c8: 0000003b4f680900
00000000fff6b1d0: 0000164abfc06530
00000000fff6b1d8: 0000164abfc0c000
00000000fff6b1e0: 00000000007ca7ea
00000000fff6b1e8: 0000007821c583b6
00000000fff6b1f0: 0000007821e13e90
00000000fff6b1f8: 0000007821c3e87c
00000000fff6b200: 0000000000000020
00000000fff6b208: 6563786520746867
00000000fff6b210: 666f206e6f697470
00000000fff6b218: 6a6b206570797420
00000000fff6b220: 7470656378453a3a
00000000fff6b228: 0000007821ca14dd
00000000fff6b230: 656378652f6a6b20
00000000fff6b238: 0000007821e136a0
00000000fff6b240: 0000000000000001
00000000fff6b248: 0000007821e13723
00000000fff6b250: 0000000000000a68
00000000fff6b258: 0000007821ca2f51
00000000fff6b260: 706c6c756e202c6b
00000000fff6b268: 0000007821e136a0
00000000fff6b270: 000000000000000a
00000000fff6b278: 0000164abfc06530
00000000fff6b280: 0000164abfc0c000
00000000fff6b288: 00000000007ca7ea
00000000fff6b290: 0000164abfc0e000
00000000fff6b298: 83081b4d4181c000
00000000fff6b2a0: 0000007821e13860
00000000fff6b2a8: 0000007821e13860
2 core dump
core dump
In the process of getting it to run we had to add new support for the GROWS_DOWN flag on mmap.
If you look you'll see that Kenton's kj library was calling sigaltstack and that is a clue that we should look more to see why it's failing there:
kj::ExceptionImpl: kj/exception.c++:634: failed: sigaltstack(&stack, nullptr): Bad 2 direct return: 128, rsp 0xfff6ab68
KJ is billed as modern c++'s missing std base library. It is bundled in cap'n proto and used in sandstorm and cloudflare workers. Sandstorm itself shared some similar philosophical views on software as nanos does. I was curious where the name came from and they mention it in their faq -- apparently the keys 'k' and 'j' are together for both qwerty and dvorak layout.
Sigaltstack is used here to setup an alternate signal stack so that things like stack overflows can be handled.
Right before this is called, mmap is called with the GROWSDOWN flag set and that is something we had no support for.
stack.ss_size = 65536;
// Note: ss_sp is char* on FreeBSD, void* on Linux and OSX.
stack.ss_sp = reinterpret_cast(mmap(
nullptr, stack.ss_size, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_GROWSDOWN, -1, 0));
KJ_SYSCALL(sigaltstack(&stack, nullptr));
The stack pointer was chosen to run "downhill" (with the stack advancing toward lower memory) to simplify indexing into the stack from the user's program (positive indexing) and to simplify displaying the contents of the stack from a front panel. - Intel Microprocessors: 8008 to 8086, October, 1980 / IEEE
How Is Memory Laid Out?
Let's pause there and review our mental model of how memory is laid out on your machine.
Typically a stack grows down and the heap grows up.
That is the stack grows from higher memory addresses to lower memory addresses.
In the case of KJ's exception handling they create a new stack and then use mmap to have it extend downwards in memory.
It looks like this:
---------- | stack | | | | | \|/ | | | | /|\ | | | | | heap | | data | | code | ----------
To see the direction of growth more cleanly we can look at a few examples:
#include <stdio.h>
void fun2() {
int c;
printf("Address: %p\n", (void *)&c);
}
void fun1() {
int b;
printf("Address: %p\n", (void *)&b);
fun2();
}
int main() {
int a;
printf("Address: %p\n", (void *)&a);
fun1();
return 0;
}
A common misunderstanding that some people will have is that they assume you can just declare a few variables on the stack in one function, however, we show multiple functions in this example because the compiler might choose to re-order the variables within the same function. Now we show the heap:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *a;
int *b;
a = (int*)malloc(sizeof(int));
b = (int*)malloc(sizeof(int));
printf("Address: %p\n", (void *)&a);
printf("Address: %p\n", (void *)&b);
}
For most purposes you can assume the directions to be true, however, in some architectures the stack grows up and in others such as SPARC you have windowed registers and becomes dealers choice.
Back to GROWS_DOWN
We can create a mapping and flag it with GROWS_DOWN. Then we can attach it with pthread_attr_setstack so that we can utilize it as a stack.
When we set the GROWS_DOWN flag on a mapping the mapping can be expanded downwards up to the stack size limit by "touching" addresses lower than the base of the mapping. This requires that a guard gap is kept between this mapping and any adjacent mappings.
To see this in action we can look at this chunk of code I appropriated from our test suite:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/resource.h>
#include <sys/mman.h>
#define PAGESIZE 4096
#define test_error(msg, ...) do { \
fprintf(stderr, "Error at %s:%d: " msg "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
exit(EXIT_FAILURE); \
} while (0)
#define test_perror(msg, ...) do { \
fprintf(stderr, "Error at %s:%d: " msg ": " , __FILE__, __LINE__, ##__VA_ARGS__); \
perror(NULL); \
exit(EXIT_FAILURE); \
} while (0)
static void *mmap_growsdown_thread(void *stack_limit)
{
if (__builtin_frame_address(0) >= stack_limit + PAGESIZE / 2) {
printf("ret addr: %p\n", __builtin_frame_address(0));
mmap_growsdown_thread(stack_limit);
}
return NULL;
}
static void mmap_growsdown(void)
{
const size_t guard_gap = 256 * PAGESIZE;
struct rlimit stack_limit;
if (getrlimit(RLIMIT_STACK, &stack_limit) < 0)
test_perror("getrlimit");
size_t map_len = guard_gap + stack_limit.rlim_cur;
void *addr = mmap(NULL, map_len, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_GROWSDOWN, -1, 0);
if (addr == MAP_FAILED)
test_perror("mmap");
const size_t initial_stack_size =
#ifdef PTHREAD_STACK_MIN
PTHREAD_STACK_MIN;
#else
4 * PAGESIZE;
#endif
munmap(addr, map_len - initial_stack_size);
void *stack = addr + map_len - initial_stack_size;
pthread_attr_t attr;
pthread_attr_init(&attr);
if (pthread_attr_setstack(&attr, stack, initial_stack_size))
test_error("pthread_attr_setstack");
pthread_t thread;
if (pthread_create(&thread, &attr, mmap_growsdown_thread, addr + guard_gap ))
test_error("pthread_create");
pthread_attr_destroy(&attr);
pthread_join(thread, NULL);
if (munmap(addr, map_len) < 0)
test_perror("munmap");
}
int main() {
mmap_growsdown();
}
__builtin_frame_address is the return address of the function frame. When we run this code we recursively call our mmap_growsdown_thread and can see our address decreasing downwards.
You can keep calling this until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the guard page will result in a SIGSEGV signal.
This is all to say - the next time you are trying to do a fancy demo showcasing a virtual machine running as a unikernel that is spinning up wasm payloads in isolates - just remember - the stack grows down.