FAAS in Go with WASM, WASI and Rust
https://eli.thegreenplace.net/2023/faas-in-go-with-wasm-wasi-and-rust/ [eli.thegreenplace.net]
2023-05-11 21:07
tags:
go
programming
rust
wasm
web
This post is best described as a technology demonstration; it melds together web servers, plugins, WebAssembly, Go, Rust and ABIs. Here’s what it shows:
How to load WASM code with WASI in a Go environment and hook it up to a web server.
How to implement web server plugins in any language that can be compiled to WASM.
How to translate Go programs into WASM that uses WASI.
How to translate Rust programs into WASM that uses WASI.
How to write WAT (WebAssembly Text) code that uses WASI to interact with a non-JS environment.
source: L
Beautiful Branchless Binary Search
https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/ [probablydance.com]
2023-04-28 23:45
tags:
compsci
cxx
programming
I read a blog post by Alex Muscar, “Beautiful Binary Search in D“. It describes a binary search called “Shar’s algorithm”. I’d never heard of it and it’s impossible to google, but looking at the algorithm I couldn’t help but think “this is branchless.” And who knew that there could be a branchless binary search? So I did the work to translate it into a algorithm for C++ iterators, no longer requiring one-based indexing or fixed-size arrays.
https://muscar.eu/shar-binary-search-meta.html
source: L
Dumb bugs: the PCI device that wasn't
https://sthbrx.github.io/blog/2023/04/04/dumb-bugs-the-pci-device-that-wasnt/ [sthbrx.github.io]
2023-04-05 18:21
tags:
bugfix
c
linux
programming
So pci_notify() gets called with our VIO device (somehow), and we’re converting that struct device into a struct pci_dev with no error checking. We could solve this particular bug by just checking that our device is actually a PCI device before we proceed - but we’re in a function called pci_notify, we’re expecting a PCI device to come in, so this would just be a bandaid.
source: L
A fork() in the road
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf [www.microsoft.com]
2023-03-25 04:02
tags:
malloc
paper
pdf
programming
systems
unix
The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.
source: L
A world to win - WebAssembly for the rest of us
https://www.wingolog.org/archives/2023/03/20/a-world-to-win-webassembly-for-the-rest-of-us [www.wingolog.org]
2023-03-20 22:09
tags:
functional
garbage-collection
lisp
programming
transcript
wasm
As it turns out, there is a reason that there is no good Scheme implementation on WebAssembly: the initial version of WebAssembly is a terrible target if your language relies on the presence of a garbage collector. There have been some advances but this observation still applies to the current standardized and deployed versions of WebAssembly. To better understand this issue, let’s dig into the guts of the system to see what the limitations are.
source: HN
Paving the Road to Vulkan on Asahi Linux
https://asahilinux.org/2023/03/road-to-vulkan/ [asahilinux.org]
2023-03-20 18:25
tags:
concurrency
gl
graphics
linux
programming
systems
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
source: HN
Discovering one bug after another in the UTF-8 decoding logic in OpenBSD, then going on to fix other aspects of related code.
https://research.exoticsilicon.com/articles/unbreaking_utf8_on_the_console [research.exoticsilicon.com]
2023-03-10 20:32
tags:
bugfix
investigation
openbsd
programming
text
tty
Still, the debugging process we went through here to discover the cause of the problems in the first place is worth sharing from the beginning, as the code in question was particularly bad with plenty of textbook mistakes. Who knows what you might find in your own investigations elsewhere.
Email: https://marc.info/?l=openbsd-tech&m=167734639712745&w=2
source: L
The futex_waitv() syscall and gaming on Linux
https://www.collabora.com/news-and-blog/blog/2023/02/17/the-futex-waitv-syscall-gaming-on-linux/ [www.collabora.com]
2023-02-17 23:48
tags:
concurrency
gaming
linux
perf
programming
systems
The futex_waitv syscall is a new syscall through which the process can wait for multiple futexes. The task wakes up when any futex in the list is awakened. This can be used to implement wait on multiple locks and wait lists, etc, without the limitations imposed by using eventfd.
source: L
double-free vulnerability in OpenSSH server 9.1 (CVE-2023-25136)
https://marc.info/?l=oss-security&m=167628974320957&w=2 [marc.info]
2023-02-16 20:18
tags:
exploit
malloc
openbsd
programming
security
Exploiting this vulnerability will not be easy: modern memory allocators provide protections against double frees, and the impacted sshd process is unprivileged and heavily sandboxed.
Quick update: we were able to gain arbitrary control of the “rip” register through this bug (i.e., we can jump wherever we want in sshd’s address space) on an unpatched installation of OpenBSD 7.2 (which runs OpenSSH 9.1 by default). This is by no means the end of the story: this was only step 1, bypass the malloc and double-free protections.
source: L
Do Not Taunt Happy Fun Branch Predictor
https://www.mattkeeter.com/blog/2023-01-25-branch/ [www.mattkeeter.com]
2023-01-25 20:09
tags:
cpu
perf
programming
I recently came up with a “clever” idea to eliminate one jump from an inner loop, and was surprised to find that it slowed things down. Allow me to explain my terrible error, so that you don’t fall victim in the future.
An instruction oddity in the ppc64 (PowerPC 64-bit) architecture
https://utcc.utoronto.ca/~cks/space/blog/tech/PowerPCInstructionOddity [utcc.utoronto.ca]
2023-01-21 19:45
tags:
bugfix
compiler
cpu
programming
turtles
As Raymond Chen notes, ‘or rd, ra, ra’ has the effect of ‘move ra to rd’. Moving a register to itself is a NOP, but several Power versions (the Go code’s comment says Power8, 9, and 10) overload this particular version of a NOP (and some others) to signal that the priority of your hardware thread should be changed by the CPU; in the specific case of ‘or r1, r1, r1’ it drops you to low priority. That leaves us with the mystery of why such an instruction would be used by a compiler, instead of the official NOP (per Raymond Chen, this is ‘or r0, r0, 0’).
As covered in the specific ppc64 diff in the change that introduced this issue, Go wanted to artificially mark a particular runtime function this way (see CL 425396 and Go issue #54332 for more). To do this it needed to touch the stack pointer in a harmless way, which would trigger the toolchain’s weirdness detector. On ppc64, the stack pointer is in r1. So the obvious and natural thing to do is to move r1 to itself, which encodes as ‘or r1, r1, r1’, and which then triggers this special architectural behavior of lowering the priority of that hardware thread. Oops.
https://devblogs.microsoft.com/oldnewthing/20180809-00/?p=99455
https://github.com/golang/go/issues/54332
Pointer compression in Oilpan
https://v8.dev/blog/oilpan-pointer-compression [v8.dev]
2022-11-30 03:00
tags:
cxx
malloc
programming
None of this is completely new though, which is why we launched pointer compression for V8 in 2020 and saw great improvements in memory across the web. With the Oilpan library we have another building block of the web under control. Oilpan is a traced-based garbage collector for C++ which is among other things used to host the Document Object Model in Blink and thus an interesting target for optimizing memory.
source: HN
Building the fastest Lua interpreter.. automatically!
https://sillycross.github.io/2022/11/22/2022-11-22/ [sillycross.github.io]
2022-11-22 23:10
tags:
compiler
jit
lua
perf
programming
I have been working on a research project to make writing VMs easier. The idea arises from the following observation: writing a naive interpreter is not hard (just write a big switch-case), but writing a good interpreter (or JIT compiler) is hard, as it unavoidably involves hand-coding assembly. So why can’t we implement a special compiler to automatically generate a high-performance interpreter (and even the JIT) from “the big switch-case”, or more formally, a semantical description of what each bytecode does?
source: HN
How to Make Rust Leak Memory (Also: How to Make It Stop)
https://fly.io/blog/rust-memory-leak/ [fly.io]
2022-06-16 18:40
tags:
bugfix
investigation
malloc
programming
rust
Of course you can leak memory, even in Rust. For even medium-sized long-running applications, lots of graphs from a good memory profiler can make life better. And they’ll probably help you find the memory leak too.
CVE-2022-23088: Exploiting A Heap Overflow In The Freebsd Wi-Fi Stack
https://www.zerodayinitiative.com/blog/2022/6/15/cve-2022-23088-exploiting-a-heap-overflow-in-the-freebsd-wi-fi-stack [www.zerodayinitiative.com]
2022-06-16 18:38
tags:
exploit
freebsd
programming
security
wifi
In April of this year, FreeBSD patched a 13-year-old heap overflow in the Wi-Fi stack that could allow network-adjacent attackers to execute arbitrary code on affected installations of FreeBSD Kernel. This bug was originally reported to the ZDI program by a researcher known as m00nbsd and patched in April 2022 as FreeBSD-SA-22:07.wifi_meshid. The researcher has graciously provided this detailed write-up of the vulnerability and a proof-of-concept exploit demonstrating the bug.
source: L
How fast are Linux pipes anyway?
https://mazzo.li/posts/fast-pipes.html [mazzo.li]
2022-06-02 22:56
tags:
concurrency
linux
malloc
perf
programming
systems
In this post, we will explore how Unix pipes are implemented in Linux by iteratively optimizing a test program that writes and reads data through a pipe.
We will proceed as follows:
A first slow version of our pipe test bench;
How pipes are implemented internally, and why writing and reading from them is slow;
How the vmsplice and splice syscalls let us get around some (but not all!) of the slowness;
A description of Linux paging, leading up to a faster version using huge pages;
The final optimization, replacing polling with busy looping;
Some closing thoughts.
source: L
All About Libpas, Phil's Super Fast Malloc
https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/libpas/Documentation.md [github.com]
2022-06-01 21:43
tags:
c
malloc
perf
programming
Libpas is a fast and memory-efficient memory allocation toolkit capable of supporting many heaps at once, engineered with the hopes that someday it’ll be used for comprehensive isoheaping of all malloc/new callsites in C/C++ programs.
source: HN
Faster CRC32 on the Apple M1
https://dougallj.wordpress.com/2022/05/22/faster-crc32-on-the-apple-m1/ [dougallj.wordpress.com]
2022-05-22 19:25
tags:
cpu
hash
perf
programming
CRC32 is a checksum first proposed in 1961, and now used in a wide variety of performance sensitive contexts, from file formats (zip, png, gzip) to filesystems (ext4, btrfs) and protocols (like ethernet and SATA). So, naturally, a lot of effort has gone into optimising it over the years. However, I discovered a simple update to a widely used technique that makes it possible to run twice as fast as existing solutions on the Apple M1.
source: HN
Lotus 1-2-3 For Linux
https://lock.cmpxchg8b.com/linux123.html [lock.cmpxchg8b.com]
2022-05-21 21:51
tags:
development
investigation
linux
programming
retro
unix
I’ll cut to the chase; through a combination of unlikely discoveries, crazy hacks and the 90s BBS warez scene I’ve been able to port Lotus 1-2-3 natively to Linux – an operating system that literally didn’t exist when 1-2-3 was released!
source: L
Logging C Functions
https://justine.lol/ftrace/ [justine.lol]
2022-05-20 17:01
tags:
c
investigation
programming
systems
The Cosmopolitan Libc _start() function starts by intercepting the --ftrace flag. If it exists, then it opens and sorts of the symbol table from the elf binary. Then it changes the protection of memory so it’s able to iterate over the program’s memory to look for nop instructions it can mutate. Those NOPs were inserted by GCC. It’s easy to self-modify them in memory, since they have the same byte length as the CALL instruction. Think of it like a mini linker. It just relinks the profiling nops. Once they’ve been rewritten, functions will start calling ftrace_hook() which is an assembly function that saves the CPU state to the stack. That means ftrace kind of acts like an operating system kernel. Once the assembly saved the CPU it can call the ftracer() C code that acquires a reentrant mutex and unwinds the RBP backtrace pointer (via __builtin_frame_address(0)) to determine the address of the function that called it. Once it has the address of the function, it passes it along to kprintf() which has a special %t syntax for turning numbers into symbols.
source: HN