How I helped fix sleep-wake hangs on Linux with AMD GPUs
https://nyanpasu64.gitlab.io/blog/amdgpu-sleep-wake-hang/ [nyanpasu64.gitlab.io]
2025-01-03 09:52
tags:
bugfix
investigation
linux
malloc
programming
systems
Through some digging, I found that when a desktop enters S3 sleep, the system cuts power to PCIe GPUs, causing their VRAM chips to lose data. To preserve this data, GPU drivers copy VRAM in use to system RAM before the system sleeps, then restore it after the system wakes. However the Linux amdgpu driver has a bug where, if there is not enough free RAM to store all VRAM in use, the system will run out of memory and crash, instead of moving RAM to disk-based swap.
source: L
GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production
https://arxiv.org/abs/2311.09394v2 [arxiv.org]
2024-04-19 20:11
tags:
c
development
fuzzing
malloc
paper
pdf
programming
systems
Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms.
This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an “if” statement to a 36-year-old idea and made it work at scale.
Flipping Pages: An analysis of a new Linux vulnerability in nf_tables and hardened exploitation techniques
https://pwning.tech/nftables/ [pwning.tech]
2024-03-26 23:33
tags:
best
cpu
exploit
linux
malloc
paper
programming
security
systems
In this blogpost I present several novel techniques I used to exploit a 0-day double-free bug in hardened Linux kernels (i.e. KernelCTF mitigation instances) with 93%-99% success rate. The underlying bug is input sanitization failure of netfilter verdicts. Hence, the requirements for the exploit are that nf_tables is enabled and unprivileged user namespaces are enabled. The exploit is data-only and performs an kernel-space mirroring attack (KSMA) from userland with the novel Dirty Pagedirectory technique (pagetable confusion), where it is able to link any physical address (and its permissions) to virtual memory addresses by performing just read/writes to userland addresses.
Also: https://github.com/Notselwyn/CVE-2024-1086
source: HN
Gaining kernel code execution on an MTE-enabled Pixel 8
https://github.blog/2024-03-18-gaining-kernel-code-execution-on-an-mte-enabled-pixel-8/ [github.blog]
2024-03-20 07:36
tags:
android
exploit
malloc
security
systems
In this post, I’ll look at CVE-2023-6241, a vulnerability in the Arm Mali GPU that I reported to Arm on November 15, 2023 and was fixed in the Arm Mali driver version r47p0, which was released publicly on December 14, 2023. It was fixed in Android in the March security update. When exploited, this vulnerability allows a malicious Android app to gain arbitrary kernel code execution and root on the device. The vulnerability affects devices with newer Arm Mali GPUs that use the Command Stream Frontend (CSF) feature, such as Google’s Pixel 7 and Pixel 8 phones. What is interesting about this vulnerability is that it is a logic bug in the memory management unit of the Arm Mali GPU and it is capable of bypassing Memory Tagging Extension (MTE), a new and powerful mitigation against memory corruption that was first supported in Pixel 8. In this post, I’ll show how to use this bug to gain arbitrary kernel code execution in the Pixel 8 from an untrusted user application. I have confirmed that the exploit works successfully even with kernel MTE enabled by following these instructions.
source: HN
The case of the application that used thread local storage it never allocated
https://devblogs.microsoft.com/oldnewthing/20221128-00/?p=107456 [devblogs.microsoft.com]
2024-03-15 22:42
tags:
bugfix
concurrency
development
malloc
programming
windows
Upon closer inspection, the real problem was not that the application’s TLS was being corrupted. The problem was that the application was using TLS slots it never allocated, so it was inadvertently using somebody else’s TLS slots as its own. And of course, when the true owner updated the TLS value, the application interpreted that as corruption.
Identifying Rust's collect::<Vec<_>>() memory leak footgun
https://blog.polybdenum.com/2024/01/17/identifying-the-collect-vec-memory-leak-footgun.html [blog.polybdenum.com]
2024-01-18 17:32
tags:
malloc
programming
rust
turtles
This is the story of how I identified the bug. (TLDR: collect::<Vec<_>>() will sometimes reuse allocations, resulting in Vecs with large excess capacity, even when the length is exactly known in advance, so you need to call shrink_to_fit if you want to free the extra memory.)
Ordinarily, that wouldn’t have been a problem, since the into_iter().map().collect() line used to pack them into (u32, u32)s would allocate a new vector with only the exact amount of space required. However, thanks to the allocation reuse optimization added in Rust 1.76, the new vec shared the backing store of the input vec, and hence had a capacity of 16560, meaning it was using 132480 bytes of memory to store only 16 bytes of data.
source: HN
Arena allocator tips and tricks
https://nullprogram.com/blog/2023/09/27/ [nullprogram.com]
2023-10-01 18:51
tags:
c
development
hash
malloc
programming
Over the past year I’ve refined my approach to arena allocation. With practice, it’s effective, simple, and fast; typically as easy to use as garbage collection but without the costs. Depending on need, an allocator can weigh just 7–25 lines of code — perfect when lacking a runtime. With the core details of my own technique settled, now is a good time to document and share lessons learned. This is certainly not the only way to approach arena allocation, but these are practices I’ve worked out to simplify programs and reduce mistakes.
See also: https://nullprogram.com/blog/2023/09/30/
An easy-to-implement, arena-friendly hash map
source: L
When LIMIT 9 works but LIMIT 10 hangs
https://neon.tech/blog/when-limit-9-works-but-limit-10-hangs [neon.tech]
2023-05-31 18:06
tags:
bugfix
javascript
malloc
programming
So then bytes 3 and 4 should be that 16-bit payload length — and this is where things fall apart. The ws message says we have 126 bytes (00000000 01111110) of payload. That sounds plausible. The undici message says we have 25,888 bytes (01100101 00100000) of payload … in a 222 byte packet? Yeah: this one is fishy.
source: HN
Synthetic Memory Protections - An update on ROP mitigations
https://www.openbsd.org/papers/csw2023.pdf [www.openbsd.org]
2023-03-25 19:35
tags:
cpu
defense
malloc
openbsd
pdf
security
slides
systems
ROP methods have become increasingly sophisticated
But we can identify system behaviours which only ROP code requires
We can contrast this to what Regular Control Flow code needs
And then, find behaviours to block
source: HN
A fork() in the road
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf [www.microsoft.com]
2023-03-25 04:02
tags:
malloc
paper
pdf
programming
systems
unix
The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.
source: L
double-free vulnerability in OpenSSH server 9.1 (CVE-2023-25136)
https://marc.info/?l=oss-security&m=167628974320957&w=2 [marc.info]
2023-02-16 20:18
tags:
exploit
malloc
openbsd
programming
security
Exploiting this vulnerability will not be easy: modern memory allocators provide protections against double frees, and the impacted sshd process is unprivileged and heavily sandboxed.
Quick update: we were able to gain arbitrary control of the “rip” register through this bug (i.e., we can jump wherever we want in sshd’s address space) on an unpatched installation of OpenBSD 7.2 (which runs OpenSSH 9.1 by default). This is by no means the end of the story: this was only step 1, bypass the malloc and double-free protections.
source: L
Pointer compression in Oilpan
https://v8.dev/blog/oilpan-pointer-compression [v8.dev]
2022-11-30 03:00
tags:
cxx
malloc
programming
None of this is completely new though, which is why we launched pointer compression for V8 in 2020 and saw great improvements in memory across the web. With the Oilpan library we have another building block of the web under control. Oilpan is a traced-based garbage collector for C++ which is among other things used to host the Document Object Model in Blink and thus an interesting target for optimizing memory.
source: HN
How to Make Rust Leak Memory (Also: How to Make It Stop)
https://fly.io/blog/rust-memory-leak/ [fly.io]
2022-06-16 18:40
tags:
bugfix
investigation
malloc
programming
rust
Of course you can leak memory, even in Rust. For even medium-sized long-running applications, lots of graphs from a good memory profiler can make life better. And they’ll probably help you find the memory leak too.
How fast are Linux pipes anyway?
https://mazzo.li/posts/fast-pipes.html [mazzo.li]
2022-06-02 22:56
tags:
concurrency
linux
malloc
perf
programming
systems
In this post, we will explore how Unix pipes are implemented in Linux by iteratively optimizing a test program that writes and reads data through a pipe.
We will proceed as follows:
A first slow version of our pipe test bench;
How pipes are implemented internally, and why writing and reading from them is slow;
How the vmsplice and splice syscalls let us get around some (but not all!) of the slowness;
A description of Linux paging, leading up to a faster version using huge pages;
The final optimization, replacing polling with busy looping;
Some closing thoughts.
source: L
All About Libpas, Phil's Super Fast Malloc
https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/libpas/Documentation.md [github.com]
2022-06-01 21:43
tags:
c
malloc
perf
programming
Libpas is a fast and memory-efficient memory allocation toolkit capable of supporting many heaps at once, engineered with the hopes that someday it’ll be used for comprehensive isoheaping of all malloc/new callsites in C/C++ programs.
source: HN
The case of the failed exchange of the vtable slot
https://devblogs.microsoft.com/oldnewthing/20220429-00/?p=106543 [devblogs.microsoft.com]
2022-05-04 20:24
tags:
bugfix
cxx
malloc
programming
windows
This shell extension is trying to detour the operating system, and it failed. (Note that Windows does not support apps detouring the operating system. This shell extension has exited into unsupported territory.)
Beyond the Remake of 'Shadow of the Colossus': A Technical Perspective
https://www.youtube.com/watch?v=fcBZEZWGYek [www.youtube.com]
2022-02-23 06:20
tags:
concurrency
development
gaming
malloc
programming
video
Intro to porting games between platforms, then also a deep walkthrough of a custom allocator libary.
The multi-generational LRU
https://lwn.net/Articles/851184/ [lwn.net]
2021-04-03 03:03
tags:
linux
malloc
systems
update
One of the key tasks assigned to the memory-management subsystem is to optimize the system’s use of the available memory; that means pushing out pages containing unused data so that they can be put to better use elsewhere. Predicting which pages will be accessed in the near future is a tricky task, and the kernel has evolved a number of mechanisms designed to improve its chances of guessing right. But the kernel not only often gets it wrong, it also can expend a lot of CPU time to make the incorrect choice. The multi-generational LRU patch set posted by Yu Zhao is an attempt to improve that situation.
https://lwn.net/ml/linux-kernel/20210313075747.3781593-1-yuzhao@google.com/
source: HN
Improving texture atlas allocation in WebRender
https://nical.github.io/posts/etagere.html [nical.github.io]
2021-02-05 02:11
tags:
compsci
graphics
malloc
programming
This is a longer version of the piece I published in the mozilla gfx team blog where I focus on the atlas allocation algorithms. In this one I’ll go into more details about the process and methodology behind these improvements. The first part is about the making of guillotiere, a crate that I first released in March 2019. In the second part we’ll have a look at more recent work building upon what I did with guillotiere, to improve texture memory usage in WebRender/Firefox.
https://mozillagfx.wordpress.com/2021/02/04/improving-texture-atlas-allocation-in-webrender/
source: HN
What they don’t tell you about demand paging in school
https://offlinemark.com/2020/10/14/demand-paging/ [offlinemark.com]
2020-10-20 18:29
tags:
linux
malloc
programming
systems
This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.
Good look at practical realities.
source: L