inks

tag: cpu

The radix 2^51 trick

https://www.chosenplaintext.ca/articles/radix-2-51-trick.html [www.chosenplaintext.ca]

2025-05-31 00:51

The obvious solution would be to break up each 256-bit number into four 64-bit pieces (commonly referred to as “limbs”).

The first reason is that adc is just slower to execute than a normal add on most popular x86 CPUs. Since adc has a third input (the carry flag), it’s a more complex instruction than add. It’s also used less often than add, so there is less incentive for CPU designers to spend chip area on optimizing adc performance.

The key insight here is that we can use this technique to delay carry propagation until the end. We can’t avoid carry propagation altogether, but we can avoid it temporarily. If we save up the carries that occur during the intermediate additions, we can propagate them all in one go at the end.

source: L

The absurdly complicated circuitry for the 386 processor's registers

http://www.righto.com/2025/05/intel-386-register-circuitry.html [www.righto.com]

2025-05-04 20:22

tags: compsci cpu hardware photos retro tech

If you look in a book on processor design, you’ll find a description of how registers can be created from static memory cells. However, the 386 illustrates that the implementation in a real processor is considerably more complicated. Instead of using one circuit, Intel used six different circuits for the registers in the 386.

I want a good parallel computer

https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html [raphlinus.github.io]

2025-03-22 17:56

tags: concurrency cpu graphics hardware programming

I believe a simpler, more powerful parallel computer is possible, and that there are signs in the historical record. In a slightly alternate universe, we would have those computers now, and be doing the work of designing algorithms and writing programs to run well on them, for a very broad range of tasks.

source: L

The Pentium contains a complicated circuit to multiply by three

http://www.righto.com/2025/03/pentium-multiplier-adder-reverse-engineered.html [www.righto.com]

2025-03-14 23:21

tags: article cpu hardware investigation math

In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line. I’ve been examining the Pentium’s circuitry in detail and I came across a circuit to multiply by three, a complex circuit with thousands of transistors. Why does the Pentium have a circuit to multiply specifically by three? Why is it so complicated? In this article, I examine this multiplier—which I’ll call the ×3 circuit—and explain its purpose and how it is implemented.

Constant-Time Code: The Pessimist Case

https://eprint.iacr.org/2025/435 [eprint.iacr.org]

2025-03-08 06:09

tags: compiler cpu crypto paper pdf perf programming turtles

This note discusses the problem of writing cryptographic implementations in software, free of timing-based side-channels, and many ways in which that endeavour can fail in practice. It is a pessimist view: it highlights why such failures are expected to become more common, and how constant-time coding is, or will soon become, infeasible in all generality.

From compiler optimizations to CPU pipelines and register renaming.

Zen and the Art of Microcode Hacking

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking [bughunters.google.com]

2025-03-08 06:03

tags: bios cpu exploit hash programming security systems

In this post, we first discuss the background of what microcode is, why microcode patches exist, why the integrity of microcode is important for security, and how AMD attempts to prevent tampering with microcode. Next, we focus on the microcode patch signature validation process and explain in detail the vulnerability present (using CMAC as a hash function). Finally, we discuss how to use some of the tools we’ve released today which can help researchers reproduce and expand on our work (skip to the Zentool section of this blogpost for a “how to” on writing your own microcode).

source: HN

How do modern compilers choose which variables to put in registers?

https://langdev.stackexchange.com/questions/4325/how-do-modern-compilers-choose-which-variables-to-put-in-registers [langdev.stackexchange.com]

2025-02-17 20:59

tags: compiler cpu programming

This is a very broad subject. The problem of deciding how to map a program with arbitrarily many variables onto a fixed set of registers is known as register allocation, and it has been the subject of much research, study, and engineering effort since the very earliest compilers. One of the canonical approaches, graph coloring, was first proposed in 1981. Countless other approaches and variants have been explored since then, and I cannot hope to cover the full breadth of the topic in a single answer.

source: HN

AMD: Microcode Signature Verification Vulnerability

https://github.com/google/security-research/security/advisories/GHSA-4xq7-4mgh-gp6w [github.com]

2025-02-03 19:53

tags: bios cpu exploit hash security systems virtualization

This vulnerability allows an adversary with local administrator privileges (ring 0 from outside a VM) to load malicious microcode patches. We have demonstrated the ability to craft arbitrary malicious microcode patches on Zen 1 through Zen 4 CPUs. The vulnerability is that the CPU uses an insecure hash function in the signature validation for microcode updates. This vulnerability could be used by an adversary to compromise confidential computing workloads protected by the newest version of AMD Secure Encrypted Virtualization, SEV-SNP or to compromise Dynamic Root of Trust Measurement.

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3019.html

source: HN

Don't clobber the frame pointer

https://nsrip.com/posts/clobberfp.html [nsrip.com]

2025-01-05 09:34

tags: bugfix compiler cpu go programming

Recently I diagnosed and fixed two frame pointer unwinding crashes in Go. The root causes were two flavors of the same problem: buggy assembly code clobbered a frame pointer. By “clobbered” I mean wrote over the value without saving & restoring it. One bug clobbered the frame pointer register. The other bug clobbered a frame pointer saved on the stack. This post explains the bugs, talks a bit about ABIs and calling conventions, and makes some recommendations for how to avoid the bugs.

source: L

It’s the Most Indispensable Machine in the World

https://www.wsj.com/tech/ai/asml-euv-machine-lithography-chips-967954d0 [www.wsj.com]

2025-01-04 07:12

tags: article business cpu tech

The piece of equipment that the entire world has come to rely on—and she is specially trained to handle—is called an extreme ultraviolet lithography machine. It’s the machine that produces the most advanced microchips on the planet. It was built with scientific technologies that sound more like science fiction—breakthroughs so improbable that they were once dismissed as impossible. And it has transformed wafers of silicon into the engines of modern life.

She’s one of the engineers assigned to the fabrication plants—or fabs—where ASML customers manufacture their semiconductors. Hall is based here in Boise, the headquarters of Micron Technology, where I hopped into a bunny suit of my own and followed her inside the chip fab. Then I got a rare, behind-the-scenes peek at what might just be the most important machine ever made.

source: DF

The Alder Lake SHLX anomaly

https://tavianator.com/2025/shlx.html [tavianator.com]

2025-01-03 09:54

tags: benchmark cpu perf programming

It seems like SHLX performs differently depending on how the shift count register is initialized. If you use a 64-bit instruction with an immediate, performance is slow. This is also true for instructions like INC (which is similar to ADD with a 1 immediate). On the other hand, 32-bit instructions, and 64-bit instructions without immediates (even no-op ones), make it fast. All of these ways to initialize RCX lead to 1-cycle latency:

source: L

Flipping Pages: An analysis of a new Linux vulnerability in nf_tables and hardened exploitation techniques

https://pwning.tech/nftables/ [pwning.tech]

2024-03-26 23:33

tags: best cpu exploit linux malloc paper programming security systems

In this blogpost I present several novel techniques I used to exploit a 0-day double-free bug in hardened Linux kernels (i.e. KernelCTF mitigation instances) with 93%-99% success rate. The underlying bug is input sanitization failure of netfilter verdicts. Hence, the requirements for the exploit are that nf_tables is enabled and unprivileged user namespaces are enabled. The exploit is data-only and performs an kernel-space mirroring attack (KSMA) from userland with the novel Dirty Pagedirectory technique (pagetable confusion), where it is able to link any physical address (and its permissions) to virtual memory addresses by performing just read/writes to userland addresses.

Also: https://github.com/Notselwyn/CVE-2024-1086

source: HN

Reverse engineering standard cell logic in the Intel 386 processor

http://www.righto.com/2024/01/intel-386-standard-cells.html [www.righto.com]

2024-03-13 07:33

tags: article compsci cpu hardware photos tech

The 386 processor (1985) was Intel’s most complex processor at the time, with 285,000 transistors. Intel had scheduled 50 person-years to design the processor, but it was falling behind schedule. The design team decided to automate chunks of the layout, developing “automatic place and route” software. This was a risky decision since if the software couldn’t create a dense enough layout, the chip couldn’t be manufactured. But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.

In this article, I take a close look at the “standard cells” used in the 386, the logic blocks that were arranged and wired by software. Reverse-engineering these circuits shows how standard cells implement logic gates, latches, and other components with CMOS transistors. Modern integrated circuits still use standard cells, much smaller now, of course, but built from the same principles.

An improved chkstk function on Windows

https://nullprogram.com/blog/2024/02/05/ [nullprogram.com]

2024-02-06 23:47

tags: compiler cpu programming windows

If you’ve spent much time developing with Mingw-w64 you’ve likely seen the symbol ___chkstk_ms, perhaps in an error message. It’s a little piece of runtime provided by GCC via libgcc which ensures enough of the stack is committed for the caller’s stack frame. The “function” uses a custom ABI and is implemented in assembly. So is the subject of this article, a slightly improved implementation soon to be included in w64devkit as libchkstk (-lchkstk).

source: L

Operation Triangulation: What You Get When Attack iPhones of Researchers

https://securelist.com/operation-triangulation-the-last-hardware-mystery/111669/ [securelist.com]

2023-12-27 19:52

tags: best cpu exploit investigation iphone security

This presentation was also the first time we had publicly disclosed the details of all exploits and vulnerabilities that were used in the attack. We discover and analyze new exploits and attacks using these on a daily basis, and we have discovered and reported more than thirty in-the-wild zero-days in Adobe, Apple, Google, and Microsoft products, but this is definitely the most sophisticated attack chain we have ever seen.

source: HN

Zenbleed

https://lock.cmpxchg8b.com/zenbleed.html [lock.cmpxchg8b.com]

2023-07-25 01:47

tags: cpu exploit programming security sidechannel systems

What should happen if the processor speculatively executed a vzeroupper, but then discovers that there was a branch misprediction? Well, we will have to revert that operation and put things back the way they were… maybe we can just unset that z-bit?

If we return to the analogy of malloc and free, you can see that it can’t be that simple - that would be like calling free() on a pointer, and then changing your mind!

That would be a use-after-free vulnerability, but there is no such thing as a use-after-free in a CPU… or is there?

source: L

The complex history of the Intel i960 RISC processor

http://www.righto.com/2023/07/the-complex-history-of-intel-i960-risc.html [www.righto.com]

2023-07-02 01:13

tags: cpu hardware retro

The Intel i960 was a remarkable 32-bit processor of the 1990s with a confusing set of versions. Although it is now mostly forgotten (outside the many people who used it as an embedded processor), it has a complex history. It had a shot at being Intel’s flagship processor until x86 overshadowed it. Later, it was the world’s best-selling RISC processor. One variant was a 33-bit processor with a decidedly non-RISC object-oriented instruction set; it became a military standard and was used in the F-22 fighter plane. Another version powered Intel’s short-lived Unix servers. In this blog post, I’ll take a look at the history of the i960, explain its different variants, and examine silicon dies. This chip has a lot of mythology and confusion (especially on Wikipedia), so I’ll try to clear things up.

source: HN

Understanding DeepMind's Sorting Algorithm

https://justine.lol/sorting/ [justine.lol]

2023-06-12 21:55

tags: compsci cpu performance sorting

A few days ago, DeepMind published a blog post talking about a paper they wrote, where they discovered tinier kernels for sorting algorithms. They did this by taking their deep learning wisdom, which they gained by building AlphaGo, and applying it to the discipline of of superoptimization. That piqued my interest, since as a C library author, I’m always looking for opportunities to curate the best stuff. In some ways that’s really the whole purpose of the C library. There are so many functions that we as programmers take for granted, which are the finished product of decades of research, distilled into plain and portable code.

DeepMind earned a fair amount of well-deserved attention for this discovery, but unfortunately they could have done a much better job explaining it.

https://www.deepmind.com/blog/alphadev-discovers-faster-sorting-algorithms

source: HN

Epyc 7002 CPUs may hang after 1042 days of uptime

https://old.reddit.com/r/sysadmin/comments/13wmowy/psa_epyc_7002_cpus_may_hang_after_1042_days_of/ [old.reddit.com]

2023-06-01 18:27

tags: admin cpu hardware

Note that your server will almost definitely hang, requiring a physical (or IPMI) reboot, because no interrupts, including NMIs, can be delivered to the zombie cores: this means no scheduler, no IPIs, nothing will work.

source: HN

Synthetic Memory Protections - An update on ROP mitigations

https://www.openbsd.org/papers/csw2023.pdf [www.openbsd.org]

2023-03-25 19:35

tags: cpu defense malloc openbsd pdf security slides systems

ROP methods have become increasingly sophisticated
But we can identify system behaviours which only ROP code requires
We can contrast this to what Regular Control Flow code needs
And then, find behaviours to block

source: HN