inks

tag: benchmark

The Alder Lake SHLX anomaly

https://tavianator.com/2025/shlx.html [tavianator.com]

2025-01-03 09:54

It seems like SHLX performs differently depending on how the shift count register is initialized. If you use a 64-bit instruction with an immediate, performance is slow. This is also true for instructions like INC (which is similar to ADD with a 1 immediate). On the other hand, 32-bit instructions, and 64-bit instructions without immediates (even no-op ones), make it fast. All of these ways to initialize RCX lead to 1-cycle latency:

source: L

The Biggest Scandal In Speed Typing History

https://www.youtube.com/watch?v=maCHHSussS4 [www.youtube.com]

2023-06-27 02:30

tags: benchmark factcheck hoipolloi investigation retro tty video

Barbara Blackburn is often cited as the fastest typist in history. She even appears in the Guinness Book of World Records! She must be legit right? Well, maybe not. I was supposed to make a video about the new typing speed world record, and instead got pulled into a Barbara Blackburn rabbit hole that I can’t seem to escape. TL;DR She’s not that fast.

Block Profiling in Go

https://github.com/felixge/go-profiler-notes/blob/main/block.md [github.com]

2021-02-10 01:46

tags: benchmark development go perf

The block profile in Go lets you analyze how much time your program spends waiting on the blocking operations listed below:

source: HN

Micro-Optimizing .tar.gz Archives by Changing File Order

https://justinblank.com/experiments/optimizingtar.html [justinblank.com]

2021-01-20 06:50

tags: benchmark compression perf storage

A few weeks ago, I was doing something with a sizeable .tar.gz file, and wondered how the order of files affected the process. I’m not that knowledgable about compression, but I know that gzip uses a sliding window in which it looks for opportunities to compress repeating chunks of text. If you give it highly repetitive text, it does well, if you give it random data, it will probably give you a bigger file than when you started. So reordering files seems like it could matter.

source: danluu

An Obscure American Automaker Now Has the World’s Fastest Car

https://www.bloomberg.com/news/articles/2020-10-19/ssc-tuatara-is-world-s-fastest-production-car-new-top-speed-record [www.bloomberg.com]

2020-10-20 18:50

tags: benchmark cars

SSC’s Tuatara clocked a record 316.11 mph (508.73 km/h) on a dusty desert highway outside Las Vegas.

https://www.youtube.com/embed/N22JfNHiC1k

source: K

AVIF has landed

https://jakearchibald.com/2020/avif-has-landed/ [jakearchibald.com]

2020-09-09 20:52

tags: benchmark graphics web

AVIF is a new image format derived from the keyframes of AV1 video. It’s a royalty-free format, and it’s already supported in Chrome 85 on desktop. Android support will be added soon, Firefox is working on an implementation, and although it took Safari 10 years to add WebP support, I don’t think we’ll see the same delay here, as Apple are a member of the group that created AV1.

Roughly speaking, at an acceptable quality, the WebP is almost half the size of JPEG, and AVIF is under half the size of WebP. I find it incredible that AVIF can do a good job of the image in just 18 kB.

source: L

Is WebP really better than JPEG?

https://siipo.la/blog/is-webp-really-better-than-jpeg [siipo.la]

2020-06-23 16:39

tags: benchmark graphics web

I think Google’s result of 25-34% smaller files is mostly caused by the fact that they compared their WebP encoder to the JPEG reference implementation, Independent JPEG Group’s cjpeg, not Mozilla’s improved MozJPEG encoder. I decided to run some tests to see how cjpeg, MozJPEG and WebP compare. I also tested the new AVIF format, based on the open AV1 video codec. AVIF support is already in Firefox behind a flag and should be coming soon to Chrome if this ticket is to be believed.

source: HN

Ice Lake Store Elimination

https://travisdowns.github.io/blog/2020/05/18/icelake-zero-opt.html [travisdowns.github.io]

2020-05-18 20:25

tags: benchmark cpu investigation perf systems

We have found that the store elimination optimization originally uncovered on Skylake client is still present in Ice Lake and is roughly twice as effective in our fill benchmarks. Elimination of 96% L2 writebacks (to L3) and L3 writebacks (to RAM) was observed, compared to 50% to 60% on Skylake. We found speedups of up to 45% in the L3 region and speedups of about 25% in RAM, compared to improvements of less than 20% in Skylake.

But there’s a lot of investigation work to get there.

source: HN

ZFS versus RAID: Eight Ironwolf disks, two filesystems, one winner

https://arstechnica.com/gadgets/2020/05/zfs-versus-raid-eight-ironwolf-disks-two-filesystems-one-winner/ [arstechnica.com]

2020-05-18 19:32

tags: admin benchmark filesystem hardware storage

We exhaustively tested ZFS and RAID performance on our Storage Hot Rod server.

source: ars

Elixir and Postgres: A Rarely Mentioned Problem

https://blog.soykaf.com/post/postgresql-elixir-troubles/ [blog.soykaf.com]

2020-02-19 06:02

tags: benchmark database perf sql

Last time, we talked about the magic trick to make your full text searches go fast. This time, I’ll tell you about another performance issue I encountered that probably also affects your performance, at least if you are using Ecto and PostgreSQL.

Gathering Intel on Intel AVX-512 Transitions

https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html [travisdowns.github.io]

2020-01-17 22:19

tags: benchmark cpu investigation perf programming

This is a post about AVX and AVX-512 related frequency scaling. Now, something more than nothing has been written about this already, including cautionary tales of performance loss and some broad guidelines, so do we really need to add to the pile?

Perhaps not, but I’m doing it anyway. My angle is a lower level look, almost microscopic really, at the specific transition behaviors. One would hope that this will lead to specific, quantitative advice about exactly when various instruction types are likely to pay off, but (spoiler) I didn’t make it there in this post.

source: HN

Clang format tanks performance

https://travisdowns.github.io/blog/2019/11/19/toupper.html [travisdowns.github.io]

2019-11-19 22:54

tags: benchmark c cxx perf programming turtles

Let’s benchmark toupper implementations.

Actually, I don’t really care about toupper much at all, but I was writing a different post and needed a peg to hang my narrative hat on, and hey toupper seems like a nice harmless benchmark. Despite my effort to choose something which should be totally straightforward and not sidetrack me, this weird thing popped out.

source: L

An analysis of performance evolution of Linux’s core operations

https://blog.acolyer.org/2019/11/04/an-analysis-of-performance-evolution-of-linuxs-core-operations/ [blog.acolyer.org]

2019-11-04 21:40

tags: benchmark development linux paper perf systems

When you get into the details I found it hard to come away with any strongly actionable takeaways though. Perhaps the most interesting lesson/reminder is this: it takes a lot of effort to tune a Linux kernel. For example:

“Red Hat and Suse normally required 6-18 months to optimise the performance an an upstream Linux kernel before it can be released as an enterprise distribution”, and
“Google’s data center kernel is carefully performance tuned for their workloads. This task is carried out by a team of over 100 engineers, and for each new kernel, the effort can also take 6-18 months.”

Real-world measurements of structured-lattices and supersingular isogenies in TLS

https://www.imperialviolet.org/2019/10/30/pqsivssl.html [www.imperialviolet.org]

2019-10-30 21:45

tags: benchmark browser crypto networking quantum security

This is the third in a series of posts about running experiments on post-quantum confidentiality in TLS. The first detailed experiments that measured the estimated network overhead of three families of post-quantum key exchanges. The second detailed the choices behind a specific structured-lattice scheme. This one gives details of a full, end-to-end measurement of that scheme and a supersingular isogeny scheme, SIKE/p434. This was done in collaboration with Cloudflare, who integrated Microsoft’s SIKE code into BoringSSL for the tests, and ran the server-side of the experiment.

Because optimised assembly implementations are labour-intensive to write, they were only available/written for AArch64 and x86-64. Because SIKE is computationally expensive, it wasn’t feasible to enable it without an assembly implementation, thus only AArch64 and x86-64 clients were included in the experiment and ARMv7 and x86 clients did not contribute to the results even if they were assigned to one of the experiment groups.

Also: https://blog.cloudflare.com/the-tls-post-quantum-experiment/

source: green

Making the Tokio scheduler 10x faster

https://tokio.rs/blog/2019-10-scheduler/ [tokio.rs]

2019-10-14 16:58

tags: benchmark concurrency perf programming rust systems update

We’ve been hard at work on the next major revision of Tokio, Rust’s asynchronous runtime. Today, a complete rewrite of the scheduler has been submitted as a pull request. The result is huge performance and latency improvements. Some benchmarks saw a 10x speed up! It is always unclear how much these kinds of improvements impact “full stack” use cases, so we’ve also tested how these scheduler improvements impacted use cases like Hyper and Tonic (spoiler: it’s really good).

In preparation for working on the new scheduler, I spent time searching for resources on scheduler implementations. Besides existing implementations, I did not find much. I also found the source of existing implementations difficult to navigate. To remedy this, I tried to keep Tokio’s new scheduler implementation as clean as possible. I also am writing this detailed article on implementing the scheduler in hope that others in similar positions find it useful.

The article starts with a high level overview of scheduler design, including work-stealing schedulers. It then gets into the details of specific optimizations made in the new Tokio scheduler.

source: HN

PyPy's new JSON parser

https://morepypy.blogspot.com/2019/10/pypys-new-json-parser.html [morepypy.blogspot.com]

2019-10-08 17:06

tags: benchmark jit perf programming python

In the last year or two I have worked on and off on making PyPy’s JSON faster, particularly when parsing large JSON files. In this post I am going to document those techniques and measure their performance impact.

source: HN

Benchmarking Fibers, Threads and Processes

http://engineering.appfolio.com/appfolio-engineering/2019/9/13/benchmarking-fibers-threads-and-processes [engineering.appfolio.com]

2019-09-19 19:37

tags: benchmark concurrency perf programming ruby

Awhile back, I set out to look at Fiber performance and how it’s improved in recent Ruby versions. After all, concurrency is one of the three pillars of Ruby 3x3! Also, there have been some major speedups in Ruby’s Fiber class by Samuel Williams.

It’s not hard to write a microbenchmark for something like Fiber.yield. But it’s harder, and more interesting, to write a benchmark that’s useful and representative.

source: L

Go compiler intrinsics

https://dave.cheney.net/2019/08/20/go-compiler-intrinsics [dave.cheney.net]

2019-08-22 05:33

tags: benchmark cpu go perf programming

Over the years there have been various proposals for an inline assembly syntax similar to gcc’s asm(...) directive. None have been accepted by the Go team. Instead, Go has added intrinsic functions1.

An intrinsic function is Go code written in regular Go. These functions are known the the Go compiler which contains replacements which it can substitute during compilation.

Upgrading from an Intel Core i7-2600K: Testing Sandy Bridge in 2019

https://www.anandtech.com/show/14043/upgrading-from-an-intel-core-i7-2600k-testing-sandy-bridge-in-2019 [www.anandtech.com]

2019-05-11 01:00

tags: benchmark cpu hardware perf retro

One of the most popular processors of the last decade has been the Intel Core i7-2600K. The design was revolutionary, as it offered a significant jump in single core performance, efficiency, and the top line processor was very overclockable. With the next few generations of processors from Intel being less exciting, or not giving users reasons to upgrade, and the phrase ‘I’ll stay with my 2600K’ became ubiquitous on forums, and is even used today. For this review, we dusted off our box of old CPUs and put it in for a run through our 2019 benchmarks, both at stock and overclocked, to see if it is still a mainstream champion.

source: HN

Who has the fastest website in F1?

https://jakearchibald.com/2019/f1-perf/ [jakearchibald.com]

2019-04-03 03:01

tags: benchmark development html perf web

So, I’m going to make my predictions the only way I know how: By comparing the performance of their websites. That’ll work right? If anything, it’ll be interesting to compare 10 sites that have been recently updated, perhaps even rebuilt, and see what the common issues are. I’ll also cover the tools and techniques I use to test web performance.

source: HN