inks

source: danluu

What are the most important statistical ideas of the past 50 years?

http://www.stat.columbia.edu/~gelman/research/unpublished/stat50.pdf [www.stat.columbia.edu]

2021-03-12 03:30

We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common features of these ideas, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science.

source: danluu

Micro-Optimizing .tar.gz Archives by Changing File Order

https://justinblank.com/experiments/optimizingtar.html [justinblank.com]

2021-01-20 06:50

tags: benchmark compression perf storage

A few weeks ago, I was doing something with a sizeable .tar.gz file, and wondered how the order of files affected the process. I’m not that knowledgable about compression, but I know that gzip uses a sliding window in which it looks for opportunities to compress repeating chunks of text. If you give it highly repetitive text, it does well, if you give it random data, it will probably give you a bigger file than when you started. So reordering files seems like it could matter.

source: danluu

Hunting a Linux kernel bug

https://blog.twitter.com/engineering/en_us/topics/open-source/2020/hunting-a-linux-kernel-bug.html [blog.twitter.com]

2020-04-29 19:49

tags: bugfix linux networking

Earlier last year, we identified a firewall misconfiguration which accidentally dropped most network traffic. We expected resetting the firewall configuration to fix the issue, but resetting the firewall configuration exposed a kernel bug!

source: danluu

95%-ile isn't that good

https://danluu.com/p95-skill/ [danluu.com]

2020-02-12 00:12

tags: development essay life

Reaching 95%-ile isn’t very impressive because it’s not that hard to do. I think this is one of my most ridiculable ideas. It doesn’t help that, when stated nakedly, that sounds elitist. But I think it’s just the opposite: most people can become (relatively) good at most things.

There are several sections here. Every time I thought I was nearing the end, more content showed up.

source: danluu

Deconstruct files

https://danluu.com/deconstruct-files/ [danluu.com]

2019-07-13 16:55

tags: best factcheck fs hardware linux programming storage systems turtles unix

Let’s talk about files! Most developers seem to think that files are easy.

In this talk, we’re going to look at how file systems differ from each other and other issues we might encounter when writing to files. We’re going to look at the file “stack”, starting at the top with the file API, moving down to the filesystem, and then moving down to disk.

source: danluu

Modifying reassociate for improved CSE: fairly large perf gains

http://lists.llvm.org/pipermail/llvm-dev/2017-October/118476.html [lists.llvm.org]

2018-07-25 16:48

tags: c compiler perf programming

Wed Oct 25 11:36:54 PDT 2017

When playing around with reassociate I noticed a seemingly obvious optimization that was not getting done anywhere in llvm… nor in gcc or ICC.

source: danluu

Some bounds checks are elided by Apple's compiler and possibly others

https://github.com/capnproto/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md [github.com]

2018-03-22 16:45

tags: bugfix c compiler programming security standard

Although triggered by a compiler optimization, this is a bug in Cap’n Proto, not the compiler.

To most observers, this code would appear to be correct. However, as it turns out, pointer arithmetic that overflows is undefined behavior under the C standard. As a result, the compiler is allowed to assume that the addition on the first line never overflows.

source: danluu

C with ABC!

http://www.cs.cmu.edu/~tom7/abc/paper.txt [www.cs.cmu.edu]

2018-02-07 03:39

tags: c compiler cpu format paper programming text

In this paper, I describe a new compiler for the C89 programming language.

A paper and a compiler!

source: danluu

An Adaptive Packed-Memory Array

https://www3.cs.stonybrook.edu/~bender/newpub/BenderHu07-TODS.pdf [www3.cs.stonybrook.edu]

2018-01-23 14:11

tags: compsci paper pdf perf programming

The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries.

source: danluu

linux-insides

https://0xax.gitbooks.io/linux-insides/ [0xax.gitbooks.io]

2017-12-30 18:46

tags: book linux systems

A book-in-progress about the linux kernel and its insides. The goal is simple - to share my modest knowledge about the insides of the linux kernel and help people who are interested in linux kernel insides, and other low-level subject matter.

source: danluu

Timers in Google Home!

https://twitter.com/danluu/status/942049082767495168 [twitter.com]

2017-12-17 22:08

tags: ioshit life tweet

Ok Google, set a timer for ninety-nine years

timer for *minus* one-thousand-nine-hundred-thirty-nine weeks, two days, six hours, twenty-eight minutes and sixteen seconds starting now

source: danluu

Is there data on the quality of management decisions?

https://danluu.com/bad-decisions/ [danluu.com]

2017-11-22 19:06

tags: ideas math sports valley

Unfortunately, arguments like this are difficult to settle because, even in retrospect, it’s usually not possible to get enough information to determine the precise “value” of a decision. Even in cases where the decision led to an unambiguous success or failure, there are so many factors that led to the result that it’s difficult to figure out precisely why something happened.

Are we right or wrong? Tune in next decade to see what’s changed.

source: danluu

Musings on Kotlin Ranges

http://blog.danlew.net/2017/06/05/musings-on-kotlin-ranges/ [blog.danlew.net]

2017-11-16 00:41

tags: intro-programming java

Here are a few interesting aspects of Kotlin ranges, some of which I’ve found to be less-than-intuitive.

source: danluu

Filesystem error handling

https://danluu.com/filesystem-errors/ [danluu.com]

2017-10-23 19:59

tags: fs linux paper storage systems

Prabhakaran et al. injected errors at the block device level (just underneath the filesystem) and found that ext3, resierfs, ntfs, and jfs mostly handled read errors reasonbly but ext3, ntfs, and jfs mostly ignored write errors. While the paper is interesting, someone installing Linux on a system today is much more likely to use ext4 than any of the now-dated filesystems tested by Prahbhakaran et al. We’ll try to reproduce some of the basic results from the paper on more modern filesystems like ext4 and btrfs, some legacy filesystems like exfat, ext3, and jfs, as well as on overlayfs.

source: danluu

Strange Hash Instances in Ruby

https://kate.io/blog/strange-hash-instances-in-ruby/ [kate.io]

2017-10-02 01:26

tags: hash programming ruby

Everything can be patched, except the things that cant.

source: danluu

A history of branch prediction from 1500000 BC to 1995

https://danluu.com/branch-prediction/ [danluu.com]

2017-08-23 18:29

tags: cpu hardware perf programming retro

We’ll start with the most naive things someone might do and work our way up to something better.

source: danluu

Why does Sattolo's algorithm produce a permutation with exactly one cycle?

https://danluu.com/sattolo/ [danluu.com]

2017-08-12 04:13

tags: compsci math

I recently had a problem where part of the solution was to do a series of pointer accesses that would walk around a chunk of memory in pseudo-random order. Sattolo’s algorithm provides a solution to this because it produces a permutation of a list with exactly one cycle, which guarantees that we will reach every element of the list even though we’re traversing it in random order

source: danluu

Book review: "Working Effectively with Legacy Code" by Michael C. Feathers

http://eli.thegreenplace.net/2017/book-review-working-effectively-with-legacy-code-by-michael-c-feathers/ [eli.thegreenplace.net]

2017-07-26 21:14

tags: book development

The hacks are a good match to the foe - they’re about as awful as the code itself, so young and innocent developers may find themselves (rightfully) horrified.

source: danluu

Terminal and shell performance

https://danluu.com/term-latency/ [danluu.com]

2017-07-18 19:50

tags: benchmark development perf swtools tty ux

Most terminals have enough latency that the user experience could be improved if the terminals concentrated more on latency and less on other features or other aspects of performance. However, when I search for terminal benchmarks, I find that terminal authors, if they benchmark anything, benchmark the speed of sinking stdout or memory usage at startup. This is unfortunate because most “low performance” terminals can already sink stdout many orders of magnitude faster than humans can keep up with, so further optimizing stdout sink speed has a relatively small impact on actual user experience for most users.

source: danluu

Writing a SAT Solver

http://andrew.gibiansky.com/blog/verification/writing-a-sat-solver/ [andrew.gibiansky.com]

2017-06-21 20:29

tags: compsci haskell programming

In this post, we’ll look at how to teach computers to solve puzzles. Specifically, we’ll look at a simple puzzle that can be expressed as a boolean constraint satisfaction problem, and we’ll write a simple constraint solver (a SAT solver) and mention how our algorithm, when augmented with a few optimizations, is used in modern SAT solvers.

source: danluu