What are the most important statistical ideas of the past 50 years?
http://www.stat.columbia.edu/~gelman/research/unpublished/stat50.pdf [www.stat.columbia.edu]
2021-03-12 03:30
tags:
ideas
math
paper
pdf
science
We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common features of these ideas, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science.
source: danluu
Micro-Optimizing .tar.gz Archives by Changing File Order
https://justinblank.com/experiments/optimizingtar.html [justinblank.com]
2021-01-20 06:50
tags:
benchmark
compression
perf
storage
A few weeks ago, I was doing something with a sizeable .tar.gz file, and wondered how the order of files affected the process. I’m not that knowledgable about compression, but I know that gzip uses a sliding window in which it looks for opportunities to compress repeating chunks of text. If you give it highly repetitive text, it does well, if you give it random data, it will probably give you a bigger file than when you started. So reordering files seems like it could matter.
source: danluu
Hunting a Linux kernel bug
https://blog.twitter.com/engineering/en_us/topics/open-source/2020/hunting-a-linux-kernel-bug.html [blog.twitter.com]
2020-04-29 19:49
tags:
bugfix
linux
networking
Earlier last year, we identified a firewall misconfiguration which accidentally dropped most network traffic. We expected resetting the firewall configuration to fix the issue, but resetting the firewall configuration exposed a kernel bug!
source: danluu
95%-ile isn't that good
https://danluu.com/p95-skill/ [danluu.com]
2020-02-12 00:12
tags:
development
essay
life
Reaching 95%-ile isn’t very impressive because it’s not that hard to do. I think this is one of my most ridiculable ideas. It doesn’t help that, when stated nakedly, that sounds elitist. But I think it’s just the opposite: most people can become (relatively) good at most things.
There are several sections here. Every time I thought I was nearing the end, more content showed up.
source: danluu
Deconstruct files
https://danluu.com/deconstruct-files/ [danluu.com]
2019-07-13 16:55
tags:
best
factcheck
fs
hardware
linux
programming
storage
systems
turtles
unix
Let’s talk about files! Most developers seem to think that files are easy.
In this talk, we’re going to look at how file systems differ from each other and other issues we might encounter when writing to files. We’re going to look at the file “stack”, starting at the top with the file API, moving down to the filesystem, and then moving down to disk.
source: danluu
Modifying reassociate for improved CSE: fairly large perf gains
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118476.html [lists.llvm.org]
2018-07-25 16:48
tags:
c
compiler
perf
programming
Wed Oct 25 11:36:54 PDT 2017
When playing around with reassociate I noticed a seemingly obvious optimization that was not getting done anywhere in llvm… nor in gcc or ICC.
source: danluu
Some bounds checks are elided by Apple's compiler and possibly others
https://github.com/capnproto/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md [github.com]
2018-03-22 16:45
tags:
bugfix
c
compiler
programming
security
standard
Although triggered by a compiler optimization, this is a bug in Cap’n Proto, not the compiler.
To most observers, this code would appear to be correct. However, as it turns out, pointer arithmetic that overflows is undefined behavior under the C standard. As a result, the compiler is allowed to assume that the addition on the first line never overflows.
source: danluu
C with ABC!
http://www.cs.cmu.edu/~tom7/abc/paper.txt [www.cs.cmu.edu]
2018-02-07 03:39
tags:
c
compiler
cpu
format
paper
programming
text
In this paper, I describe a new compiler for the C89 programming language.
A paper and a compiler!
source: danluu
An Adaptive Packed-Memory Array
https://www3.cs.stonybrook.edu/~bender/newpub/BenderHu07-TODS.pdf [www3.cs.stonybrook.edu]
2018-01-23 14:11
tags:
compsci
paper
pdf
perf
programming
The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries.
source: danluu
linux-insides
https://0xax.gitbooks.io/linux-insides/ [0xax.gitbooks.io]
2017-12-30 18:46
tags:
book
linux
systems
A book-in-progress about the linux kernel and its insides. The goal is simple - to share my modest knowledge about the insides of the linux kernel and help people who are interested in linux kernel insides, and other low-level subject matter.
source: danluu
Timers in Google Home!
https://twitter.com/danluu/status/942049082767495168 [twitter.com]
2017-12-17 22:08
tags:
ioshit
life
tweet
Ok Google, set a timer for ninety-nine years
timer for *minus* one-thousand-nine-hundred-thirty-nine weeks, two days, six hours, twenty-eight minutes and sixteen seconds starting now
source: danluu
Is there data on the quality of management decisions?
https://danluu.com/bad-decisions/ [danluu.com]
2017-11-22 19:06
tags:
ideas
math
sports
valley
Unfortunately, arguments like this are difficult to settle because, even in retrospect, it’s usually not possible to get enough information to determine the precise “value” of a decision. Even in cases where the decision led to an unambiguous success or failure, there are so many factors that led to the result that it’s difficult to figure out precisely why something happened.
Are we right or wrong? Tune in next decade to see what’s changed.
source: danluu
Musings on Kotlin Ranges
http://blog.danlew.net/2017/06/05/musings-on-kotlin-ranges/ [blog.danlew.net]
2017-11-16 00:41
tags:
intro-programming
java
Here are a few interesting aspects of Kotlin ranges, some of which I’ve found to be less-than-intuitive.
source: danluu
Filesystem error handling
https://danluu.com/filesystem-errors/ [danluu.com]
2017-10-23 19:59
tags:
fs
linux
paper
storage
systems
Prabhakaran et al. injected errors at the block device level (just underneath the filesystem) and found that ext3, resierfs, ntfs, and jfs mostly handled read errors reasonbly but ext3, ntfs, and jfs mostly ignored write errors. While the paper is interesting, someone installing Linux on a system today is much more likely to use ext4 than any of the now-dated filesystems tested by Prahbhakaran et al. We’ll try to reproduce some of the basic results from the paper on more modern filesystems like ext4 and btrfs, some legacy filesystems like exfat, ext3, and jfs, as well as on overlayfs.
source: danluu
Strange Hash Instances in Ruby
https://kate.io/blog/strange-hash-instances-in-ruby/ [kate.io]
2017-10-02 01:26
tags:
hash
programming
ruby
Everything can be patched, except the things that cant.
source: danluu
A history of branch prediction from 1500000 BC to 1995
https://danluu.com/branch-prediction/ [danluu.com]
2017-08-23 18:29
tags:
cpu
hardware
perf
programming
retro
We’ll start with the most naive things someone might do and work our way up to something better.
source: danluu
Why does Sattolo's algorithm produce a permutation with exactly one cycle?
https://danluu.com/sattolo/ [danluu.com]
2017-08-12 04:13
tags:
compsci
math
I recently had a problem where part of the solution was to do a series of pointer accesses that would walk around a chunk of memory in pseudo-random order. Sattolo’s algorithm provides a solution to this because it produces a permutation of a list with exactly one cycle, which guarantees that we will reach every element of the list even though we’re traversing it in random order
source: danluu
Book review: "Working Effectively with Legacy Code" by Michael C. Feathers
http://eli.thegreenplace.net/2017/book-review-working-effectively-with-legacy-code-by-michael-c-feathers/ [eli.thegreenplace.net]
2017-07-26 21:14
tags:
book
development
The hacks are a good match to the foe - they’re about as awful as the code itself, so young and innocent developers may find themselves (rightfully) horrified.
source: danluu
Terminal and shell performance
https://danluu.com/term-latency/ [danluu.com]
2017-07-18 19:50
tags:
benchmark
development
perf
swtools
tty
ux
Most terminals have enough latency that the user experience could be improved if the terminals concentrated more on latency and less on other features or other aspects of performance. However, when I search for terminal benchmarks, I find that terminal authors, if they benchmark anything, benchmark the speed of sinking stdout or memory usage at startup. This is unfortunate because most “low performance” terminals can already sink stdout many orders of magnitude faster than humans can keep up with, so further optimizing stdout sink speed has a relatively small impact on actual user experience for most users.
source: danluu
Writing a SAT Solver
http://andrew.gibiansky.com/blog/verification/writing-a-sat-solver/ [andrew.gibiansky.com]
2017-06-21 20:29
tags:
compsci
haskell
programming
In this post, we’ll look at how to teach computers to solve puzzles. Specifically, we’ll look at a simple puzzle that can be expressed as a boolean constraint satisfaction problem, and we’ll write a simple constraint solver (a SAT solver) and mention how our algorithm, when augmented with a few optimizations, is used in modern SAT solvers.
source: danluu