A Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing

https://www.pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/ [www.pvk.ca]

2019-09-30 03:59

tags:
compsci
hash
perf

> I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.

source: L

Visual Information Theory

https://colah.github.io/posts/2015-09-Visual-Information/ [colah.github.io]

2019-09-24 23:09

tags:
article
compsci
ideas
math
random
visualization

> Information theory gives us precise language for describing a lot of things. How uncertain am I? How much does knowing the answer to question A tell me about the answer to question B? How similar is one set of beliefs to another? I’ve had informal versions of these ideas since I was a young child, but information theory crystallizes them into precise, powerful ideas. These ideas have an enormous variety of applications, from the compression of data, to quantum physics, to machine learning, and vast fields in between.

> Unfortunately, information theory can seem kind of intimidating. I don’t think there’s any reason it should be. In fact, many core ideas can be explained completely visually!

source: E

The secret-sharer: evaluating and testing unintended memorization in neural networks

https://blog.acolyer.org/2019/09/23/the-secret-sharer/ [blog.acolyer.org]

2019-09-24 02:04

tags:
ai
compsci
language
opsec
paper

> This is a really important paper for anyone working with language or generative models, and just in general for anyone interested in understanding some of the broader implications and possible unintended consequences of deep learning. There’s also a lovely sense of the human drama accompanying the discoveries that just creeps through around the edges.

> Disclosure of secrets is of particular concern in neural network models that classify or predict sequences of natural language text… even if sensitive or private training data text is very rare, one should assume that well-trained models have paid attention to its precise details…. The users of such models may discover— either by accident or on purpose— that entering certain text prefixes causes the models to output surprisingly revealing text completions.

The Synchronization of Periodic Routing Messages

https://www.icir.org/floyd/papers/sync_94.pdf [www.icir.org]

2019-09-06 22:05

tags:
compsci
networking
paper
pdf
perf
random
systems

> The paper considers a network with many apparently-independent periodic processes and discusses one method by which these processes can inadvertently become synchronized. In particular, we study the synchronization of periodic routing messages, and offer guidelines on how to avoid inadvertent synchronization. Using simulations and analysis, we study the process of synchronization and show that the transition from unsynchronized to synchronized traffic is not one of gradual degradation but is instead a very abrupt ‘phase transition’: in general, the addition of a single router will convert a completely unsynchronized traffic stream into a completely synchronized one. We show that synchronization can be avoided by the addition of randomization to the traffic sources and quantify how much randomization is necessary. In addition, we argue that the inadvertent synchronization of periodic processes is likely to become an increasing problem in computer networks.

Chopping substrings

http://blogs.perl.org/users/damian_conway/2019/07/chopping-substrings.html [blogs.perl.org]

2019-07-30 19:20

tags:
compsci
perf
perl
programming

> The “best practice” solution is a technique known as suffix trees, which requires some moderately complex coding. However, we can get very reasonable performance for strings up to hundreds of thousands of characters long using a much simpler approach.

source: HN

Decades-Old Computer Science Conjecture Solved in Two Pages

https://www.quantamagazine.org/mathematician-solves-computer-science-conjecture-in-two-pages-20190725/ [www.quantamagazine.org]

2019-07-26 03:04

tags:
compsci
math
paper

> “This conjecture has stood as one of the most frustrating and embarrassing open problems in all of combinatorics and theoretical computer science,” wrote Scott Aaronson of the University of Texas, Austin, in a blog post. “The list of people who tried to solve it and failed is like a who’s who of discrete math and theoretical computer science,” he added in an email.

> Over the years, computer scientists have developed many ways to measure the complexity of a given Boolean function. Each measure captures a different aspect of how the information in the input string determines the output bit. For instance, the “sensitivity” of a Boolean function tracks, roughly speaking, the likelihood that flipping a single input bit will alter the output bit. And “query complexity” calculates how many input bits you have to ask about before you can be sure of the output.

Also: https://www.scottaaronson.com/blog/?p=4229

Paper: https://arxiv.org/abs/1907.00847

> Induced subgraphs of hypercubes and a proof of the Sensitivity Conjecture

Models of Generics and Metaprogramming: Go, Rust, Swift, D and More

http://thume.ca/2019/07/14/a-tour-of-metaprogramming-models-for-generics/ [thume.ca]

2019-07-21 22:14

tags:
article
compiler
compsci
programming
type-system

> In some domains of programming it’s common to want to write a data structure or algorithm that can work with elements of many different types, such as a generic list or a sorting algorithm that only needs a comparison function. Different programming languages have come up with all sorts of solutions to this problem: From just pointing people to existing general features that can be useful for the purpose (e.g C, Go) to generics systems so powerful they become Turing-complete (e.g. Rust, C++). In this post I’m going to take you on a tour of the generics systems in many different languages and how they are implemented. I’ll start from how languages without a special generics system like C solve the problem and then I’ll show how gradually adding extensions in different directions leads to the systems found in other languages.

source: L

From Punched Cards To Flat Screens

http://people.ds.cam.ac.uk/ph10/CIHK.pdf [people.ds.cam.ac.uk]

2019-06-19 04:27

tags:
compsci
development
essay
pdf
programming
retro

> When I retired at the end of September 2007, I knew I would be expected to make a speech at the retirement lunch. Looking back over the 40 years that I had been part of the computing community in Cambridge, it struck me again just how much change there had been, and also how many different areas of computing I had worked in. For the younger colleagues who listened to my speech, the early years must seem like ancient history.

Solving Sudoku with Prolog

https://www.metalevel.at/sudoku/ [www.metalevel.at]

2019-06-10 22:15

tags:
compsci
programming

> Prolog is extremely well suited for solving combinatorial tasks like Sudoku.

source: HN

BPF and formal verification

https://www.sccs.swarthmore.edu/users/16/mmcconv1/pl-reflection.html [www.sccs.swarthmore.edu]

2019-06-07 04:24

tags:
compsci
development
openbsd
programming

> I spent the spring of 2015 researching the Berkeley packet filter (BPF) and its formal verification with my programming languages professor, Joe Gibbs Politz. The project took some unexpected turns and we learned a lot about Coq and applied formal verification in the process.

source: HN

Going Critical

https://www.meltingasphalt.com/interactive/going-critical/ [www.meltingasphalt.com]

2019-05-14 14:38

tags:
compsci
ideas
interactive
networking
visualization

> Networks rule our world. From the chemical reaction pathways inside a cell, to the web of relationships in an ecosystem, to the trade and political networks that shape the course of history. Or consider this very post you’re reading. You probably found it on a social network, downloaded it from a computer network, and are currently deciphering it with your neural network. But as much as I’ve thought about networks over the years, I didn’t appreciate (until very recently) the importance of simple diffusion. This is our topic for today: the way things move and spread, somewhat chaotically, across a network.

source: HN

Playing with model trains and calling it graph theory

https://11011110.github.io/blog/2019/05/02/playing-model-trains.html [11011110.github.io]

2019-05-03 02:38

tags:
compsci
gaming
paper

> You’ve probably played with model trains, for instance with something like the Brio set shown below.1 And if you’ve built a layout with a model train set, you may well have wondered: is it possible for my train to use all the parts of my track?

Verifying Popcount

https://blog.regehr.org/archives/1667 [blog.regehr.org]

2019-04-24 21:42

tags:
c
compsci
programming

> Popcount is the function that returns the number of set bits in its argument. Showing that a popcount implementation does what it claims to do has become one of my favorite examples to use when I need to quickly show students how we can reason about programs mathematically. Something like a selection sort is probably the more standard example here, but sorting gets into some subtleties that I’d rather reserve for a second example. Popcount is so easy that this whole example can be done in a half hour.

Soundness and completeness: with precision

https://bertrandmeyer.com/2019/04/21/soundness-completeness-precision/ [bertrandmeyer.com]

2019-04-23 02:17

tags:
compsci

> These widely used concepts are sometimes misunderstood. The first answer I get when innocently asking people whether the concepts are clear is yes, of course, everyone knows! Then, as I bring up such examples as credit card rejection or dead code detection, assurance quickly yields to confusion. One sign that things are not going well is when people start throwing in terms like “true positive” and “false negative”. By then any prospect of reaching a clear conclusion has vanished. I hope that after reading this article you will never again (in a program analysis context) be tempted to use them.

source: HN

Parsing: The Solved Problem That Isn't

https://tratt.net/laurie/blog/entries/parsing_the_solved_problem_that_isnt.html [tratt.net]

2019-04-10 15:51

tags:
compsci
text

> One of the things that’s become increasingly obvious to me over the past few years is that the general consensus breaks down for one vital emerging trend: language composition. Composition is one of those long, complicated, but often vague terms that crops up a lot in theoretical work. Fortunately, for our purposes it means something simple: grammar composition, which is where we add one grammar to another and have the combined grammar parse text in the new language (exactly the sort of thing we want to do with Domain Specific Languages (DSLs)). To use a classic example, imagine that we wish to extend a Java-like language with SQL

Keeping CALM: when distributed consistency is easy

https://blog.acolyer.org/2019/03/06/keeping-calm-when-distributed-consistency-is-easy/ [blog.acolyer.org]

2019-03-06 22:26

tags:
compsci
concurrency
database
development
paper
perf

> When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.

> So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.

Also: https://blog.acolyer.org/2019/03/08/a-generalised-solution-to-distributed-consensus/

A Fairly Fast Fibonacci Function

http://www.oranlooney.com/post/fibonacci/ [www.oranlooney.com]

2019-02-22 02:05

tags:
compsci
math
perf
programming

> Fair warning: this is a bit of rabbit hole, with no other purpose than to optimize the hell out something for which there is frankly no practical use. But we get to do a bit of linear algebra and try out some pretty interesting optimization techniques; that’s what I call a good time!

source: HN

A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance

https://arxiv.org/abs/1901.10861 [arxiv.org]

2019-02-12 18:45

tags:
ai
compsci
paper
pdf
security

> The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them “Intriguing properties of neural networks”. Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of {R}^n with the L_0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes.

source: grugq

Modern LZ Compression

https://glinscott.github.io/lz/index.html [glinscott.github.io]

2019-02-03 18:08

tags:
compression
compsci
cxx
programming

> This article walks through the components of a modern LZ compressor.

> LZ compression has been around for a long time, but there are still major improvements being made. The deflate algorithm is perhaps the most widely used, implemented initially in PKZIP, and these days finding broad usage in gzip. Modern incarnations are much more powerful, and use a wide array of new tricks. Two examples of the next generation are zstd and lzma. For a comprehensive overview of open source LZ compressors, check out lzbench.

source: L

It’s Time for Some Queueing Theory

https://kottke.org/19/01/its-time-for-some-queueing-theory [kottke.org]

2019-01-30 22:26

tags:
compsci
links

> Queueing theory is the scientific study of waiting in line. It can apply to familiar lines like those at the grocery store or bank but also to things like web servers, highway traffic, and telecommunications…basically any situation where you have things entering a system, being processed by a system for a certain period of time, and leaving the system.

Assorted stories and links.

source: K