Chopping substrings

http://blogs.perl.org/users/damian_conway/2019/07/chopping-substrings.html [blogs.perl.org]

2019-07-30 19:20

tags:
compsci
perf
perl
programming

> The “best practice” solution is a technique known as suffix trees, which requires some moderately complex coding. However, we can get very reasonable performance for strings up to hundreds of thousands of characters long using a much simpler approach.

source: HN

Decades-Old Computer Science Conjecture Solved in Two Pages

https://www.quantamagazine.org/mathematician-solves-computer-science-conjecture-in-two-pages-20190725/ [www.quantamagazine.org]

2019-07-26 03:04

tags:
compsci
math
paper

> “This conjecture has stood as one of the most frustrating and embarrassing open problems in all of combinatorics and theoretical computer science,” wrote Scott Aaronson of the University of Texas, Austin, in a blog post. “The list of people who tried to solve it and failed is like a who’s who of discrete math and theoretical computer science,” he added in an email.

> Over the years, computer scientists have developed many ways to measure the complexity of a given Boolean function. Each measure captures a different aspect of how the information in the input string determines the output bit. For instance, the “sensitivity” of a Boolean function tracks, roughly speaking, the likelihood that flipping a single input bit will alter the output bit. And “query complexity” calculates how many input bits you have to ask about before you can be sure of the output.

Also: https://www.scottaaronson.com/blog/?p=4229

Paper: https://arxiv.org/abs/1907.00847

> Induced subgraphs of hypercubes and a proof of the Sensitivity Conjecture

Models of Generics and Metaprogramming: Go, Rust, Swift, D and More

http://thume.ca/2019/07/14/a-tour-of-metaprogramming-models-for-generics/ [thume.ca]

2019-07-21 22:14

tags:
article
compiler
compsci
programming
type-system

> In some domains of programming it’s common to want to write a data structure or algorithm that can work with elements of many different types, such as a generic list or a sorting algorithm that only needs a comparison function. Different programming languages have come up with all sorts of solutions to this problem: From just pointing people to existing general features that can be useful for the purpose (e.g C, Go) to generics systems so powerful they become Turing-complete (e.g. Rust, C++). In this post I’m going to take you on a tour of the generics systems in many different languages and how they are implemented. I’ll start from how languages without a special generics system like C solve the problem and then I’ll show how gradually adding extensions in different directions leads to the systems found in other languages.

source: L

From Punched Cards To Flat Screens

http://people.ds.cam.ac.uk/ph10/CIHK.pdf [people.ds.cam.ac.uk]

2019-06-19 04:27

tags:
compsci
development
essay
pdf
programming
retro

> When I retired at the end of September 2007, I knew I would be expected to make a speech at the retirement lunch. Looking back over the 40 years that I had been part of the computing community in Cambridge, it struck me again just how much change there had been, and also how many different areas of computing I had worked in. For the younger colleagues who listened to my speech, the early years must seem like ancient history.

Solving Sudoku with Prolog

https://www.metalevel.at/sudoku/ [www.metalevel.at]

2019-06-10 22:15

tags:
compsci
programming

> Prolog is extremely well suited for solving combinatorial tasks like Sudoku.

source: HN

BPF and formal verification

https://www.sccs.swarthmore.edu/users/16/mmcconv1/pl-reflection.html [www.sccs.swarthmore.edu]

2019-06-07 04:24

tags:
compsci
development
openbsd
programming

> I spent the spring of 2015 researching the Berkeley packet filter (BPF) and its formal verification with my programming languages professor, Joe Gibbs Politz. The project took some unexpected turns and we learned a lot about Coq and applied formal verification in the process.

source: HN

Going Critical

https://www.meltingasphalt.com/interactive/going-critical/ [www.meltingasphalt.com]

2019-05-14 14:38

tags:
compsci
ideas
interactive
networking
visualization

> Networks rule our world. From the chemical reaction pathways inside a cell, to the web of relationships in an ecosystem, to the trade and political networks that shape the course of history. Or consider this very post you’re reading. You probably found it on a social network, downloaded it from a computer network, and are currently deciphering it with your neural network. But as much as I’ve thought about networks over the years, I didn’t appreciate (until very recently) the importance of simple diffusion. This is our topic for today: the way things move and spread, somewhat chaotically, across a network.

source: HN

Playing with model trains and calling it graph theory

https://11011110.github.io/blog/2019/05/02/playing-model-trains.html [11011110.github.io]

2019-05-03 02:38

tags:
compsci
gaming
paper

> You’ve probably played with model trains, for instance with something like the Brio set shown below.1 And if you’ve built a layout with a model train set, you may well have wondered: is it possible for my train to use all the parts of my track?

Verifying Popcount

https://blog.regehr.org/archives/1667 [blog.regehr.org]

2019-04-24 21:42

tags:
c
compsci
programming

> Popcount is the function that returns the number of set bits in its argument. Showing that a popcount implementation does what it claims to do has become one of my favorite examples to use when I need to quickly show students how we can reason about programs mathematically. Something like a selection sort is probably the more standard example here, but sorting gets into some subtleties that I’d rather reserve for a second example. Popcount is so easy that this whole example can be done in a half hour.

Soundness and completeness: with precision

https://bertrandmeyer.com/2019/04/21/soundness-completeness-precision/ [bertrandmeyer.com]

2019-04-23 02:17

tags:
compsci

> These widely used concepts are sometimes misunderstood. The first answer I get when innocently asking people whether the concepts are clear is yes, of course, everyone knows! Then, as I bring up such examples as credit card rejection or dead code detection, assurance quickly yields to confusion. One sign that things are not going well is when people start throwing in terms like “true positive” and “false negative”. By then any prospect of reaching a clear conclusion has vanished. I hope that after reading this article you will never again (in a program analysis context) be tempted to use them.

source: HN

Parsing: The Solved Problem That Isn't

https://tratt.net/laurie/blog/entries/parsing_the_solved_problem_that_isnt.html [tratt.net]

2019-04-10 15:51

tags:
compsci
text

> One of the things that’s become increasingly obvious to me over the past few years is that the general consensus breaks down for one vital emerging trend: language composition. Composition is one of those long, complicated, but often vague terms that crops up a lot in theoretical work. Fortunately, for our purposes it means something simple: grammar composition, which is where we add one grammar to another and have the combined grammar parse text in the new language (exactly the sort of thing we want to do with Domain Specific Languages (DSLs)). To use a classic example, imagine that we wish to extend a Java-like language with SQL

Keeping CALM: when distributed consistency is easy

https://blog.acolyer.org/2019/03/06/keeping-calm-when-distributed-consistency-is-easy/ [blog.acolyer.org]

2019-03-06 22:26

tags:
compsci
concurrency
database
development
paper
perf

> When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.

> So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.

Also: https://blog.acolyer.org/2019/03/08/a-generalised-solution-to-distributed-consensus/

A Fairly Fast Fibonacci Function

http://www.oranlooney.com/post/fibonacci/ [www.oranlooney.com]

2019-02-22 02:05

tags:
compsci
math
perf
programming

> Fair warning: this is a bit of rabbit hole, with no other purpose than to optimize the hell out something for which there is frankly no practical use. But we get to do a bit of linear algebra and try out some pretty interesting optimization techniques; that’s what I call a good time!

source: HN

A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance

https://arxiv.org/abs/1901.10861 [arxiv.org]

2019-02-12 18:45

tags:
ai
compsci
paper
pdf
security

> The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them “Intriguing properties of neural networks”. Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of {R}^n with the L_0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes.

source: grugq

Modern LZ Compression

https://glinscott.github.io/lz/index.html [glinscott.github.io]

2019-02-03 18:08

tags:
compression
compsci
cxx
programming

> This article walks through the components of a modern LZ compressor.

> LZ compression has been around for a long time, but there are still major improvements being made. The deflate algorithm is perhaps the most widely used, implemented initially in PKZIP, and these days finding broad usage in gzip. Modern incarnations are much more powerful, and use a wide array of new tricks. Two examples of the next generation are zstd and lzma. For a comprehensive overview of open source LZ compressors, check out lzbench.

source: L

It’s Time for Some Queueing Theory

https://kottke.org/19/01/its-time-for-some-queueing-theory [kottke.org]

2019-01-30 22:26

tags:
compsci
links

> Queueing theory is the scientific study of waiting in line. It can apply to familiar lines like those at the grocery store or bank but also to things like web servers, highway traffic, and telecommunications…basically any situation where you have things entering a system, being processed by a system for a certain period of time, and leaving the system.

Assorted stories and links.

source: K

Smoothsort Demystified

http://www.keithschwarz.com/smoothsort/ [www.keithschwarz.com]

2019-01-28 06:52

tags:
compsci
programming
sorting

> A few years ago I heard about an interesting sorting algorithm (invented by the legendary Edsger Dijkstra) called smoothsort with great memory and runtime guarantees. Although it is a comparison sort and thus on average cannot run faster than Ω(n lg n), smoothsort is an adaptive sort, meaning that if the input is somewhat sorted, smoothsort will run in time closer to O(n) than O(n lg n). In the best case, when the input is sorted, smoothsort will run in linear time. Moreover, smoothsort is an in-place sorting algorithm, and requires O(1) auxiliary storage space. Compare this to mergesort, which needs O(n) auxiliary space, or quicksort, which needs O(lg n) space on average and O(n) space in the worst case. In the worst case, smoothsort is an asymptotically optimal O(n lg n) sort. With all these advantages, smoothsort seemed like an excellent sort to code up and add to my archive of interesting code project, and I set out to learn enough about it to build my own implementation.

> Unfortunately, I quickly found out that smoothsort is one of the least-documented sorts on the web. Sure, you can find many sites that mention smoothsort or its runtime characteristics, and in a few cases sites that provide an implementation, but hardly any sites explained the full intuition behind how the sort worked or where the runtime guarantees came from. Moreover, Dijkstra’s original paper on smoothsort is extremely difficult to read and gives little intuition behind some of the trickier optimizations. After spending a few days reading over existing sites and doing a bit of my own work, I finally managed to figure out the intuition behind smoothsort, as well as the source of many of the complex optimizations necessary to get smoothsort working in constant space. It turns out that smoothsort is actually a generalization of heapsort using a novel heap data structure. Surprisingly, I haven’t found this structure mentioned anywhere on the web, and this page may be the first time it’s been mentioned online.

source: HN

Algorithms by Jeff Erickson

http://jeffe.cs.illinois.edu/teaching/algorithms/ [jeffe.cs.illinois.edu]

2019-01-02 21:54

tags:
academia
book
compsci
dupe
reference
release

> This web page contains a free electronic version of my (soon to be) self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998.

> 0th edition (prepublication draft), December 2018

Automating data-only attacks through Block Oriented Programming (BOP)

https://nebelwelt.net/blog/20181231-BOP.html [nebelwelt.net]

2019-01-01 02:44

tags:
compiler
compsci
exploit
programming
security

> With the rise of strong control-flow defenses such as Control-Flow Integrity (CFI), attackers will increasingly resort to data-only attacks that can be equally powerful. Earlier research demonstrated that data-only attacks can be as devastating as control-flow hijacking attacks. So far, constructing data-only attacks was cumbersome and required deep manual analysis. We introduce the idea of Block-Oriented Programming (BOP) where, based on a C-like programming language and the help of constraint solving, we automatically synthesize data-only exploits that run arbitrary payloads on host programs.

MinHashing

https://moultano.wordpress.com/2018/11/08/minhashing-3kbzhsxyg4467-6/ [moultano.wordpress.com]

2018-12-25 03:05

tags:
compsci
hash
paper

> At any rate, you would like to make clusters out of your data, but you only get to look at each item once in isolation. After looking at it you have to decide what cluster it should go to, at that moment, without looking at any other information, or any other items in your dataset. You only get one shot, do not throw it away! How can we accomplish this?