RNG.md

Use yes to reproduce flaky tests

I use yes to saturate CPUs and reproduce flaky tests failures. Pair it with a small loop helper to rerun the test until it fails.

SlateDB occasionally has a flaky test failure–one that occurs randomly. Such failures are usually time dependent. They usually crop up in a GitHub action failure. GitHub’s runners are notoriously unstable; they are usually overloaded and you have noisy neighbors.

I find that I need to saturate my laptop’s CPU to replicate the flaky test failure. To do so, I use the yes command.

NAME
     yes – be repetitively affirmative

SYNOPSIS
     yes [expletive]

DESCRIPTION
     The yes utility outputs expletive, or, by default, “y”, forever.

SEE ALSO
     jot(1), seq(1)

HISTORY
     The yes command appeared in Version 7 AT&T UNIX.

I found the trick on StackOverflow a while back and have been using it ever since. In fact, I’ve added a saturate script to my .zshrc:

# With an argument, spawn N background `yes` processes writing to /dev/null.
# With no argument, print the number of running `yes` processes.
#
# Usage:
#   saturate       # print count
#   saturate 20    # spawn 20 processes
saturate() {
  if [[ $# -eq 0 ]]; then
    pgrep -x yes | wc -l
    return
  fi

  local n="$1"
  local i

  for ((i = 0; i < n; i++)); do
    yes > /dev/null &
  done
}

The comment above is pretty self-explanatory. I usually use saturate 40. Once I’m done, I do pkill -9 yes.

This command pairs nicely with a loop command that I have in .zshrc:

# Little loop helper function
# Call loop <command> to run in a loop until a non-zero exit is returned.
loop() {
  local count=1
  while true; do
    echo "loop_iter(#$count)"
    "$@" || break
    count=$((count+1))
  done
}

This allows me to run a test in a loop until it fails. For example, I can do loop cargo test --test flaky_test. All together, the commands look like:

saturate 40
loop cargo test --test flaky_test
pkill -9 yes

Related posts