This blog is part of our Ruby 2.6 series. Ruby 2.6.0-preview2 was recently released.

Before Ruby 2.6, String#split returned array of splitted strings.

In Ruby 2.6, a block can be passed to String#split which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.

We will add method is_fruit? to understand how to use split with a block.

def is_fruit?(value)
  %w(apple mango banana watermelon grapes guava lychee).include?(value)
end

Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.

String#split
input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"

splitted_values = input_str.split(", ")
=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]

fruits = splitted_values.select { |value| is_fruit?(value) }
=> ["apple", "mango", "banana", "watermelon", "grapes"]

Using split an intermediate array is created which contains both fruits and vegetables names.

String#split with a block
fruits = []

input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"

input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
=> "apple, mango, potato, banana, cabbage, watermelon, grapes"

fruits
=> ["apple", "mango", "banana", "watermelon", "grapes"]

When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.

Update

Benchmark

We created a large random string to benchmark performance of split and split with block

require 'securerandom'

test_string = ''

100_000.times.each do
  test_string += SecureRandom.alphanumeric(10)
  test_string += ' '
end
require 'benchmark'

Benchmark.bmbm do |bench|

  bench.report('split') do
    arr = test_string.split(' ')
    str_starts_with_a = arr.select { |str| str.start_with?('a') }
  end

  bench.report('split with block') do
    str_starts_with_a = []
    test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
  end

end

Results

Rehearsal ----------------------------------------------------
split              0.023764   0.000911   0.024675 (  0.024686)
split with block   0.012892   0.000553   0.013445 (  0.013486)
------------------------------------------- total: 0.038120sec

                       user     system      total        real
split              0.024107   0.000487   0.024594 (  0.024622)
split with block   0.010613   0.000334   0.010947 (  0.010991)

We did another iteration of benchmarking using benchmark/ips.

require 'benchmark/ips'
Benchmark.ips do |bench|


  bench.report('split') do
    splitted_arr = test_string.split(' ')
    str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
  end

  bench.report('split with block') do
    str_starts_with_a = []
    test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
  end

  bench.compare!
end

Results

Warming up --------------------------------------
               split     4.000  i/100ms
    split with block    10.000  i/100ms
Calculating -------------------------------------
               split     46.906  (± 2.1%) i/s -    236.000  in   5.033343s
    split with block    107.301  (± 1.9%) i/s -    540.000  in   5.033614s

Comparison:
    split with block:      107.3 i/s
               split:       46.9 i/s - 2.29x  slower

This benchmark shows that split with block is about 2 times faster than split.

Here is relevant commit and discussion for this change.