Class: Kiba::Extend::Transforms::Clean::RegexpFindReplaceFieldVals

Inherits:
Object
  • Object
show all
Includes:
Allable, MultivalPlusDelimDeprecatable, SepDeprecatable
Defined in:
lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb

Overview

Performs specified regular expression find/replace in the specified field(s)

Examples:

Basic match(default with find passed as String)

# Used in pipeline as:
# transform Clean::RegexpFindReplaceFieldVals,
#   fields: :val,
#   find: 'xx+',
#   replace: 'exes'
xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'xx+',
  replace: 'exes'
)
input = [
  {val: 'xxxxxx a thing'},
  {val: 'thing xxxx 123'},
  {val: 'x files'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'exes a thing'},
  {val: 'thing exes 123'},
  {val: 'x files'}
]
expect(result).to eq(expected)

Handles start/end anchors, find passed as Regexp

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: /^xx+/,
  replace: 'exes'
)
input = [
  {val: 'xxxxxx a thing'},
  {val: 'thing xxxx 123'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'exes a thing'},
  {val: 'thing xxxx 123'}
]
expect(result).to eq(expected)

Case insensitive

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'thing',
  replace: 'object',
  casesensitive: false
)
input = [
  {val: 'the thing'},
  {val: 'The Thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'the object'},
  {val: 'The object'}
]
expect(result).to eq(expected)

Case insensitive regexp

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: /thing/i,
  replace: 'object'
)
input = [
  {val: 'the thing'},
  {val: 'The Thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'the object'},
  {val: 'The object'}
]
expect(result).to eq(expected)

Matching/replacing line breaks (note double quotes)

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: "\n",
  replace: ''
)
s1 = <<~STR

       pace/mcgill
     STR
s2 = <<~STR
       pace/mcgill

     STR
input = [
  {val: s1},
  {val: s2},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'pace/mcgill'},
  {val: 'pace/mcgill'}
]
expect(result).to eq(expected)

With capture groups

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: '^(a) (thing)',
  replace: 'about \1 curious \2'
)
input = [
  {val: 'a thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'about a curious thing'},
]
expect(result).to eq(expected)

When result is empty string

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'xx+',
  replace: ''
)
input = [
  {val: nil},
  {val: []},
  {val: ''},
  {val: 'xxxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: nil},
  {val: []},
  {val: nil},
  {val: nil}
]
expect(result).to eq(expected)

With multiple fields

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: %i[val another],
  find: 'xx+',
  replace: ''
)
input = [
  {val: 'xxxx1', another: 'xxxx2xxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: '1', another: '2'}
]
expect(result).to eq(expected)

With fields: :all

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :all,
  find: 'xx+',
  replace: ''
)
input = [
  {val: 'xxxx1', another: 'xxxx2xxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: '1', another: '2'}
]
expect(result).to eq(expected)

With debug: true

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  debug: true
)
input = [
  {val: 'bats|bats'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'bats|bats', val_repl: 'bats|bat'}
]
expect(result).to eq(expected)

With multival: true and :sep

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  multival: true,
  sep: ';'
)
input = [
  {val: 'bats;bats'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'bat;bat'}
]
expect(result).to eq(expected)

With multival: true and no :sep

Kiba::Extend.config.delim = '|'
xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  multival: true
)
input = [
  {val: 'bats|bats'}
]
result = input.map{ |row| xform.process(row) }
Kiba::Extend.reset_config
expected = [
  {val: 'bat|bat'}
]
expect(result).to eq(expected)

With no multival param and delim

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  delim: "|"
)
input = [
  {val: 'bats|bats'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'bat|bat'}
]
expect(result).to eq(expected)

Instance Method Summary collapse

Methods included from SepDeprecatable

#usedelim

Methods included from MultivalPlusDelimDeprecatable

#set_multival

Constructor Details

#initialize(fields:, find:, replace:, casesensitive: true, multival: omitted = true, sep: nil, delim: nil, debug: false) ⇒ RegexpFindReplaceFieldVals

Returns a new instance of RegexpFindReplaceFieldVals.

Parameters:

  • fields (Array<Symbol>, Symbol, nil)

    in which to find/replace

  • find (String, Regexp)

    If passing a string, make sure to use double quotes to match slash escaped characters (\n, etc)

  • replace (String)
  • casesensitive (Boolean) (defaults to: true)
  • multival (Boolean) (defaults to: omitted = true)

    DEPRECATED - Do not use

  • sep (String, nil) (defaults to: nil)

    DEPRECATED - Do not use

  • delim (nil, String) (defaults to: nil)

    used to split the field value before performing find/replace if non-nil

  • debug (Boolean) (defaults to: false)

    if true, will put replacement value in a new field. New field name is same as old field name, with “_repl” suffix added



250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb', line 250

def initialize(fields:, find:, replace:, casesensitive: true,
  multival: omitted = true, sep: nil, delim: nil,
  debug: false)
  @fields = [fields].flatten
  @find = build_pattern(find, casesensitive)
  @replace = replace
  @debug = debug
  @mv = if omitted && delim
    true
  else
    set_multival(multival, omitted, self)
  end

  if sep.nil? && delim.nil? && mv && !omitted
    msg = "If you are expecting Kiba::Extend.delim to be used as "\
      "default `sep` value, please pass it as explicit `delim` "\
      "argument. In a future release of kiba-extend, the `delim` "\
      "value will no longer default to Kiba::Extend.delim."
    warn("#{Kiba::Extend.warning_label}:\n  #{self.class}: #{msg}")
    sep = Kiba::Extend.delim
  end
  @delim = usedelim(sepval: sep, delimval: delim, calledby: self,
    default: nil)
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


276
277
278
279
280
281
282
283
284
285
286
287
288
289
# File 'lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb', line 276

def process(row)
  finalize_fields(row)

  fields.each do |field|
    oldval = row.fetch(field, nil)
    next if oldval.nil?
    next unless oldval.is_a?(String)

    newval = mv ? mv_find_replace(oldval) : sv_find_replace(oldval)
    target = debug ? :"#{field}_repl" : field
    row[target] = newval.blank? ? nil : newval
  end
  row
end