Class: Kiba::Extend::Transforms::Clean::RegexpFindReplaceFieldVals

Inherits:
Object
  • Object
show all
Includes:
Allable
Defined in:
lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb

Overview

Performs specified regular expression find/replace in the specified field(s)

Examples:

Basic match(default with find passed as String)

# Used in pipeline as:
# transform Clean::RegexpFindReplaceFieldVals,
#   fields: :val,
#   find: 'xx+',
#   replace: 'exes'
xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'xx+',
  replace: 'exes'
)
input = [
  {val: 'xxxxxx a thing'},
  {val: 'thing xxxx 123'},
  {val: 'x files'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'exes a thing'},
  {val: 'thing exes 123'},
  {val: 'x files'}
]
expect(result).to eq(expected)

Handles start/end anchors, find passed as Regexp

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: /^xx+/,
  replace: 'exes'
)
input = [
  {val: 'xxxxxx a thing'},
  {val: 'thing xxxx 123'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'exes a thing'},
  {val: 'thing xxxx 123'}
]
expect(result).to eq(expected)

Case insensitive

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'thing',
  replace: 'object',
  casesensitive: false
)
input = [
  {val: 'the thing'},
  {val: 'The Thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'the object'},
  {val: 'The object'}
]
expect(result).to eq(expected)

Case insensitive regexp

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: /thing/i,
  replace: 'object'
)
input = [
  {val: 'the thing'},
  {val: 'The Thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'the object'},
  {val: 'The object'}
]
expect(result).to eq(expected)

Matching/replacing line breaks (note double quotes)

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: "\n",
  replace: ''
)
s1 = <<~STR

       pace/mcgill
     STR
s2 = <<~STR
       pace/mcgill

     STR
input = [
  {val: s1},
  {val: s2},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'pace/mcgill'},
  {val: 'pace/mcgill'}
]
expect(result).to eq(expected)

With capture groups

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: '^(a) (thing)',
  replace: 'about \1 curious \2'
)
input = [
  {val: 'a thing'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'about a curious thing'},
]
expect(result).to eq(expected)

When result is empty string

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 'xx+',
  replace: ''
)
input = [
  {val: nil},
  {val: []},
  {val: ''},
  {val: 'xxxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: nil},
  {val: []},
  {val: nil},
  {val: nil}
]
expect(result).to eq(expected)

With multiple fields

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: %i[val another],
  find: 'xx+',
  replace: ''
)
input = [
  {val: 'xxxx1', another: 'xxxx2xxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: '1', another: '2'}
]
expect(result).to eq(expected)

With fields: :all

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :all,
  find: 'xx+',
  replace: ''
)
input = [
  {val: 'xxxx1', another: 'xxxx2xxxx'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: '1', another: '2'}
]
expect(result).to eq(expected)

With debug: true

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  debug: true
)
input = [
  {val: 'bats|bats'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'bats|bats', val_repl: 'bats|bat'}
]
expect(result).to eq(expected)

With multival: true and :sep

xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  multival: true,
  sep: ';'
)
input = [
  {val: 'bats;bats'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  {val: 'bat;bat'}
]
expect(result).to eq(expected)

With multival: true and no :sep

Kiba::Extend.config.delim = '|'
xform = Clean::RegexpFindReplaceFieldVals.new(
  fields: :val,
  find: 's$',
  replace: '',
  multival: true
)
input = [
  {val: 'bats|bats'}
]
result = input.map{ |row| xform.process(row) }
Kiba::Extend.reset_config
expected = [
  {val: 'bat|bat'}
]
expect(result).to eq(expected)

Instance Method Summary collapse

Constructor Details

#initialize(fields:, find:, replace:, casesensitive: true, multival: false, sep: nil, debug: false) ⇒ RegexpFindReplaceFieldVals

Returns a new instance of RegexpFindReplaceFieldVals.

Parameters:

  • fields (Array<Symbol>, Symbol, nil)

    in which to find/replace

  • find (String, Regexp)

    If passing a string, make sure to use double quotes to match slash escaped characters (\n, etc)

  • replace (String)
  • casesensitive (Boolean) (defaults to: true)
  • multival (Boolean) (defaults to: false)
  • sep (String, nil) (defaults to: nil)

    required if multival: true; if not given, will default to Kiba::Extend.delim value

  • debug (Boolean) (defaults to: false)

    if true, will put replacement value in a new field. New field name is same as old field name, with “_repl” suffix added



232
233
234
235
236
237
238
239
240
# File 'lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb', line 232

def initialize(fields:, find:, replace:, casesensitive: true,
  multival: false, sep: nil, debug: false)
  @fields = [fields].flatten
  @find = build_pattern(find, casesensitive)
  @replace = replace
  @debug = debug
  @mv = multival
  @sep = set_sep(sep)
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


243
244
245
246
247
248
249
250
251
252
253
254
255
256
# File 'lib/kiba/extend/transforms/clean/regexp_find_replace_field_vals.rb', line 243

def process(row)
  finalize_fields(row)

  fields.each do |field|
    oldval = row.fetch(field, nil)
    next if oldval.nil?
    next unless oldval.is_a?(String)

    newval = mv ? mv_find_replace(oldval) : sv_find_replace(oldval)
    target = debug ? "#{field}_repl".to_sym : field
    row[target] = newval.blank? ? nil : newval
  end
  row
end