Class: Kiba::Extend::Transforms::Deduplicate::GroupedFieldValues

Inherits:
Object
  • Object
show all
Includes:
SepDeprecatable
Defined in:
lib/kiba/extend/transforms/deduplicate/grouped_field_values.rb

Overview

Note:

Tread with caution, as this has not been used much and is not extensively tested

Field value deduplication that is at least semi-safe for use with grouped fields that expect the same number of values for each field in the grouping

Examples:

Basic usage/defaults

# Used in pipeline as:
# transform Deduplicate::GroupedFieldValues,
#   on_field: :name,
#   grouped_fields: %i[work role],
#   delim: ';'
xform = Deduplicate::GroupedFieldValues.new(
  on_field: :name,
  grouped_fields: %i[work role],
  delim: ';'
)
input = [
  # empty/delim-only values in :on_field
  {name: ';',
   work: ';',
   role: 'auth;ed'},
  # nil value in :on_field
  {name: nil,
   work: 'auth;ed',
   role: ';'},
  # nil value in other field
  {name: 'Jan;Jan',
   work: nil,
   role: 'auth;ed'},
  # role has empty value for Jan
  {name: 'Bob;Jan;Bob',
   work: ';',
   role: 'auth;;ctb'},
  # work is empty string value; role has only 2 values
  {name: 'Cam;Jan;Cam',
   work: '',
   role: 'auth;ed'},
  # lots of values, multiple duplicates
  {name: 'Fred;Jan;Fred;Bob;Fred;Bob',
   work: 'Rpt;Bk;Paper;Bk;Pres;Bk',
   role: 'auth;photog;ed;ill;auth;ed.'},
  # single value
  {name: 'Martha',
   work: 'Bk',
   role: 'ctb'}
]
result = input.map{ |row| xform.process(row) }
expected = [
  # empty string values returned as nil values
  {name: nil,
   work: nil,
   role: 'auth'},
  # no processing possible, row passed through
  {name: nil,
   work: 'auth;ed',
   role: ';'},
  # nil values not processed
  {name: 'Jan',
   work: nil,
   role: 'auth'},
  # empty string values to be concatenated are treated as such
  {name: 'Bob;Jan',
   work: nil,
   role: 'auth;'},
  # empty string -> nil, role not having a 3rd value to delete does
  #   not cause failure or weirdness
  {name: 'Cam;Jan',
   work: nil,
   role: 'auth;ed'},
  # keeps first value associated with each name
  {name: 'Fred;Jan;Bob',
   work: 'Rpt;Bk;Bk',
   role: 'auth;photog;ill'},
  # passes row through; nothing to deduplicate
  {name: 'Martha',
   work: 'Bk',
   role: 'ctb'}
]
expect(result).to eq(expected)

Case insensitive deduplication

xform = Deduplicate::GroupedFieldValues.new(
  on_field: :name,
  grouped_fields: %i[work role],
  delim: ';',
  ignore_case: true
)
input = [
  {name: 'Jan;jan',
   work: nil,
   role: 'auth;ed'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan',
   work: nil,
   role: 'auth'},
]
expect(result).to eq(expected)

Normalized deduplication

xform = Deduplicate::GroupedFieldValues.new(
  on_field: :role,
  grouped_fields: %i[name],
  delim: ';',
  normalized: true
)
input = [
  {name: 'Jan;Bob;Sam;Pat;Hops',
   role: 'auth./ill.;auth, ill;ed;ed.;Ed.'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan;Sam;Hops',
   role: 'auth./ill.;ed;Ed.'},
]
expect(result).to eq(expected)

Instance Method Summary collapse

Methods included from SepDeprecatable

#usedelim

Constructor Details

#initialize(on_field:, sep: nil, delim: nil, grouped_fields: [], ignore_case: false, normalized: false) ⇒ GroupedFieldValues

Returns a new instance of GroupedFieldValues.

Parameters:

  • on_field (Symbol)

    the field we deduplicating (comparing, and initially removing values from

  • sep (nil, String) (defaults to: nil)

    DEPRECATED do not use in new transforms

  • delim (nil, String) (defaults to: nil)

    used to split/join multivalued field values

  • grouped_fields (Array<Symbol>) (defaults to: [])

    other field(s) in the same multi-field grouping as field. Values will be removed from these fields positionally, if the corresponding value was removed from field

  • ignore_case (Boolean) (defaults to: false)
  • normalized (Boolean) (defaults to: false)

    if true, will apply Utils::StringNormalizer with arguments: mode: :plain, downcased: false to values for comparison



139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/kiba/extend/transforms/deduplicate/grouped_field_values.rb', line 139

def initialize(on_field:, sep: nil, delim: nil, grouped_fields: [],
  ignore_case: false, normalized: false)
  @field = on_field
  @other = grouped_fields
  @delim = usedelim(sepval: sep, delimval: delim, calledby: self)
  @getter = Kiba::Extend::Transforms::Helpers::FieldValueGetter.new(
    fields: grouped_fields,
    discard: %i[nil]
  )
  @ignore_case = ignore_case
  if normalized
    @normalizer = Utils::StringNormalizer.new(downcased: false)
  end
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/kiba/extend/transforms/deduplicate/grouped_field_values.rb', line 155

def process(row)
  val = row[field]
  return row if val.blank?

  vals = comparable_values(row)

  to_delete = deletable_elements(vals)
  return row if to_delete.empty?

  do_deletes(row, to_delete)

  row
end