Class: Kiba::Extend::Transforms::Deduplicate::FieldGroup

Inherits:
Object
  • Object
show all
Defined in:
lib/kiba/extend/transforms/deduplicate/field_group.rb

Overview

Note:

Tread with caution, as this has not been used much and is not extensively tested.

Field value deduplication that is at least semi-safe for use with grouped fields that expect the same number of values for each field in the grouping

Examples:

Basic usage/defaults

# Used in pipeline as:
# transform Deduplicate::FieldGroup,
#   grouped_fields: %i[name work role],
#   delim: ';'
xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name work role],
  delim: ';'
)
input = [
  # nothing in group
  {name: nil,
   work: nil,
   role: nil},
  # single group
  {name: "Sue",
   work: "Bk",
   role: "auth"},
  # nil grouped field
  {name: "Sue;Sue;Sue",
   work: nil,
   role: "auth;ed;auth"},
  # nil value in other field
  {name: "Sue;Jill;Joan;Jill",
   work: "Bk;;Bk;",
   role: "auth;auth;ed;auth"},
  # work is empty string value; role has only 2 values
  {name: "Cam;Jan;Cam",
   work: "",
   role: "auth;ed"},
  # lots of values, multiple duplicates
  {name: "Fred;Jan;Fred;Bob;Fred;Bob",
   work: "Rpt;Bk;Paper;Bk;Rpt;Bk",
   role: "auth;photog;ed;ill;auth;ed."}
]
result = input.map{ |row| xform.process(row) }
expected = [
  # nothing in group
  {name: nil,
   work: nil,
   role: nil},
  # single group
  {name: "Sue",
   work: "Bk",
   role: "auth"},
  # nil grouped field
  {name: "Sue;Sue",
   work: nil,
   role: "auth;ed"},
  # nil value in other field
  {name: "Sue;Jill;Joan",
   work: "Bk;;Bk",
   role: "auth;auth;ed"},
  # work is empty string value; role has only 2 values
  {name: "Cam;Jan;Cam",
   work: "",
   role: "auth;ed"},
  # lots of values, multiple duplicates
  {name: "Fred;Jan;Fred;Bob;Bob",
   work: "Rpt;Bk;Paper;Bk;Bk",
   role: "auth;photog;ed;ill;ed."}
]
expect(result).to eq(expected)

Case insensitive deduplication

xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name role],
  delim: ';',
  ignore_case: true
)
input = [
  {name: 'Jan;jan',
   role: 'auth;Auth'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan',
   role: 'auth'},
]
expect(result).to eq(expected)

Normalized deduplication

xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name role],
  delim: ';',
  normalized: true
)
input = [
  {name: 'Jan;Jan.;Sam;Sam?;Hops',
   role: 'auth./ill.;auth, ill;ed;ed.;Ed.'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan;Sam;Hops',
   role: 'auth./ill.;ed;Ed.'},
]
expect(result).to eq(expected)

Instance Method Summary collapse

Constructor Details

#initialize(grouped_fields: [], delim: Kiba::Extend.delim, ignore_case: false, normalized: false) ⇒ FieldGroup

Returns a new instance of FieldGroup.

Parameters:

  • grouped_fields (Array<Symbol>) (defaults to: [])

    fields in the multi-field grouping to be deduplicated.

  • delim (nil, String) (defaults to: Kiba::Extend.delim)

    used to split/join multivalued field values

  • ignore_case (Boolean) (defaults to: false)
  • normalized (Boolean) (defaults to: false)

    if true, will apply Utils::StringNormalizer with arguments: mode: :plain, downcased: false to values for comparison



118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/kiba/extend/transforms/deduplicate/field_group.rb', line 118

def initialize(grouped_fields: [], delim: Kiba::Extend.delim,
  ignore_case: false, normalized: false)
  @fields = grouped_fields
  @delim = delim
  @getter = Kiba::Extend::Transforms::Helpers::FieldValueGetter.new(
    fields: grouped_fields
  )
  @ignore_case = ignore_case
  if normalized
    @normalizer = Utils::StringNormalizer.new(downcased: false)
  end
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })


132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/kiba/extend/transforms/deduplicate/field_group.rb', line 132

def process(row)
  vals = getter.call(row)
  return row if vals.empty?
  return row if vals.values.none? { |v| v.match?(delim) }

  vals.transform_values! do |v|
    v.split(delim).map { |v| v.empty? ? nil : v }
  end

  keep = indexes_to_keep(vals)
  deduplicate(vals, keep).each do |field, vals|
    row[field] = vals.join(delim)
  end

  row
end