Class: Kiba::Extend::Transforms::Deduplicate::FieldGroup

Inherits:
Object
  • Object
show all
Includes:
Normalizable
Defined in:
lib/kiba/extend/transforms/deduplicate/field_group.rb

Overview

Note:

Tread with caution, as this has not been used much and is not extensively tested.

Field value deduplication that is at least semi-safe for use with grouped fields that expect the same number of values for each field in the grouping

Examples:

Basic usage/defaults

# Used in pipeline as:
# transform Deduplicate::FieldGroup,
#   grouped_fields: %i[name work role],
#   delim: ';'
xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name work role],
  delim: ';'
)
input = [
  # nothing in group
  {name: nil,
   work: nil,
   role: nil},
  # single group
  {name: "Sue",
   work: "Bk",
   role: "auth"},
  # nil grouped field
  {name: "Sue;Sue;Sue",
   work: nil,
   role: "auth;ed;auth"},
  # nil value in other field
  {name: "Sue;Jill;Joan;Jill",
   work: "Bk;;Bk;",
   role: "auth;auth;ed;auth"},
  # work is empty string value; role has only 2 values
  {name: "Cam;Jan;Cam",
   work: "",
   role: "auth;ed"},
  # lots of values, multiple duplicates
  {name: "Fred;Jan;Fred;Bob;Fred;Bob",
   work: "Rpt;Bk;Paper;Bk;Rpt;Bk",
   role: "auth;photog;ed;ill;auth;ed."}
]
result = input.map{ |row| xform.process(row) }
expected = [
  # nothing in group
  {name: nil,
   work: nil,
   role: nil},
  # single group
  {name: "Sue",
   work: "Bk",
   role: "auth"},
  # nil grouped field
  {name: "Sue;Sue",
   work: nil,
   role: "auth;ed"},
  # nil value in other field
  {name: "Sue;Jill;Joan",
   work: "Bk;;Bk",
   role: "auth;auth;ed"},
  # work is empty string value; role has only 2 values
  {name: "Cam;Jan;Cam",
   work: "",
   role: "auth;ed"},
  # lots of values, multiple duplicates
  {name: "Fred;Jan;Fred;Bob;Bob",
   work: "Rpt;Bk;Paper;Bk;Bk",
   role: "auth;photog;ed;ill;ed."}
]
expect(result).to eq(expected)

Case insensitive deduplication

xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name role],
  delim: ';',
  ignore_case: true
)
input = [
  {name: 'Jan;jan',
   role: 'auth;Auth'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan',
   role: 'auth'},
]
expect(result).to eq(expected)

Normalized deduplication

xform = Deduplicate::FieldGroup.new(
  grouped_fields: %i[name role],
  delim: ';',
  normalize: {xforms: %i[to_ascii nonword]}
)
input = [
  {name: 'Jan;Jan.;Sam;Sam?;Hops',
   role: 'auth./ill.;auth, ill;ed;ed.;Ed.'},
]
result = input.map{ |row| xform.process(row) }
expected = [
  {name: 'Jan;Sam;Hops',
   role: 'auth./ill.;ed;Ed.'},
]
expect(result).to eq(expected)

Since:

  • 5.1.0

Instance Method Summary collapse

Methods included from Normalizable

#get_norm_args, #prepare_normalizer

Constructor Details

#initialize(grouped_fields: [], delim: Kiba::Extend.delim, ignore_case: false, normalize: nil) ⇒ FieldGroup

Returns a new instance of FieldGroup.

Parameters:

  • grouped_fields (Array<Symbol>) (defaults to: [])

    fields in the multi-field grouping to be deduplicated.

  • delim (nil, String) (defaults to: Kiba::Extend.delim)

    used to split/join multivalued field values

  • ignore_case (Boolean) (defaults to: false)
  • normalize (nilValue, Hash) (defaults to: nil)

    pass the desired Utils::StringNormalizer keyword arguments in as a Hash to normalize values before deduplication

Since:

  • 5.1.0



121
122
123
124
125
126
127
128
129
# File 'lib/kiba/extend/transforms/deduplicate/field_group.rb', line 121

def initialize(grouped_fields: [], delim: Kiba::Extend.delim,
  ignore_case: false, normalize: nil)
  @fields = grouped_fields
  @delim = delim
  @getter = Kiba::Extend::Transforms::Helpers::FieldValueGetter.new(
    fields: grouped_fields
  )
  @normalizer = prepare_normalizer(ignore_case, normalize)
end

Instance Method Details

#process(row) ⇒ Object

Parameters:

  • row (Hash{ Symbol => String, nil })

Since:

  • 5.1.0



132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/kiba/extend/transforms/deduplicate/field_group.rb', line 132

def process(row)
  vals = getter.call(row)
  return row if vals.empty?
  return row if vals.values.none? { |v| v.match?(delim) }

  vals.transform_values! do |v|
    v.split(delim).map { |v| v.empty? ? nil : v }
  end

  keep = indexes_to_keep(vals)
  deduplicate(vals, keep).each do |field, vals|
    row[field] = vals.join(delim)
  end

  row
end