Class: Kiba::Extend::Transforms::Deduplicate::Table
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::Table
- Defined in:
- lib/kiba/extend/transforms/deduplicate/table.rb
Overview
Note:
This transform runs in memory, so for very large sources, it may take a long time or fail. In this case, use a combination of Flag and FilterRows::FieldEqualTo
Given a field on which to deduplicate, removes duplicate rows from table. The first row of each set of rows containing the same value in the given field. Various additional functionality is configurable via the arguments passed to the transform. See examples and #initialize for details.
Tip: Use CombineValues::FromFieldsWithDelimiter or CombineValues::FullRecord to create a combined field on which to deduplicate
Instance Method Summary collapse
-
#close ⇒ Object
-
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) ⇒ Table
constructor
A new instance of Table.
-
#process(row) ⇒ Object
Constructor Details
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) ⇒ Table
Returns a new instance of Table.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 183 def initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) @field = field @deduper = {} @delete = delete_field @example = example_source_field @max_examples = max_examples @ex_target = example_target_field @delim = example_delim @occs = include_occs @occ_target = occs_target_field @compile_uniq_fieldvals = compile_uniq_fieldvals @compile_delim = compile_delim end |
Instance Method Details
#close ⇒ Object
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 213 def close deduper.values.each do |hash| row = hash[:row] add_example_field(row, hash) if example row[occ_target] = hash[:occs] if occs row.delete(field) if delete if compile_uniq_fieldvals row = row.map do |fld, _val| next if fld == field [fld, hash[:fieldvals][fld].join(compile_delim)] end.compact.to_h end yield row end end |
#process(row) ⇒ Object
202 203 204 205 206 207 208 209 210 211 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 202 def process(row) field_val = row.fetch(field, nil) return if field_val.blank? get_row(field_val, row) get_occ(field_val, row) if occs get_example(field_val, row) if example compile_values(field_val, row) if compile_uniq_fieldvals nil end |