Class: Kiba::Extend::Transforms::Deduplicate::Table
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::Table
- Defined in:
- lib/kiba/extend/transforms/deduplicate/table.rb
Overview
Note:
This transform runs in memory, so for very large sources, it may take a long time or fail. In this case, use a combination of Flag and FilterRows::FieldEqualTo
Given a field on which to deduplicate, removes duplicate rows from table. The first row of each set of rows containing the same value in the given field. Various additional functionality is configurable via the arguments passed to the transform. See examples and #initialize for details.
Tip: Use CombineValues::FromFieldsWithDelimiter or CombineValues::FullRecord to create a combined field on which to deduplicate
Instance Method Summary collapse
-
#close ⇒ Object
-
#compiled_row(hash, row) ⇒ Object
-
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) ⇒ Table
constructor
A new instance of Table.
-
#process(row) ⇒ Object
Constructor Details
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) ⇒ Table
Returns a new instance of Table.
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 286 def initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences, compile_uniq_fieldvals: false, compile_delim: Kiba::Extend.delim) @field = field @deduper = {} @delete = delete_field @example = example_source_field @max_examples = max_examples @ex_target = example_target_field @delim = example_delim @occs = include_occs @occ_target = occs_target_field @compile_uniq_fieldvals = compile_uniq_fieldvals @compile_delim = compile_delim end |
Instance Method Details
#close ⇒ Object
316 317 318 319 320 321 322 323 324 325 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 316 def close deduper.each do |_val, hash| row = hash[:row] add_example_field(row, hash) if example row[occ_target] = hash[:occs] if occs row = compiled_row(hash, row) if compile_uniq_fieldvals row.delete(field) if delete yield row end end |
#compiled_row(hash, row) ⇒ Object
327 328 329 330 331 332 333 334 335 336 337 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 327 def compiled_row(hash, row) row.map do |fld, val| if fld == example [fld, nil] elsif [field, ex_target, occ_target].include?(fld) [fld, val] else [fld, hash[:fieldvals][fld].join(compile_delim)] end end.compact.to_h.compact end |
#process(row) ⇒ Object
305 306 307 308 309 310 311 312 313 314 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 305 def process(row) field_val = row.fetch(field, nil) return if field_val.blank? get_row(field_val, row) get_occ(field_val, row) if occs get_example(field_val, row) if example compile_values(field_val, row) if compile_uniq_fieldvals nil end |