Class: Kiba::Extend::Transforms::Deduplicate::Table
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::Table
- Defined in:
- lib/kiba/extend/transforms/deduplicate/table.rb
Overview
Note:
This transform runs in memory, so for very large sources, it may take a long time or fail. In this case, use a combination of Flag and FilterRows::FieldEqualTo
Given a field on which to deduplicate, removes duplicate rows from table
Keeps the row with the first instance of the value in the deduplicating field
Tip: Use CombineValues::FromFieldsWithDelimiter or CombineValues::FullRecord to create a combined field on which to deduplicate
Input table:
| foo | bar | baz | combined |
|-----------------------------|
| a | b | f | a b |
| c | d | g | c d |
| c | e | h | c e |
| c | d | i | c d |
Used in pipeline as:
transform Deduplicate::Table, field: :combined, delete_field: true
Results in:
| foo | bar | baz |
|-----------------|
| a | b | f |
| c | d | g |
| c | e | h |
Instance Method Summary collapse
-
#close ⇒ Object
-
#initialize(field:, delete_field: false) ⇒ Table
constructor
A new instance of Table.
-
#process(row) ⇒ Object
Constructor Details
#initialize(field:, delete_field: false) ⇒ Table
Returns a new instance of Table.
50 51 52 53 54 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 50 def initialize(field:, delete_field: false) @field = field @deduper = {} @delete = delete_field end |
Instance Method Details
#close ⇒ Object
66 67 68 69 70 71 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 66 def close @deduper.values.each do |row| row.delete(@field) if @delete yield row end end |
#process(row) ⇒ Object
57 58 59 60 61 62 63 64 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 57 def process(row) field_val = row.fetch(@field, nil) return if field_val.blank? return if @deduper.key?(field_val) @deduper[field_val] = row nil end |