Class: Kiba::Extend::Transforms::Deduplicate::Table
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::Table
- Defined in:
- lib/kiba/extend/transforms/deduplicate/table.rb
Overview
Note:
This transform runs in memory, so for very large sources, it may take a long time or fail. In this case, use a combination of Flag and FilterRows::FieldEqualTo
Given a field on which to deduplicate, removes duplicate rows from table
Keeps the row with the first instance of the value in the deduplicating field
Tip: Use CombineValues::FromFieldsWithDelimiter or CombineValues::FullRecord to create a combined field on which to deduplicate
Input table:
| foo | bar | baz | combined |
|-----------------------------|
| a | b | f | a b |
| c | d | g | c d |
| c | e | h | c e |
| c | d | i | c d |
| c | d | j | c d |
Used in pipeline as:
transform Deduplicate::Table, field: :combined, delete_field: true
Results in:
| foo | bar | baz |
|-----------------|
| a | b | f |
| c | d | g |
| c | e | h |
Used in pipeline as:
transform Deduplicate::Table, field: :combined, delete_field: true,
example_source_field: :baz, max_examples: 2,
example_target_field: :ex, example_delim: ";"
Results in:
| foo | bar | baz | ex |
|-----------------|----|
| a | b | f | f |
| c | d | g | g;i|
| c | e | h | h |
Used in pipeline as:
transform Deduplicate::Table, field: :combined, delete_field: true,
example_source_field: :baz, max_examples: 2,
example_target_field: :ex, example_delim: ";", include_occs: true
Results in:
| foo | bar | baz | ex | occurrences |
|-----------------|----|-------------|
| a | b | f | f | 1 |
| c | d | g | g;i| 3 |
| c | e | h | h | 1 |
Instance Method Summary collapse
-
#close ⇒ Object
-
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences) ⇒ Table
constructor
A new instance of Table.
-
#process(row) ⇒ Object
Constructor Details
#initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences) ⇒ Table
Returns a new instance of Table.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 104 def initialize(field:, delete_field: false, example_source_field: nil, max_examples: 10, example_target_field: :examples, example_delim: Kiba::Extend.delim, include_occs: false, occs_target_field: :occurrences) @field = field @deduper = {} @delete = delete_field @example = example_source_field @max_examples = max_examples @ex_target = example_target_field @delim = example_delim @occs = include_occs @occ_target = occs_target_field end |
Instance Method Details
#close ⇒ Object
130 131 132 133 134 135 136 137 138 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 130 def close deduper.values.each do |hash| row = hash[:row] add_example_field(row, hash) if example row[occ_target] = hash[:occs] if occs row.delete(field) if delete yield row end end |
#process(row) ⇒ Object
120 121 122 123 124 125 126 127 128 |
# File 'lib/kiba/extend/transforms/deduplicate/table.rb', line 120 def process(row) field_val = row.fetch(field, nil) return if field_val.blank? get_row(field_val, row) get_occ(field_val, row) if occs get_example(field_val, row) if example nil end |