Class: Kiba::Extend::Transforms::Deduplicate::FlagAll
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::FlagAll
- Defined in:
- lib/kiba/extend/transforms/deduplicate/flag_all.rb
Overview
Adds a field (specified as in_field
) containing ‘y’ or ‘n’, indicating whether value of on_field
is a duplicate
In contrast with Flag, where the first instance of a value in on_field
is always
marked n
, with FlagAll, all rows containing a duplicate value in on_field
are
marked y
.
Input table:
| foo | bar | combined |
|-----------------------|
| a | b | a b |
| c | d | c d |
| c | e | c e |
| c | d | c d |
Used in pipeline as:
@deduper = {}
transform Deduplicate::FlagAll, on_field: :combined, in_field: :duplicate
Results in:
| foo | bar | combined | duplicate |
|----------------------------------|
| a | b | a b | n |
| c | d | c d | y |
| c | e | c e | n |
| c | d | c d | y |
Instance Method Summary collapse
-
#close ⇒ Object
-
#initialize(on_field:, in_field:, explicit_no: true) ⇒ FlagAll
constructor
A new instance of FlagAll.
-
#process(row) ⇒ Object
Constructor Details
#initialize(on_field:, in_field:, explicit_no: true) ⇒ FlagAll
Returns a new instance of FlagAll.
52 53 54 55 56 57 58 |
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 52 def initialize(on_field:, in_field:, explicit_no: true) @on = on_field @in_field = in_field @deduper = {} @no_val = explicit_no ? "n" : "" @rows = [] end |
Instance Method Details
#close ⇒ Object
68 69 70 71 72 73 74 |
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 68 def close @rows.each do |row| val = row[on] row[in_field] = (deduper[val] > 1) ? "y" : no_val yield row end end |
#process(row) ⇒ Object
61 62 63 64 65 66 |
# File 'lib/kiba/extend/transforms/deduplicate/flag_all.rb', line 61 def process(row) val = row[on] deduper.key?(val) ? deduper[val] += 1 : deduper[val] = 1 rows << row nil end |