Class: Kiba::Extend::Transforms::Deduplicate::Flag
- Inherits:
-
Object
- Object
- Kiba::Extend::Transforms::Deduplicate::Flag
- Defined in:
- lib/kiba/extend/transforms/deduplicate/flag.rb
Overview
Adds a field (in_field
) containing ‘y’ or ‘n’, indicating whether value of on_field
is a duplicate
The first instance of a value in on_field
is always marked n
. Subsequent rows containing the same
value will be marked ‘y’
Use this transform if you need to retain/report on what will be treated as a duplicate. Use FilterRows::FieldEqualTo to extract only the duplicate rows and/or to keep only the non-duplicate rows.
Use FlagAll if you need all rows containing duplicate values flagged y
.
To delete duplicates all in one step, use Table
Input table:
| foo | bar | combined |
|-----------------------|
| a | b | a b |
| c | d | c d |
| c | e | c e |
| c | d | c d |
Used in pipeline as:
@deduper = {}
transform Deduplicate::Flag, on_field: :combined, in_field: :duplicate, using: @deduper
Results in:
| foo | bar | combined | duplicate |
|----------------------------------|
| a | b | a b | n |
| c | d | c d | n |
| c | e | c e | n |
| c | d | c d | y |
Defined Under Namespace
Classes: NoUsingValueError
Instance Method Summary collapse
-
#initialize(on_field:, in_field:, using:, explicit_no: true) ⇒ Flag
constructor
A new instance of Flag.
-
#process(row) ⇒ Object
Constructor Details
#initialize(on_field:, in_field:, using:, explicit_no: true) ⇒ Flag
Returns a new instance of Flag.
59 60 61 62 63 64 65 66 67 68 |
# File 'lib/kiba/extend/transforms/deduplicate/flag.rb', line 59 def initialize(on_field:, in_field:, using:, explicit_no: true) @on = on_field @in_field = in_field @using = using unless @using raise NoUsingValueError, "#{self.class.name} `using` hash does not exist" end @no_val = explicit_no ? "n" : "" end |
Instance Method Details
#process(row) ⇒ Object
71 72 73 74 75 76 77 78 79 80 |
# File 'lib/kiba/extend/transforms/deduplicate/flag.rb', line 71 def process(row) val = row.fetch(on) if using.key?(val) row[in_field] = "y" else using[val] = nil row[in_field] = no_val end row end |