It may happen that you build up duplicate records in an Alteryx work flow. Whether this is due to different aggregations, transformations or joins/unions, at a certain moment you may want to work only with unique records. Then it is important to separate the unique records from the duplicates. This is best done with the Alteryx Unique Tool.
Examples of use cases
- Only the first transaction of a customer is important, out of a long list of transactions.
- You have done one or more appends, and now you want to reduce some redundancy there.
- You want to check whether the list of customers or invoices contains duplicates.
The Unique Tool, an overview
The Unique Tool...
- has 1 input anchor and 2 output anchors: U (unique) and D (duplicate).
- is case-sensitive.
- scans through the data from top to bottom.
- in case of duplication, labels the first record as unique and all subsequent identical records as duplicate.
Based on the last 3 points above, you would do well to consider combining the Unique Tool with the Data Cleansing and/or Sort. Especially when the data has already seen some tools.
An example use case
For this I use the dataset from the example work flow of the Alteryx Unique Tool. Here it's about a customer list. I want to know whether customers appear more than once in the list. We are talking about 96 records in total.
I now drag the Unique Tool into my work flow and connect it to the Data Input Tool. In the configuration window, we see a list of fields to select. Let's first check for duplicates on the combination of FirstName and LastName. I select those two fields and run the work flow.
Now it appears that 6 records are duplicates (based on the combination of first name and last name.) This requires further investigation; are we talking about duplicates here, or are there customers with the same name? When we run the flow again with Address in the combination, 5 records remain as duplicates. It has now become very likely that 5 people are registered as customers more than once.
Points of attention
Although the Unique Tool is a very user-friendly tool, there are some snags. It is important to name them. If the selected columns contain impurities or inconsistencies (I already mentioned case-sensitivity), it is essential to clean them up first. This can be done, for example, with a Data Cleansing Tool.
In addition, it is important to assess the extent to which the order of the records is important. In the case of the first use case from the list at the top, for example (the first transaction per customer), sorting by date is recommended.
Separating unique records and duplicates in Alteryx with the Unique Tool, that's how you do it. Hopefully these tips have helped you. Check out our other blogs on Alteryx Tools. Need more help or explanation? Don't hesitate to contact us for our workshops and training or hire a consultant.