{smcl} {* 13dec2004}{...} {hline} help for {hi:nodup}{right: El } {hline} {title:Show discrepancies within sets of records} {p 8 17 2} {cmdab:nodup} [{it:varlist}] [{cmd:if} {it:exp}] {cmd:, dup(}{it:varlist}{cmd:)} [{cmd:over(}{it:integer}{cmd:)}] {title:Description} {p 4 4 2}Be sure to see {bf:Remarks}, below. {p 4 4 2}{cmd:nodup} lists differences among similar records. The dataset is divided into {bf:set}s based on the mandatory {cmd:dup()} option, the records within each set are compared, and the differences reported. {p 4 4 2}For each variable in {it:varlist}, the number of {bf:set}s that are {it:not} identical is reported (twice, actually: with and without counting {it:missing} as a distinct value). {title:Options} {p 8 8 2}{cmd:dup(}{it:varlist}{cmd:)} determines what constitutes a {bf:set} - specifically, records that are identical over {it:varlist}. {p 8 8 2}{cmd:over()} specifies another level of reporting: In addition to reporting the number of sets with more than a single value per variable, {cmd:nodup} includes a column that reports the number of sets having more than {it:integer} values per variable (2 by default). {title:Remarks} {p 4 4 2}Tooth-achingly difficult to describe in the abstract, this command is very easy to actually use. For example, consider a patient database including SSN, name, and birth date. The following command: {p 8 8 2}{cmd:. nodup name bdate, dup(ssn)} {p 4 4 2} will output: {p 8 11 2}1. the number of distinct patients based on SSN {p 8 11 2}2. the number of patients (based on SSN) with more than one name in the dataset. {p 8 11 2}3. the number of patients (based on SSN) with more than one birth date in the dataset. {p 4 4 2} Specifying {cmd:over(5)} would cause the output to include the number of patients with more than 5 names (and the number with more than 5 birthdates). {title:Also see} {pstd}Online: help for {help duplicates} {pstd}Contact: {browse "mailto:elliott.lowy@va.gov":Elliott Lowy}