Skip to main content

Data design thinking: data cleaning improvements using tableau prep


Tableau Prep automatically shows errors and outliers in data and employs fuzzy clustering to help you with the common, repetitive tasks like fixing spellings errors or reconciling entities across data sources. Project Maestro shares the same calc language and governance structure as the rest of Tableau, so you can get started easily. And with a streamlined sharing experience to Tableau Desktop, Tableau Server and Tableau Online, a user can experience data prep and analysis in one continuous flow. 

In this first phase of the Enterprise implementation of Tableau at UCSD Health, we have focused on getting data from five vendors (Epic, Experian and Bank of America / Healthlogic) ready for Tableau work. The data sets we've created in Hyper have a zero date of 2013 10 01. They are also made to conform to  SDMX standards whenever possible. SDMX stands for Statistical Data and Metadata eXchange and is an international initiative that aims at standardising and modernising (“industrialising”) the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.

Current operations depend on a data file loosely termed an 'accounts trial balance'. This data is actually a blend of account level information, and a 'bucket' which is a software generated construct. The bucket information relies on affinity analysis - a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In the case of UCSD Health billing accounts, the affinity is based on the patient, their insurance coverges and the split of fiscal responsibility.

SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.

These organisations are the main players at world and regional levels in the collection of official statistics in a large variety of domains (agriculture statistics, health care, economic and financial statistics, social statistics, environment statistics etc.).

We've published a sequence of Adobe Spark guides that can be useful for providing UCSD employees recognition for learning that happens anywhere. A digital badge is an online representation of a data analysis skill we are promoting and sharing with our sister UC medical centers.


Tableau 2018. Tableau Prep. <> last accessed 2018 04 24.

SDMX 2018. Learning about sdmx basics. <> last accessed 2018 04 13.