However, you may no longer need to call on the services of Severus Snape or Mystic Meg to get a workable estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on SearchDataManagement.com, which proposes a more structured and rational approach to estimating data quality work effort.
At first glance, the overall methodology that David proposes is reasonable in terms of estimating effort for a pure profiling exercise - at least in principle. (It's analogous to similar "bottom/up" calculations that I've used in the past to estimate ETL development on a job-by-job basis, or creation of standards Business Intelligence reports on a report-by-report basis).
I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we're only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that's where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.
- 10mins: for each "Simple" item (standard format, no applied business rules, fewer that 100 member records)
- 30 mins: for each "Medium" complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
- 60 mins: for any "Hard" high-complexity items (significant, complex business logic, data sets over 1000 member records)
How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I'd suggest the following:
- Two hours of preparation per workshop (If the stakeholder group is "tame". Double it if there are participants who are negatively inclined).
- One hour face-time per workshop (Double it for "negatives")
- One hour post-workshop write-up time per workshop
- One workshop per 10 stakeholders.
- Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.
Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.