    Data Cleansing Issues

    Data Cleansing Solution is the process of reconciling one or more data sets, so that the resulting data is free of mismatched or invalid information. Cleansing operations figure prominently in the extraction, translation and loading process of many enterprise-grade data-warehouse operations---the large-scale information databases that many large institutions use to track and report information. However, cleansing as a data-management strategy is not without challenges.
    Many automatic cleansing solutions identify outlier data and remove or re-code the outlier to match an expected value. However, an outlier is not necessary invalid. Therefore, an automated solution must be able to address all legitimate business purposes of outlier data and handle them appropriately. Data cleansing must have an approach to handling null values that serves the purpose of data reporting. For example, there is a difference between having a genuine null, and simply having no data. A community health worker might find that a patient smokes zero cigarettes and therefore enter nothing into a collection form. That "nothing" is not the same thing as the health worker leaving a field blank because she did not ask the question. Cleansing must be able to handle this type of situation. Sometimes a large database may integrate data from multiple sources that record similar data in a different manner. The cleansing process must identify these "type mismatch" scenarios and force all of the related data into the same format and the same data type so that the data may be reported in uniform fashion by report developers.
    Data cleansing is the process of uncovering and correcting inconsistent records from a table, a set, or database. This is used mainly in databases to identify imperfect, incorrect, erroneous and irrelevant parts of the data and then modifying, replacing or deleting the incorrect data.

