Quality of Data

From Wikipedia, the free encyclopedia

Quality of Data (QoD) is a designation coined by L. Veiga, that specifies and describes the required Quality of Service of a distributed storage system from the Consistency point of view of its data. It can be used to support big data management frameworks, Workflow management, and HPC systems (mainly for data replication and consistency). It takes into account data semantics, namely the Time interval of data freshness, the Sequence of tolerable number of outstanding versions of the data read before ore refresh, and the Value divergence allowed before displaying it. Initially it was based on a model from an existing research work regarding vector-field Consistency,[1] awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007 and later enhanced for increased scalability and fault-tolerance.[2]

This consistency model has been successfully applied and proven in big data key/value store Apache HBase,[3] initially designed as a middleware[4] module seating between clusters from separate data centres. The HBase-QoD coupling [5] minimises bandwidth usage and optimises resources allocation during replication achieving the desired consistency level at a more fine-grained level.

QoD is defined by the three-dimensions of vector k=(θ,σ,ν), but with a broader view of the issue, applicable also to large-scale data management techniques in regards to their timely delivery.[6]

Other descriptions[edit]

Quality of Data should not be confused with other definitions for data quality such as completeness, validity, and accuracy.[7] [8]

References[edit]

  1. ^ Nuno Santos; Luís Veiga; Paulo Ferreira (2007). "Vector-Field Consistency for Adhoc Gaming" (PDF). ACM/IFIP/Usenix Middleware Conference 2007.
  2. ^ Luís Veiga; André Negrão; Nuno Santos; Paulo Ferreira (2010). "Unifying Divergence Bounding and Locality Awareness in Replicated Systems with Vector-Field Consistency" (PDF). JISA, Journal of Internet Services and Applications, Volume 1, Number 2, 95-115, Springer, 2010.
  3. ^ "Apache HBase – Apache HBase™ Home". hbase.apache.org. Retrieved 2022-10-15.
  4. ^ Sergio Estéves; João Silva & Luís Veiga (2013). "Quality-of-service for consistency of data geo-replication in cloud computing" (PDF). Euro-Par 2012 Parallel Processing. Springer Berlin Heidelberg, 2012. 285-297.
  5. ^ Álvaro García-Recuero; Sergio Estéves; Luís Veiga (2013). "Quality-of-Data for Consistency Levels in Geo-replicated Cloud Data Stores" (PDF). IEEE CloudCom 2013.
  6. ^ Data Quality Published by IBM
  7. ^ Richard Y. Wang (1992). "Toward quality data : an attribute-based approach" (PDF). Decision Support Systems 13, MIT.
  8. ^ George A. Mihaila; Louiqa Raschid; María-Esther Vidal (2000). Using Quality of Data Metadata for Source Selection and Ranking. CiteSeerX 10.1.1.34.9361.