Published Oct 26, 2010



PLUMX
Google Scholar
 
Search GoogleScholar


Miguel Eduardo Torres-Moreno

Germán Flórez-Larrahondo

##plugins.themes.bootstrap3.article.details##

Abstract

This paper presents an empirical study of the effect that different input sizes have on the performance of lossless data compression algorithms. We analyzed three different performance measures and created a new dataset based on the Calgary and Canterbury corpus. This dataset also includes two new “complex” files as well. We demonstrated that for large files the compression ratio of the lossless algorithms stays fairly constant and only changes by a small factor every 10MB. Finally, we have shown that the execution time for compressing and Decompression data is a linear function based on the size of the input.

Keywords

compresión de datos, algoritmos de compresión sin pérdida, desempeño de algoritmosData compression, lossless algorithms, algorithm’s performance

References
Arnold, R. y Bell, T. A Corpus for the Evaluation of Lossless Compression Algorithms. En: Proceedings of the IEEE Data Compression Conference. Utah: Snowbird, 1997.
Bell, T. et al. Modeling for Text Compresión. En: ACM Computing Surveys, 21(4), diciembre, 1989, 557-591.
Bender, P. E. y Wolf, J. K. An Improved Sliding Window Data Compression Algorithm based on the Lempel-Ziv Data Compression Algorithm. En: Proceedings on the Global Telecommunications Conference, GLOBECOM ‘90, IEEE, 3, 1990, 1773-1777.
Cho, G. Y. y Cho, D. H. A Study on the Efficient Compression Algorithm of the Voice/Data Integrated Multiplexer. En: Proceedings on the IEEE International Conference on Communications, ICC 95, Seattle, 3, junio, 1995, 1438-1442.
Jianzhong, L. y Srivastava, J. Efficient Aggregation Algorithms for compressed Data Warehouses. En: IEEE Transactions on Knowledge and Data Engineering, 14(3), 2002, 515-529.
Jones, D. A Practical Evaluation of a Data Compression Algorithm. En: Proceedings of the 1991 Data Compression Conference. Utah: Snowbird, 1991, 372-381.
Livingston, F. et al. Lossless Data Compression in Real Time. En: Proceedings of the Twenty-Eighth Asilomar Conference on Signals. Systems and Computers, 2, 1994, 1247-1250.
Mano, Y. y Sato, Y. A Data Compression Scheme which Achieves Good Compression for Practical Use. En: Proceedings of the Fifteenth Annual International Computer Software and Applications Conference. septiembre, 1991, 442-449.
MIT Lincoln Labs. Datasets for the 1999 Intrusion Detection Evaluation. 2004. Disponible en: http://www.ll.mit.edu/IST/ideval/data/1999/training/week1/thursday/outside.tcpdump.gz.
Moffat, A. Arithmetic Coding. 2004. Disponible en: http://www.cs.mu.oz.au/~alistair/arith_coder/ (current May 10).
Williams, R. N. An Extremely Fast Ziv-Lempel Data Compression Algorithm. En: On Data Compression Conference, DCC ’91, abril, 1991, 362-371.
Witten, I. H., Neal, R. M. y Cleary, J. G. Arithmetic Coding for Data Compression. En: Communications of the ACM, 30(6), 1987, 520-540.
How to Cite
Torres-Moreno, M. E., & Flórez-Larrahondo, G. (2010). Análisis empírico del efecto del tamaño de la información de entrada en el desempeño de herramientas de compresión sin pérdida. Ingenieria Y Universidad, 8(1). Retrieved from https://revistas.javeriana.edu.co/index.php/iyu/article/view/895
Section
Articles