Review and prospect of research on ancient book information processing in China

Main Article Content

Zhongbao Liu
Zhenzhen Qin
Wenjuan Zhao


The advent of the era of big data appears to be an unprecedented opportunity to the development of ancient book information processing in China. A comprehensive review of the research on ancient book information processing can help researchers understand the research progress and grasp the future research trend. Based on the life cycle of ancient book information processing, this paper reviews and summarizes the progress of digital resource construction, data mining, system construction and information service, and prospects the future research trend. Although progress has been made to some extent, the ancient book information processing is still in its infancy. It is expected that more researchers will pay attention to and engage in this field.


Download data is not yet available.

Article Details

How to Cite
Liu, Z., Qin, Z., & Zhao, W. (2021). Review and prospect of research on ancient book information processing in China. Malaysian Journal of Library &Amp; Information Science, 26(3), 77–95. Retrieved from


Bai, S. X. and Bao, Y. L. 2017. LDA-based word image representation for keyword spotting on historical Mongolian documents. Journal of Modern Information, Vol. 37, no. 7: 51-54, 88.

Chang, E. 2009. The automatic compilation system construction of the agricultural ancient books. Researches in Library Science, no. 6: 10-14.

Chang, E. and Hou, H. Q. 2007. Research on automatic compilation of ancient agricultural books. Journal of Nanjing Agricultural University (Social Sciences Edition), Vol. 7, no. 1: 99-104.

Chang, E., Hou, H. Q. and Cao, L. 2007. Research on automatic version comparison and analysis of ancient book and its realization. Journal of Chinese Information Processing, Vol. 21, no. 2: 83-88.

Chang, E., Zhang, C. X., Hou, H. Q. and Hui, F. P. 2013. Automatic word sense disambiguation of ancient Chinese based on vector space model. Library and Information Service, Vol. 57, no. 2: 114-118.

Chang, Y. C., Lu, C. and Zhai, J. P. 2019. Application of knowledge organization of ancient Chinese prose based on linked data. Library Theory and Practice, no. 2: 55-59.

Che, C. and Zheng, X. J. 2016. Sub-word based translation extraction for terms in Chinese historical classics. Journal of Chinese Information Processing, Vol. 30, no. 3: 46-51.

Chen, C. M. and Chang C. 2019. A Chinese ancient book digital humanities research platform to support digital humanities research. The Electronic Library, Vol. 37, no.2: 314-336.

Chen, F. Y., Chen, C. M. and Chang, C. 2019. Development and evaluation of a character social network relationship map tool in an ancient book digital humanities research platform. Proceedings of the 8th International Congress on Advanced Applied Informatics, Toyama, Japan: 73-78.

Chen, T. Y., Chen, R., Pan, L. L., Li, H. J. and Yu, Z. H. 2007. Archaic Chinese punctuating sentences based on context n-gram model. Computer Engineering, Vol. 33, no. 3: 192-196.

Cheng, Z. and Liu, X. J. 2021. Feature extraction of ancient Chinese characters based on deep convolution neural network and big data analysis. Computational Intelligence and Neuroscience, 2021: 2491116.

Deng, Z. H., Huang, X., Lu, Y. J. and Li, M. J. 2014. Discussion about the construction method of ontology library in field of ancient book editions. Library, Document & Communication, Vol. 4: 80-87.

Ding, Y. L., Li, R. F. and Li, W. X. 2012. Ancient Chinese musical score translation via instance-based learning. Proceedings of the 2012 International Conference on Audio, Language and Image Processing, Shanghai, China: 1035-1040.

Fu, X. J., Yuan, T., Li, X. B. and Wang, Z. G. 2019. Research on the method and system of word segmentation and POS tagging for ancient Chinese medicine literature. Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, San Diego, USA: 2493-2498.

Gao, L. C., Zhong, Y., Tang, Y. M., Zhi, T. and Xuan, H. 2011. Metadata extraction system for Chinese books. Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China: 749-753.

Gao, M. 2021. Research on rapid sharing of digital ancient literature resources in cloud computing environment. Proceedings of the 6th International Conference on Smart Grid and Electrical Automation, Kunming, China: 248-252.

Gao, S., Jin, P. and Zhang, D. Z. 2019. Research on named entity recognition of TCM classics based on deep learning. Technology Intelligence Engineering, Vol. 5, no. 1: 113-123.

Gu, L. L. 2021. Integration and optimization of ancient literature information resources based on big data technology. Mobile Information System, Vol. 2021: 6452418.

Guo, W. L., and Dai, Y. Q. 2011. The retrieval research on ancient books digitization. Library Theory and Practice, no. 10: 13-16.

Han, F., Yang, T. X. and Song, J. H. 2015. Ancient Chinese MT based on sentence-focused syntax. Journal of Chinese Information Processing, Vol. 29, no. 2: 103-110.

Han, X., Wang, H., Zhang, S. Fu, Q., and Liu, S. 2019. Sentence segmentation for classical Chinese based on LSTM with radical embedding. Journal of China Universities of Posts and Telecommunications, Vol. 26, no. 2: 1-8.

He, L. and Cao, L. 2006. Research of building and retrieval of ancient agricultural book ontology. New Technology of Library and Information Service, Vol. 12: 37-39, 53.

Ho, H. I. B., and De Weerdt, H. 2014. MARKUS: Text analysis and reading platform. Available at:

Huang, J. N. 2011. An experiment of editing multi-text Chinese ancient book indexes based on VFP and Word. New Technology of Library and Information Service, no. 10: 85-89.

Huang, J. N. and Hou, H. Q. 2008. On sentence segmentation and punctuation model for ancient books on agriculture. Journal of Chinese Information Processing, Vol. 22, no. 4: 31-38.

Huang, J. N. and Hou, H. Q. 2011. An experiment on word segmentation for ancient agriculture books. Journal of the China Society for Scientific and Technical Information, Vol. 30, no. 6: 618-625.

Huang, S. Q., Wang, D. B. and He, L. 2015. Research on constructing automatic recognition model for ancient Chinese place names based on pre-Qin corpus. Library and Information Service, Vol. 59, no. 12: 135-140.

Jia, F. X. 2015. Construction method of ancient books knowledge base based on knowledge clustering. Journal of Library Science, no. 5: 45-48.

Jiang, X., Jiang, Y., Fang, M., and Wang, R. P. 2010. Tree pruning based fast segmentation of classical texts - a case study on Classic of Tea. Journal of Chinese Information Processing, Vol. 24, no. 6: 10-13, 42.

Jeon, Y. C. 2005. General catalogue of Chinese ancient books in Korea. Seoul: Korea Learning Ancient Publishing House.

Li, B., Lu, W., Yuan, W., and Gu, Y. 2017. Discover social relations and activities from ancient Chinese history book Zuo Zhuan. Proceedings of the 2017 International Conference on Behavioral, Economic, Socio-cultural Computing, Krakow, Poland: 1-5.

Liang, S. H. and Chen, X. H. 2013. Methodological study of automatic word segmentation in pre-Qin document Mencius. Journal of School of Chinese Language and Culture, Nanjing Normal University, no. 3: 175-182.

Liu, C. H. 2004. The knowledge representation of ancient Chinese medicine books based on knowledge element. Proceedings of the third International Convention of Traditional Medicine, China, 313-314.

Liu, C. L., Huang, C. K., Wang, H. and Bol, P.K. 2015. Mining local gazetteers of literary chinese with CRF and pattern based methods for biographical information in Chinese history. Proceedings of the 2015 International Conference on Big Data, Washington, D C, USA: 1629-1638.

Liu, J. Y., and Zhao, X. W. 2017. Construction of computer aided collation repository of ancient editions. Library Theory and Practice, no. 3: 54-58.

Liu, Z. B., Dang, J. F., Zhang, Z. J. 2020. Research on automatic extraction of historical events and construction of event graph based on Historical Records. Library and Information Service, Vol. 64, no. 11: 116-124.

Ma, C. X., Chen, X. H. and Qu, W. G. 2013. Study and design on knowledge network of classical ancient books and commentary literatures. Library and Information Service, Vol. 57. no. 9: 124-128.

Mao, J. J. 2006. Development and construction on ancient books digitization in the overseas. Digital Library Forum, no. 12: 24-28.

Oriental Culture Research Center of University of Tokyo. 2011. Full text database of rare Chinese ancient books. Available at:

Ouyang, J. and Ren, S. H. 2021. Visualization of ancient texts reading in digital humanities research. Library Journal, Vol. 40, no. 4: 82-89.

Qian, Z. Y., Zhou, J. Z., Tong, G. P. and Su, X. N. 2014. Research on automatic word segmentation and POS tagging for Chu Ci based on HMM. Library and Information Service, Vol. 58, no. 4: 105-11.

Shi, L. J. 2016. Research on common problems and Countermeasures of ancient image database construction. Library Work and Study, no. 9: 62-66.

Wang, B., Shi, X., Tan, Z., Chen, Y. and Wang, W. 2016. A sentence segmentation method for ancient Chinese tests based on NNLM. Proceedings of the 17th Chinese Lexical Semantics Workshop, Singapore, 387-396.

Wang, B. L., Shi, X. D., Su, J. S. 2017. A sentence segmentation method for ancient Chinese texts based on recurrent neural network. Acta Scientiarum Naturalium Universitatis Pekinensis, Vol. 53, no. 2: 255-261.

Wang, C., Zhang, X. H., Han, C. H. 2009. Research on sentence segmentation and punctuation in ancient Chinese. Journal of Henan University (Natural Science), Vol. 39, no. 5: 525-529.

Wang, D. B., Gao, R. Q., Shen, S. and Li, B. 2018. Deep learning-based classification of pre-Qin classics questions. Journal of the China Society for Scientific and Technical Information, Vol. 37, no. 11: 1114-1122.

Wang, D. B., Huang, S. Q., and He, L. 2017. Researches of automatic part-of-speech tagging for pre-Qin literature based on multi-feature knowledge. Library and Information Service, Vol. 61, no. 12: 64-70.

Wang, P. 2014. Research on information organization and classification of dictionary combined retrieval system based on handed down Chinese Dictionary in Japan, South Korea and China. The Journal of Chinese Character Studies, Vol. 10: 1-25.

Wang, S., Xiong, D. L. and Wang, X. X. 2009. The research and implementation of example based machine translation of ancient Chinese. Journal of Xuchang University, Vol. 28, no.5: 88-91.

Wang, S. S., Wang, D. B., Huang, S. Q., and He, L. 2018. Research on the automatic word segmentation of The Book of Songs under multi-dimensional domain knowledge. Journal of the China Society for Scientific and Technical Information, Vol. 37, no. 2: 183-193.

Wang, Y. C. and Tsai, R. T. H. 2013. Transliteration extraction from classical Chinese buddhist literature using conditional random fields. Proceedings of the 27th Pacific Asia Conference on Language, Taipei, Taiwan: 260-266.

Wei, J. Z. and Liu, R. 2019. An approach of constructing knowledge graph of the Hundred Schools of Thought in ancient China. Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, Urbana-Champaign, Illinois: 335-336.

Wei, Y. T., Wang, H. Z., Zhao, J. Q., Liu, Y. T., Zhang, Y. and Wu, B. 2020. GeLaiGeLai: a visual platform for analysis of classical Chinese poetry based on knowledge graph. Proceedings of the 11th IEEE International Conference on Knowledge Graph, Nanjing, China: 513-520.

Wu, X., Wu L., Duan, X. T., Ren, T. L. and He, J. 2016. Digital protection and inheritance of culture under the background of “The Belt and Road Initiative”. Proceedings of the 2016 IEEE International Conference on Electronic Information and Communication Technology, Harbin, China: 231-235.

Xia, C. J., Lin, H. Q. and Liu, W. 2017. Designing a data model of Chinese ancient books for evidence based practice. Journal of Library Science in China, Vol. 43, no. 232: 16-34.

Xiao, L. and Chen, X. H. 2010. Automatic detection of version differences among ancient Chinese texts. Journal of Chinese Information Processing, Vol. 24, no. 5: 50-55.

Xiao, Y. 2017. Research on the application of the index data of ancient books. New Century Library, Vol. 5: 45-48.

Xu, R. H., and Chen, X. H. 2012. A method of segmentation on Zuo Zhuan by using commentaries. Journal of Chinese Information Processing, Vol. 26, no. 2: 13-17.

Xue, L. G. 1995. Development status and prospect of collected literature digitalization. Taiwan Branch of National Central Library, Vol. 4, no. 1: 10-21.

Yao, Y., and Huang, Z. 2016. Bi-directional LSTM recurrent neural network for Chinese word segmentation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Kyoto, Japan: 1197-1206.

Yu, J. S., Wei, Y., and Zhang, Y. W. 2019. Automatic ancient Chinese texts segmentation based on BERT. Journal of Chinese Information Processing, Vol. 33, no. 11: 57-63.

Yu, J. S., Wei, Y., Zhang, Y. W., and Yang, H. 2020. Word segmentation for ancient Chinese texts based on nonparametric Bayesian models and deep learning. Journal of Chinese Information Processing, Vol. 34, no. 6: 1-8.

Yu, X. J. and Wei, H. F. 2019. A machine learning model for the dating of ancient Chinese texts. Proceedings of the 2019 International Conference on Asian Language Processing, Shanghai, China: 115-120.

Yu, L., and Guan, J. W. 2017. A situation and development analysis on the digitalization of ancient books in China. Digital Library Forum, no. 11: 41-47.

Yu, L. L., Ding, D. X., Qu, W. G., Chen, X. H. and Li, H. 2009. The ancient Chinese word sense disambiguation based on CRF. Microelectronics and Computer, Vol. 26, no. 10: 45-48.

Zhang, K. X., Xia, Y. Q., and Yu, H. 2009. CRF-based approach to sentence segmentation and punctuation for ancient Chinese prose. Journal of Tsinghua University (Science and Technology), Vol. 49, no. 10: 1733-1736.

Zhang, L. Y., and Wang, J. 2020. Design of faceted classification system of ancient book databases. Library Development, no. 3: 56-61

Zhang, M., Ma, S. P., Jiang, Z., and Huang, K. 2001. Statistical learning and analyses of Chinese ancient books for information retrieval. Proceedings of the 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace, Tucson, USA: 869-873.

Zhang, X., Chen, H., Xu, T. 2013. Deep learning for Chinese word segmentation and POS tagging. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, USA: 647-657.

Zhang, Z. X. 2002. Research on automatic compilation of ancient documents by computer. Lexicographical Studies, no. 5: 42-48.

Zhao, H. Y. 2020. Evaluation index system for the service efficiency of digital resources of ancient books. Library Tribune, no. 7: 150-160.

Zhou, L. N., Hong, L., and Gao, Z. Y. 2019. Construction of knowledge graph of Chinese Tang Poetry and design of intelligent knowledge Services. Library and Information Service, Vol. 63, no. 2: 24-33.

Zhu, B. J., and Zhang, J. Z. 2020. Digital humanities cyberinfrastructure for ancient China studies: past, present, and future. Library Trends, Vol. 69, no. 1: 319-333.

Zhu, S. L., and Bao, P. 2015. The use of Geographic Information System in the development and utilization of ancient local chronicles, Library Hi Tech, Vol. 33, no. 3: 356-368.