Uncertainty Identification in Microblogs

Document Type : Review Paper


1 Laboratoire de la Communication dans les Systèmes Informatiques, Ecole Nationale Supérieure d’Informatique, BP 68M, 16309, Oued-Smar, Alger, Algérie.

2 Department of Information Technology Management, Faculty of Management, University of Tehran, Tehran, Iran


Microblogging, like Twitter, has become a popular platform of human expressions, through which users can easily produce content on breaking news, public events, or products. The massive amount of microblogging data is a useful and timely source that carries mass sentiments, beliefs and opinions on various topics. Users express themselves freely with varying levels of uncertainty, which makes exploiting microblogs as a source of data a tedious task requiring this aspect to be taken into consideration. Here we talk about the uncertainty expressed in microblogs not the uncertainty relative to the claimed information factuality. This aspect that we approach has received little attention in the context of microblogging, whereas it is important to know with which degree of uncertainty the users intend to provide information. The research works carrying out the retrieval of information or investigation in microblogs, are particularly concerned by this subject. In this paper we present a state of the art on the identification of uncertainty in microblogs with the aim of identifying this issue and describing the current knowledge through the study of similar or related work. We mainly constated that, to adapt to the characteristics of social media, it is necessary to identify the uncertainty based on the contextual uncertain semantics rather than the traditional cue-phrases, and considering multiple sub-classes could provide more information for research on handing uncertainty in social media texts.

Graphical Abstract

Uncertainty Identification in Microblogs


  • The paper approaches the issue of identifying uncertainty in microblogs.
  • The uncertainty identification is defined and corpora annotated for uncertainty are highlighted.
  • The theory behind the semantic uncertainty levels is exposed.
  • A comparative study is done based on the related works that applied the semantic classification.


Adel, H., & Schütze, H. (2016). Exploring different dimensions of attention for uncertainty detection. arXiv preprint arXiv:1612.06549.
Aikhenvald, A. Y. (2004). Evidentiality. Oxford University Press, Oxford.
Al-Sabbagh, R., Girju, R., & Diesner, J. (2015, April). A unified framework to identify and extract uncertainty cues, holders, and scopes in one fell-swoop. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 310-334). Springer, Cham.
Basu, M., Ghosh, K., & Ghosh, S. (2020). Information Retrieval from Microblogs During Disasters: In the Light of IRMiDis Task. SN Computer Science1(1), 1-10.
Bessarab, A., Mitchuk, O., Baranetska, A., Kodatska, N., Kvasnytsia, O., & Mykytiv, G. (2021). Social Networks as a Phenomenon of the Information Society. Journal of Optimization in Industrial Engineering14(1), 35-42.
De Marneffe, M. C., Manning, C. D., & Potts, C. (2012). Did it happen? The pragmatic complexity of veridicality assessment. Computational linguistics38(2), 301-333.
Diab, M., Levin, L., Mitamura, T., Rambow, O., Prabhakaran, V., & Guo, W. (2009, August). Committed belief annotation and tagging. In Proceedings of the Third Linguistic Annotation Workshop (LAW III) (pp. 68-73).
Díaz, N. P. C. (2013, September). Detecting negated and uncertain information in biomedical and review texts. In Proceedings of the Student Research Workshop associated with RANLP 2013 (pp. 45-50).
Ebner, M. (2018). Microblogs. In The SAGE Encyclopedia of the Internet (pp. 640-641). Sage Publications, Inc..
Farkas, R., Vincze, V., Móra, G., Csirik, J., & Szarvas, G. (2010, July). The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In Proceedings of the fourteenth conference on computational natural language learning–Shared task (pp. 1-12).
Feng, R. J., Zhang, H. J., Pan, W. M., Zhou, Z. Y., & Li, Y. J. (2021). A New Method of Microblog Rumor Detection Based on Transformer Model. In Artificial Intelligence in China (pp. 531-537). Springer, Singapore.
Ghelichkhan, A., Nematizadeh, S., Saeednia, H. R., & Nourbakhsh, S. K. (2020). Optimal Use of Social Media From the Perspective of Brand Equity in Startups with a Data Approach. Journal of Optimization in Industrial Engineering, 13(2), 149-163.
Han, X., Li, B., & Wang, Z. (2019). An attention-based neural framework for uncertainty identification on social media texts. Tsinghua Science and Technology25(1), 117-126.
Hua, W., Huynh, D. T., Hosseini, S., Lu, J., & Zhou, X. (2012). Information Extraction From Microblogs: A Survey. Int. J. Softw. Informatics6(4), 495-522.
Java, A., Song, X., Finin, T., & Tseng, B. (2007, August). Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis (pp. 56-65).
Jaworski, W., Rejmund, E., & Wierzbicki, A. (2014, August). Credibility Microscope: relating Web page credibility evaluations to their textual content. In 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) (Vol. 1, pp. 297-302). IEEE.
Jordan, S. E., Hovet, S. E., Fung, I. C. H., Liang, H., Fu, K. W., & Tse, Z. T. H. (2019). Using Twitter for public health surveillance from monitoring and prediction to public response. Data4(1), 6.
Kiefer, F. (2005). Lehetoseg es szuksegszeruseg [Possibility and necessity]. Tinta Kiadó, Budapest.
Kim, J. D., Ohta, T., & Tsujii, J. I. (2008). Corpus annotation for mining biomedical events from literature. BMC bioinformatics9(1), 1-25.
Konstantinova, N., De Sousa, S. C., Díaz, N. P. C., López, M. J. M., Taboada, M., & Mitkov, R. (2012, May). A review corpus annotated for negation, speculation and their scope. In Lrec (pp. 3190-3195).
Kothandan, J., & Murugesan, P. (2021). ML based social media data emotion analyzer and sentiment classifier with enriched preprocessor. Journal of Information Technology Management13(Special Issue: Big Data Analytics and Management in Internet of Things), 6-20.
Kumar, M., Garg, A., Munjal, A., & AkanshaTanwar, A. (2017). Twitter Based Information Extraction. International Journal of New Technology and Research3(3).
Li, B., Xiang, J., Chen, L., Han, X., Yu, X., Xu, R., ... & Wong, K. F. (2018, May). The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
Li, B., Xiang, J., Chen, L., Han, X., Yu, X., Xu, R., ... & Wong, K. F. (2018, May). The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
Light, M., Qiu, X. Y., & Srinivasan, P. (2004). The language of bioscience: Facts, speculations, and statements in between. In HLT-NAACL 2004 workshop: linking biological literature, ontologies and databases (pp. 17-24).
Medlock, B. (2008). Exploring hedge identification in biomedical literature. Journal of biomedical informatics41(4), 636-654.
Medlock, B., & Briscoe, T. (2007, June). Weakly supervised learning for hedge classification in scientific literature. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 992-999).
Nawaz, R., Thompson, P., & Ananiadou, S. (2010, July). Evaluating a meta-knowledge annotation scheme for bio-events. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (pp. 69-77).
Palmer, F. R. (2001). Mood and modality. Cambridge university press.
Qazvinian, V., Rosengren, E., Radev, D., & Mei, Q. (2011, July). Rumor has it: Identifying misinformation in microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 1589-1599).
Owyang, J. (2008) Retweet: The infectious power of word of mouth. http://www.webstrategist.com/blog/2008/11/23/retweet-the-infectious-power-of-the-word-of-mouth/ (last access June 2021).
Rajabi, F., Saghaei, A., & Sadinejad, S. (2020). Monitoring of social network and change detection by applying statistical process: ERGM. Journal of Optimization in Industrial Engineering13(1), 131-143.
Rouhani, S., & Abedin, E. (2019). Crypto-currencies narrated on tweets: a sentiment analysis approach. International Journal of Ethics and Systems.
Rubin, V. L., Liddy, E. D., & Kando, N. (2006). Certainty identification in texts: Categorization model and manual tagging results. In Computing attitude and affect in text: Theory and applications (pp. 61-76). Springer, Dordrecht.
Rubin, V. L. (2010). Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts. Information Processing & Management46(5), 533-540.
Russell, S., & Norvig, P. (2002). Artificial intelligence: a modern approach.
Saurí, R., & Pustejovsky, J. (2009). FactBank: a corpus annotated with event factuality. Language resources and evaluation43(3), 227-268.
Saurí, R., & Pustejovsky, J. (2012). Are you sure that this happened? assessing the factuality degree of events in text. Computational linguistics38(2), 261-299.
Settles, B., Craven, M., & Friedland, L. (2008, December). Active learning with real annotation costs. In Proceedings of the NIPS workshop on cost-sensitive learning (Vol. 1).
Shatkay, H., Pan, F., Rzhetsky, A., & Wilbur, W. J. (2008). Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users. Bioinformatics24(18), 2086-2093.
Szarvas, G. (2008, June). Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proceedings of acl-08: HLT (pp. 281-289).
Szarvas, G., Vincze, V., Farkas, R., Móra, G., & Gurevych, I. (2012). Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics38(2), 335-367.
Uzuner, Ö., Zhang, X., & Sibanda, T. (2009). Machine learning and rule-based approaches to assertion classification. Journal of the American Medical Informatics Association16(1), 109-115.
Vincze, V. (2014). Uncertainty detection in natural language texts. PhD, University of Szeged, 141.
Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics9(11), 1-9.
Yan, Q., Wu, L., & Zheng, L. (2013). Social network based microblog user behavior analysis. Physica A: Statistical Mechanics and Its Applications392(7), 1712-1723.
Wei, Z., Chen, J., Gao, W., Li, B., Zhou, L., & He, Y., et al. (2013). An Empirical Study on Uncertainty Identification in Social Media Context. Meeting of the Association for Computational Linguistics (pp.58-62).
Wilson, T. A. (2008). Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states. University of Pittsburgh.