Why do we need to become knowledgeable about multimodal text? ​