10011001
00010101
10001010
11000010
10011010
00011100
10101111
01101011
10101100
11000000
11000000
01111100
11011100
01110010
10110001

Research Group:

Multimodal Learning Technologies

Head of the Research Group:

Prof. Dr. Daniele Di Mitri

Multimodal Learning Technologies is a cutting-edge research area that focuses on developing AI systems capable of processing and integrating multiple types of data inputs—such as text, images, video, audio, or even sensor data—to create more versatile and human-like models. This field explores advancements in deep learning architectures like transformers for multimodal tasks such as image captioning, speech-to-text translation, or video understanding. Applications span industries including healthcare (e.g., medical image analysis combined with patient records), education (e.g., interactive learning tools), and entertainment (e.g., immersive virtual reality experiences). The ultimate goal is to create intelligent systems that can seamlessly understand complex real-world scenarios.