Research Article

Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS)

Roberto Balestri 1 * , Pasquale Cascarano 1 , Mirko Degli Esposti 1 , Guglielmo Pescatore 1
More Detail
1 Università di Bologna, Bologna, ITALY* Corresponding Author
Online Journal of Communication and Media Technologies, 15(3), July 2025, e202524, https://doi.org/10.30935/ojcmt/16669
Published: 28 July 2025
OPEN ACCESS   2648 Views   1154 Downloads
Download Full Text (PDF)

ABSTRACT

This paper introduces TRAILDREAMS, a framework that uses a large language model (LLM) to automate the production of movie trailers. The purpose of LLM is to select key visual sequences and impactful dialogues, and to help TRAILDREAMS to generate audio elements such as music and voiceovers. The goal is to produce engaging and visually appealing trailers efficiently. In comparative evaluations, TRAILDREAMS surpasses current state-of-the-art trailer generation methods in viewer ratings. However, it still falls short when compared to real, human-crafted trailers. While TRAILDREAMS demonstrates significant promise and marks an advancement in automated creative processes, further improvements are necessary to bridge the quality gap with traditional trailers.

CITATION (APA)

Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2025). Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS). Online Journal of Communication and Media Technologies, 15(3), e202524. https://doi.org/10.30935/ojcmt/16669

REFERENCES

  1. Alberani, D. (2006). Cinemagoer [Computer software]. GitHub. https://cinemagoer.github.io/
  2. Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9357–9366). IEEE. https://doi.org/10.1109/CVPR.2019.00959
  3. Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024a). An automatic deep learning approach for trailer generation through large language models. In 2024 9th International Conference on Frontiers of Signal Processing (ICFSP), Paris, France (pp. 93–100). https://doi.org/10.1109/ICFSP62546.2024.10785516
  4. Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024b). TRAILDREAMS-framework [Computer software]. GitHub. https://github.com/robertobalestri/TRAILDREAMS-Framework
  5. Bellard, F. (2000). FFmpeg [Computer software]. https://ffmpeg.org/
  6. Brachmann, C., Chunpir, H. I., Gennies, S., Haller, B., Kehl, P., Mochtarram, A. P., Möhlmann, D., Schrumpf, C., Schultz, C., Stolper, B., Walther-Franks, B., Jacobs, A., Hermes, T., & Herzog, O. (2009). Automatic movie trailer generation based on semantic video patterns. In I. Maglogiannis, V. Plagianakos, & I. Vlahavas (Eds.), Artificial intelligence: Theories and applications. SETN 2012. Lecture notes in computer science (Vol. 7297, pp. 345–352). Springer. https://doi.org/10.1007/978-3-642-30448-4_44
  7. Bredin, H., Yin, R., Coria, J. M., Gelly, G., Korshunov, P., Lavechin, M., Fustes, Di., Titeux, H., Bouaziz, W., & Gill, M. P. (2020). Pyannote.Audio: Neural building blocks for speaker diarization. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7124–7128). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9052974
  8. Castellano, B. (2014). PySceneDetect [Computer software]. https://www.scenedetect.com/
  9. Copet, J., Kreuk, F., Gat. Itai, Remez, T., Kant, D., Synnaeve, G., Adi, Y., & Défossez, A. (2024). Simple and controllable music generation. arXiv. https://doi.org/10.48550/arXiv.2306.05284
  10. Coqui AI. (2021). XTTS [Computer software]. https://docs.coqui.ai/en/latest/models/xtts.html
  11. De Palma, B. (Director). (1996). Mission: Impossible [Film]. Paramount Pictures.
  12. Degli Esposti, M., & Pescatore, G. (2023). Exploring TV seriality and television studies through data-driven approaches. In Proceedings of the 13th Media Mutations International Conference. https://doi.org/10.21428/93b7ef64.ec022085
  13. Epstein, Z., Hertzmann, A., Akten, M., Farid, H., Fjeld, J., Frank, M. R., Groh, M., Herman, L., Leach, N., Mahari, R., Pentland, A., Russakovsky, O., Schroeder, H., & Smith, A. (2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. https://doi.org/10.1126/science.adh4451
  14. Explosion AI. (2016). spaCy English models [Computer software]. https://spacy.io/models/en
  15. Gallifant, J., Fiske, A., Levites Strekalova, Y. A., Osorio-Valencia, J. S., Parke, R., Mwavu, R., Martinez, N., Gichoya, J. W., Ghassemi, M., Demner-Fushman, D., McCoy, L. G., Celi, L. A., & Pierce, R. (2024). Peer review of GPT–4 technical report and systems card. PLOS Digital Health, 3(1), Article e0000417. https://doi.org/10.1371/journal.pdig.0000417
  16. Hesham, M., Hani, B., Fouad, N., & Amer, E. (2018). Smart trailer: Automatic generation of movie trailer using only subtitles. In Proceedings of the 1st International Workshop on Deep and Representation Learning (pp. 26–30). https://doi.org/10.1109/IWDRL.2018.8358211
  17. Hu, Y., Jin, L., & Jiang, X. (2022). A GCN-based framework for generating trailers. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence (pp. 610–617). https://doi.org/10.1145/3532213.3532306
  18. Irie, G., Satou, T., Kojima, A., Yamasaki, T., & Aizawa, K. (2010). Automatic trailer generation. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 839–842). ACM. https://doi.org/10.1145/1873951.1874092
  19. Jackson, P. (Director). (2012). The Hobbit: An unexpected journey [Film]. Warner Bros.
  20. JaidedAI. (2023). EASYOCR [Computer software]. GitHub. https://github.com/JaidedAI/EASYOCR
  21. jianfch. (2023). Stable Whisper [Computer software]. GitHub. https://github.com/jianfch/stable-ts
  22. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), Article 55.
  23. Long, F., Qiu, Z., Yao, T., & Mei, T. (2024). VideoDrafter: Content-consistent multi-scene video generation with LLM. arXiv. https://doi.org/10.48550/arXiv.2401.01256
  24. Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised video summarization with adversarial LSTM networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2982–2991). IEEE. https://doi.org/10.1109/CVPR.2017.318
  25. Marhon, S. A., Cameron, C. J. F., & Kremer, S. C. (2013). Recurrent neural networks. In M. Bianchini, M. Maggini, & L. Jain (Eds.), Handbook on neural information processing. Intelligent systems reference library (Vol. 49, pp. 29–65). Springer. https://doi.org/10.1007/978-3-642-36657-4_2
  26. Nolan, C. (Director). (2014). Interstellar [Film]. Paramount Pictures.
  27. Oliveira, D. (2024, January 7). Creating movie trailers with AI. Towards AI. https://towardsai.net/p/machine-learning/creating-movie-trailers-with-ai
  28. OpenAI. (2023a). OpenAI–GPT-4. OpenAI. https://openai.com/gpt-4
  29. OpenAI. (2023b). Whisper [Computer software]. GitHub. https://github.com/openai/whisper
  30. Papalampidi, P., Keller, F., & Lapata, M. (2021). Film trailer generation via task decomposition. arXiv. https://doi.org/10.48550/arXiv.2111.08774
  31. Pavel, A., Reed, C., Hartmann, B., & Agrawala, M. (2014). Video digests: A browsable, skimmable format for informational lecture videos. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (pp. 573–582). ACM. https://doi.org/10.1145/2642918.2647400
  32. Piccolomini, E. L., Gandolfi, S., Poluzzi, L., Tavasci, L., Cascarano, P., & Pascucci, A. (2019). Recurrent neural networks applied to GNSS time series for denoising and prediction. In Proceedings of the 26th International Symposium on Temporal Representation and Reasoning. https://doi.org/10.4230/LIPIcs.TIME.2019.10
  33. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision [Preprint]. arXiv. https://arxiv.org/abs/2103.00020
  34. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2022). clip-ViT-L-14 [Computer software]. https://huggingface.co/sentence-transformers/clip-ViT-L-14
  35. Ratcliff, J. W., & Metzener, D. (1988). Pattern matching: The gestalt approach. Dr. Dobb’s Journal, 13, Article 46.
  36. Rehusevych, O., & Firman, T. (2020). movie2trailer: Unsupervised trailer generation using anomaly detection. In D. Tabernik, A. Lukezic, & K. Grm (Eds.), Proceedings of the 25th Computer Vision Winter Workshop.
  37. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. https://doi.org/10.18653/v1/d19-1410
  38. Richards, G. (2018, March 14). Going in deep: How have movie trailers changed in the last decade? Exit6 Film Festival Blog. https://www.exit6filmfestival.com/post/2018/03/14/going-in-deep-how-have-movie-trailers-changed-in-the-last-decade
  39. Rouard, S., Massa, F., & Défossez, A. (2023). Hybrid transformers for music source separation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece (pp. 1–5). https://doi.org/10.1109/ICASSP49357.2023.10096956
  40. Sentence Transformers. (2022). clip-ViT-L-14. https://huggingface.co/sentence-transformers/clip-ViT-L-14
  41. Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
  42. Smeaton, A. F., Lehane, B., O’Connor, N. E., Brady, C., & Craig, G. (2006). Automatically selecting shots for action movie trailers. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (pp. 231–238). ACM. https://doi.org/10.1145/1178677.1178709
  43. Smith, J. R., Joshi, D., Huet, B., Hsu, W., & Cota, J. (2017). Harnessing A.I. for augmenting creativity: Application to movie trailer creation. In Proceedings of the 25th ACM International Conference on Multimedia (pp. 1799–1808). ACM. https://doi.org/10.1145/3123266.3127906
  44. Snyder, B. (2005). Save the cat!: The last book on screenwriting you’ll ever need. Michael Wiese Productions.
  45. Tarwani, K. M., & Edem, S. (2017). Survey on recurrent neural network in natural language processing. International Journal of Engineering Trends and Technology, 48(6), 301–304. https://doi.org/10.14445/22315381/IJETT-V48P253
  46. Wasko, J. (2003). How Hollywood works. SAGE. https://doi.org/10.4135/9781446220214
  47. Xie, J., Chen, X., Zhang, T., Zhang, Y., Lu, S.-P., Cesar, P., & Yang, Y. (2023). Multimodal-based and aesthetic-guided narrative video summarization. IEEE Transactions on Multimedia, 25, 4894–4908. https://doi.org/10.1109/TMM.2022.3183394
  48. Xu, H., Zhen, Y., & Zha, H. (2015). Trailer generation via a point process-based visual attractiveness model. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15) (pp. 2198–2204). AAAI Press. https://dl.acm.org/doi/10.5555/2832415.2832554
  49. Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., & Wu, Y. (2019). Libritts: A corpus derived from LibriSpeech for text-to-speech. In Proceedings of the Annual Conference of the International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2441
  50. Zhou, H., Hermans, T., Karandikar, A. V., & Rehg, J. M. (2010). Movie genre classification via scene categorization. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 747–750). ACM. https://doi.org/10.1145/1873951.1874068
  51. Zhou, K., Qiao, Y., & Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI. https://doi.org/10.1609/aaai.v32i1.12255
  52. Zhu, J., Yang, H., He, H., Wang, W., Tuo, Z., Cheng, W.-H., Gao, L., Song, J., & Fu, J. (2023). MovieFactory: Automatic movie creation from text using large generative models for language and images. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9313–9319). ACM. https://doi.org/10.1145/3581783.3612707