{"id":3010744,"date":"2025-12-23T10:00:00","date_gmt":"2025-12-23T18:00:00","guid":{"rendered":"urn:uuid:e6a11d51-b374-4784-a699-ea3c84b3b710"},"modified":"2026-01-30T09:21:12","modified_gmt":"2026-01-30T17:21:12","slug":"vision-language-models-for-quality-assurance","status":"publish","type":"research-post","link":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/","title":{"rendered":"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance"},"content":{"rendered":"\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:70%\">\t\t<div\n\t\t\tclass=\"wp-block-sie-social-share social-share--raw\"\n\t\t\tdata-aa-modulename=\"social-share\"\n\t\t\tdata-url=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/\"\n\t\t\tdata-title=\"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance\"\n\t\t\tdata-heading=\"Share this\"\n\t\t\tdata-bluesky-title=\"\"\n\t\t\tdata-twitter-title=\"\"\n\t\t\tdata-twitter-hashtags=\"\"\n\t\t\tdata-reddit-title=\"\"\n\t\t\tdata-email-subject=\"\"\n\t\t\tdata-email-body=\"\"\n\t\t\t>\n\t\t<\/div>\n\t\n\n<div class=\"post-author\" data-no-of-bylines=\"5\"><ul><li>Nabajeet Barman<span class=\"sie-style-body-small\">Sr. Research Scientist, Sony Interactive Entertainment<\/span><\/li><li>Abhijay Ghildyal<span class=\"sie-style-body-small\">Machine Learning Engineer, Sony Interactive Entertainment<\/span><\/li><li>Saman Zadtootaghaj<span class=\"sie-style-body-small\">Sr. Researcher, Sony Interactive Entertainment<\/span><\/li><li>Mohammad Reza Taesiri<span class=\"sie-style-body-small\">Postdoctoral Researcher, University of Alberta<\/span><\/li><li>Cor-Paul Bezemer<span class=\"sie-style-body-small\">Assoc. Professor, University of Alberta<\/span><\/li><\/ul><\/div>\n\n\n<p class=\"sie-paragraph sie-paragraph-5aee6b63-b432-4b63-b7bb-35463d56b1fd\">With video games now generating the highest revenues in the entertainment industry,&nbsp;optimizing&nbsp;game development workflows has become essential for the sector&#8217;s sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to&nbsp;automate and&nbsp;enhance various aspects of game development, particularly Quality Assurance (QA), which&nbsp;remains&nbsp;one of the industry&#8217;s most labor-intensive processes.&nbsp;To accurately evaluate the performance of VLMs in video game QA tasks and&nbsp;determine&nbsp;their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain&nbsp;which&nbsp;tend to focus heavily on complex mathematical or textual reasoning tasks, overlooking essential visual comprehension tasks fundamental to video game QA.&nbsp;<\/p>\n\n\n\n<p class=\"sie-paragraph sie-paragraph-a1e7783d-024e-4c58-bf96-16b32146348e\">To bridge this gap, we introduce&nbsp;VideoGameQA-Bench, a comprehensive benchmark&nbsp;of 16 state-of-the-art VLMs&nbsp;that covers a wide array of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack tasks, glitch detection, and bug report generation for both images and videos of various games.&nbsp;<\/p>\n\n\n\n<p class=\"sie-paragraph sie-paragraph-d3ee9e94-fc61-4011-af84-88b481d49a03\">The video game QA process can&nbsp;generally be&nbsp;abstracted into three main types of tasks:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"sie-paragraph sie-paragraph-10b116ec-5b28-4873-bfe1-76b5d1b4f7a8\"><strong class=\"sie-paragraph sie-paragraph-10b116ec-5b28-4873-bfe1-76b5d1b4f7a8\">Verifying&nbsp;scene integrity<\/strong> by comparing the visual representation of scenes against intended configurations and known reference states, such as an oracle or previously&nbsp;rendered&nbsp;versions of the same scenes.&nbsp;<\/li>\n\n\n\n<li class=\"sie-paragraph sie-paragraph-e1cac24b-6570-4e59-8e96-5b4cd447c4b4\"><strong class=\"sie-paragraph sie-paragraph-e1cac24b-6570-4e59-8e96-5b4cd447c4b4\">Detecting glitches<\/strong> through open-ended exploration-these glitches are unintended gameplay or visual artifacts without specific reference points, requiring testers to rely on common sense and general knowledge for detection.&nbsp;<\/li>\n\n\n\n<li class=\"sie-paragraph sie-paragraph-8b15a292-295b-490d-bcd5-d486bb3e5429\"><strong class=\"sie-paragraph sie-paragraph-8b15a292-295b-490d-bcd5-d486bb3e5429\">Systematically reporting and documenting<\/strong> all identified glitches, ensuring developers receive clear and actionable information to address problems effectively during game development.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p class=\"sie-paragraph sie-paragraph-4be6b4c8-ff9c-4810-bca3-37a3ff582947\">The&nbsp;results&nbsp;highlight that&nbsp;while&nbsp;current&nbsp;VLMs&nbsp;show&nbsp;promising performance&nbsp;in&nbsp;identifying&nbsp;many visual issues and generating useful bug descriptions, they continue to struggle with fine-grained visual details, subtle regressions, and precise&nbsp;pinpointing&nbsp;of glitches&nbsp;in longer&nbsp;video&nbsp;clips.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"sie-paragraph sie-paragraph-95c25226-0b5a-4603-b570-73c91222cae2\">For more details on this work, detailed&nbsp;results&nbsp;and findings, please visit the project page&nbsp;<a class=\"sie-paragraph sie-paragraph-95c25226-0b5a-4603-b570-73c91222cae2\" href=\"https:\/\/asgaardlab.github.io\/videogameqa-bench\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.&nbsp;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n\t\t<div class=\"related-posts__container\" data-selection-type=\"latest\" data-post-type=\"research-post\">\n\t\t\t<h2 class=\"related-posts__heading sie-style-h5\">Latest Research Posts<\/h2>\n\t\t\t<div class=\"related-posts related-posts--vertical\">\n\t\t\t\t<article class=\"related-post related-post--research-post\"><div class=\"related-post__content\"><h3 class=\"related-post__title sie-style-body-small-v2\">\n\t\t<a href=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/perceptually-guided-3dgs-streaming-and-rendering-for-mixed-reality\/\">\n\t\t\tPerceptually Guided 3DGS Streaming and Rendering for Mixed Reality\n\t\t<\/a>\n\t<\/h3><\/div><\/article><article class=\"related-post related-post--research-post\"><div class=\"related-post__content\"><h3 class=\"related-post__title sie-style-body-small-v2\">\n\t\t<a href=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/content-adaptive-encoding-for-interactive-game-streaming\/\">\n\t\t\tContent Adaptive Encoding for Interactive Game Streaming\n\t\t<\/a>\n\t<\/h3><\/div><\/article><article class=\"related-post related-post--research-post\"><div class=\"related-post__content\"><h3 class=\"related-post__title sie-style-body-small-v2\">\n\t\t<a href=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/learning-representations-in-video-game-agents\/\">\n\t\t\tLearning Representations in Video Game Agents with Supervised Contrastive Imitation Learning\n\t\t<\/a>\n\t<\/h3><\/div><\/article>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-sie-scroll-to-top\" data-aa-modulename=\"sie-scroll-to-top\"><button class=\"sie-btn sie-btn--action\" type=\"button\"><span>Back to top<\/span><\/button><\/div>\n","protected":false},"author":15,"parent":0,"template":"","byline":[381,383,384,382,385],"research-post-category":[],"class_list":["post-3010744","research-post","type-research-post","status-publish","hentry","post-vision-language-models-for-quality-assurance"],"ab_tests":{},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.4 (Yoast SEO v27.4) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance - Sony Interactive Entertainment<\/title>\n<meta name=\"description\" content=\"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance\" \/>\n<meta property=\"og:description\" content=\"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/\" \/>\n<meta property=\"og:site_name\" content=\"Sony Interactive Entertainment\" \/>\n<meta property=\"article:modified_time\" content=\"2026-01-30T17:21:12+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data2\" content=\"Nabajeet Barman\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/innovation\\\/research-academia\\\/research\\\/vision-language-models-for-quality-assurance\\\/\",\"url\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/innovation\\\/research-academia\\\/research\\\/vision-language-models-for-quality-assurance\\\/\",\"name\":\"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance - Sony Interactive Entertainment\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/#website\"},\"datePublished\":\"2025-12-23T18:00:00+00:00\",\"dateModified\":\"2026-01-30T17:21:12+00:00\",\"description\":\"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/innovation\\\/research-academia\\\/research\\\/vision-language-models-for-quality-assurance\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/innovation\\\/research-academia\\\/research\\\/vision-language-models-for-quality-assurance\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/innovation\\\/research-academia\\\/research\\\/vision-language-models-for-quality-assurance\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/\",\"name\":\"Sony Interactive Entertainment\",\"description\":\"Pushing the Boundaries of Play\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/sonyinteractive.com\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance - Sony Interactive Entertainment","description":"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/","og_locale":"en_US","og_type":"article","og_title":"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance","og_description":"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).","og_url":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/","og_site_name":"Sony Interactive Entertainment","article_modified_time":"2026-01-30T17:21:12+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes","Written by":"Nabajeet Barman"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/","url":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/","name":"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance - Sony Interactive Entertainment","isPartOf":{"@id":"https:\/\/sonyinteractive.com\/en\/#website"},"datePublished":"2025-12-23T18:00:00+00:00","dateModified":"2026-01-30T17:21:12+00:00","description":"Vision-Language Models (VLMs) offer considerable potential to automate game development Quality Assurance (QA).","breadcrumb":{"@id":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sonyinteractive.com\/en\/innovation\/research-academia\/research\/vision-language-models-for-quality-assurance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sonyinteractive.com\/en\/"},{"@type":"ListItem","position":2,"name":"VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance"}]},{"@type":"WebSite","@id":"https:\/\/sonyinteractive.com\/en\/#website","url":"https:\/\/sonyinteractive.com\/en\/","name":"Sony Interactive Entertainment","description":"Pushing the Boundaries of Play","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sonyinteractive.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/research-post\/3010744","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/research-post"}],"about":[{"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/types\/research-post"}],"author":[{"embeddable":true,"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/users\/15"}],"version-history":[{"count":5,"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/research-post\/3010744\/revisions"}],"predecessor-version":[{"id":3010976,"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/research-post\/3010744\/revisions\/3010976"}],"wp:attachment":[{"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/media?parent=3010744"}],"wp:term":[{"taxonomy":"byline","embeddable":true,"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/byline?post=3010744"},{"taxonomy":"research-post-category","embeddable":true,"href":"https:\/\/sonyinteractive.com\/en\/wp-json\/wp\/v2\/research-post-category?post=3010744"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}