We have seen Large Language Models (LLMs) being used to transform ‘napkin sketches’ into wireframes and convert wireframes into working applications. However, we were unable to find any explorations of using LLMs to turn a list of UX requirements directly into a wireframe.

This post goes over some of our experiments and results with the hope that this inspires others to share their work, as well.

We approached the problem from two different directions – HTML/CSS and free-floating axis-aligned bounding boxes (AABB):

Beyond basic prompt engineering, we explored several representations per each approach (links to images of results below)

HTML/CSS

AABB

Requirements input to LLM:

  • Player hands: Three distinct areas showing the cards held by each player (face-up for the active player, face-down for others)
  • Discard pile: A central area showing the top card of the discard pile
  • Draw pile: A face-down stack of remaining cards
  • Turn indicator: A visual cue showing which player’s turn it is
  • Card count: A display of how many cards each player has left
  • Action area: A space where special actions (like drawing cards or skipping turns) are animated
  • Message area: A space for game messages (e.g., “Player 2 must skip their turn”)
  • Score display: If playing multiple rounds, an area showing each player’s current score
  • Settings/menu button: For accessing game options or returning to the main menu
  • Player avatars or names: To easily identify each player

Results:

CSS Flow Layout

CSS Flexbox Layout

CSS Grid Layout

Bootstrap

Bulma

Tailwind

JSON

YAML

XML 

We also ran our experiments with several LLMs:

  • Claude 3.5 Sonnet
  • GPT 4o
  • OpenAI o1-preview
  • Llama 3.1 405b
  • Mistral large 

We observed that Claude 3.5 Sonnet was the winner in several key areas: accurately translating requirements, addressing the maximum number of requirements, and generating high-quality code with appropriate utility classes and UI components. The wireframes generated had better visual hierarchy, semantic consistency, and well-proportioned elements compared to outputs from other models.

Requirements input to LLM:

  1. Order Display: A section of the screen that shows the current order to be fulfilled, consisting of the required rice, fish type, and side dish.
  2. Timer: A progress bar timer (starting from 15 seconds) to complete the current order.
  3. Ingredient Cards: A collection of 16 cards representing the available ingredients (rice, fish types, and side dishes) that the player can select from to fulfill the order. 
  4. Selection Area: A designated area where the player can drag and drop or click to select the cards representing their chosen ingredients for the current order.
  5. Score/Money Display: A display showing the player’s current score or money earned, based on correct orders and penalties.
  6. Level Indicator: An indicator showing the player’s current level, which affects the reward and penalty values.
  7. Streak Counter: A display tracking the player’s current streak of consecutive correct orders.
  8. Power-up Icons: Icons or buttons representing the available power-ups (Extra Time, Double Reward, Reroll Order) and their remaining uses.
  9. Power-up Activation Buttons: Buttons or controls to activate the respective power-ups when needed.
  10. Game Status Display: A display area showing the game’s current status, such as “Order Correct,” “Order Incorrect,” “Game Won,” or “Game Over.”
  11. Pause/Resume Button: A button to pause or resume the game, if applicable.
  12. Restart/Quit Buttons: Buttons to restart the game or quit and return to the main menu.
  13. Customer Avatar: Avatar for customer placing the order.

Results:

Claude 3.5 Sonnet

GPT 4o

OpenAI o1-preview

Meta Llama 3.1 405B

Mistral Large

Prompt Engineering

  • Chain of Thought (CoT) prompting
    • Using Auto-CoT: We included phrases like ‘think step by step’ in our prompt. This approach proposed by Zhang et al. (2022) eliminates the need for manual effort in generating reasoning chains.
    • Few-Shot CoT: While Auto-CoT is powerful, we found that combining it with few-shot examples led to even better results. The expanded requirements provide a detailed spatial layout and hierarchical structure, serving as a mental map for the LLM. This approach guides the LLM to think in terms of specific visual organization and spatial relationships, leading to more coherent and practical UI designs.

For instance, we included an example like this:

  • Structured prompting: Claude is familiar with processing structured inputs. Using XML-like tags such as <example>, <styling_rules>, and <output> helps compartmentalize information, providing clear context and structure.

Conclusion

Based on our experiments, as illustrated in the figure (link), we found that HTML/CSS typically produced clean and balanced wireframes. These resulted in consistent card sizes and alignment. Among the frameworks tested, Tailwind CSS yielded the best results. We observed that the wireframes exhibited several desirable qualities: balance, well-defined action areas, appropriately sized elements, and consistent spacing and provided a clearer representation of the game state while appearing visually appealing and user-friendly. In contrast, the AABB approach struggled with clutter and inconsistent card sizes.

Final prompt used


Haven Studios is a game development studio with a goal to create systemic and evolving worlds focused on freedom, thrill and playfulness that will keep players entertained and engaged for years. Haven joined Sony Interactive Entertainment, PlayStation Studios in 2022 as the first Sony game development team based in Canada. Rucha Shende is an ML Researcher at Haven Studios.