I tend to write here about choice-based games as though they were all the same kind of thing, but that’s a perspective that very much comes from a history with parser IF and a tendency to distinguish clicking from typing. And I will freely admit that a few years ago I was pretty obtuse myself about the differences between types of interface option.
In fact there are many kinds of input methods for communicating the player’s decisions in a story, and many possible or actual variants, some of which allow the player to type a keyword from a set list, or search from among hidden choices by typing, or perform parser-like commands by pushing menu buttons. Seen in this light, the parser is part of a continuum with other input methods, not uniquely distinct from them.
Much has already been written, of course, about controllers and input for games. See for instance Tracy Fullerton’s Game Design Workshop for a detailed discussion of control design in general. And people continue to experiment, as in this alt controller jam. However, most of these focus on non-text-based or non-narrative games.
Here I’m going to discuss several input methods for text-based and/or highly narrative-focused games according to the following metrics:
How much effort is it? How expressive is it? How ambiguous is it? How discoverable is it? How much pressure is there? How much is the player required to embody the actions of the protagonist?
This is not the same as studying the verb set of the resulting games.
Effort refers to the amount of work the player has to do — sometimes actual physical work — in order to operate the machine. Dance-pads are high-effort; hypertext is low-effort. Typing is a bit more effort than clicking. Speaking might be more or less effort than typing, depending on how you feel about using your voice. High-effort input methods can intensify the sense of identification and complicity with the character, because we’re actually doing some form of work in tandem with them or on their behalf.
Expressiveness refers to how much information can be packed into a single action. Parser entry tends to be very expressive because the player is choosing a verb out of a large palette, as well as a noun with perhaps many viable options. Clicking links is relatively less expressive; entering natural language to be interpreted by a chatbot is relatively more expressive, though the latter runs into problems with ambiguity.
The more expressive a system’s input, the more the player can have a sense of intention — of planning and then executing the plan — within a single move. It is also possible to make plans that require multiple very simple moves to execute, of course, so this is just about the density of a single action.
Ambiguity refers to how the player’s input is mapped to world model changes, and whether that mapping is clear or unclear. All traditional CYOA is completely unambiguous: we make a choice, we go to a page. Parser IF tends to be unambiguous as well. We enter a command and we are immediately told whether that command succeeded or did not succeed, and we generally have a clear idea of precisely what we’re trying to achieve.
Some hypertext formats are more ambiguous, however: consider Twines in which it is not obvious which links will move the story forward and which will result in text cycling or a brief detour to a page of descriptive text. Other choice-based systems sometimes have vaguely-named options or conceal how their stats are changing in response to player selections.
Most ambiguous of all are things like chatbots or the natural language input in Facade, where it is hard to know exactly what the system is concluding from our input. Is it responding to keywords? To tone? To other aspects of the input? Is there any way to tell?
Discoverability refers to how hard it is to figure out a valid command. Highlighted hyperlinks are very discoverable. Hidden links and graphical interfaces with pixel-hunting are intentionally hard to discover. Parser systems are often unintentionally hard to discover.
Pressure refers to the stakes placed on successful and timely performance. If a way of indicating your choice is difficult (so that you might fail) or it’s on a tight timer, that’s high-pressure. Telltale adds pressure to many of its dialogue sequences by putting them on a timer.
Interactive film also tends to have built-in pressure, because your choice (often) needs to be registered with the system before the film hits the branching decision point. Alternatively, the film can just stop and wait for you, but that’s a bit immersion-breaking; this is an issue I imagine is likely also to come up in VR interactive stories quite a bit.
Embodiment refers to the player’s need to actually act out or at the very least perform actions physically analogous to what the protagonist is doing. This can create a strong sense of identification with the avatar, but at the high end requires a lot more space and technical setup to gauge, or else custom controllers.
Hypertext. [exADpm] Links are embedded in the text, and may also appear at the bottom. Low effort, low expressiveness, low pressure, highly discoverable, and with moderate-to-high ambiguity depending on how the author handles link labeling and formatting.
Some people find that ambiguity frustrating; for others, it’s one of the strengths of the medium. In Porpentine’s games, I often feel that I’m making a selection based on aesthetic sensibilities or a general sense about which words appeal to me most, not with any expectation of controlling the outcome.
Timed hypertext. [exADPm] Occasionally hypertext works on a timer — Twine is capable of changing text on the page, including removing elements, and this appears to good effect in for instance Detritus, where on one screen some of the things you might want to interact with vanish before you have a chance. The combination of ambiguity and pressure can be particularly stressful since there isn’t much time to think interpretively about one’s choices before they vanish. This may be why I’ve mainly seen this method deployed to communicate stressful situations, and generally not for the full duration of a piece.
Dragged commands. [exaDpm] Texture provides a system whereby the player drags one word over another in order to indicate an action. It’s discoverable because all the actions are clearly enumerated, and starting to drag one action automatically highlights all the places where that action will apply, so there’s no pixel-hunting. It’s potentially more expressive than the typical hypertext because the player is generally choosing both action and object — though there are fewer available actions and objects than in a parser game. And it requires mild effort. To my mind, this explores an interesting sweet spot in the control space, one that may enhance the sense of complicity and intentionality for the player while remaining extremely accessible.
The initially available version of Texture is a bit underpowered, in the sense of not allowing enough variable tracking and other state features to build substantial interactive stories, but I know that new features are being added and I’m looking forward to seeing what happens there.
Untimed choice selection from a short list. [exaDpm] This is the standard of book CYOAs, of ChoiceScript and Undum stories, and of Twine pieces which cluster options at the end of the page rather than as interspersed hypertext. Typically these options directly express what the protagonist is supposed to do or think next. Low effort, low expressiveness, low pressure, low ambiguity, but highly discoverable.
Timed choice selection from a short list. [exaDPm] As above, but you have a finite amount of time to make an entry before one is chosen for you. Still low on effort, expressiveness, and ambiguity, but with a bit more pressure than previously. This also raises the possibility that inaction is itself an interesting choice.
Multiple distinct choices available on a single node. [eXaDpm] Here, the player can make several decisions before moving forward. First Draft of the Revolution does this; likewise Dr. Sourpuss Is Not A Choice-Based Game, at least some of the time. Encouraging the player to combine multiple elements means that there’s more opportunity within a particular node to think through what a combination might mean, so the expressiveness is higher; but because all the options are enumerated, the process is still much more discoverable than in parser IF.
Quick Time Events. [ExaDPm] QTEs are the mainstay of Telltale games, and pop up elsewhere everywhere from the Arkham series to Heavy Rain: the game shows a diagram of what keys on the keyboard (or, often, the controller) correspond to which choices at the moment. Sometimes they ask you to mash a particular key repeatedly in order to express the character exerting effort, or tap out a particular sequence to express that the character does something requiring physical dexterity. These are still highly discoverable — there’s no unclarity about what you’re supposed to do — but now both pressured and effortful. Expressiveness and ambiguity remain low. Embodiment is medium — greater than in other methods we’ve seen so far, but still very abstract.
Arguably, though it’s technically not a QTE, one might also include jamming the keys on the keyboard in order to progress through a certain amount of text as in the opening of Winter Storm Draco, because it doesn’t matter which keys you type. That mechanic is playing with embodiment.
Typing keywords. [exaDpm] In Caroline, among others, there are a handful of keywords that will move the story forward from any given node, and those are explicitly presented to the player: somewhat more effort than a choice selection, but otherwise not much different. One can add pressure by timing this, or add ambiguity by offering less context about what the keywords are going to do.
Ruiness incorporates typing more rarely, and the idea is for the player to type city names they’ve discovered: this slightly reduces discoverability (you have to do more work to figure out what might be a viable input), but since most of the game is still hypertext-based, it’s possible to keep wandering around until you’ve found what you’re looking for.
Typing words into a parser. [eXadpm] The full-bore parser approach is moderate effort, high expressiveness, low discoverability, low pressure (except in a handful of realtime typing games, but that’s very rare), and low embodiment. Almost always, it’s also low ambiguity, but there are those rare occasions where the game is doing more than you realize with your input. (Warbler’s Nest is one of my favorite examples: a certain object has multiple names, and unlike just about every other parser, this time the game does track and does care which of those synonyms you use.)
Typing natural language dialogue. [eXAdPm] Expressiveness is obviously high: you can say anything you want. But you don’t necessarily know what the chatbot is taking from what you say. Is it just checking for keywords? Is it noticing tone and emotion? So you have high ambiguity and low discoverability unless you’re also supplying some additional helps, and this can result in some weird outcomes.
In Façade, this is combined with real-time activity from the NPCs, which is a high-pressure implementation; you could lower the pressure by doing this as a turn-based thing, but at the expense of some immersion. There are further interesting possibilities in this space as natural language processing improves.
Moving through a textual layout. [ExaDpm] Loose Strands and Device 6 both lay out their text in such a way that you can make selections within the story by how you choose to scroll through the text — swiping one direction or another, moving across a textual landscape. This doesn’t initially feel like a huge amount of effort, but it does involve some, and it’s more engaged and haptic than clicking hypertextual links. Expressiveness is limited, though: you have the choice of one of four directions, and typically it’s more like two or three. As for embodiment, this takes us even further away from it.
Moving through a 2D or 3D world. [ExADpm] I intensely dislike the phrase “walking simulator,” since it usually indicates contemptuous hostility on the part of the person who uses it. But it is fair to say that Everybody’s Gone to the Rapture or Dear Esther or Gone Home rely heavily on space-traversal as a way to express what the player is interested in learning about next, and occasionally to express more significant choices. I’ve occasionally referred to this as a narrative-by-exploration game, but there are other things you could describe that way.
Space-traversal is slightly expressive — there are many points in a modeled space that you could choose to occupy — but the occupation of space itself typically doesn’t communicate a great deal to a game except in specialized circumstances (standing on a pressure plate, being in the line of fire).
It’s also moderate to high effort, since you have to be driving your protagonist around the map. The amount of pressure depends on whether other timing and gating mechanisms exist in the world. Discoverability is a little unpredictable; it’s possible for the player to miss an important exit or event but just not seeing the relevant item on screen or facing the wrong direction at the wrong time.
Finally, it’s higher in embodiment than most of what we’ve seen so far.
Voice selection of a keyword. [ExaDpM] Codename Cygnus works by having the player speak a command. Medium effort, low ambiguity, low pressure, low expressiveness, high discoverability (you’re generally told exactly what the options are).
It could become more expressive if the voice input filters were able to make use of additional information about the player — pitch, emotional qualities — but then this would also make it more challenging for the player to enact the decisions they want to make.
There’s no reason you couldn’t do all the same variants here that you do with typing: voice-based parser input and voice-based natural language, with or without a real-time component to provide pressure. I know of a handful of experiments in these directions: for instance, Home Sweetie-Bot Home is designed to take voice input to a parser system.
Specialist props. [EXa(d?)pM] Some games have physical props that function as controllers and blur the line between the player’s world and the protagonist’s: this is why so many of us have old plastic guitars somewhere in our houses. This is not limited to musical instruments, though.
At GDC last year, there was a special handmade controller that let the player do hand gestures on a spell book in order to cast certain spells in a game called Book of Fate: it was (as you might imagine) a little bit finicky, but when it worked, the effect was rather cool. Book of Fate is not a text-based game, but spell-casting is such a common feature of old-school text adventures that it is easy to imagine how one might adapt that kind of controller to a text game if one wished to do so. Discoverability depends a lot on the specifics of the prop.
Escape rooms and functioning LARP props are arguably the ultimate extension of this, since they’re putting the player physically into a space and removing any difference between avatar and protagonist. But at that point, the input is also often being interpreted by a human being at the other end.
Whole-body engagement. [EXA(d?)PM] The standard existing examples are dancing games, VR, etc. Discoverability depends a lot on messaging; input is fuzzy and therefore likely always to be a bit ambiguous; the amount of effort means that some of these games are accessible only to the able-bodied and energetic.
I don’t know of any examples where this kind of input is combined with text output, except maybe, depending how you think about it, With Those We Love Alive – where the effect is being used purely for expressiveness and is not being “read” by the game itself in any way.
I’ve probably omitted some metrics, and possibly there is writing about this that I haven’t seen, as well, so if I’m ignoring something I should be mentioning, apologies, and please feel free to drop additional thoughts in the comments.