Transcript of Aug 9 ifMUD Discussion on Testing


Automated and Human Testing
Hard to find edge cases
Bug tracking
Tester access
Types of tester input
Getting testers with particular backgrounds
The IDE skein
Graphing and visualization




Emily says, “right! so it is the hour of theoryclub”
Gunther says, “not to mention the winter of our discontent with bugs”
Emily says, “our topic tonight is testing, and there are sort of two sides to what I was hoping we could discuss: on the one hand, tricks and tips and horror stories from a craft perspective, but also on the other hand how testing plays into the relationship with the audience”
Zach says, “ooooh, scary stories”
Emily says, “but let’s start with the first part of that: let’s talk about specific experiences with testing”
DavidW says, “I remember that Domicile got some flak because its About text admitted the author knew about bugs in the games yet released it anyways.”
zarf says, “raise your hand if you’ve ever released an IF game that you believe had zero bugs”
Emily sits on hands
DavidW says (to zarf), “ha ha”
zarf says, “right”
Roger says, “The first tester is the author, I guess.”
Gunther asks, “are we counting library bugs?”
Nitku says, “well there are bugs and there are bugs”
Emily says (to Gunther), “no.”
Zach asks, “Hmm, is it different if the bugs are known rather than unknown?”
Jota says, “There’s a difference between knowing about bugs and knowing that there are bugs that you don’t know about.”
Gunther says, “then Breaking the Code!”
Emily says (to Gunther), “(automatic drum roll)”
Gunther says, “(yes, it needed two releases because the first one screwed up the code)”
zarf says, “there are always bugs that I know about which are too minor for my level of pre-release exhaustion”
zarf says, “so no quibbling about that”
Roger says, “Surely there are bug-free CYOA”
zarf says, “but I do not mean to distract, only to set expectations”


Emily asks, “so does anyone want to talk about a testing procedure/etc. that they have found useful and would like to share?”
DavidW says (to Emily), “hm. I remember when testing Counterfeit Monkey discovering I couldn’t make a shrimp cocktail and you laughed yourself silly then added it. :)”
Emily says (to DavidW), “the shrimp cocktail was indeed a fine addition”
Roger says, “I tried implementing a classic test-driven-design framework, which worked, I think, but wasn’t worth its weight, really”
Zach says, “I don’t mean to distract either, but as more of a player than an author, I would appreciate tips to make me a better tester.”
Gunther says, “there does not appear to be a more useful technique than having lots of beta testers”
DavidW says (to Zach), “Make transcripts, for one thing.”
DavidW says, “I remember making lots and lots of notes when testing Beyond, particularly when the English translation wasn’t the best.”
Roger says, “livetesting via clubfloyd with the author in attendance has been an interesting development”
Emily says, “I’ve become increasingly enthusiastic about adding lots of automated tests”
Gunther says (to Emily), “those are good for puzzles and regression testing, but not for what happens once a human gets his hands on it”
Emily says (to Gunther), “sure, but you need both”
zarf says, “I did not think about automated testing until I started Hadean Lands, which has this vast quantity of internal mechanism. Now I have bazillions of tests”
zarf says, “because I was scared that I would break some tiny case of ‘ring bell 4 in this specific situation’ nine months after implementing it”
Roger says, “hmmm well it seems to me there’s sort of two related things: testing whether the game does what the author intends, and testing whether the game does what the player intends.”
zarf says, “I imagine that CM had similar requirements”
Emily says, “especially with Monkey, I wound up writing lots and lots of tests at the same time that I wrote the code, essentially to say “I think I’m writing code that will cover the following situations, but let me actually make a test to try all those commands””
zarf says (to roger), “certainly”
DavidW says, “re: Beyond, I even went to far to suggest that the rood needed a better description so the audience would a) understand what it was, b) wouldn’t think it was a misspelling.”
Emily says, “especially because that was a case where there were a very large number of objects some of which testers might never encounter”
DavidGriffith exclaims, “oh goody!”
Emily says, “so there’s the “does this piece behave according to spec” question and then there’s the “is the spec correct/sufficient to provide a good experience to the player” question”
Emily says, “and also the “not all players are alike, so how do other types of player than the one you imagined feel about your game” question”
zarf says, “automated testing is all about the former, as far as I can tell”
Emily says, “well, mostly, though I also find it really useful to add testing verbs that in some way tell me how many types of solution there are for a given puzzle”
Emily says, “which is kind of on the edge — it’s verifying that the game works right but also giving me authorial insight on whether I have what is likely to be adequate coverage”
zarf asks, “hm. You mean, auto-search through the game mechanics to count outcomes?”
Emily says (to zarf), “I mean “this puzzle requires a long straight item, but I’ve been writing this game for three years so I’ve forgotten what all falls in that category — let’s list them””
zarf says, “aha”
zarf says, “that’s a kind of game that HL isn’t”
zarf says, “but I can see it”
DavidW says, “I’m not sure how this fits into testing exactly, but one of my long long delayed WIPs uses XYZZY to generate a random command suitable for the location and in-scope items. It isn’t very systematic, but it does catch some things.”
Gunther says, “automated testing also makes sure, as zarf said, that stuff continues to work”
Gunther says, “I’d be interested to hear if anyone’s actually used it for TDD”
Gunther says, “(write tests first, then code until all tests pass)”
baf asks, “TDD?”
Gunther says (to baf), “test-driven development”
Roger says, “It’s philosophically quite similar to writing the transcript first, which of course is very old”
zarf says, “I’ve written testing commands like ‘list all rooms and objects that lack a description’”
Emily says, “oh, yeah, that too”
Emily says, “‘is my coverage adequate?’”
Ghogg says, “hm, I don’t think I’ve ever added rooms/object without also doing the description at the same time”
Zach says, “Hmm, that makes me wonder if it were possible to write some sort of testing bot.”
Zach says, “That would test every possible combination of items and commands to see if something breaks.”
zarf says (to ghogg), “I always *think* I’ve been careful about that…”
Gunther says, “twitch plays IF”
DavidGriffith says, “;If anyone; if anyone has any questions about Inform6 6/12, I’m here.”
Emily says, “Versu does autotesting, so you can run e.g. 100 runthroughs of a story and see what all outcomes result, but that’s possible because the affordances at any given moment are explicit and usually less than 20″
zarf says (to zach), “that is possible in restricted domains”
DavidW says, “I tend to put a TODO into any inadequate description.”
Zach asks, “Wait a minute, why does *my* description have TODO in it?”
zarf says, “it’s not generally possible for typical IF, unless maybe you have a very specific definition of ‘breaks’”
Ghogg says, “what I tend to be missing is words mentioned in room descriptions that should be examinable but I’ve never made the objects to begin with that would go with them”
Roger says, “It does get prohibitive right around >SAY [arbitrary list of alphanumeric characters]” which is not literally infinite”
Zach says (to zarf), “Right, or the author is willing to read through the entire random transcript.”
DavidGriffith says, “I find it helpful to give each room its own file”
Roger says (to Ghogg), “That’s the whole [every important thing in brackets] principle or whatever that was called”
Emily says (to Roger), “bracket every notable thing
DavidW says, “Something I notice in I7 authoring is accidentally creating two similarly-named items when only one was wanted.”
Roger says, “Yeah, that’s it”
baf says, “A generalized bot seem like it would be good for answering ‘Does it crash’ and ‘Does it throw errors’, but not ‘Does this produce the output I want’. If you’ve got so many cases that it’s unwieldy to try them all, you’ve probably go so many cases that it’s unwieldy to read them all.”
DavidW says, “And I’m not sure how to catch that without reading the World list.”
Fang says, “are actually software bug bugs really an issue with IF? It seems the more typical ‘bug’ is really things behaving in ways the player does not expect. Which seems impossible to catch via an automated tool”
Emily says (to Fang), “sure they are”
zarf says, “actual software bug bugs happen”
Ghogg says (to Fang), “yeeeeah they are just caught more often in the final version”
Emily says, “I’ve had runtime errors, loops, hanging, various fun things”
Ghogg says, “(I had two playing through v2 of Invisible Parties, which should be fixed in v3)”
Nitku says, “compatibility with interpreters”
Emily says (to Nitku), “yeah, that’s a big one”
DavidW says (to Fang), “I played a game where the BUY command was coded so poorly, the PC could buy himself for $2 and the game attempts to put the PC into his own inventory. Which crashes the game, of course.”
zarf says, “and even things behaving weirdly can be caught by automated tools. Simplest case is just a run-command-check-outcome test”
Fang says (to DW), “that sounds sorta awesome”
Nitku says, “There was a game where you could eat yourself”
DavidW says, “There’s a fun bug in Last Sonnet of Marie Antoinette where I attempted to cut the buttery yellow force field with the butter knife. Hilarity ensued.”
Emily says (to DavidW), “dude, I barely remember that game at all and yet you recollect errors from it”
DavidW says (to Emily), “It was a fun error!”
Emily says (to zarf), “I did use Object Response Tests to hammer on objects in CM, though at some point my eyes always glazed over looking at the output for N verbs x hundreds of objects”
zarf says, “I don’t go in that direction”
Emily says, “mm”
zarf says, “but what I mean is, if you’ve implemented something and then written a test for it, that can save your ass from that stuff breaking *later*. Even when the breakage is of the form ‘can’t put things into the backpack any more’”
Ghogg says, “conclusion: just have DavidW do all your testing”
Roger says, “And of course the classic GIVE ME YOU bug in the Long Glass of Sherbet
Emily says (to zarf), “yes, that’s a fair point”
zarf says, “which would qualify as a ‘world does not behave as user expects’ bug’”
baf says, “Bugs can be memorable. One of my strongest memories of Savoir Fairewas a bug. You probably know the one I’m talking about. That’s how memorable it was.”
Emily . o O ( why did I set the topic to this again? )
Zach says, “heh”
Jota says (to baf), “I do not know which one you’re talking about.”
baf says (to Jota), “I was really addressing Emily there.”
Emily says (to baf), “I assume you mean the bug where you could enter the class hierarchy”
Emily says, “though the “doors floating on puddles” bug was also classic, I think that didn’t make it past beta”
Gunther says, “of course, one of the most famous bugs is Once And Future‘s “THROW ME ON GRENADE” not working years and years after the author mentioning it needed fixing”


Emily says, “someone just a couple days ago sent me a CM bug in which it’s apparently possible to stick your possessions into an NPC’s backpack and be unable to get them back”
Ghogg says, “so far I have heard of none of these bugs everyone is chattering about”
Emily says, “which must have been there all along, but no one mentioned it to me before now”
Emily says, “so even a largeish number of players doesn’t guarantee a particular outcome”
baf asks, “Hm. So, taking that backpack bug as an example: what would catch that?”
baf says, “Humans failed. Automated testing would probably fail too, because it’s a combination of two actions that you wouldn’t think of in setting up the tests.”
Emily says (to baf), “yeah, I’m not really sure”
Nitku says, “Conceivably you could have an automated test that tries to put everything in and on everything else and if it can’t pick them up anymore it’s a bug”
Gunther says (to Nitku), “yes, except what if it only triggers after the NPC leaves the current room”
Nitku says, “well there are always more edge cases”
Gunther says (to Nitku), “exactly.”
zarf says, “you never write a tool to catch the bugs you didn’t think of at all”
Nitku says, “but not writing a test because it doesn’t catch absolutely everything doesn’t sound productive”
Ghogg says (to Emily), “well, I think part of the problem is just because a player hits a bug doesn’t mean they report it”
Ghogg says, “it tends to be a fair amount of extra effort for me to put a bug report together and in some cases I just gave up”
Ghogg says, “anyway, something like auto-transcript for online play so the author can scan for bugs might help”


Roger asks, “So how does everyone like to handle their stack of bugs, anyway? Something as formal as mantis?”
zarf says, “hah”
Emily says (to Roger), “dear god no”
DavidW says, “I remember crashing Hitchhiker’s Guide to the Galaxy by putting the cardboard box into the thing, the thing into your robe’s pocket, and the robe into the cardboard box.”
DavidGriffith says, “I write bugs into a TODO or BUGS file and go from there”
Emily says, “for CM, I have a text file into which I paste bug reports when they come in and then when they’re fixed I move an annotation to the change log”
harpua says (to roger), “I use Bitbucket for source control which also has a nice issue tracking system”
zarf says, “now I’m thinking of the Inform library bug where “GET X, Y, AND Z” doesn’t work. It’s never worked. But nobody noticed for years, or if they did, they didn’t report it”
zarf says, “I go with the text file also”
Emily says, “I feel like something like Mantis is only useful if I wanted to communicate with the outside world about my bugfixing progress”
Emily says, “given that on most of this stuff I’m a team of 1″
Emily says, “yeah, there was another Ultimate Quest issue where I had a bunch of autotesting that was reasonably sound, but there was a situation that occurred only when a particular scene was running, and the autotests didn’t trigger that so didn’t catch it”
Emily says, “on the “what can beta-testers do” front, there are different types of testers”
Roger says, “DavidW and everyone else”
Emily says, “I think the old article about LICK PARROT is still pretty accurate to my experience with testers”
baf says, “Seems like IF has an advantage over a lot of other software here in that it’s trivial to record exactly how a bug manifests and exactly what sequence of input led to it.”
Ghogg says, “well, as long as you aren’t using a randomizer”
zarf says, “mostly trivial. :)”
Gunther says, “here’s an anecdote: I mentioned once my game had a fixed character. The tester was extremely confused why the character was castrated.”
baf says, “In the systems I deal with at work, the biggest challenge in debugging is clarifying exactly what the tester is talking about.”
zarf says, “multimedia / hyperlink issues are classically a pain to track down. But yes, in “most cases”, things are repeatable”
Emily says, “it seems like in theory it’s probably possible to do an autorun through Twine or ChoiceScript games”
Emily says, “I don’t know in practice how much that is actually done”
Emily says, “but they do list all possible affordances, in theory, so I would think you could do it”
zarf says, “I don’t know either”
GDorn says, “Selenium would be a decent tool for running through Twine games”
zarf says, “surely the choicescript people, at least, have thought about this”
DavidW says, “I hope so.”
Emily says (to zarf), “I have the feeling dfabulich has actually said something to me about this and I’ve just forgotten what that something was”
Emily says, “one thing we found with Versu autotesting, perhaps semi-obviously, is that while you can get some interesting information from autotesting, the percentage of times the bot hits a particular outcome does not necessarily correspond at all well to how often a human would hit that outcome”
Emily says, “since the bot is much more likely to pick semi-boring choices or ignore romantic options or whatever”
baf says, “In the Shufflecomp testing ring, one of the games I was given was in Twine. I had no idea what to do.”
Emily says (to baf), “I guess there’s no way to retain a transcript”
DavidW says (to baf), “yeah, there’s no transcript, no room names to refer to”
Ghogg says, “choice of games seems to do old fashioned beta testers”
Ghogg says, “and they do have a fair number of bugs and updates”
Emily says (to Ghogg), “that doesn’t mean they don’t autotest, though”
baf says, “I guess it wasn’t just that it was in Twine, it was that it was in very simple Twine, not using any exotic features. It was like being asked to debug a sheaf of papers.”
Emily says, “I think it’s a both-and thing, ideally”
Emily says (to baf), “ah”
Ghogg says (to Emily), “they don’t to my knowledge”
Nitku says, “You can share Twine saves and the URL contains the choices”
Emily says, “well, in that case maybe it becomes like beta-reading fanfic”
Emily says, “‘this bit didn’t make sense, over here you describe Ginny Weasley as having black hair’”
Ghogg says, “quite a few of the bugs that come up are hard to catch automatically, anyway. THey’ll be something like a hard-coded ‘he’ or ‘she’ on a character where the gender is supposed to be selected”
Roger says, “I shudder to think what testing that sinking-pirate-boat must have involved”
Ghogg says, “or some set combination of stats that is supposed to let a certain action work has it fail anyway”
Emily asks (of Roger), “Captain Verdeterre’s Plunder?”
Roger says, “Yup”
DavidW says, “ooh, that would be a difficult game to test, yeah.”


Emily asks, “how do people feel about giving beta-testers game versions that let the testers skip ahead or otherwise have access to cheats?”
DavidGriffith says, “never thought about it before.”
zarf says, “Generally I don’t”
zarf says, “In a future Hadean Lands test I might hand out a save file for ‘some distance in’, but maybe not”
DavidW says (to Emily), “Beyond’s testers had that option. There were goto chapter commands.”
Ghogg asks, “how long is HL?”
Busta says, “I’m cool with giving the testers whatever they feel they need”
zarf says (to ghogg), “looooong”
Emily says, “I generally don’t like to do that because I want information about what it feels like to actually play, from the perspective of total ignorance, but that gets to be a harder line to hold when you’re testing a big game”
Ghogg asks (of zarf), “even when you know what to do?”
Ghogg says, “from your descriptions I don’t think HL would allow jumping very well”
zarf says, “yes”
Ghogg says, “but in a very long chapter-structure game I sdon’t see why you wouldn’t let them jump, especially so you can say ‘ok can you go back and test chapter 5 I think it’s fixed now’”
zarf says, “I jump myself around all the time, but that’s to test specific things”
Busta says, “The problem with jumps is if there are choice-based options in the beginning that affect later portions of the game”
Ghogg says, “of course in your case they aren’t just testing for bugs but structure/do the puzzles make sense/etc”
DavidGriffith asks, “how about coding in a cheat to jump ahead to a specific point in the game rahter than allowing debug commands?”
zarf says, “the problem is that I’m not absolutely convinced that my debug jump commands produce the same kinds of world state as actually playing through”
Ghogg says, “if you let too many testers cheat then they won’t necessarily spot that puzzle #5 is way too abstruse to solve”
baf says, “Well, there are sort of two levels of testing. You want “user experience” testers, but you also want “scrape parrot with longbow” testers.”
zarf says, “and so I figure that user testing with debug commands is basically untrustworthy”
Ghogg says (to zarf), “yeah, I think it depends on your structure”
Ghogg says (to Emily), “anyway, the other problem with a CoG auto-bug-catcher is it is hard to automatically say what is a bug, exactly”
baf says, “And for the latter sort, being able to jump around and purloin stuff seems useful.”
Ghogg says, “‘loops back to a previous section’ is a feature on some of them”
Emily says (to Ghogg), “sure”
zarf says, “like if you can reach room X by solving either puzzle Y or puzzle Z, any cheat that jumps to room X makes hash of your game assumptions”
Emily says, “to be clear, I think it’s really important to have actual humans looking at and commenting on output; I just also have found on larger projects that I need to augment that substantially or else I’ll be asking my beta-testers to do more scutwork than any non-paid person should have to”
Emily says, “(and more than many paid persons would want to be bothered with)”
Ghogg asks, “how many live people did ultimate quest have as testers?”
Emily says, “six or seven”
Emily says, “(depending on how you count access to different portions)”
Roger says (to zarf), “You could distribute a saved game, I guess”
Emily says, “but it is a difficult case to make an example of because both the specs and the circumstances of creation are unlike pretty much any other IF game ever”
zarf says (to roger), “I’d have to distribute *two* saved games to get coverage”
Emily says (to zarf), “if only two, that doesn’t seem that bad”
zarf says, “sadly, this is a simplified example :/”
zarf says, “there are too many combinations for players to really cover everything. but I don’t want to find out at the end that I’ve only tested a single route through, either”
Emily asks (of zarf), “how practical would it be to autogenerate an array of test commands? like, make the “test me” content itself up automatically to cover a good set of options?”
zarf says, “I’ve done that for my own use.”
zarf says, “(Not using “TEST ME”, but using scripts that run the game in a framework for that purpose)”
zarf says, “I might use that capacity to distribute a handful of save files for major branches.”
zarf says, “it is valuable for me to be able to get into a particular midgame branch without using any debug commands”
Busta says, “I used TEST ME to automatically go to a specific chapter all the time”
Ghogg says, “man, all of you with long games with ‘chapters’ and stuff”
zarf says, “I’ve written short games but testing is so much easier there. :)”
zarf says, “(now I’m remembering when I could play through _So Far_ in three minutes flat)”
DavidW asks, “zounds. really?”
zarf says, “(So Far is a game which goes by fast if you know exactly what to do. HL, not so much.)”
DavidW says, “I wonder how a game like Mulldoon Legacy was tested. Brutal puzzles, and not really split into chapters all that much.”


Emily asks, “so when it comes to live human testers, how much do you think they do and/or should alter the more aesthetic or conceptual aspects of the game?”
zarf says, “they do not”
zarf says, “I am the master”
zarf lightning flashes
Zach says, “That seems to really depend on the author.”
Busta says, “I have the opposite opinion. I wish they would absolutely comment on concept and aesthetics.”
Zach says, “And their godlike powers or lack of”
DavidW says (to Emily), “I remember an early version of CM where I wasn’t thrilled with the then-ending sequence.”
zarf says (to busta), “Oh, I love it when they comment!”
DavidGriffith says, “I would prefer to be notified if some prose of mine doesn’t sound right.”
Busta says, “Otherwise the reviews will tell you”
zarf says, “Testers should comment on all that stuff!”
harpua says, “I like them to comment”
Emily says (to DavidW), “the ending was a subject of ongoing debate throughout”
Busta says, “good endings are tough”
zarf says, “but tester comments about aesthetics are not commands to me. They’re my insight into how the game is working.”
zarf says, “(I think Mattie Brice was completely wrong, yes)”
Emily says (to zarf), “I’m not saying she was necessarily right, but I did think it was an interesting issue to raise”
DavidW asks, “uh, who’s Mattie Brice?”
zarf says, “if a tester says ‘I didn’t understand at all!’ then what happens next depends on whether I *want* players to understand at that point”
Emily says (to DavidW), “a game designer and journalist/critic who wrote the following article which I posted as background reading:
Busta says, “When it comes to static fiction and betareading, it’s said that betareaders will always spot a problem – they just may not have the right solution. I think that applies here.”
zarf says, “yeah”
zarf says, “I’ve seen authors (and game designers) complain when testers try ot solve design problems for them”
DavidW says, “oops. I have been exposed as not having done my homework.”
Ghogg asks, “oh, that was the don’t-test article?”
Emily says, “yeah, I generally discourage– well, no, that overstates it. I often essentially ignore suggested fixes”
zarf says, “I don’t complain, but the answers may be useless to me.”
zarf says, “(I’d still thank them for trying)”
Emily says, “the fact that there was a problem is useful information, though”
Ghogg says, “I mean, I think you should feel free to IGNORE what a tester says if you have Reasons, but not have anyone play it is a recipe for crashes”
baf says, “I remember testing one game where the author asked ‘What can I do to make the game better?’ and I felt like it was putting an unacceptable burden on me as a tester because I just didn’t like the style of the game at all”
Busta says (to baf), “Yeah, it’s the designers job to answer that question.”
baf asks, “I was quite willing to ferret out bugs, but how could I say ‘Write this to be less precious’ in a way that would be at all helpful?”
Emily says (to baf), “but that kind of answer can be extremely useful”
Ghogg says, “yeah, I want testers to go picking at my writing”
Emily says, “though admittedly it gets complicated by your relationship with the author”
Emily says, “admittedly, this is part of the reason I don’t mostly test for people I don’t know at least a bit — I’m not always sure what kind of feedback they would be okay with hearing from me”
Busta says, “you gotta be able to take criticism”
Emily says, “I think you *can* phrase things as “I would like it better if” rather than “the game would be better if” — that at least makes clear the subjectivity of your feedback”
zarf says, “Mind you, I’ve also had the experience of having someone say ‘That bit is boring’ and I reply ‘Well, it has to be that way and I can’t fix it now, sorry’ and then ten minutes later I have an idea on how to tweak it. Just because my brain is engaged”
Zach says, “Yeah, I sort of assume that anyone willing to publish a game is willing to take criticism.”
Nitku says, “That’s a bold assumption”
Emily says, “yeah, that is not really necessarily true at all”
DavidW says (to Zach), “You’re killing my baby! My baby!”
Emily says, “well, less cheesily: some people are writing games in order to communicate something really difficult or personal, and hearing about a bunch of typos and a sentence you didn’t care for may seem like a vicious missing of the point”
Busta says, “Or even worse, you have to tell them they aren’t communicating their deeply personal story well in the first place”
Zach says, “Hmm, I can understand that, but if it is deeply personal and the author doesn’t want criticism, maybe release to general public isn’t the best idea.”
harpua says, “or sometimes they look for deeper meaning in a puzzle game that is just that; an old fashioned puzzle game with no deeper meanings.”
Zach says, “By ‘criticism’ I mean reviews in general.”
Nitku says, “There are also people who can’t take negative feedback and didn’t assume they’d get any”
Nitku says, “or they don’t recognize they’re that kind of people”
Ghogg says, “there are certainly communities who don’t give much/any”
Emily says, “a lot depends on the context you’re coming from as a creator: if you’re writing in a community that is primarily a safe space and sanctuary, you might not think that critiques of your artistic *ability* are likely to arise, let alone something you need to defend against”
Busta says, “I’m far more likely to trust negative feedback than positive feedback”
Emily says, “tbh I have a certain sympathy for this every time I get an email or see a blog post from someone saying “gosh there’s not enough of a tutorial in Galatea”, because I wrote that game expecting it to be played by <50 people who were all hardcore regulars, so the possibility that I needed to make it friendly to IF-novice college students in 2012 did not occur to me”
K-Y says (to Zach), “I assume you are aware of the blowup a few months ago over unexpected criticism in the XYZZY reviews”
K-Y says, “from people who have certainly been releasing games for some time”
Zach says (to K-Y), “Only vaguely”
Zach asks, “Which review was that?”
K-Y says, “hmm, not a good tangent to stick into this discussion”
Emily says, “if people are interested, the top few posts in pretty much cover it”
Busta says, “post-release feedback should be taken differently than testing feedback anyway”


Emily asks, “how much do people consider their testers’ background when recruiting betas?”
zarf says, “hadn’t occurred to me. other than the background of ‘has tested my games before’”
Busta exclaims, “haha, right now it’s- Oh, you actually want to test my game? Here you go!”
zarf says, “I’m not exactly known for games with sensitive social contexts”
Gunther says, “I recall dfan’s testing of For A Change
Gunther says, “in which someone thought he was not a native speaker”
Zach says, “No convicted felons”
Gunther says, “and corrected all of it.”
Emily says (to Gunther), “awesome”
Ghogg says, “oh wow”
Emily says, “I’m thinking more of things like sending something to a tester with particular subject knowledge, or, conversely, finding newbie testers for games that are meant to be reasonably accessible”
DavidW says, “I’ve mostly recruited testers from right here on ifmud. I forget how I got around to asking non-mudders, but they were also IF authors.”
Emily says, “or for that matter finding testers with a range of devices, or testers for meant-to-be-accessible-with-voiceover stuff”
Ghogg says, “if I finish my game-for-newbies I definitely will need to poke around for newbies”
Emily says, “if I’d had Allen test Savoir-Faire there are numerous musical anachronisms that would never have been permitted”
K-Y says (to Gunther), “my main contribution to that was a last-minute report that later stuck out to him as ‘nobody else would ever have noticed that’”
Emily says, “at this point it’s fairly hard for me to find new novice IF players because many of my friends are from the gaming/IF world and most of my friends who are not have at some point been pressed into “be my novice tester” service in the past, so I might need to acquire new good-enough-to-ask-favors-of friends in order to do that in future (or hire people)”
Zach says, “Allen is one big musical anachronism”
Taleslinger asks (of emily), “Is your husband into IF at all?”
Emily (to Taleslinger) laughs
Emily says, “we did at one point take Counterfeit Monkey over to his parents’ house and they played the first half hour or so”
Emily says, “during the testing phase”
Emily says, “but even they aren’t novices, for obvious reasons”
Gunther says, “”oh son, *still* this parser bug?””
Ghogg says, “I’d probably pop around tigsource or something”
Zach says (to Gunther), “heh”
zarf says, “it is possible that I will need more HL testers than I have available from known sources. (At the end of this month.)”
zarf says, “Haven’t decide what to do if so.”
Zach asks, “Is Graham not interested in mudding?”
DavidGriffith says, “I have a reimp of deja vu in the works”
Busta says, “I’m thinking I’ll eventually hit up some on-if game forums. Offer to test an indie platformer or something *shudder*”
Busta says, “*non-if”
zarf says, “or , I should say, ‘more HL testers that can put in assloads of time’”
Emily says (to Zach), “not hugely. he occasionally looks over my shoulder”
DavidW says (to Zach), “And then he writes fanfic about us.”
Zach says, “Yay, I’ve always wanted to be a star”
Ghogg waves to Graham if he’s watching
Emily reports this
K-Y says, “bug: mudders have developed sentience”
Taleslinger says, “I haven’t”
Emily says, “(in fact he has been doing things to the bug tracker, he says)”
Emily says, “(so, you know)”
Emily asks, “are there other things that people wanted to dig into? or horror stories they wanted to tell?”
DavidGriffith says, “I’ve wanted to do something lovecraft-like”


zarf says, “I want to ask whether anybody uses the I7 IDE skein”
zarf says, “because I tried once and felt drubbed afterwards”
Roger says, “I keep thinking I’ll wake up one morning understanding it, but it hasn’t happened yet”
Nitku says, “I have the walkthrough saved to the skein so that I can always test that the story is completeable”
Emily says, “I have done so for small games but don’t attempt to for large ones”
Emily says, “but do go on”
DavidW says, “I ignore the skein.”
Nitku says, “also I use it to skip ahead”
DavidGriffith asks, “are there any I6 users still in here?”
zarf says, “I have this Python script that I use to do all the things that I think the skein was meant to use”
Busta says, “I don’t use the skein either. I write TEST ME commands to skip ahead to where I want”
zarf says, “replay sets of commands checking output, etc”
zarf says, “but I want to do all that with a text file, not a node-based UI”
zarf says, “(TEST ME counts as a text file)”
zarf says, “I do use the opt-cmd-R “Replay to here” menu command, which uses the skein underneath”
Nitku says, “The skein is easier to keep up to date than the test mes in my experience”
Emily says, “yeah, that I use loads”


Busta says, “threaded conversation is really hard to test, especially when the dialogue changes depending on both facts and moods”
zarf says, “okay, this small survey is more than I knew before”
Emily says, “I like the Skein when I have a game that’s straightforward enough that I know all the major possible pathways through it; that’s just not always the model I’m writing for”
Emily says (to Busta), “are there things you think would make that easier? (Disclaimer: since I passed this off to other people, I’m not at all likely to do anything about your input, but I’m curious anyway)”
Busta says (to Emily), “I haven’t found the right answer yet, other than making sure you tackle every possibility manually”
Emily says, “for Alabaster I did some graph visualization stuff that helped me visually find bugs that would have been hard otherwise”
Emily says, “but I didn’t really generalize that to the code set as a whole”
Emily says, “but that’s also a useful thing that I think we haven’t talked about — autogenerating graphs of the code behavior that you can then look at for obvious defects”
Emily says, “so with Alabaster it would go through the trees of dialogue and look for places where there might not be a follow-up comment to dialogue that needed one, or things like that, and color the relevant nodes to show that fact”
Busta says, “I made a few excel spreadsheets. One was for every object in the game and whether it had a generic response, an avoiding response, or a specific response when asking an NPC about it”
Emily asks, “it seems like it’s more possible to do that kind of thing for when you have an explicit model for what can lead to something else — have people here used any of Aaron Reed’s extensions for modeling puzzle graphs within the game code, and if so how did that go?”
DavidW says, “Again, my ignorance betrays me.”
zarf says, “I’ve got my (not-in-game-code) puzzle graph tool but it doesn’t output actual graphs”
Busta says, “I’ve only used his smarter parser, but I’ll check out his other stuff”
Emily says, “there’s Spin, which has the idea of letting players bypass puzzles entirely, but I thought there was another one to do with tracking which puzzles were and were not solved”
Emily says, “ah, I think I was thinking of Intelligent Hinting:”
Busta says, “Ahh that’s right. I designed my own hinting and spin system for my game, but I did look at it for ideas.”
Emily says (to zarf), “I feel like the puzzle graph tool concept also counts, though”
Zach asks, “Skipping puzzles? How would that be different than including a walkthrough?”
Emily says (to zach), “I *think* the idea is that you have a certain amount of in-game currency you’re allowed to spend on puzzle-skipping”
Emily says, “but I’ve never actually used it myself”
Roger says, “Like the Wishbringer stone, I guess”
Zach says, “Ah. NightFloyd played a game like that.”
Zach says, “Professor Frank, I think.”
Emily asks, “anything people wish their testers would do, or testers wish authors would do to make their lives easier?”
Emily says, “(crickets)”
Emily says, “our betatesters are indeed perfection itself”
zarf says, “I wish testers would play my game obsessively for days on end”
DavidW says, “hm, It would be nice if the input line could accept more characters for a comment, but that’s not in authorial control.”
zarf says, “(more crickets)”
Busta says, “none of my testers went back in to try to find the amusing things mentioned at the end, but I took that to mean the game wasn’t entertaining enough to make it enjoyable”
zarf says (to dw), “that can be hacked by the author, actually, but it’s a nuisance”
Zach says, “I want full dental”
Roger says, “I wouldn’t mind if more authors released their source code to their testers, but I may be in the minority there”
DavidW says, “I’m not sure what authors could do to minimize tester burnout.”
Busta says, “I just wish there were more testers :)”
DavidW says, “I wish I had more stamina for testing. I haven’t done much lately.”
Emily says, “mmkay. we’re about at the 2-hour mark and things seem to be winding down”
Emily says, “so unless there’s something else people wanted to add, I’m going to call it”
Busta exclaims, “thanks for hosting!”
Emily says, “next month I will be in Switzerland at the usual Theoryclub time, but that doesn’t mean that others can’t do something, so I’ll look into topics and see if there’s someone who wants to log”

5 thoughts on “Transcript of Aug 9 ifMUD Discussion on Testing

  1. Pingback: Transcript of Testing Discussion | Emily Short's Interactive Storytelling

  2. I’d personally like a continuing feedback: when playing a game one could type ‘comment blablabla’, and the random number seed + typed commands would then be saved with at the end of the play session an option to upload/mail them somewhere. But I don’t want that as much as I want acts, which were voted down, so I won’t take the trouble to propose it for Inform. It is also something I want more as a player than as an author – I am the type of guy who always makes notes of errors and wishes found for whatever program he uses, and normally ends up doing nothing other with them than discarding them when a new version comes out.

  3. As for What Really Works, one game I recently tested had thorough change logs, and I was able to make sure they got fixed and also look for knock-on bugs. I think checking off on change logs also forces the author to make sure the bugs really are fixed beyond just oh, look, it compiles, and it also allows them to add another tweak in. It’s again a good way to direct testing for those who want it, and also, a change log can read like a story, so it can make directed testing fun.

    Also, something like bitbucket, totally, for bugs. One of the biggest problems I’ve had is seeing a bug exists, knowing I’d like to get rid of it some time, and knowing it’s also very small & I don’t want to clutter my brain with it right now & I’d rather attack it during down time waiting for a transcript. Without bug tracking, so much got lost in the shuffle, and I worried about what more got lost, and I was never quite able to focus on big bugs.

    The ui for describing and categorizing a bug/task/enhancement in bitbucket is very easy & also, for comp entrants, you can make it private. Plus Bitbucket has low barriers to entry and is very simple to and flexible to use. It is wonderful for a 1-person project. You don’t even have to set up source control!

    Bitbucket can definitely tie into change logs. You can tag an issue as “fixed in build 4” or whatever. Then you can search for that. They are even developing tools to export issues into CSV and XLS. But you don’t have to be close to cutting edge to use it, or to write a change log.

    • As for What Really Works, one game I recently tested had thorough change logs, and I was able to make sure they got fixed and also look for knock-on bugs. I think checking off on change logs also forces the author to make sure the bugs really are fixed beyond just oh, look, it compiles, and it also allows them to add another tweak in.

      True, though not all beta-testers are up for this kind of meticulous verification, especially if you’re in the later stages of testing a game. If you’ve got access to testers who are, though, that’s excellent.

      Failing that, I often do something where, when a bug is reported, I write a test script to duplicate the bug before I attempt to fix it. That way I can a) reproduce the bug and then b) verify the fix after it’s done.

  4. About ChoiceScript, it absolutely has automated testing. has been around at least since before Indigo SpeedIF. There’s two main programs: Quicktest to test every line in a story sequentially to make sure there’s no syntax errors or unreachable code, and Randomtest to iterate hundreds/thousands of times through the story with random choices made to empirically identify lines of code that rarely or never fire due to variables.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s