Engineering a 21st Century Education

As a game developer turned teacher, the one most difficult part of the transition was paper. I'm not even talking about the thousands of copies of handouts and worksheets for students. I'm talking about the paper that compromises the layers and layers of bureaucratic processes that any sufficiently large organization develops over time. The forms. The reports. The mundane paperwork that must be done to uphold the laws that govern the operation of a school.

I get it.

It's stuff that must be done.

The problem isn't that it exists, it's how schools deal with it. The school has limited resources available so it needs to get the most out of the resources it has. That makes sense, right? The school administration has ready access to a large number of highly trained, adaptable, resourceful, and intelligent individuals on hand with a wide range of skills covering every discipline imaginable. It has teachers.

I'm always more than happy to help when needed! I just get frustrated when I'm asked to perform work that could be reduced or eliminated by technology.

In my last post, I talked about school being a game and the need to meta-game it. One of the first issues that I think we need to talk about are "opportunity costs". Every hour that teachers spend on administrative tasks is an hour that is not being spent on teaching. Furthermore, these costs are recurring. If schools could automate 10 minutes worth of administrative tasks each day from a teacher's workload, they would save each teacher about 30 hours of work over the course of the year. That's a lot of time that teachers could reallocate towards improving instruction.

Teacher time is a valuable resource and finite one. Education needs to be engineered to get the most out of that time. Based on my short time as a teacher so far, here are some of the systems that I think could be optimized:

We need a complete "Electronic Individualized Educational Plan Record" system overhaul. The current generation of "Student Information Systems" is grossly insufficient to deal with the complexity of our educational legislation. Schools need to keep documented records of adhering to a student's legally entitled accommodations, and a significant amount this documentation is still being done on paper. We have the technology to design an educational record system that is secure, fault tolerant, and efficient. It would take an substantial initial effort, but imagine the time that it could save school staff in the long run.

We need a better "asset management system" for school property used by students. It's very frustrating to me as a teacher when I need to fill out carbon copied checkout lists for textbooks by hand in the year 2016. When a student doesn't return the textbook, I'm required to fill out another carbon copy form, manually address an envelop to the student's home, and put it in the mail bin for processing. Why isn't this process electronic yet? I should be able to snap a picture on my phone, press a button to assign it to a student or document its return, and everything else should be taken care of by a computer program. We clearly have the technology to do this.

We need a "behavioral intervention tracking and diagnostic system". The school keeps paper records of certain student behaviors such as tardy slips and misconduct reports -- which again are filled out by hand on carbon copy paper. There are also some cases where the teacher is expected to intervene in certain ways such as contacting the parent. The issue is that there are so many different rules that I need to keep track of and responses that I need to take to that data. We need a system that that can track behavior data from multiple sources and suggest interventions based on a statistical models of what has and has not worked for that student.

On top of moving from antiquated "pen and paper" systems, we also need to improve interoperability between the educational software we already use. There's some good ideas happening with the Tin Can API, but the support from technology providers just isn't there yet. I love to see new ideas in educational software! The problem is that some of these applications seem to neglect the teacher's experience with the product. We need to set higher standards for educational software.

Whenever my students complete a learning activity on the computer, it should automatically go into my grade-book. The grade-book should automatically flag any items that need to be manually graded, and the process of providing feedback to the student should be as stream-lined as possible. More detailed information about the student's performance should be stored into a database for later statistical analysis.

The other problem is the lack of standards regarding assessment items. For example, my students love Kahoot. I would totally use it way more if it were easier for me import multiple choice questions from an existing database. If I could program randomly generated questions in MyOpenMath, export them to a standardized format, and then import them into Kahoot, I would be one happy teacher.

I don't think any of these technologies are unrealistic. It's not like I'm asking for facial recognition software to replace hall passes or an artificially intelligent grader (although those would be kinda awesome too). If schools want to instill "21st Century Skills" in their students, they need to lead by example. In the "21st Century", knowing what processes can be automated by technology is a crucial skill to have. To do otherwise is a disservice to both teachers and students.

I used to think schools needed more games

I love games! I love playing them. I love making them. I love theorizing about them. They're an essential part of who I am as a person.

I used to think schools needed more games.

I was working as a video game developer and was fascinated by "tutorial levels". You know, that part of the game that is designed to help you learn how to play the game. Some games neglect their tutorial level and it comes off feeling like a dry lecture. Go here. Push button. Repeat. However, I've also been completely awed by some games that take their tutorial levels to a completely different level. Games like The Elder Scrolls and Guild Wars for example. The experience is so seamlessly integrated with the "game" that you don't even realize you're playing a tutorial. You just play. By the time you've completed the tutorial, you were totally immersed in the game and knew exactly what you needed to.

I used to think schools needed more games.

There's an certain authenticity to this learning that I never really experienced as a student. I thought if I could design the perfect "tutorial level" for math, then everything else would just fall into place. The students would have fun. They'd learn real mathematical concepts in a natural environment. They'd grow and develop as individuals and as a group. I'd be like a "math teacher" and "guild leader" all rolled into one (although I probably wouldn't run IWAY).

I used to think schools needed more games...

...and then I started teaching.

The problem is not that school doesn't have enough games, it's that school has too many games. Now, I'm not talking about the latest web app: Kahoot, Quizizz, Manga High, etc.. Those are certainly a type of game that has a place in school, although perhaps the number of apps is getting overabundant as well, but I'm talking about the games that are school. School itself is like a "Live Action Role Playing Game". Everyone invents their character, acts out their role, cooperates with some players, competes with others, and are rewarded or punished in accordance with the game master's rules.

Now school being RPG isn't a problem on its own. The problem is that there are a whole bunch of mini-RPGs being played simulatenously, and all of them have conflicting rules. Here is a short list of a few games that might be going on at a given time:

  • The students are playing a game with each other. They compete with each other for social status while cooperating against outside threats to their system.
  • The teachers are playing a game with their students. The teachers are trying to maximize student learning while the students are trying to minimize the work they have to do.
  • The administrators are playing a game with their teachers. The administrators are trying to maximize test scores while minimizing teacher burn-out.
  • The school board is playing a game with their administrators. The school board is trying to maximize community approval while minimizing school funding.
  • The parents are playing a game with their school board. The parents are trying to maximize the quality of education while minimizing the amount of attention paid to local elections.

Within these games, temporary alliances are made to accomplish mutual goals. Teachers and parents might cooperate to get students in for extra tutoring. Administrators and school boards might cooperate for better community awareness. Sometimes these alliances help the system as whole and sometimes they detract from it. It's one of the most complex network systems I've ever seen.

I used to think schools needed more games...

...and now I think schools need to have a closer look at the games that are already being played there.

In most of these games, competition is the dominant strategy. Students that are competing for limited scholarship funds have little incentive to help one another. Schools that receive funding based on standardized test scores have a very strong incentive to focus on instructional strategies that produce short-term results over long-term retention. School boards are underappreciated as a position of political power and tend to just "fly under the radar". Until we fix the reward systems so that they encourage cooperation, the games will continue to be frustratingly difficult for everyone involved.

We need to start meta-gaming school. We need to look at how the rules affect the relationships between players. We need to look how those rules can be changed to encourage more co-operation and less competition between the parties involved. Until we have these conversations, we're never going to win.

Political Calculus

Disclosure:  This article is primarily mathematical in nature but the very act of discussing politics makes it difficult to fully remove bias.  I feel obligated to disclose that I'm a member of the Green Party.  While I'm neither a Republican or Democrat, I tend to lean to the north-west section of the Nolan chart.  However, I do intend to try my best to make this analysis as neutral as humanly possible.

During my regular social media browsing the other day, I came across two posts of interest.

The first was a statement from the Green Party of Virginia about why they are not endorsing Bernie Sanders ahead of the primary.  While I had expected this to be the case, there was a section of this statement that really caught my attention: "Whether individual Greens choose to vote for Sanders on March 1st is a choice that will depend on their own calculus of what is best for the country" (emphasis mine).

Since one of the co-chairs of the GPVA is a mathematician, I could reasonably assume that the reference to calculus was intended to mean exactly what it says. The problem is that the general population doesn't usually look at elections from this perspective.  People tend to vote based on gut feelings rather than mathematical analysis. For this reason, I disagree with the GPVA's decision. I feel that they have the responsibility to provide party members with information on how to maximize their influence on the election and calculus isn't a strong point for most voters. If the GPVA refuses to take sides in the primary, then I feel obligated to do so in their place.

The second was a data visualization of how various primary candidates would fare against each other in a general election:

With "Super Tuesday" fast approaching, this was exactly the kind of information that I needed!  This effectively provides a payoff matrix for the primary candidates to which I can apply my "political calculus".
Continue reading "Political Calculus"

Fallout 4 - Come As You Are

While building up my settlements in Fallout 4, I noticed that there was a "Powered Speaker" and wondered what I could do with it.   Between the "Interval Switch", "Delayed On Switch" and the "Delayed Off Switch", I figured I had enough tools to make some some music in my Coastal Cottage.  To start, I decided on remaking one of the first songs I learned on guitar: Nirvana's Come As You Are.

Despite my best efforts, it's still not quite perfect.  It seems like the timing on the switches isn't as exact as I needed it to be for this wiring system, and it seems like some switches will occasionally stay on despite having no power.   I found that the most reliable method was to alternate between two offset interval switches with a slight overlap to form a loop, and use a chain of delays to space out the individual notes.

Resource-wise, I found myself needing tons of copper and ceramic.  The most time consuming part was setting up the delays and notes via terminal.  Below is the diagram I created as a reference so I could set several things at once.

Fallout 4 - Come As You Are

I hope this will inspire some discussion on how to create music in Fallout 4.  If you put one together or have an idea on how to streamline the process, please share in the comments!

Why is a teacher like a video game developer?

No, this isn't a raven and writing desk riddle.   Teachers and game developers have more in common than you might think!

You need to assume that any instructions you give will be promptly ignored

The classroom is like a giant sandbox game.  You need to think of every conceivable action that might be taken by the player/student and ensure there are some appropriate consequences in place.  Preferably realistic ones too. You could go with the insta-death lava to restrict movement if you want, but expect some angry phone calls from parents.

For every hour you spend planning you wish you had three

Seriously.  The difference between a well planned project/lesson is night and day.   Unfortunately, I don't think my principal will go for the 1:1 class period to prep period schedule...

For every hour you spend working, you spend another hour documenting what you did

It's called CYA: Cover. Your. WE DON'T SAY THAT AT SCHOOL!

Would it be out of line for me to start tracking student behaviors in Bugzilla?

You refer to 60+ hour work weeks as "normal"

Veterans of the video game industry are no strangers to "crunch time".   It's the unavoidable time period before the end of a project where "get it out the door" fever starts to set in.  The title ships, you briefly reflect on what worked and what didn't, then the next project starts and before you know it you're back in "crunch" mode.  Teachers refer to this cycle as "a school day".

There's a ton of little things you'd like to fix, you just don't have the time

These are things that were probably noticed by some highly caffeinated tester in a poorly lit basement somewhere, added to the bug database, and ultimately stamped with those three fateful letters: WNF.  Will. Not. Fix.  "Yes, there's a typo in question 4.  GET OVER IT."



RFC: Are geometric constructions still relevant?

Dear friends, fellow math teachers, game developers and artists.

I've got this little dilemma I'm wondering if you could help me with. You see, part of my geometry curriculum deals with compass and straightedge constructions. My colleagues have suggested that this is a topic we, not exactly skip, but... I dunno what the appropriate word here is... trim?  They argue that it's largely irrelevant for our students, is overly difficult, and represents a minimal component of the SOL test. And I don't think they're wrong. I haven't used a compass and straightedge since I left high school either.

However, something about these constructions strikes as beautiful. Part of me thinks that's enough reason to include them, but it also got me thinking about more practical applications of them.   Where did use them?  I used them making video games.  Video games build worlds out of "lines" and "spheres".  Beautiful worlds.

My question is this, do my 3D artist friends feel the same way?  Do you remember your compass and straightedge constructions?  Do you use them, or some derivation thereof, in your everyday work?  Are you glad to have learned them?  Or are the elementary constructions made so trivial in modern 3D modeling software that you don't even think about them?

Please comment and share.

And now for taste of things to come...

It's been a while since my last post, but I'm still here!  A lot has happened in the past 6 months and I'm not trying to be neglectful of my blogging.  In an effort to give myself some added motivation, I'm going to try to outline some of the things I have planned for this blog.  By making this list public, hopefully I'll feel pressured to hold myself to it.  So, without further ado, here's what you have to look forward to in the months to come:

  • I'm currently working on a custom WordPress theme for this blog.  It's taking a bit longer than anticipated, but it's coming!
  • I've spent a good deal of time transitioning my courses to use OER materials.  I wanted to take some time to reflect on the transition from MyMathLab to MyOpenMath and what the future may hold.
  • I've experimented in the past with automating my course syllabus creation.  Now, I'm trying to take this one step further and generate an entire course.  I don't know how far I'm going to get with this, but would like to at least do an article about how I think LMSs could save instructors a great deal of time through dynamic course data.
  • It's been a year since my foray into local politics.  I'd like to take a look back on what happened since then.
  • Lately, I've been playing a bit of FPS as opposed to my usual RPGs.   It's given me ideas for some new metagaming posts.
  • Finally, and perhaps the biggest news, I've been offered a new job!  I don't want to give away too much yet, but let's just say there's potentially going to be a lot more math lessons here in the future!  I do, however, feel obliged to reiterate that this is my personal blog and the views expressed here do not reflect the positions of any of my employers: past, present or future.

Anyways, I hope there's some exciting things to come.  Thanks for reading!

The Future of AI: 13 year old Ukrainian boy Looking for Guild?

I recently finished reading Michio Kaku's The Future of the Mind and found it very thought provoking.  A combination of cutting-edge advances in psychology, artificial intelligence and physics, mixed together with numerous pop-culture references made for a very informative and inspiring read.  Many of the ideas presented seemed very reminiscent of the narratives in The Mind's I, but with a greater emphasis on the practicality of technological advances.  While I would no doubt recommend it to an interested reader, I don't exactly intend for this post to turn into a book review.  This is more of a personal reflection on some of my thoughts while reading it.

Defining Consciousness: Kaku vs Jaynes

My first point of intrigue begins with Kaku's definition of consciousness, which he calls the "space-time theory of consciousness":

Consciousness is the process of creating a model of the world using multiple feedback loops in various parameters (e.g., in temperature, space time, and relation to others), in order to accomplish a goal (e.g. find mates, food, shelter).

Consciousness is a notoriously difficult phenomenon to define, as this is as good of a definition as any in the context of the discussion. What's interesting about this definition is that it begins with a very broad base and scales upward.  Under Kaku's definition, even a thermostat has consciousness -- although to the lowest possible degree.  In fact, he defines several levels of consciousness and units of measurement within those levels.  Our thermostat is at the lowest end of the scale, Level 0, as it has only a single feedback loop (temperature).  Level 0 also includes other systems with limited mobility but more feedback variables such as plants.  Level 1 consciousness adds spacial awareness reasoning, while Level 2 adds social behaviour.  Level 3 finally includes human consciousness:

Human consciousness is a specific form of consciousness that creates a model of the world and then simulates it in time, by evaluating the past to simulate the future. This requires mediating and evaluating man feedback loops in order to make a decision to achieve a goal.

This definition is much closer to conventional usage of the word "consciousness".  However, for me this definition seemed exceptionally similar to a very specific definition I'd seen before.  This contains all the essential components of Julian Jaynes' definition in The Origin of Consciousness!

Jaynes argued that the four defining characteristics of consciousness are an analog “I”, (2) a metaphor “me”, (3) inner narrative, and (4) introspective mind-space.  The "analog 'I'" is similar to what Kaku describes as the brain's "CEO" -- the centralized sense of self that makes decisions about possible courses of action.  Jaynes' "introspective mind-space" is analogous to the "model of the world" in Kaku's definition -- our comprehensive understanding of the environment around us.  The "metaphor 'me'" is the representation of oneself within that world model that provides the "feedback loop" about the costs and benefits of hypothetical actions.  Finally, what Jaynes' describes as "inner narrative" serves as the simulation in Kaku's model.

This final point is the primary difference between the two models.  One of the possible shortcomings of Jaynes' definition is that the notion of an "inner narrative" is too dependent on language.  However, Kaku avoids this confusion by using the term "simulation".  Jaynes' hypothesis was that language provided humanity with the mental constructs needed to simulate the future in a mental model.  I think the differences in language are understandable given the respective contexts.  Jaynes was attempting to establish a theory of how consciousness developed, while Kaku was attempting to summarize the model of consciousness that has emerged through brain imaging technology.

While I must admit some disappointment that Jaynes was not mentioned by name, it's partly understandable.  Jaynes' theory is still highly controversial and not yet widely accepted in the scientific community.  With Kaku's emphasis on scientific advances, it might have been inappropriate for this book.  Nevertheless, I'd be interested to hear Kaku's thoughts on Jaynes' theory after having written this book.  Jaynes didn't have the luxuries of modern neuroscience at his disposal, but that only makes the predictions of the theory more fascinating.

Artificial Intelligence (or the illusion thereof)

While I continued to read on, I happened to come across a news story proclaiming that Turing Test had been passed.  Now, there's a couple caveats to this claim.  For one, this is not the first time a computer has successfully duped people into thinking it was human.  Programs like ELIZA and ALICE have paved the way for more sophisticated chatterbots over the years.  What makes this new bot, Eugene, so interesting is the way in which it confused the judges.

There's plenty of room for debate about the technical merits of Eugene's programming.  However, I do think Eugene's success is a marvel of social engineering.  By introducing itself as a "13-year old Ukrainian boy", the bot effectively lowers the standard for acceptable conversation.  The bot is (1) pretending to be a child and (2) pretending to be a non-native speaker.  Childhood naivety excuses a lack of knowledge about the world, while the secondary language excuses a lack of grammar.   Together, these two conditions provide a cover for the most common shortcomings of chatterbots.

With Kaku's new definition of consciousness in mind, I started to think more about the Turing Test and what it was really measuring.  Here we have a "Level 0" consciousness pretending to be a "Level 3" consciousness by exploiting the social behaviors typical of a "Level 2" consciousness.  I think it would be a far stretch to label Eugene as a "Level 3" consciousness, but does his social manipulation ability sufficiently demonstrate "Level 2" consciousness? I'm not really sure.

Before we can even answer that, Kaku's model of consciousness poses an even more fundamental question.  Is it possible to obtain "Level (n)" consciousness without obtaining "Level (n-1)"?

If yes, then maybe these levels aren't really levels at all.  Maybe one's "consciousness" isn't a scalar value, but a vector rating of each type of consciousness.  A a human would score reasonably high in all four categories. Eugene is scoring high on Level 0, moderate on Level 2, and poor on Levels 1 and 3.

If no, then maybe the flaw in the A.I. development is that we're attempting to develop social skills before spacial skills.  This is partly due to the structure of the Turing Test.  Perhaps, like the Jaynesian definition of consciousness, we're focused a bit too much on the language.  Maybe it's time to replace the Turing Test with something a little more robust that takes multiple levels of consciousness into consideration.

The MMORPG-Turing Test

Lately I've been playing a bit of Wildstar.  Like many popular MMORPGs, one of the significant problems facing the newly launched title is rampant botting.   As games of this genre have grown in popularity, the virtual in-game currency becomes a commodity with real-world value.  The time-consuming process behind the collection of in-game currency, or gold farming, provides ample motivation for sellers to automate the process using a computer program.  Developers like Carbine are in a constant arms race to keep these bots out of the game to preserve the game experience for human players.

Most of these bots are incredibly simple.  Many of them simply play back a pre-recorded set of keystrokes to the application.  More sophisticated bots might read, and potentially alter, the game programs memory to perform more complicated actions.  Often times these bots double as an advertising platform for the gold seller, and spam the in-game communication channels with the sellers website.  It's also quite common for the websites to contain key-loggers, as hijacking an existing player's account is far more profitable than botting itself.

While I'm annoyed by bots as much as the next player, I must admit some level of intrigue with the phenomena.  The MMORPG environment is essentially a Turing Test at an epic scale.  Not only is the player-base of the game is on the constant look out for bot-like behavior, but also the developers implement algorithms for detecting bots.  A successful AI would not only need to deceive humans, but also deceive other programs.  It makes me wonder how sophisticated a program would need to be so that the bot would be indistinguishable from a human player.   The odds are probably stacked against such a program.

Having played games of this type for quite some time, I've played with gamers who are non-native speaker or children and I've also seen my share of bots.  While the "13 year old Ukrainian boy" ploy might work in a text-based chat, I think it would be much more difficult to pull off in an online game.  It's difficult to articulate, but human players just move differently.  They react to changes in the environment in a way that is fluid and dynamic.  On the surface, they display a certain degree of unpredictability while also revealing high-level patterns.  Human players also show goal oriented behavior, but the goal of the player may not necessarily align with the goal of the game. It's these type of qualities  that I would expect to see from a "Level 1" consciousness.

Furthermore, online games have a complex social structure.  Players have friends, guilds, and random acquaintances.  Humans tend to interact differently depending on the nature of this relation.  In contrast, a typical chatterbot treats everyone it interacts with the same.  While some groups of players have very lax standards for who they play with, other groups hold very high standards for player ability and/or sociability.  Eugene would have a very difficult time getting into an end-game raiding guild.  If a bot could successfully infiltrate such a group, without their knowledge, it might qualify as a "Level 2" consciousness.

When we get to "Level 3" consciousness, that's where things get tricky.  The bot would not only need to understand the game world well enough to simulate the future, but it would also need to be able to communicate those predictions to the social group.  It is, after all, a cooperative game and that communication is necessary to predict the behavior of other players.  The bot also needs to be careful not to predict the future too well.  It's entirely possible for a bot to exhibit super-human behavior and consequently give itself away.

With those conditions for the various levels of consciousness, MMORPGs also enforce a form of natural selection on bots.  Behave too predictably?  Banned by bot detection algorithms.  Fail to fool human players?  Blacklisted by the community.  Wildstar also potentially offers an additional survival resource in the form of C.R.E.D.D., which could require the bot to make sufficient in-game funds to continue its subscription (and consequently, its survival).

Now, I'm not encouraging programmers to start making Wildstar bots.  It's against the Terms of Service and I really don't want to have to deal with anymore than are already there.  However, I do think that an MMORPG-like environment offers a far more robust test of artificial intelligence than a simple text-based chat if we're looking at consciousness using Kaku's definition.   Perhaps in the future, a game where players and bots can play side-by-side will exist for this purpose.


When I first started reading Kaku's Future of the Mind, I felt like his definition of consciousness was merely a summary of the existing literature.  As I continued reading, the depth of his definition seemed to continually grow on me.  In the end, I think that it might actually offer some testable hypotheses for furthering AI development.  I still think Kaku needs to read Jaynes' work if he hasn't already, but I also think he's demonstrated that there's room for improvement in that definition.   Kaku certainly managed to stimulate my curiosity, and I call that a successful book.


In a previous post, I mentioned my fascination with Twitch Plays Pokemon (TPP). The reason behind this stems from the many layers of metagaming that take place in TPP. As I discussed in my previous post, the most basic definition of metagaming is "using resources outside the game to improve the outcome within the game". However, there's another definition of metagaming that has grown in usage thanks to Hofsteadter: "a game about a game". This reflexive definition of metagaming is where the complexity of TPP begins to shine. Let's take a stroll through some various types of metagaming taking place in TPP.

Outside resources

At the base level, we have players making use of a variety of outside resources to improve their performance inside the game. For Pokemon, the most useful resources might include maps, beastiaries, and Pokemon-type matchups. In TPP, many players also bring with them their own previous experiences with the game.

Game about a game

Pokemon itself is a metagame. Within the world of the game, the Pokemon League is its own game within the game. A Pokemon player is playing the role of a character who is taking part in game tournament. What makes TPP so interesting is that that it adds a game outside the game. Players in TPP can cooperate or compete for control of the game character. In effect, TPP is a meta-metagame: a game about a game about a game. Players in TPP are controlling the actions of a game character participating in a game tournament. It's Pokemon-ception!

Gaming the population

Another use of metagaming is to take knowledge of the trends in player behaviors and utilize that information to improve the outcome within the game. In TPP, players would use social media sites such as Reddit to encourage the spread of certain strategies. Knowledge of social patterns in the general population TPP players enables a few players to guide the strategy of the collective in a desirable directions. Memes like "up over down" bring structure to an otherwise chaotic system and quickly become the dominant strategy.

Gaming the rules

One of my favorite pastimes in theory-crafting, which is itself a form of metagaming. Here, we take the rules of the game and look at possible strategies like a game. The method TPP used in the final boss fight is a perfect example of this. The boss is programmed to select a move type that the player's pokemon is weak against and one of these moves deals no damage. By using a pokemon that is weak against this particular move, the boss is locked into a strategy that will never do any damage! Not only do the TPP players turn the rules of the game against it, but they also needed to organize the population to pull it off!

Gaming the population

Another use of metagaming is to take knowledge of the trends in player behaviors and utilize that information to improve the outcome within the game. In TPP, players would use social media sites such as Reddit to encourage the spread of certain strategies. Knowledge of social patterns in the general population TPP players enables a few players to guide the strategy of the collective in a desirable directions. Memes like "up over down" bring structure to an otherwise chaotic system and quickly become the dominant strategy.

Rule modification games

One of the defining characteristics of a game are the rules. The rules of Pokemon are well defined by the game's code, but the rules of TPP are malleable. We can choose between "chaos" and "democracy". Under chaos, every player input gets sent to the game. Under democracy, players vote on the next action to send. When we look at the selection of rules in terms of a game where we try to maximize viewers/participates, we get another type of metagaming.

Understanding Voter Regret

Lately I've been doing a little bit of research on voting methods.  In particular, I've been fascinated by this idea of measuring Bayesian Regret.  Unfortunately, many of the supporting links on are dead.  With a little detective work I managed to track down the original study and the supporting source code.

Looking at this information critically, one my concerns was the potential for bias in the study.  This is the only study I could find taking this approach, and the information is hosted on a site that is dedicated to the support of the method proved most effective by the study.  This doesn't necessarily mean the result is flawed, but it's one of the "red flags" I look for with research.  With that in mind, I did what any skeptic should: I attempted it replicate the results.

Rather than simply use the provided source code, I started writing my own simulation from scratch.  I still have some bugs to work out before I release my code, but the experience has been very educational so far.  I think I've learned more about these voting methods by fixing bugs in my code than reading the original study.  My initial results seem consistent with Warren Smith's study but there's still some kinks I need to work out.

What I'd like to do in this post is go over a sample election that came up while I was debugging my program.  I'm hoping to accomplish a couple things by doing so.  First, I'd like to explain in plain English what exactly the simulation is doing.   The original study seems to be written with mathematicians in mind and I'd like for these results to be accessible to a wider audience.  Second, I'd like to outline some of the problems I ran into while implementing the simulation.  It can benefit me to reflect on what I've done so far and perhaps some reader out there will be able to provide input on these problems that will point me in the right direction.

Pizza Night at the Election House

It's Friday night in the Election household, and that means pizza night!  This family of 5 takes a democratic approach to their pizza selection and conducts a vote on what time of pizza they should order.   They all agree that they should get to vote on the pizza.  The only problem is that they can't quite agree on how to vote.  For the next 3 weeks, they've decided to try out 3 different election systems: Plurality, Instant-Runoff, and Score Voting.

Week 1: Plurality Voting

The first week they use Plurality Voting.  Everyone writes down their favorite pizza and which ever pizza gets the most votes wins.

The youngest child votes for cheese.  The middle child votes for veggie.  The oldest votes for pepperoni.  Mom votes for veggie, while dad votes for hawaiian.

With two votes, veggie pizza is declared the winner.

Mom and the middle child are quite happy with this result.  Dad and the two others aren't too excited about it.  Because the 3 of them were split on their favorites, the vote went to an option that none of them really liked.  They feel hopeful that things will improve next week.

Week 2: Instant Run-off Voting

The second week they use Instant Run-off Voting.  Since the last election narrowed down the pizzas to four options, every lists those four pizzas in order of preference.

The youngest doesn't really like veggie pizza, but absolutely hates pineapple.  Ranks cheese 1st, pepperoni 2nd, veggie 3rd,and hawaiian last.

The middle child is a vegetarian.  Both the hawaiian and pepperoni are bad options, but at least the hawaiian has pineapple and onions left over after picking off the ham. Ranks veggie 1st, cheese 2nd, hawaiian 3rd and pepperoni last.

The oldest child moderately likes all of them, but prefers fewer veggies on the pizza.  Ranks pepperoni 1st, cheese 2nd, hawaiian 3rd and veggie last.

Dad too moderately likes all of them, but prefers the options with meat and slightly prefers cheese to veggie.  Ranks hawaiian 1st, pepperoni 2nd, cheese 3rd and veggie last.

Mom doesn't like meat on the pizza as much as Dad, but doesn't avoid it entirely like the middle child.  Ranks veggie 1st, cheese 2nd, pepperoni 3rd and hawaiian last.

Adding up the first place votes gives the same result as the first election: 2 for veggie, 1 for hawaiian, 1 for pepperoni and 1 for cheese.  However, under IRV the votes for the last place pizza get transferred to the next ranked pizza on the ballot.

However, there's something of a problem here.  There's a 3-way tie for last place!

A fight nearly breaks out in the Election house.  Neither dad, the older or youngest want their favorite to be eliminated.  The outcome of the election hinges on whose votes get transferred where!

Eventually mom steps in and tries to calm things down.  Since the oldest prefers cheese to hawaiian and the youngest prefers pepperoni to hawaiian, it makes sense that dad's vote for hawaiian should be the one eliminated.  Since the kids agree with mom's assessment, dad decides to go along and have his vote transferred to pepperoni.

Now the score is 2 votes for veggie, 2 votes for pepperoni, and 1 vote for cheese.  Since cheese is now the lowest, the youngest childs vote gets transferred to the next choice: pepperoni.   With a vote of 3 votes to 2, pepperoni has a majority and is declared the winner.

The middle child is kind of upset by this result because it means she'll need to pick all the meat off her pizza before eating.  Mom's not exactly happy with it either, but is more concerned about all the fighting.  They both hope that next week's election will go better.

Week 3: Score Voting

The third week the Election family goes with Score Voting.  Each family member assigns a score from 0 to 99 for each pizza.  The pizza with the highest score is declared the winner.  Everyone wants to give his/her favorite the highest score and least favorite the lowest, while putting the other options somewhere in between. Here's how they each vote:

The youngest rates cheese 99, hawaiian 0, veggie 33 and pepperoni 96.

The middle child rates cheese 89, hawaiian 12, veggie 99 and pepperoni 0.

The oldest child rates cheese 65, hawaiian 36, veggie 0 and pepperoni 99.

Dad rates cheese 13, hawaiian 99, veggie 0 and pepperoni 55.

Mom rates cheese 80, hawaiian 0, veggie 99 and pepperoni 40.

Adding all these scores up, the finally tally is 346 for cheese, 147 for hawaiian, 231 for veggie and 290 for pepperoni.  Cheese is declared the winner.  Some of them are more happier than others, but everyone's pretty much okay with cheese pizza.

Comparing the Results

Three different election methods.  Three different winners.  How do we tell which election method is best?

This is where "Bayesian Regret" comes in.

With each of these 3 elections, we get more and more information about the voters. First week, we get their favorites.  Second week, we get an order of preference.  Third week, we get a magnitude of preference.   What if we could bypass the voting altogether and peak instead the voter's head to see their true preferences?  For the family above, those preferences would look like this:

cheese hawaiian veggie pepperoni
youngest 99.92% 2.08% 34.25% 95.79%
middle 65.95% 10.09% 73.94% 0.61%
oldest 74.55% 66.76% 57.30% 83.91%
dad 52.13% 77.03% 48.25% 64.16%
mom 87.86% 39.79% 99.72% 63.94%

These values are the relative "happiness levels" of each option for each voter.  It might help to visualize this with a graph.


If we had this data, we could figure out which option produced the highest overall happiness.  Adding up these "happiness" units, we get 380 for cheese, 195 for hawaiian, 313 for veggie and 308 for pepperoni.  This means the option that produces the most family happiness is the cheese pizza.  The difference between the max happiness and the outcome of the election gives us our "regret" for that election.  In this case: the plurality election has a regret of 67, the IRV election has a regret of 72, and the score voting election has a regret of 0 (since it chose the best possible outcome).

Now keep in mind that this is only the regret for this particular family's pizza selection.  To make a broader statement about which election method is the best, we need to look at all possible voter preferences.  This is where our computer simulation comes in.  We randomly assign a number for each voter's preference for each candidate, run the elections, calculate the regret, then repeat this process over and over to average the results together.  This will give us an approximation of how much regret will be caused by choosing a particular voting system.

Open Questions

In writing my simulation from scratch, I've run into a number of interesting problems.  These aren't simply programming errors, but rather conceptual differences between my expectations and the implementation.   Some of these questions might be answerable through more research, but some of them might not have a clear cut answer.   Reader input on these topics is most welcome.

Implementing IRV is complicated

Not unreasonably hard, but much more so than I had originally anticipated.  It seemed easy enough in theory: keep track of the candidates with the lowest number of votes and eliminate them one round at a time.  The problem that I ran into was that in small elections, which I was using for debugging, there were frequently ties between low ranked candidates in the first round (as in the case story above).   In the event of a tie, my code would eliminate the candidate with the lower index first.  Since the order of the candidates was essentially random, this isn't necessarily an unfair method of elimination.  However, it did cause some ugly looking elections where an otherwise "well qualified" candidate was eliminated early by nothing more than "bad luck".

This made me question how ties should be handled in IRV.   The sample elections my program produced showed that the order of elimination could have a large impact on the outcome.  In the election described above, my program actually eliminated "cheese" first.  Since the outcome was the same, it didn't really matter for this example.  However, if the random ordering of candidates had placed "pepperoni" first then "cheese" would have won the election!  Looking at this probabilistically, the expected regret for this example would be 1/3*0+2/3*72 = 48.   A slight improvement, but the idea of non-determinism still feel out of place.

I started looking into some alternative methods of handling ties in IRV.  For a simulation like this, the random tie-breaker probably doesn't make a large difference.  With larger numbers of voters, the ties get progressively more unlikely anyways.   However, I do think it could be interesting to compare the Bayesian Regret among a number of IRV variations to see if some tie breaking mechanisms work better than others.

Bayesian Regret is a societal measure, not individual

When I first started putting together my simulation, I did so "blind".  I had a conceptual idea of what I was trying to measure, but was less concerned about the mathematical details.  As such, my first run produced some bizarre results.  I still saw a difference between the voting methods, but at a much different scale.  In larger elections, the difference between voting methods was closer to factor of .001.    With a little bit of digging, and double-checking the mathematical formula for Bayesian Regret, I figured out I did wrong.  My initial algorithm went something like this:

I took the difference between the utility of each voter's favorite and the candidate elected.  This gave me an "unhappiness" value for each voter.  I averaged the unhappiness of all the voters to find the average unhappiness caused by the election.  I then repeated this over randomized elections and kept a running average of the average unhappiness caused by each voting method.  For the sample election above, voters are about 11% unhappy with cheese versus 24% or 25% unhappy with veggie and pepperoni respectively.

I found this "mistake" rather intriguing.  For one thing, it produced a result that kind of made sense intuitively.  Voters were somewhat "unhappy" no matter which election system was used.  Even more intriguing was that if I rescaled the results of an individual election, I found that they were distributed in close to the same proportions as the results I was trying to replicate.  In fact, if I normalized the results from both methods, i.e.  R' = (R-MIN)/(MAX-MIN), then they'd line up exactly.

This has become something of a dilemma.  Bayesian Regret measures exactly what it says it does -- the difference between the best option for the society and the one chosen by a particular method.  However, it produces a result that is somewhat abstract.  On the other hand, my method produced something a little more tangible  -- "average unhappiness of individual voters" -- but makes it difficult to see the differences between methods over a large number of elections.  Averaging these unhappiness values over a large number of elections, the results seemed to converge around 33%.

Part of me wonders if the "normalized" regret value, which aligns between both models, might be a more appropriate measure.  In this world, it's not the absolute difference between the best candidate and the one elected but the difference relative to the worst candidate.  However, that measure doesn't really make sense in a world with the potential for write-in candidates.   I plan to do some more experimenting along these lines, but I think the method of how to measure "regret" is a very an interesting  question in itself.

"Honest" voting is more strategic than I thought

After correcting the aforementioned "bug", I ran into another troubling result.  I started getting values that aligned with Smith's results for IRV and Plurality, but the "Bayesian Regret" of Score Voting was coming up as zero.  Not just close to zero, but exactly zero.  I started going through my code and comparing it to Smith's methodology, when I realized what I did wrong.

In my first implementation of score voting, the voters were putting their internal utility values directly on the ballot.  This meant that the winner elected would always match up with the "magic best" winner.   Since the Bayesian Regret is the difference between the elected candidate and the "magic best", it was always zero.   I hadn't noticed this earlier because my first method for measuring "unhappiness" returned a non-zero value in every case -- there was always somebody unhappy no matter who was elected.

Eventually I found the difference.  In Smith's simulation, even the "honest" voters were using a very simple strategy: giving a max score to the best and a min score to the worst.  The reason that the Bayesian Regret for Score Voting is non-zero is due to the scaling of scores between the best and the worst candidates.  If a voter strongly supports one candidate and opposes another, then this scaling doesn't make much of a difference.   It does, however, make a big difference when the voters are indifferent between the candidates but gives a large score differential to the candidate that's slightly better than the rest.

With this observation, it became absolutely clear why Score Voting would minimize Bayesian Regret.  The more honest voters are, the closer the Bayesian Regret gets to zero.   This raises another question: how much dishonesty can the system tolerate?

Measuring strategic vulnerability

One of the reasons for trying to reproduce this result was to experiment with additional voting strategies outside of the scope of the original study.  Wikipedia cites another study by M. Badinski and R. Laraki that suggests Score Voting is more susceptible to tactical voting than alternatives.  However, those authors too may be biased towards their proposed method.  I think it's worthwhile to try and replicate that result as well.  The issue is that I'm not sure what the appropriate way to measure "strategic vulnerability" would even be.

Measuring the Bayesian Regret of strategic voters and comparing it with honest voters could potentially be a starting point.   The problem is how to normalize the difference.   With Smith's own results, the Bayesian Regret of Score Voting increases by 639% by using more complicated voting strategies while Plurality only increases by 188%.  The problem with comparing them this way is that the Bayesian Regret of the strategic voters in Score Voting is still lower than the Bayesian Regret of honest Plurality voters.   Looking only at the relative increase in Bayesian Regret isn't a fair comparison.

Is there a better way of measuring "strategic vulnerability"?  Bayesian Regret only measure the difference from the "best case scenario".  The very nature of strategic voting is that it shift the result away from the optimal solution.  I think that to measure the effects of voting strategy there needs to be some way of taking the "worst case scenario" into consideration also.   The normalized regret I discuss above might be a step in the right direction.  Any input on this would be appreciated.


Please don't take anything said here as gospel.  This is a blog post, not a peer-reviewed journal.  This is my own personal learning endeavor and I could easily be wrong about many things.  I fully accept that and will hopefully learn from those mistakes.   If in doubt, experiment independently!

Update: The source code used in this article is available here.