For the past several entries in my column I have covered the fundamentals of practical data modeling techniques. Each of the basic principles of Entity Relationship Modeling has been covered so far, including the idea that all data is relational. This month I will delve into more advanced normalization and de-normalization concepts, extending the Angry Shards game data model example.
The Angry Shards Game Database
In the previous article, I built the start of the Angry Shards game data model, a demonstration database used at CodeFutures*. The first few normalized tables – shown in Figure 1 – were built out, including the player, game, and player_game entities.
Figure 1: The Angry Shards game data model, as we left it last week.
I’m going to be adding some new cool features to the model, to support a richer game experience for the user – and of course to demonstrate some new data modeling techniques.
Adding New Features to a Data Model
Let’s add a new feature to the model: support for multiple rounds in the game. This way, as players get better at the game and advance levels, we can track how well they do in each round. To make this more interesting for the player and to encourage competition, let’s record the highest score the player achieves, as well as the highest level obtained.
For our initial attempt at this, we will add some columns to the player_game table. To support up to 3 rounds for each game, let’s add three new columns to the player_game table: game_round_score_1, game_round_score _2, and game_round_score_3. Each of these columns will hold the resulting score for that particular round of the game. Similarly, to record the highest level and score, we will add high_level and high_score columns to the player_game table.
The new model is shown in Figure 2.
Figure 2: The model with additional columns for high scores.
I’m sure you may have already noticed this is not the ideal approach, as it causes several problems and limitations in our data model:
- The game is limited to 3 levels, unless we continue adding a new column each time another level is designed in. This is a hassle, and makes the model very inflexible.
- It’s not easy to track any other information about each round of play without adding multiple columns for each piece of data needed. For example, if we want to know how long it took the player to play a particular round, this approach would acquire 3 additional columns: game_round_time_1, game_round_time_2, and game_round_time_3. Therefore, as we enrich the game experience, the model will become more and more unwieldy and cluttered – making it harder for the development team to understand as well.
- Recording the high_score and high_level on the player_game table is just plan incorrect , as it directly violates the rules of data model normalization. The reason? Because the high_score and high_level are not related to player_game, but rather belong with player table, since a player can have at most one and only one value for these attributes. Besides breaking theoretical rules, there is a far more practical side to this common error: each time you want to record the high_score and high_level values (at the end of each game), the application must update every player_game row for the player. This is incredibly inefficient as an active player could have 100s of player_game rows, resulting in a large number of updates to the database (and updates are almost always the least efficient operation in a production database).
Given that our objective is a highly flexible model that will support the life of the application and be as easy to work with as possible, we will apply some simple normalization techniques to fix it.
The first step is to move the high_score and high_level columns to the player table. This associates these columns with the table to which they are truly related, as they can only occur once and only once per player instance.
The next step is to normalize the repeating fields for recording the scores for each round or level. Since the actual relationship between a player_game to a round or level of play is one-to-many, we need to add a new table: player_game_round. We will add the following attributes to the table, resulting in the model shown in Figure 3:
- player_game_round_id: Sequential ID, matching our standard for other tables.
- status_code: The status of the round, a numeric code value representing in-progress, complete, abandoned, etc..
- round_play_time: The length of time it took the player to complete the round.
- round_score: The score that the player achieved for the round.
- round_level: The level of the game that the round was for.
- player_game_id: The foreign key to the immediate parent table (player_game). This allows us to express the relationship of a given player_game instance to its player_game_round instances.
Figure 3: The model, with player_game_round details moved to a separate table.
You can see that, after applying our normalization techniques, this version is much improved. Now we can flexibly add as many levels to the game as we like, and track various attributes about how a player does while playing those levels. In addition, we can track the high_score and high_level for each player – very useful information that we can provide to the end user.
It should be obvious that with a little work and thought, database normalization techniques can dramatically improve your data model, making it more flexible, efficient, and much easier to understand by your development group.
* The Angry Shards game, data model and graphics are copyright © 2012-2013 by CodeFutures Corporation and is used with its permission.