[GMS] Dev. Blog: Server Maintenances

Hello again Maplers!

Let’s talk about server maintenances. Server maintenances, whether scheduled or unscheduled, happen fairly regularly. I think therefore, it’s safe to assume that most Maplers know what “server maintenance” means. The game is unavailable and can’t be logged into. But have you ever wondered what we do during a server maintenance and why they’re necessary? If so, read on and I’ll explain!

Step One – Preparation

Server maintenance is an “all hands on deck” situation. It requires at least a server engineer, the developers, at least one producer, QA testers, and occasionally a database administrator. It also requires intense preparation and planning to keep the maintenance as short as possible.

First, the server engineer and the developers list out all the tasks that must be performed and create an internal schedule. Next, the producer takes that schedule, adds time for testing, and adds in some extra time in case there’s an emergency. Then the producer creates a “maintenance post” that gets put up on the official Website to let the players know a maintenance is coming.

While this is going on, the developer is delivering all of the new files and a list of changes to the server engineer and the producer. The server engineer must check the files to ensure they contain no errors. Then they’re loaded onto our internal testing environment. That’s where the QA testers come in. They do some basic testing to ensure that the changes cause no serious issues. Once they’re satisfied, they give the server engineer the green light. This all happens in the two days before the maintenance itself.

Step Two – Bring it Down

When the maintenance begins, the first thing that happens is we block the login port. This keeps anyone but us from logging into the game. Then we shut down the login servers. These are the servers that maintain the players’ connection to the game servers. Once these are down, all players are removed from the game though the game worlds are still running. Finally the game servers themselves are shut down. This takes over an hour because each game server has to run a number of “synchronization checks” to make sure the game database has an accurate record of the current state of all players.

These synchronization checks are actually a large part of the maintenance period and one of the major reasons they’re necessary. If the game servers aren’t shut down on occasion to run these checks, they can become unstable. The last servers brought down are the databases. This only happens if we have to work on them though. If there’s an unscheduled server maintenance, we try to not bring down the database servers if we can help it. This helps keeps the maintenance as short as possible.

Step Three – Update

Once the servers are down, we start updating the server code. Server code changes include minor bug fixes, add some sort of internal data tracking tools, or adjust in-game content. This is actually the quickest part of maintenance, usually taking only about five minutes.

The real time-consuming changes come when we have to make changes to the database. Even routine database work like world transfers and name changes can take about an hour and a half on average, depending on the number of requests. Sometimes though, more complicated changes are needed. Since the databases contain all the information for every character that exists in the game – every character. That character you haven’t logged into for two years? It’s in there. That Cygnus Knight you played to level 30 and never touched again? It’s in there too. That’s a lot of ones and zeros to dig through, so any change takes some time. It gets worse if a query for a database change fails and must be started over. That can add hours to the downtime.

While the servers are down, we also usually update the Cash Shop. Updating the Cash Shop doesn’t actually require a server maintenance as it’s located on a separate set of servers. That allows us to update and maintain the Cash Shop without affecting the game servers if needed, but we try to schedule Cash Shop maintenances around server maintenances.

Step Four – Bring it Back Up

Once all the changes have been made, we can restart the servers. Bringing servers back online happens in the reverse order from bringing them down. First we bring up the databases, then the game servers, then the login servers. Bringing servers back up takes about an hour and is usually the easiest part of the maintenance. We rarely encounter errors or issues during this step.

Step Five – Testing

Once all the servers are back up and running, we leave the login port blocked and enter the game to test it. We log into every channel in every world, test the Cash Shop and Maple Trading System, create and delete characters, and test new account creation. These tests are to ensure that nothing abnormal has happened to the servers. This testing takes about thirty minutes. Once it’s completed we can open the game to players.

This is often where delays happen even though we schedule twice as long as we need for the testing itself. That’s because we have hundreds of servers and sometimes one or more fails to restart properly. Sometimes there are network connection issues. This step allows us to see those problems before players experience them. If and when these very serious issues occur, we can bring the servers back down to fix it. Of course this causes the maintenance to be extended, so it’s the one thing we never want to see during this step.

Step Six – Back to adventure!

Once the maintenance is over and the game is stable, the developers and server engineer go home to sleep. Once a server maintenance begins, no one working on it leaves until it’s complete, so some of us will have been up for over 24 hours. In the mean time, the producer and QA testers monitor the game, read the forums, and hang out in populated areas of the game to make sure players are not experiencing any problems. Assuming everything went smoothly, the maintenance is all done – and then the rest of us can go home to sleep!

I hope you enjoyed this little peek into what we do during server maintenance. Quite frankly, they’re not my favorite part of my job (I know you don’t like them much either), but they’re the unglamorous necessity that keeps MapleStory running. Hopefully now they’re less of a mystery.

Until next time Maplers,

Eurydice

The MapleStory Team

Advertisements

Posted on February 4, 2011, in Dev Blog, Global MapleStory, Nexon America. Bookmark the permalink. 2 Comments.

  1. TY! THIS FINALLY LETS US UNDERSTAND NEXON’S DOWNTIME!!!!

  2. DSdavidDS :
    TY! THIS FINALLY LETS US UNDERSTAND NEXON’S DOWNTIME!!!!

    agreed .. dev blogs are always fun .. now waiting for the next (the second) blog party update vid

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: