Diwali is a time of celebrations, sharing happiness, sweets, and gifts, and somewhere along the way, also celebrating and remembering the return of Lord Ram back to Ayodhya after 14 years in exile. It is India’s largest festival, sort of like what Christmas is in America, only a lot brighter, colorful and, loud.
While Diwali is the day to celebrate, have fun and spend moments with your family, it is also a time when web traffic is at its lowest at BCMTouring and my blog. Making it a perfect time to schedule a much needed and long server maintenance, which had been pending for a while.
Our server has been running on CentOS 5.x and this OS has reached the end of life. We needed to upgrade the operating system to latest CentOS 7.x. A task I envisioned would take around 12 hours. Since it involved backing up the HDD, then re-imaging the HDD with the latest version of CentOS and installing the requisite software and configuring them, before restoring all the data and accounts back to the freshly installed OS.
Since our server is located in US and server technicians are Americans, it meant the only Indian guy working on Diwali eve would be me. And I too didn’t have to do much, except monitor the progress and ensure services were configured as per our requirement.
At 10pm on 29th October, I closed the forum for visitors to ensure database writes did not occur during the backup process. And gave the green signal to data center technicians to begin their upgrade process.
While they worked to on setting up the backup drive, I went ahead and updated the plugins and forum software.
BCMTouring has a lot of data, over 160GB to be precise and backing up all that data takes a lot of time. Especially since our regular automated backup was also taking place alongside the backup.
Around 3am I finally decided to hit the bed, and then proceeded to wake up after every hour, to monitor the progress. Eventually early in the morning, I received an email from DC tech, asking permission to take the server offline and begin the process of re-imaging the drive. A process which took a little longer than expected, thanks to a change in shift. Then the second technician began installing and configuring web server software, which needed a little intervention from me, to ensure everything was configured in a manner we needed them to be.
By the time process of restoring the backup on the newly installed OS began, it had been almost 12 hours since the maintenance had begun.
This is why I had decided to schedule this maintenance on Diwali eve. Thanks to Diwali celebrations, most members were busy and others already knew, scheduled maintenance was in progress.
Around 3pm on 30th October I received an email from the technician, stating they were unable to restore BCMTouring from the backup they had taken earlier. Thankfully our recent automated backup had the same data, and technician began the process of restoring BCMTouring from that backup. With the third technician (second shift had ended) taking over the process.
Around 10pm, data was finally restored and BCMTouring was up and running once again, after 24 hours of downtime.
During the day, apart from this, I was also running around doing a few household chores, with thankfully most being managed by my sister. And Diwali being celebrated in its usual fervor and ending with a great dinner, watching BCMT come back to life.
Of course, as it always happens, no server maintenance is ever complete without a couple of issues afterward. And today at around 5:30am, half asleep, I received an email from our server, stating the HDD we had replaced during the weekend, had developed smart errors!
Now the technicians are back on duty, syncing the data and prepping for HDD replacement, hopefully later in the evening, of what is likely to be, another long day…