|  | | | 
25.11.2007, 19:54
|  | The Architect | | Join Date: May 2005 Location: Zollikon, Switzerland
Posts: 3,182
Groaned at 3 Times in 3 Posts
Thanked 418 Times in 115 Posts
| | | Recent forum downtime
Update 09.12.2007: Photo system back online. Details here.
Update 06.12.2007: 500 missing attachment restored. Details here.
The last four days have seen the biggest outage in English Forum's history. Between early Thursday morning and late Sunday evening I know that many of you were unable to get your "fix". Your employers probably noticed that the amount of actual work you did shot up, and maybe you rediscovered some important parts of your personal lives with all that free time. I wish I could say the same about the last four days...
Before we talk about what happened and why, let's start with what we have right now. This is mostly good news. - The forum has lost only three hours of messages. These hours were in the early morning of Thursday 22nd November, when most of you were asleep. So for practical purposes - no posts were lost.
- Almost everybody has their same avatar and/or profile picture. Some of you who had updated it recently will find that it doesn't display. You can easily fix this by uploading it again.
- Some forum functions such as the photo gallery do not work. They will be fixed over time. In the case of the photo gallery many of the pictures were lost.
- All files attached to posts were lost. You can read about how you can help fix this situation here.
So all in all, as far as forum disasters go, I'm quite happy with the result. A lot of other stuff has been lost, but it's all behind the scenes stuff to do with the smooth maintenance of the server. In my own personal case it represents a lot of time and effort down the drain. Hopefully I'll be faster with some stuff the second time around.
So what happened and why did it take so long to fix? I'll try to keep the technical information for the end, because I realise that most of you aren't interested, but some of you are. Early in the morning on Thursday I discovered that the server that hosts English Forum was down. This is my own server, which lives in a data center (raised floors, aircon, backup power and all that) and it also hosts many other domains, though as far as traffic and server load go, English Forum takes up most of the resources.
To cut a long story short, in the process of trying to resolve the situation (which was caused by virtual disc files expanding to the point where the system which carried them ran out of space) I lost the server. Totally. Toasted. I had to reinstall it from scratch and go to my backups.
So why did it take four days? Well that's where the fun began. Because I'm a cautious kind of fellow I have all the backups made to a completely separate (non virtual, physical) drive. This drive was not affected, but I quickly found that most of my backups were corrupted and unuseable. To make matters worse, the magic files I needed which tie all the backups together were not there. Why? The crash happened in the middle of a backup. Why didn't I have other backups? Well I should have, but it turns out the script which was supposed to save old copies of backups had not been working (just the script which was supposed to stop the discs becoming full).
Some of you work in the IT industry, and some of you may have written disaster recovery plans. I've written a few in my time, but they are very seldom tested and are often a work of fiction. Anyone who has been through this themselves will tell you that a backup is worthless unless you actually try to restore it. But I digress.
Faced with the reality of the situation I had to work with what I had. I had to manually cobble together the 60 or so other domains on the server and restore what I could from various collections of older backups. It also took time to deal with all the unhappy customers (all were really great and understanding which helped a lot). Since english forum would kill an unoptimised server (due to the high load), I couldn't even start to put it in place until a lot of other ground work was in place and the other sites were up and running. All in all, considering the amount of corruption on the backup drive, pretty much everyone got almost everything back.
We have no loss of message content because I back the english forum database up every hour. Call me paranoid, but with the amount of posts I knew that in a disaster situation losing posts is bad for a forum. Unfortunately everything else around the forum was a little different. I did have backups, but most were unusable. The reason we lost a couple of hours was because the first two database backups were corrupt, but the third was ok.
Why were my backups corrupted? I don't know. Could have been a physical problem with the disc, or the operating system may have done it. If you have a file of several hundred megabytes, it only takes a single byte of corruption to destroy it. The larger the files, the higher the failure rate. My backup drive is less than a year old, and I'd never seen any problems with it. Suffice to say, I'll be checking the integrity of the backups more thoroughly from now on.
Unfortunately I still have much work to do, so you won't see me on the forum very often in the next couple of weeks, but be rest assured I'll be thinking of ways to avoid a repeat of this incident. Anyone who has worked in this field knows it's the sort of thing that can keep you awake at night.
I'd like to thank Lob Rockster, gregv and swissbob who helped me out with a few things during the recovery. I'm sure they and the rest of the hard working EF moderator team will be doing their best over the next few weeks to help us bounce back from this. I'd also like to thank those of my commercial customers who host their domains on the same server for their patience and understanding through this difficult time.
Last edited by mark; 12.12.2007 at 01:08.
Reason: removing forum downtime message - completed
| | The following 104 users would like to thank mark for this useful post: | adrian76, ali_the_nomad, BenK, Big Dream, billie, Blonaybear, Bookworm, Brianb_ie, brüder, Bubugala, caninecounselor, chipmaker, chrisch, ChrisW, clive7, Colonelboris, Cooper, CornerFlax, couta, Crumbs, dalehauskins, Darkphoenix, DaveA, dbsb, Dodger, draculin, EastEnders, edot, eejit, flow23, Galatea, Gav, grumpygrapefruit, helencho, i-b-deborah, Jack, jannewbold, Jekyll, jemma, jonnyt, jot, kfcfriend, Kittster, konijn, krlock3, lucy_sg, magyir, mannie organ, Maple Leaf, Mark75, Martin79, Mikeybroomers, mila cruz, mimi1981, miniMia, MissBehaving, moosealot, MrsZurich, muze7, möpp, Natasha, NatsBrit, Oldhand, Ollie, outrage, Pacman, panamahat, pat, PlantHead, Polorise, quinallex, readingsteve, ric, Ritchi, Rob, robban, RSargeant, SamC, SamCole, Sandgrounder, Scott, seb23, Simon, smackerjack, Smitty, southie, StirB, Suermel, Sutter, telandy, terryhall, thoean, tigerli, tildaoz, timpy, Uncle Max, undercovermoles, Woodsie, WorldTraveller, Yorkie | 
26.11.2007, 01:17
|  | The Architect | | Join Date: May 2005 Location: Zollikon, Switzerland
Posts: 3,182
Groaned at 3 Times in 3 Posts
Thanked 418 Times in 115 Posts
| | | Re: Recent forum downtime
Changed the content of the initial post with more useful info and opened the thread for discussion.
| 
26.11.2007, 01:47
|  | Member | | Join Date: Sep 2007 Location: Zurich
Posts: 164
Groaned at 1 Time in 1 Post
Thanked 56 Times in 40 Posts
| | | Re: Recent forum downtime
Having been a database administrator in a past life, I have had my share of weekends spent restoring from backups
All your (and the other admins') effort is very much appreciated. I'm off to visit that Donate button. Even if you're doing it as a hobby, some beers are probably in order.
| 
26.11.2007, 07:06
|  | Senior Member | | Join Date: Mar 2006 Location: Zurich
Posts: 280
Groaned at 0 Times in 0 Posts
Thanked 54 Times in 36 Posts
| | | Re: Recent forum downtime
As always: Thank you so much, Mark. I am time and again impressed with the care and effort you run this forum to the benefit of us all. I wish you a good "personal recovery" after all this extra work.
Thanks!!!!
Idgie
| 
26.11.2007, 07:27
|  | Forum Veteran | | Join Date: Jun 2007 Location: Zürich
Posts: 932
Groaned at 6 Times in 5 Posts
Thanked 665 Times in 338 Posts
| | | Re: Recent forum downtime
Legendary effort mate. I was having nightmares that all the info on the forum would be lost and we would have to start from scratch. Thanks for all the hours and hard work getting it back up and running again. Bet you'll be testing those recoveries now. | 
26.11.2007, 07:56
|  | Forum Veteran | | Join Date: Oct 2006 Location: Thurgau
Posts: 1,474
Groaned at 0 Times in 0 Posts
Thanked 266 Times in 199 Posts
| | | Re: Recent forum downtime
Lost this weekend without the forum, that Donate button will be pressed. Fantastic job and much appreciated by all.
| 
26.11.2007, 08:44
|  | Forum Veteran | | Join Date: Apr 2007 Location: Lausanne / Weybridge UK
Posts: 592
Groaned at 10 Times in 5 Posts
Thanked 241 Times in 166 Posts
| | | Re: Recent forum downtime
Thank you guys. Your hard work is VERY MUCH appreciated. My weekend was just not the same without my hourly fix of EF!!!!
Thanks again
| 
26.11.2007, 09:14
|  | Forum Veteran | | Join Date: Jul 2007 Location: Lörrach/DE
Posts: 671
Groaned at 6 Times in 6 Posts
Thanked 568 Times in 294 Posts
| | | Re: Recent forum downtime
Great work, Mark. And thanks for getting everything up and running once again.
It's a good thing that it was just a server crash. At first I thought that the IT dept at work had blocked EF: now that would be a real disaster...
| 
26.11.2007, 09:45
| | | | Re: Recent forum downtime
Mate I feel your pain, been there got the t-shirt. It's just very unfortunate that the backups went aswell..
One customer I worked with in the past insisted on backing up 3tb of data to tape, which took about 3-4 days to carry out. For some reason they couldn't get through their thick heads that if the server/disks went they'd lose up 4 days of data.
This warehouse was continually growing to add insult to injury due to their poor archiving/it management of records.
Well after 2yrs of all roses and sunshine it finally happened, and you can warn people all you like.
I just regurgitated the e-mails I'd held in a little folder warning them every 3 months that if it happened they would be in big trouble to their managers.
Oh well I got a gold star and 3 managers who ignored the advice were shown the door! It may look good on your budget now, but in the end it will cost you a job..
What was really funny was the data which was being held. I'm not going to go into this, but it's along the lines of criticality to a UK govt system, with what was lost on 2 cds by courier in England..(I might send you a pm Mark with the actual data stored, should bring a smile after all the carnage this w/e)
So again well done mate and time to write a disaster recovery plan to cover the disaster recovery plan I reckon.  (Damn tape drives eh!)
It's only now you begin to realise what a community EF is and how many people rely on it's content. Another time to step back and be proud of your creation mate I reckon <raises glass>
Time for a beer I reckon...You + Helpers have earned it.
All the best
Karl
| 
26.11.2007, 09:45
|  | Forum Veteran | | Join Date: Nov 2006 Location: Biel/Bienne
Posts: 844
Groaned at 7 Times in 6 Posts
Thanked 576 Times in 254 Posts
| | | Re: Recent forum downtime
phewwww
Thanks a lot guys for putting so much effort into 're-animating' the EF back to live again.It's much appreciated by me as well!!
| 
26.11.2007, 10:43
|  | Forum Veteran | | Join Date: May 2007 Location: Meisenberg Zug
Posts: 1,003
Groaned at 21 Times in 13 Posts
Thanked 284 Times in 182 Posts
| | | Re: Recent forum downtime | 
26.11.2007, 10:45
|  | Junior Member | | Join Date: Mar 2007 Location: Lutry
Posts: 64
Groaned at 0 Times in 0 Posts
Thanked 48 Times in 21 Posts
| | | Re: Recent forum downtime
I would also like to thank you for the hard work. Working in IT myself, I can feel the pain.
May i request that part of my donation goes into buying your wives/girlfriends/partners a bunch of flowers? I am sure they did not see much of you this weekend and should be thanked as well !
| 
26.11.2007, 10:51
|  | Forum Legend | | Join Date: Jul 2007 Location: ZH
Posts: 5,764
Groaned at 43 Times in 37 Posts
Thanked 6,668 Times in 2,851 Posts
| | | Re: Recent forum downtime
I thought our IT dept had blocked it because it's open on my computer all day. Got a bit paranoid until I saw the message about the crash later in the day.
Got absolutely no idea about computers and servers and their internal gubbins but hats off to the sterling work you made at the weekend putting the widgets, nuts and bolts back together on the EF.
| 
26.11.2007, 11:08
| | | | Re: Recent forum downtime
My thanks too Mark, from another one who realises what a pain this kind of thing can be.
| 
26.11.2007, 11:21
| | Member | | Join Date: Apr 2007 Location: ZH
Posts: 121
Groaned at 1 Time in 1 Post
Thanked 15 Times in 12 Posts
| | | Re: Recent forum downtime
You f-*/)(§ useless /&%! Get yer backups in %&$=/ order! I mean, I always have everything reliably backed up and.. oh no.. wait..
(Just kidding! Thanks for all your efforts, Mark. Been there a few times myself, though in my case almost always through my own stupid fault. Last few days have been v. boring without EF.)
| 
26.11.2007, 11:27
| | | | Re: Recent forum downtime
@mark
Thanks for the efforts Mark -- as an IT person myself, I can feel your pain
May I suggest that you check out rsnapshot ( www.rsnapshot.org). It's a great system for keeping backups always available. I use this in combination with regular tape backups and they have saved my hide more than once.
cheers,
Ben
| 
26.11.2007, 12:20
|  | Forum Veteran | | Join Date: Apr 2007 Location: Used to be Zurich
Posts: 1,257
Groaned at 22 Times in 18 Posts
Thanked 799 Times in 419 Posts
| | | Re: Recent forum downtime
Mark - I am sure that you have gotten loads of suggestions and probably don't need anymore, but I use a company called SIAG for personal and corporate backup solutions ( www.siag.ch). They are Swiss, and your data gets stored inside a mountain in Gstaad (kind of cool). The actual product is called SwissVault. Check it out.
fduvall
| 
26.11.2007, 12:37
|  | Forum Veteran | | Join Date: Mar 2007 Location: Die Südkürve
Posts: 1,864
Groaned at 12 Times in 11 Posts
Thanked 1,018 Times in 547 Posts
| | | Re: Recent forum downtime
Yep, cheers Mark - sorry for your lost weekend
I used to run a forum myself a few years ago, thankfully we never had any problems along these lines but just from knowing a small bit of the back end of a vBulletin forum I can imagine how much has gone on unseen and needs to be put back - ouch
Just to echo everyone else's sentiments, I was gutted on Thursday and Friday that I actually had to do some work for a change
(p.s. - have put all my attachments back in as requested in the link)
| 
26.11.2007, 14:24
|  | Forum Veteran | | Join Date: Apr 2007 Location: Zurich
Posts: 1,206
Groaned at 3 Times in 3 Posts
Thanked 991 Times in 521 Posts
| | | Re: Recent forum downtime
A huge thankyou to Mark and all others who've got the system working again. Can't even begin to imagine the gallons of coffee needed for that!
| 
26.11.2007, 16:25
|  | Member | | Join Date: Jan 2007 Location: Lausanne
Posts: 191
Groaned at 3 Times in 3 Posts
Thanked 68 Times in 35 Posts
| | | Re: Recent forum downtime
Cheers Mark et al. I was sooooo glad to be able to get my fix again today. | |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | | | | Thread Tools | | | | Display Modes | Linear Mode |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | All times are GMT +2. The time now is 00:26. | |