Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (808)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (872)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Four hour downtime as mysql tumbled  (Read 13389 times)
0 Members and 1 Guest are viewing this topic.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Posted 2015-05-23 19:59:09 »

The MySQL server was janked from underneath JGO, giving you the opportunity to finally getting around to do some coding, for once. I'm sorry it took so long for me to notice a skype message by theagentd informing me of these matters, as I was drilling holes in walls and ceilings, and working on my upper body strength as I was attaching curtains to rails and moving them around for a bit.

Having learnt my lesson, I will probably setup some notification mechanism to signal my phone in case JGO goes belly up. I mean, with all the money flowing in from arguably distasteful to downright obscene ads, that's the least I could do, right? Okay!

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline theagentd
« Reply #1 - Posted 2015-05-23 20:50:21 »

You responded within seconds of my Skype message, so you're not entirely correct. =P

Myomyomyo.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #2 - Posted 2015-05-23 22:42:07 »

That was purely coincidental. I walked in my bedroom for the first time that day and saw a Skype message. It's as if the universe was trying to tell me something. True story.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline klaus
« Reply #3 - Posted 2015-05-24 10:43:33 »

If you don't want the hassle of setting up nagius monitoring or similar I can recommend the free service https://uptimerobot.com/

I use it for my sites and even webservices.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2015-05-24 15:14:40 »

I took JGO down for 3 minutes to check whether the down-alert emails/SMS/app-notifications were properly harassing me. Next time JGO goes down I'll know. Trust me, I'll know.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline ClaasJG

JGO Coder


Medals: 43



« Reply #5 - Posted 2015-05-25 10:45:56 »

But last time JGO was not down,
it showed some message about a sleeping admin Wink

I doubt uptimerobot will yield any result?
(You could ping it successfully)

-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2015-05-25 10:49:56 »

I do not use uptimerobot, I use Pingdom, which can be configured to check for specific words in the http-response. This 'JGO check' has been active since 2010, I forgot about it - yesterday I connected it to my smartphone & email.



http://stats.pingdom.com/1iyt8llwhe3z

I case you're wondering where the dreaded 4 hour gap is: I added the http-response check only yesterday. It shows the 3min down-time, as I took MySQL down, eventhough Apache was still up. I set it up to check for availability every minute, and get notified when 5min consecutive checks fail.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Drenius
« Reply #7 - Posted 2015-05-25 14:48:41 »

Offline hwinwuzhere
« Reply #8 - Posted 2015-05-25 15:03:01 »

@Drenius That made me laugh very hard (inside my head that is). You sir made my day. Thanks  Pointing

There are two kinds of people in this world: Those who can extrapolate from incomplete data,
Offline Soulfoam
« Reply #9 - Posted 2015-05-25 18:46:35 »

It was a bit worrysome after the first 2 hours. The first hour went by and I was like, Riven's working on it for sure.

2 hours went by... okay he's still working on it....

3 hours went by... Riven.. ...  Emo Clueless

4 hours... ....

Figured it'd come up sooner or later though, glad it wasn't longer Tongue.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #10 - Posted 2015-05-29 05:21:00 »

 persecutioncomplex

The database failed in a completely new way this time, causing the SMF template still to be loaded. Pingdom worked nicely, but as I was actually asleep, it was to no avail. Sadly, I don't have time to investigate the underlying issue, at this moment. I simply resorted to starting the MySQL database again. Something is seriously wrong, but it'll have to wait. I cannot even take a peek at work, as today is going to be a teambuilding day, where we'll be riding around in jeeps and do other things like looking at 200 year old buildings from a boat... yay. Cranky

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #11 - Posted 2015-05-29 06:48:03 »

Aren't team building exercises fun?? Don't you enjoy building lasting relationships with your coworkers? They're such a productive use of time! Cranky

Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2015-05-29 07:02:00 »

I get along with my colleagues just fine, even after work and occasionally in the weekends. Who wudda thunk!

Unrelated: why are all MySQL log-files missing info about the crash... it's like the mysql-process was instantly terminated. Leaving me with nothing to analyze. I'm feeling a bit hesitant towards writing a script that restarts mysql automatically once it notices it is down... persecutioncomplex

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1146
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #13 - Posted 2015-05-29 07:47:57 »

Next time suggest some sort of painful team building exercise such as paintball. Oh yes.

Cas Smiley

Offline DarkCart

JGO Kernel


Medals: 124
Projects: 9
Exp: 50 years


It's all in the mind, y'know.


« Reply #14 - Posted 2015-05-29 12:01:08 »

Next time suggest some sort of painful team building exercise such as paintball. Oh yes.

Cas Smiley

John: Hey Bill!

Bill: What.

John: You're OUT!

*Bill gets hit in the chest with a paintball*

The darkest of carts.
Offline Opiop
« Reply #15 - Posted 2015-05-29 13:06:29 »

My company's tech department loves to watch the horror spread over a new software developer's face as they are told they need to tell a joke in front of ~50 other techies. I managed to evade mine, but other people weren't so lucky. Does that somehow count as a cruel form of team building?
Offline KevinWorkman

« JGO Plugged Duke »


Medals: 288
Projects: 12
Exp: 12 years


HappyCoding.io - Coding Tutorials!


« Reply #16 - Posted 2015-05-29 17:35:21 »

Is the "X new posts" notification thing that shows up at the top broken now?

(Why am I the only one who notices when this breaks, is everybody else doing something cool that I don't know about?)

HappyCoding.io - Coding Tutorials!
Happy Coding forum - Come say hello!
Offline DarkCart

JGO Kernel


Medals: 124
Projects: 9
Exp: 50 years


It's all in the mind, y'know.


« Reply #17 - Posted 2015-05-29 17:36:23 »

Is the "X new posts" notification thing that shows up at the top broken now?

(Why am I the only one who notices when this breaks, is everybody else doing something cool that I don't know about?)

I noticed this too. So no, you're not the only one.

The darkest of carts.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #18 - Posted 2015-05-29 20:36:25 »

It's a bug in the MySQL JDBC driver, which won't give you a fresh connection to the database after the database crashed. When you request a brand new connection, you'll receive a stale connection, which instantly hangs indefinitely on the first query. To work around this I have to randomize the db connection string (url), but that code isn't yet in the push-notification service I wrote years ago. So I have to restart the service after MySQL comes back up after a serious crash.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline hwinwuzhere
« Reply #19 - Posted 2015-05-29 20:46:31 »

but that code isn't yet in the push-notification service I wrote years ago. So I have to restart the service after MySQL comes back up after a serious crash.

Somehow this reminds me of an xkcd comic where the developer finds out a piece of his code is completely broken, yet he'd anticipated that when he wrote it and added a nice comment about it to his future self.

</rant>

There are two kinds of people in this world: Those who can extrapolate from incomplete data,
Offline Opiop
« Reply #20 - Posted 2015-05-29 20:47:07 »

How do you randomize the connection string and not have a connection error? Don't you always need to use the same URL when connecting to the SQL box?
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #21 - Posted 2015-05-29 21:11:43 »

jdbc:............./...........?rndm=[nanos]

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #22 - Posted 2015-05-29 21:25:50 »

but that code isn't yet in the push-notification service I wrote years ago. So I have to restart the service after MySQL comes back up after a serious crash.

Somehow this reminds me of an xkcd comic where the developer finds out a piece of his code is completely broken, yet he'd anticipated that when he wrote it and added a nice comment about it to his future self.

</rant>
it's not so much a TODO in sourcecode, I didn't know it at the time, and couldn't be bothered to fix/redeploy ancient code. having said that, i'll have to find the code first Clueless


as for my team building day (i know you want to know!) it was basically half a day in a boat where everybody had to stfu as a guide was talking, and the other half we were driving around in jeeps screaming 'left after 375m! 200m! 50m! LEFT!' at the collegue behind the wheel, whilst banging heads against the roof.

is this considered team building? getting to old for this.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #23 - Posted 2015-05-29 21:50:41 »

okay, it just crashed again, and this time I got:
tail -n 1000 /var/log/messages | grep mysql

1  
2  
3  
4  
5  
6  
7  
8  
9  
May 29 21:39:44 (none) kernel: mysqld invoked oom-killer: gfp_mask=0x10200da, order=0, oom_score_adj=0
May 29 21:39:44 (none) kernel: mysqld cpuset=/ mems_allowed=0
May 29 21:39:44 (none) kernel: CPU: 0 PID: 22025 Comm: mysqld Not tainted 3.18.5-x86_64-linode52 #1
May 29 21:39:44 (none) kernel: [14391]     0 14391     1048        5       7       33             0 mysqld_safe
May 29 21:39:44 (none) kernel: [21934]   105 21934   151610    13523      96     4306             0 mysqld
May 29 21:39:44 (none) kernel: mysqld: page allocation failure: order:2, mode:0x2000d0
May 29 21:39:44 (none) kernel: CPU: 0 PID: 21934 Comm: mysqld Not tainted 3.18.5-x86_64-linode52 #1
May 29 21:39:47 (none) kernel: [14391]     0 14391     1048       13       7       24             0 mysqld_safe
May 29 21:39:50 (none) kernel: [14391]     0 14391     1048       23       7       14             0 mysqld_safe


So mysql-server runs entirely out of memory, invoking the 'oom-killer'.

At least that's a place to start looking for solutions. I'll try to fine tune my.cnf tomorrow, when I'm less braindead. If anything, I could upgrade the Linode node. We're running on the cheapest node at the moment. It might be time to upgrade anyway. Obviously the root-cause of this may swamp a bigger memory pool similarly. We'll see.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #24 - Posted 2015-05-29 22:04:47 »

There. 13min later we're running on a brand new VM Smiley

Old trusty VM was hovering around 20MB free RAM just after launching MySQL. Now we have ~1GB headspace. Gotta get some sleep now!

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline theagentd
« Reply #25 - Posted 2015-05-29 22:33:43 »

Jesus christ, one round of applause for Riven.

Everyone: "EVERYTHING'S BROKEN"
*5 min later*
Riven: "Okay, migrated server, patched the broken SMF code, added some new features, repelled spam bots and hacking attempts, etc etc etc"
Everyone: "=O"

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 508
Projects: 3
Exp: 5 years


I'm the King!


« Reply #26 - Posted 2015-05-30 23:23:04 »

He's like some sort of wizard or something....

Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #27 - Posted 2015-06-05 18:35:43 »

 Emo

The MySQL database was corrupted, records were lost, table column definitions had lost their 'default' value, I actually had to reinstall MySQL from scratch, as critical metadata info (in the databases 'information_schema' and 'mysql') about tables was corrupted or had vanished completely. When starting the mysql service sys-log exploded with errors.

I tried to uninstall MySQL and reinstall it, but that retained the metadata and corrupt SMF table definitions. As a last resort, I uninstalled MySQL, dropped the datadir and turned to backups to rebuild the JGO database from there. persecutioncomplex

The last post is from ~10 hours ago, so quite a bit of today's content has been lost, but at least we have write-access to the database again.


The root cause of all this instability is still a bit of a mystery, but 'luckily' it's weekend, so I've got a ton of time to dig deeper.



Update: as JGO runs on a larger Linode VPS now, I have more diskspace to hold backups. For the kind souls mirroring http://java-gaming.org/recovery/ it's important to know that the backup interval has been increased from once per day, to every 4 hours, increasing the size of the backups by factor 6.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #28 - Posted 2015-06-05 19:09:44 »

As for why JGO was in limbo for such an eternity... the corrupted state of the database did not affect the homepage, causing Pingdom not to alert me of any problems... it was a busy day at work, and once again theagentd alerted me over Skype. It took me about 30 minutes to get home, and a horrifying full hour to get everything back up.

Now stop medal-slapping, I'm merely doing what I'm supposed to do Pointing

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline theagentd
« Reply #29 - Posted 2015-06-05 19:16:06 »

Quote
Sorry, you can't repeat a karma action without waiting 1 hours.
       
._.

Myomyomyo.
Pages: [1] 2
  ignore  |  Print  
 
 

 
Riven (846 views)
2019-09-04 15:33:17

hadezbladez (5789 views)
2018-11-16 13:46:03

hadezbladez (2602 views)
2018-11-16 13:41:33

hadezbladez (6205 views)
2018-11-16 13:35:35

hadezbladez (1498 views)
2018-11-16 13:32:03

EgonOlsen (4733 views)
2018-06-10 19:43:48

EgonOlsen (5791 views)
2018-06-10 19:43:44

EgonOlsen (3275 views)
2018-06-10 19:43:20

DesertCoockie (4174 views)
2018-05-13 18:23:11

nelsongames (5500 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!