( search forums )
Running a STABLE Linux Server?
Soldat Forums - Server Talk - Server Help
Vostok 4
March 3, 2005, 5:21 pm
First of all hi everyone, I'm a new Soldat player who has setup a server (it seems to be quite full all the time when its running) but I'm getting really annoyed with running it. First of all, it seems the server randomly dies for no reason what so ever. So I decided to run the soldatmonitor.pl script after reading it through, and everything looks good. Now I wake up today, and there are 2 (TWO instances) of my screen running, I have no bloody idea how that happened, and... 2 instances of soldatmonitor.pl and about 934539875938475 new 0byte logfiles.

Does anyone here run a STABLE linux server? Ie. a month uptime or more? I would really like to get some stats up plus some community features, but I can't do that if the server is dying every 10 seconds on me. After checking the soldatmonitor.pl log this is what I'm seeing:

[code][Thu Mar 3 03:48:34 2005] Process ID: 30421
[Thu Mar 3 03:48:34 2005] Soldat process exiting.
[Thu Mar 3 03:48:34 2005] Killing SOLDAT: 30421[Thu Mar 3 03:48:39 2005] Soldat Monitor Script Started.
[Thu Mar 3 03:48:39 2005] Checking Started.
[Thu Mar 3 03:48:39 2005] Process ID: 30449
[Thu Mar 3 03:48:39 2005] Soldat process exiting.
[Thu Mar 3 03:48:39 2005] Killing SOLDAT: 30449[Thu Mar 3 03:48:44 2005] Soldat Monitor Script Started.
[Thu Mar 3 03:48:44 2005] Checking Started.
[Thu Mar 3 03:48:44 2005] Process ID: 30478
[Thu Mar 3 03:48:44 2005] Soldat process exiting.
[Thu Mar 3 03:48:44 2005] Killing SOLDAT: 30478[Thu Mar 3 03:48:49 2005] Soldat Monitor Script Started.[/code]

Over and over and over and over (you get the point).

Is there a fix for this? Should I contact the author of the soldatmonitor?

I dont seem to have as many problems when running the server on its own, BUT... but... if it dies I have to restart it right away and I'm not always around, I could write a cron script to check it... however... I'd like to get a clean fix :)

Thanks again, and I'll post the info under the lounge when I get some more things setup!

FliesLikeABrick
March 3, 2005, 5:31 pm
MM seems to be working on server-side issues more lately, hopefully he will resolve the infamous linux instability issues sometime soon, more specifically hopefully before i convert my servers to linux :P

Vostok 4
March 3, 2005, 6:02 pm
Hopefully :? It's kind of hard to get a community started when every 30 minutes everyone is dropped. Although I think this is something to do with servermonitor by tank... I'll try to shoot him an email but take a look at this, another typical restart:

[code][Thu Mar 3 12:41:27 2005] 0-Time Left: 1 minutes
32491 [Thu Mar 3 12:41:44 2005] Ports not found running 32491 UDP PORT 23073 - DOWN 32491
[Thu Mar 3 12:41:44 2005] Admin Port Disconnect
[Thu Mar 3 12:41:44 2005] Admin process restarting
[Thu Mar 3 12:41:44 2005] Connected.
32491 [Thu Mar 3 12:41:44 2005] Ports not found running 32491 UDP PORT 23073 - DOWN 32491
[Thu Mar 3 12:41:44 2005] Admin Port Disconnect
[Thu Mar 3 12:41:44 2005] ADMIN: Setting status to down.
[Thu Mar 3 12:41:44 2005] Soldat process exiting.
[Thu Mar 3 12:41:44 2005] Killing SOLDAT: 32492[Thu Mar 3 12:41:47 2005] Process ID: 8721
[Thu Mar 3 12:42:09 2005] Admin process restarting
[Thu Mar 3 12:42:19 2005] Connected.
32491 [Thu Mar 3 12:42:25 2005] Ports not found running 32491 UDP PORT 23073 - DOWN 32491
[Thu Mar 3 12:42:25 2005] Admin Port Disconnect
[Thu Mar 3 12:42:25 2005] ADMIN: Setting status to down.
[Thu Mar 3 12:42:25 2005] Soldat process exiting.
[Thu Mar 3 12:42:25 2005] Killing SOLDAT: 8721[Thu Mar 3 12:42:28 2005] Process ID: 8928
[Thu Mar 3 12:42:50 2005] Admin process restarting
[Thu Mar 3 12:43:00 2005] Connected.[/code]

It seemed to stopitself after that last SOLDAT and the server SEEMS to be running, but why is it just "setting status to down" all the time? Don't set status to down! hehe

FliesLikeABrick
March 3, 2005, 6:34 pm
yeah, but tank's script is a means of dancing around the problem until MM can actually fix the linux server instability

edit: tonight i will take a look at his script and see if i can provide any insight into what it is doing

Vostok 4
March 3, 2005, 6:40 pm
Ok, let me know if you get anywere. I'm going to add in some debug echos so I know what's triggering the ADMIN: Setting Status to Down, then I'll be able to pinpoint the problemo.

[EDIT]

Ok, I think I'm on to something. I'm trying to understand the damn logic behind this script, and I think I found a flaw.

[code]if ($soldatstatus->{failed} > 6) {
$time = localtime();
print LOG "[$time] ADMIN: Setting status to down.\n";
$soldatstatus->{status} = 0;
print MAIN_ADMIN "status = 0\n";
sleep 25;
}[/code]

The variable as far as I can see is increased every time an error occurs, so say it disconnects from the admin port. But, if it disconnects from the admin port 6 times, and reconnects (while the server is still running) it will shut down the server and restart it, even though there is no server troubles. I'm going to dig deeper but I think this is the problem of the monitor script just shutting down the server when its not necessary.

EDIT:

Ok, I found the block of code that SHOULD reset the failed status, however fundamentally I think this is the wrong way of going about whether or not we should restart the server. I'm not quite sure what this does in regards to line count > 5, but I hope to find out.

[code]# Output Loop
foreach my $line (<$sock>)
{
if (!($line =~ m/\.\r\n/))
{
$time = localtime();
print LOG "[$time] $lineno-".$line;
$lineno++;
if ($lineno > 5)
{
$soldatstatus->{failed} = 0;
}
my @senddata = $parentselect->can_write(.1);
foreach my $adminhandle (@senddata)
{
print $adminhandle $line;
}
}
}[/code]

OK, not an edit but an update :)

I messed around with the script, and I told it to ONLY kill the server if it CANNOT connect to the admin port at all (which would only happen when the server is dead, fair enough).

I also removed all ASE crap because I don't really care for it... I'll keep ya updated on how smooth its going to run.

[EDIT]

OK, server is running fine it seems! I'm gonna keep my fingers crossed but we are gonna hit up probably one full map rotation... best yet!

Got a little bug to work out still, get this alot:

[code]
[Thu Mar 3 15:11:05 2005] Admin disconnected.
[Thu Mar 3 15:11:05 2005] Admin connected.
[Thu Mar 3 15:11:05 2005] Admin disconnected.
[Thu Mar 3 15:11:05 2005] Admin connected.
[Thu Mar 3 15:11:05 2005] Admin disconnected.
[Thu Mar 3 15:11:05 2005] Admin connected.
[Thu Mar 3 15:11:17 2005] Admin disconnected.
[Thu Mar 3 15:11:17 2005] Admin connected.
[/code]


Not sure why it does that so much within seconds, I'm gonna take a look at the script again, but for now I'll just leave it be, it actually seems to be working...

[EDIT AGAIN]

Okily dokily, so for now it seems to be working much better, no crashes as of yet, server has had some nice load, about 20 clients for past 2 hours, and no problems. For me thats good enough (compared to what it was before) so what I'm going to do is post the file I fixed and explain a bit of what I did... (*NOTE* I only did the .pl script so you have to use that)

If you look ahead the way that the script tested if the server is dead it waited for 6 counts of a failure to restart, and it would then shutdown/start up the server. But, for somereason everytime the secure admin port dropped out it would add another failure, and, if you look above again, it could happen approximately 6 times a second, causing the server to conk out. I'm not even sure if the procedure to cancel all failure requests works as I did not sit down and trace the code.

So, what I did is removed all instances where the failed variable was increased and just added it to create a flag on one occation: When the secure admin port could not connect to the regular port (note the difference, we CANNOT connect in this situation versus we got DISCONNECTED).

Also, this script has ALL ASE junk removed, I don't really care about that for my server as it gets enough hits on the lobby and it just creates extra unnecessary overhead. If you can't do what I did yourself and you want ASE I can post another version. The fixed script is here:

[URL]

FULL credit for the script goes to Tank and his original thread [URL] (there), I just fixed er up a little :) I also added a bit more verboseness on some errors just for my needs.

[ANOTHER EDIT]

OK, SO... so far, the server has been under quite considerable load, pretty much people playing all day for the past 2 days, and the server hiccuped once, but that wasn't the script. Buut the script took care of it right away, restarted and we are off for some fun again :D It works better than I could have hoped for right now, hopefully other linux admins will find this helpful... and again thanks Tank!

FliesLikeABrick
March 3, 2005, 9:12 pm
please combine your three posts into one using the 'edit'

personally I don't care if you leave them, but others will flame you for it. I'll delete this post after you play the 'lets manage our post game'

Be sure to post whatever else you come up with from this

Michal Marcinkowski
March 5, 2005, 10:32 am
quote:
[Thu Mar 3 15:11:05 2005] Admin disconnected.
[Thu Mar 3 15:11:05 2005] Admin connected.
[Thu Mar 3 15:11:05 2005] Admin disconnected.
[Thu Mar 3 15:11:05 2005] Admin connected.

That's probably the script trying to connect, I don't know why so many times.
What error does it give you when the server dies without using any script?

Vostok 4
March 5, 2005, 10:54 am
It doesn't... it just dies. The gamestat.txt is still old, and nothing happens. However, running the script I finally managed to get to an unrecoverable crash:

[code]
[Sat Mar 5 02:42:16 2005] 37-Bad packet size: rp8 rs5
[/code]

Soldatmonitor for some reason did not catch this, but it sat like this for about an hour till I manually restarted it.

Michal Marcinkowski
March 5, 2005, 2:59 pm
hmm thanks I'll check it.

n00bface
March 6, 2005, 5:00 am
Very nice script edit...Be sure to post an update when you get the repeated admin connections and disconnections sorted..my average daily log size went from 2MB to 9MB because of it :O

Vostok 4
March 7, 2005, 5:01 pm
n00bface, I don't really see how this socket connect says admin port disconnecting reconnecting, to me it just sounds like something is screwy. I think instead of messing around with this script before Tank gets back or comes alive again, I'm going to write my own c-based script, I'll try to start off with a bit different method of implementation. By the way, I just removed the Admin Port Disconnected/Restarted junk because it was clogging the logs, so you can get the update here: (Or remove it yourself) Again, full credit goes to Tank for the great script!

[URL]

Mole_Incarnate
March 8, 2005, 2:03 am
quote:Originally posted by Vostok 4n00bface, I don't really see how this socket connect says admin port disconnecting reconnecting, to me it just sounds like something is screwy. I think instead of messing around with this script before Tank gets back or comes alive again, I'm going to write my own c-based script, I'll try to start off with a bit different method of implementation. By the way, I just removed the Admin Port Disconnected/Restarted junk because it was clogging the logs, so you can get the update here: (Or remove it yourself) Again, full credit goes to Tank for the great script!

[URL]


Suppose I should drag out tank to check these probs out for you. *goes off to snag tank with a bent pointy stick*

Tank
March 8, 2005, 2:39 am
Hey Vostok,

I had a look through the modifications, its hard to pick because the spacing has changed so diff sees every line as different.

Anyway line 444:
if ($soldatstatus->{failed} == 99) {

Your setting that rather high ? It should also be a > 98 rather then == because there is multiple sections of the code that incremement failed and it could very well skip over 99.

print $session qq(Welcome Sir...
You are chillin as the admin of the Sluts [IMAGE]hole...
Make me pround.);
print $session "Tank Script modified by V4\n";

Have you tested this with remote admin yet ? When I initially modified the welcome message it remote admin tool stopped thinking it was connected successfuly. Hence why my script prints the same exact 3 lines as MM's then my text.

Insight into your mods would be great and Ill update the main.

I stopped work on this because their is not much further you can go, and really it would be much easier to work on the soldat code, but I was told MM is pretty protective of source code help.

Tank
March 8, 2005, 2:45 am

I've had a harder look at the code I've written and your write failed never gets reset properly. The logic behind the reset was that if the server printed 5 lines of text between the time it took to get to 6 failed tries that the server was probably fine, and if it wasnt 6 more failed tries were going to come up soon anyway. However my code only resets on the 5th line, not every 5 lines.

Is their anything else you think I missed vostok?