Junos Space is starting…

I recently ran across an issue with my Junos Space server, where after a seemingly random period of time, the application would crash and I’d be presented with the following message:

Junos Space is starting …

Junos Space is currently starting up. Please wait. Once it is ready, the log in page will be presented. Then clear the cache on your browser, close the browser and re-login from a new browser session.
Startup status messages will be displayed as they become available:

Typically a reboot of the server via “shutdown -r now” would temporarily fix it, until it crashed a again a few days or weeks later. I eventually learned that the root of the issue was full space on the /var directory, which is critical for the application, DB, etc. Here is how the issue was identified and corrected:

Log into the CLI for Junos Space, and hit option 7 to enter debug console:

Welcome to the Junos Space network settings utility.

Initializing, please wait


Junos Space Settings Menu

1> Change Password
2> Change Network Settings
3> Change Time Options
4> Retrieve Logs
5> Security
6> Expand VM Drive Size
7> (Debug) run shell

A> Apply changes
Q> Quit
R> Redraw Menu

Choice [1-7,AQR]: 7

Check for free disk space using the df command:

[root@space-005056b136b5 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jmpvgnocf-lvroot
                       29G  7.3G   20G  27% /
/dev/mapper/jmpvgnocf-lvtmp
                       17G  325M   16G   3% /tmp
/dev/mapper/jmpvgnocf-lvvar
                       47G   44G     0 100% /var
/dev/mapper/jmpvgnocf-lvlog
                       29G   16G   12G  58% /var/log
/dev/sda1              99M   17M   78M  18% /boot
tmpfs                 7.8G     0  7.8G   0% /dev/shm

Notice I’m 100% on /var. Not good. So what in particular on /var is consuming the disk?

[root@space-005056b136b5 ~]# cd /var/
[root@space-005056b136b5 var]# du -hs *
12K     account
16K     backups
30G     cache
5.0G    chroot
8.0K    crash
16K     db
16K     empty
9.7G    lib
8.0K    local
44K     lock
16G     log
16K     lost+found
4.0K    mail
8.0K    nis
94M     opennms
8.0K    opt
8.0K    preserve
236K    run
96K     spool
287M    tmp
25M     www
8.0K    yp

The /cache directly looks pretty large.

[root@space-005056b136b5 var]# cd cache/
[root@space-005056b136b5 cache]# du -hs *
200K    fontconfig
30G     jboss
16K     jmp
12K     jmp-watchdog
8.0K    libvirt
648K    man
8.0K    mod_proxy
8.0K    mod_ssl
8.0K    yum
[root@space-005056b136b5 cache]# cd jboss/
[root@space-005056b136b5 jboss]# du -hs *
212M    backup
40K     exportInventory
8.0K    fm
28K     import
2.7G    java_pid14561.hprof
2.7G    java_pid14569.hprof
2.7G    java_pid14637.hprof
2.7G    java_pid14720.hprof
2.7G    java_pid14919.hprof
2.9G    java_pid15590.hprof
3.0G    java_pid17052.hprof
756M    java_pid17278.hprof
2.7G    java_pid17379.hprof
2.7G    java_pid22426.hprof
3.9G    java_pid7340.hprof
16K     jmp
8.0K    LocalScript
8.0K    opennms

And here is the culprit, a bunch of old dump files from previous jboss crashes. Checking the date, these are rather old:

[root@space-005056b136b5 jboss]# ll
total 21987360
drwx------ 2 apache space       4096 Jul 30  2014 backup
drwxrwx--- 2 jboss  space       4096 Jul 24  2014 exportInventory
drwxrwx--- 2 jboss  space       4096 Apr 26  2014 fm
drwxrwx--- 2 apache space       4096 Apr 26  2014 import
-rw------- 1 jboss  space 2823398329 Mar 18 12:01 java_pid14720.hprof
-rw------- 1 jboss  space 2848171703 Dec  4 20:09 java_pid14919.hprof
-rw------- 1 jboss  space 3026975179 Oct  2  2014 java_pid15590.hprof
-rw------- 1 jboss  space 3195963924 Mar 19 13:23 java_pid17052.hprof
-rw------- 1 jboss  space  791781376 Apr 26 03:49 java_pid17278.hprof
-rw------- 1 jboss  space 2829805295 Nov 18 16:32 java_pid17379.hprof
-rw------- 1 jboss  space 2883593097 Oct 25  2014 java_pid22426.hprof
-rw------- 1 jboss  space 4093285485 Jul 24  2014 java_pid7340.hprof
drwxrwx--- 2 jboss  space       4096 Apr 26  2014 jmp
drwxrwx--- 2 jboss  space       4096 Apr 26  2014 LocalScript
drwxrwx--- 2 jboss  space       4096 Jul 16  2014 opennms

Let’s remove them all and check our disk space again.

[root@space-005056b136b5 jboss]# rm -rf *.hprof
[root@space-005056b136b5 jboss]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/jmpvgnocf-lvroot
                       29G  7.3G   20G  27% /
/dev/mapper/jmpvgnocf-lvtmp
                       17G  325M   16G   3% /tmp
/dev/mapper/jmpvgnocf-lvvar
                       47G   16G   29G  35% /var
/dev/mapper/jmpvgnocf-lvlog
                       29G   16G   12G  58% /var/log
/dev/sda1              99M   17M   78M  18% /boot
tmpfs                 7.8G     0  7.8G   0% /dev/shm

Looking better. Now to restore the application, we need to stop some services

[root@space-005056b136b5 jboss]# /etc/init.d/jmp-watchdog stop
[root@space-005056b136b5 jboss]# /etc/init.d/jboss-dc stop
[root@space-005056b136b5 jboss]# /etc/init.d/jboss stop
[root@space-005056b136b5 jboss]# /etc/init.d/mysql stop
[root@space-005056b136b5 jboss]# /etc/init.d/heartbeat stop

We can start them all back up by just starting jmb-watchdog

[root@space-005056b136b5 ~]# /etc/init.d/jmp-watchdog start
jmp-watchdog running

If you want to view the status of this process, you can view the log feed:

[root@space-005056b136b5 ~]# tailf /var/log/watchdog 

Back in business…

space

 

Note, it is also possible to simply add another disk to the VM and expand the volume.  Refer to this document for more detail:

http://www.juniper.net/documentation/en_US/junos-space14.1/topics/task/installation/junos-space-virtual-appliance-deploying.html#addingdiskresource

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s