I recently ran across an issue with my Junos Space server, where after a seemingly random period of time, the application would crash and I’d be presented with the following message:
Junos Space is starting …
Junos Space is currently starting up. Please wait. Once it is ready, the log in page will be presented. Then clear the cache on your browser, close the browser and re-login from a new browser session.
Startup status messages will be displayed as they become available:
Typically a reboot of the server via “shutdown -r now” would temporarily fix it, until it crashed a again a few days or weeks later. I eventually learned that the root of the issue was full space on the /var directory, which is critical for the application, DB, etc. Here is how the issue was identified and corrected:
Log into the CLI for Junos Space, and hit option 7 to enter debug console:
Welcome to the Junos Space network settings utility.
Initializing, please wait
Junos Space Settings Menu
1> Change Password
2> Change Network Settings
3> Change Time Options
4> Retrieve Logs
5> Security
6> Expand VM Drive Size
7> (Debug) run shell
A> Apply changes
Q> Quit
R> Redraw Menu
Choice [1-7,AQR]: 7
Check for free disk space using the df command:
[root@space-005056b136b5 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jmpvgnocf-lvroot
29G 7.3G 20G 27% /
/dev/mapper/jmpvgnocf-lvtmp
17G 325M 16G 3% /tmp
/dev/mapper/jmpvgnocf-lvvar
47G 44G 0 100% /var
/dev/mapper/jmpvgnocf-lvlog
29G 16G 12G 58% /var/log
/dev/sda1 99M 17M 78M 18% /boot
tmpfs 7.8G 0 7.8G 0% /dev/shm
Notice I’m 100% on /var. Not good. So what in particular on /var is consuming the disk?
[root@space-005056b136b5 ~]# cd /var/
[root@space-005056b136b5 var]# du -hs *
12K account
16K backups
30G cache
5.0G chroot
8.0K crash
16K db
16K empty
9.7G lib
8.0K local
44K lock
16G log
16K lost+found
4.0K mail
8.0K nis
94M opennms
8.0K opt
8.0K preserve
236K run
96K spool
287M tmp
25M www
8.0K yp
The /cache directly looks pretty large.
[root@space-005056b136b5 var]# cd cache/
[root@space-005056b136b5 cache]# du -hs *
200K fontconfig
30G jboss
16K jmp
12K jmp-watchdog
8.0K libvirt
648K man
8.0K mod_proxy
8.0K mod_ssl
8.0K yum
[root@space-005056b136b5 cache]# cd jboss/
[root@space-005056b136b5 jboss]# du -hs *
212M backup
40K exportInventory
8.0K fm
28K import
2.7G java_pid14561.hprof
2.7G java_pid14569.hprof
2.7G java_pid14637.hprof
2.7G java_pid14720.hprof
2.7G java_pid14919.hprof
2.9G java_pid15590.hprof
3.0G java_pid17052.hprof
756M java_pid17278.hprof
2.7G java_pid17379.hprof
2.7G java_pid22426.hprof
3.9G java_pid7340.hprof
16K jmp
8.0K LocalScript
8.0K opennms
And here is the culprit, a bunch of old dump files from previous jboss crashes. Checking the date, these are rather old:
[root@space-005056b136b5 jboss]# ll
total 21987360
drwx------ 2 apache space 4096 Jul 30 2014 backup
drwxrwx--- 2 jboss space 4096 Jul 24 2014 exportInventory
drwxrwx--- 2 jboss space 4096 Apr 26 2014 fm
drwxrwx--- 2 apache space 4096 Apr 26 2014 import
-rw------- 1 jboss space 2823398329 Mar 18 12:01 java_pid14720.hprof
-rw------- 1 jboss space 2848171703 Dec 4 20:09 java_pid14919.hprof
-rw------- 1 jboss space 3026975179 Oct 2 2014 java_pid15590.hprof
-rw------- 1 jboss space 3195963924 Mar 19 13:23 java_pid17052.hprof
-rw------- 1 jboss space 791781376 Apr 26 03:49 java_pid17278.hprof
-rw------- 1 jboss space 2829805295 Nov 18 16:32 java_pid17379.hprof
-rw------- 1 jboss space 2883593097 Oct 25 2014 java_pid22426.hprof
-rw------- 1 jboss space 4093285485 Jul 24 2014 java_pid7340.hprof
drwxrwx--- 2 jboss space 4096 Apr 26 2014 jmp
drwxrwx--- 2 jboss space 4096 Apr 26 2014 LocalScript
drwxrwx--- 2 jboss space 4096 Jul 16 2014 opennms
Let’s remove them all and check our disk space again.
[root@space-005056b136b5 jboss]# rm -rf *.hprof
[root@space-005056b136b5 jboss]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jmpvgnocf-lvroot
29G 7.3G 20G 27% /
/dev/mapper/jmpvgnocf-lvtmp
17G 325M 16G 3% /tmp
/dev/mapper/jmpvgnocf-lvvar
47G 16G 29G 35% /var
/dev/mapper/jmpvgnocf-lvlog
29G 16G 12G 58% /var/log
/dev/sda1 99M 17M 78M 18% /boot
tmpfs 7.8G 0 7.8G 0% /dev/shm
Looking better. Now to restore the application, we need to stop some services
[root@space-005056b136b5 jboss]# /etc/init.d/jmp-watchdog stop
[root@space-005056b136b5 jboss]# /etc/init.d/jboss-dc stop
[root@space-005056b136b5 jboss]# /etc/init.d/jboss stop
[root@space-005056b136b5 jboss]# /etc/init.d/mysql stop
[root@space-005056b136b5 jboss]# /etc/init.d/heartbeat stop
We can start them all back up by just starting jmb-watchdog
[root@space-005056b136b5 ~]# /etc/init.d/jmp-watchdog start
jmp-watchdog running
If you want to view the status of this process, you can view the log feed:
[root@space-005056b136b5 ~]# tailf /var/log/watchdog
Back in business…
Note, it is also possible to simply add another disk to the VM and expand the volume. Refer to this document for more detail:
Thanks for this write-up. We’ve got a customer with a similar-sounding issue, so will look into it…