Saturday, December 15, 2012

Scala for Java properties file conversion utility

Recently it was determined that it would be best for some of our custom applications to source their text (UI labels, messages, errors, etc) from a database than from text files.  The main application of interest uses ResourceBundles in many cases and has around 10 base bundle names which are translated (somewhat incompletely) into Spanish from the main English data.  There are around 3200 data items involved.  In a previous application, I was able to simply do some regex work in notepad++ and generate some SQL scripts to populate a database table being used to back up the resource bundles (via a customized ResourceBundle.Control class).

For the application at hand though, it seemed awkward after learning a few lessons from the previous application.  The solution I chose to follow instead was a simple Scala script.  Not trying to ruin the story but I will say that the end result is pretty nice.  It allows me to rerun the process without having to worry about missing steps as was the case when doing it by hand.

Why use Scala?  It seems more "composable"?  It definitely has less boilerplate code.  Was it faster to create this than a similar Java utility - maybe not.  Scala isn't my day to day language but you won't improve without working with it.  This was a low risk opportunity to improve my Scala skills.

import scala.io._
import java.io.{File, PrintStream}
import scala.util.matching.Regex

def getlang(file: File) : String = file.getName.lastIndexOf('_') match
{   
    case -1 => "EN"
    case x:Int => file.getName.substring(x+1,x+3).toUpperCase   
}

def getbasename(file : File) : String = file.getName.lastIndexOf('\\') match
{
    case -1 => file.getName.replaceFirst("_.+", "").replaceFirst("\\..+", "")
    case x:Int =>  file.getName.substring(x+1,  file.getName.lastIndexOf('.')).replaceFirst("_.+", "").replaceAll("\\", ".")
}
val myOut = new PrintStream(new File("C:\\<path>\\<app-id>-msgs.sql"), "UTF-8" );

myOut.println("set escape on")
myOut.println("set define off")
myOut.println("truncate table <app-id>_MSGS_TBL;")

val ignoreFiles = Set("conn.properties", "log4j.properties", "packtag.properties", "struts.properties");

for (file <- new File("C:\\<path>").listFiles.filter(f => """.*\.properties$""".r.findFirstIn(f.getName).isDefined).filter(f => !ignoreFiles.contains(f.getName())))
{    
    val lang = getlang(file)
    val basename = getbasename(file)
    try
    {
        for(line <- Source.fromFile(file, "Cp1252").getLines().filter(l => l.indexOf("=") != -1 && l.trim.charAt(0) != '#'))
        {       
            val key = line.substring(0, line.indexOf("=")).trim
            val v = line.substring(line.indexOf("=")+1).trim.replaceAll("'", "''").replaceAll("&","\\&");
           
            myOut.printf("Insert into <app-id>_MSGS_TBL(LANGUAGE,COUNTRY,VARIANT,BASENAME,KEY,VAL) values ('%s',' ' ,' ','%s', '%s', NVL(N'%s', N' '));\n", lang, basename, key, v);
        }
    }
    catch
    {
        case e:Throwable => println("-- Error with file:" + file.getName)
    }
}
Notes
  • Sorry the formatting isn't optimum.  
  • I purposely removed some path/file information directly related to my employer- so this example would need minor changes to run in a real environment.
  • The source properties files only existed in the root of the src code directory so I didn't have to worry about full Java package names or traversing sub-directories.
  • The script does escape single quotes and ampersands as needed.
  • The source English text was in the base bundle but I remap that to associate it with the English language.
  • The script ignores some non-text resource related files which I identified upfront.
  • This skips comment lines.
  • I coded the insert statement to never insert null into the val column since I defined it as non-null.  A single space was used in place of null.  This just fits our environment better.
  • Not a lot of error handling in place.
  • The original resource authors apparently used the cp1252 encoding instead of the expected iso-8859-1 encoding which caused some confusion briefly. 
  • Execution of the resulting SQL script identified issues such as duplicate property keys/values.  Each of those was researched and resolved manually (since some had different text for same key).





No comments:

Post a Comment