First steps with Scala, a functional alternative to bash scripts…


update: after getting some feedback I decided to make a bash and python version of this script. Check it out here.

another update: I revisited the scala version of the script, and at the same time talked a little bit about Implicit conversions, Scala’s answer to ruby’s open classes. Have a look at it here.

Those who know me are aware that I’ve been following play framework, and actively taking part of it’s community, for a couple of years.

Playframework 2.0 is right around the corner, and it’s core is programmed in Scala, so it’s a wonderful opportunity to give this object-oriented / functional hybrid beast a try…

Like many others, I will pick a very simple script to give my first steps…

Finding an excuse to give Scala a try

With a couple of friends we are on the way to translate play framework documentation to spanish (go have a look at it at http://playdoces.appspot.com/, by the way, you are more than welcome to collaborate with us)

The documentation is composed of a bunch of .textile files, and I had a very simple and silly bash script to track our advance. Every file that has not yet been translated has the phrase “todavía no ha sido traducida” in it’s first line

echo pending: `grep "todavía no ha sido traducida" * | wc -l` / `ls | wc -l`

Which produced something like

pending: 40 / 63

Pretty simple, right?

I just wanted to develop a simple scala script to count the translated files, and also it’s size, to know how much work we had ahead.

Scala as a scripting language

Using scala as a scripting language is pretty simple. Just enter some scala code in a text file, and execute it with “scala file.scala“. You can also try it with the interactive interpreter, better knonw as REPL (well, it’s not really an interpreter, but a Read-Evaluate-Print Loop, that’s where the REPL name comes from).

In linux, you can also excute them directly from the shell marking the scala file as executable and adding these lines to the beginning of the file.

#!/bin/sh
exec scala "$0" "$@"
!#

Tip: you can speed up A LOT script execution by adding a -savecompiled like it says on the scala command man page, like this:

#!/bin/sh
exec scala -savecompiled "$0" "$@"
!#

Classes and type inference in scala

So I created a DocumentationFile, with a name, length and an isTranslated property.

class DocumentationFile(val file: File) {

  val name = file.getName
  val length = file.length
  val isTranslated = (firstLine.indexOf("Esta página todavía no ha sido traducida al castellano") == -1)

  def firstLine = new BufferedReader(new FileReader(file)).readLine

}

Scala takes away a lot of boilerplate code. The constructor is right there, along with the class declaration. In our case, the DocumentationFile constructor takes a java.io.File as argument.

Scala also makes heavy use of type inference to alleviate us from having to declare every variable’s type. That’s why you don’t have to specify that name is a String, length a Long and isTranslated a Boolean. You still have to declare types on method’s arguments, but usually you can omit them everywhere else.

Working with collections

Next I needed to get all textile files from the current directory, instantiate a DocumentationFile for each of them, and save them in an Array for later processing.

import java.io._

val docs = new File(".").listFiles
  .filter(_.getName.endsWith(".textile"))   // process only textile files
  .map(new DocumentationFile(_))

Technically speaking is just one line of code. The “_” is just syntactic sugar, we could have written it in a more verbose way like this:

val docs = new File(".").listFiles
  .filter( file => file.getName.endsWith(".textile") )   // process only textile files
  .map( file => new DocumentationFile(file) )

Or if you are a curly braces fun:

val docs = new File(".").listFiles
  .filter { file => 
    file.getName.endsWith(".textile")         // process only textile files
  }   
  .map { file => 
    new DocumentationFile(file)
  }

Higher order functions

Once we have all textile files, we’ll need the translated ones.

val translated = docs.filter(_.isTranslated)

Here we are passing the filter method a function as parameter (that’s what is called a higher order function). That function is evaluated for every item in the Array, and if it returns true, that item is added to the resulting Array. The “_.isTranslated” stuff is once again just syntactic sugar. We could have also written the function as follows:

val translated = docs.filter( (doc: DocumentationFile) => doc.isTranslated )

Functional versus imperative: To var or not to var

Now I need to calculate the quantity and size of the translated and not yet translated files. Counting the files is pretty easy, just have to use “translated.length” to know how many files have been translated so far. But for counting their size I have to sum the size of each one of them.

This was my first attempt:

var translatedLength = 0L
translated.foreach( translatedLength += _.length ) 

In scala we can declare variables with the “var” and “val” keywords, the first ones are mutable, while the later one ar immutables. Mutable variables are read-write, while immutable variables can’t be reassigned once their value has been established (think of them like final variables in Java).

While scala allows you to work in an imperative or functional style, it really encourages the later one. Programming in scala, kind of the scala bible, even teaches how to refactor your code to avoid the use of mutable variables, and get your head used to a more functional programming style.

These are several ways I’ve found to calculate it in a more functional style (thanks to stack overflow!)

val translatedLength: Long = translated.fold(0L)( (acum: Long, element: DocumentFile) => acum + element.length )

//type inference to the rescue
val translatedLength = translated.foldLeft(0L)( (acum, element) => acum + element.length )

//syntactic sugar
val translatedLength = translated.foldLeft(0L)( _ + _.length )

// yes, if statement is also an expression, just like the a ? b : c java operator.
val translatedLength = if (translated.length == 0) 0 else translated.map(_.length).sum

I’ve finally settled with this simple and short form:

val translatedLength = translated.map(_.length).sum
val docsLength = docs.map(_.length).sum

Default parameters and passing functions as arguments

Now I have all the information I needed, so I just have to show it on screen. I also wanted to show the file size in kbs.

Once again this was my first attempt:

println( 
  "translated size: " + asKB(translatedLength) + "/" + asKB(docsLength) + " " + 
  translatedLength * 100 / docsLength + "% "
)

println( 
  "translated files: " + translated.length + "/" + docs.length + " " + 
  translated.length * 100 / docs.length + "% "
)

def asKB(length: Long) = (length / 1000) + "kb"

And this was the output:

translated size: 256kb/612kb 41% 
translated files: 24/64 37% 

Well, it worked, but it could definitely be improved, there was too much code duplication.

So I created a function that took care of it all:

def status(
  title: String = "status", 
  current: Long, total: Long, 
  format: (Long) => String = (x) => x.toString): String = {

  val percent = current * 100 / total

  title + ": " + format(current) + "/" + format(total) + " " +
  percent + "%" +
  " (pending " + format(total - current) + " " +
  (100-percent) + "%)"
}

The only tricky part is the format parameter. It’s just a higher order function, that by default just converts the passed number to a String.

We use that function like this:

println( 
  status("translated size", translatedLength, docsLength, (length) => asKB(length) ) 
)

println( 
  status("translated files", translated.length, docs.length) 
)

And that’s it.

It’s really easy to achieve this kind of stuff using scala as a scripting language, and on the way you may learn a couple of interesting concepts, and give your first steps into functional programming.

In the next article, I have a look at a bash and python version of this script, to compare the scripting capabilities of each of them.

This is the complete script, here you have a github gist and you can also find it in the play spanish documentation project.

#!/bin/sh
exec scala -savecompiled "$0" "$@"
!#

import java.io._

val docs = new File(".").listFiles
  .filter(_.getName.endsWith(".textile"))   // process only textile files
  .map(new DocumentationFile(_))

val translated = docs.filter(_.isTranslated)    // only already translated files

val translatedLength = translated.map(_.length).sum
val docsLength = docs.map(_.length).sum

println( 
  status("translated size", translatedLength, docsLength, (length) => asKB(length) ) 
)

println( 
  status("translated files", translated.length, docs.length) 
)

def status(
  title: String = "status", 
  current: Long, total: Long, 
  format: (Long) => String = (x) => x.toString): String = {

  val percent = current * 100 / total

  title + ": " + format(current) + "/" + format(total) + " " +
  percent + "%" +
  " (pending " + format(total - current) + " " +
  (100-percent) + "%)"
}

def asKB(length: Long) = (length / 1000) + "kb"

class DocumentationFile(val file: File) {

  val name = file.getName
  val length = file.length
  val isTranslated = (firstLine.indexOf("Esta página todavía no ha sido traducida al castellano") == -1)

  override def toString = "name: " + name + ", length: " + length + ", isTranslated: " + isTranslated

  def firstLine = new BufferedReader(new FileReader(file)).readLine

}

updates:

I added some tips and advices I got from all the positive feedback to this article:

    2012-01-13 changed title, from “First steps with Scala, say goodbye to bash scripts…” to the current one, it’s less polemic, but more accurate.
About these ads

12 responses to this post.

  1. Thanks for the introduction. It was a bit hard to follow because of my lack of functional programming skills. By putting a bit more effort I was able to see a bit better how useful scala can be. Keep posting so we can get the most out of Play 2.0.

    Reply

  2. Posted by Jörg B on 6 December, 2011 at 11:23

    Very nice post, thanks.
    At the end of the day, I’d prefer your bash-one-liner, though. ;-)

    Reply

  3. ok, Jörg, maybe it’s not a goodbye but a see you latter… jejeje

    Reply

  4. Do you think it’s possible to avoid spawning an additional shell process by replacing the beginning (i.e., the shebang line) with something along the lines of:
    #!/bin/scala “$0″ “$@”

    Reply

    • I’m no bash expert (and obviously neither a scala expert). I tried making a symlink to /bin/scala with sudo ln /home/sas/devel/scala/bin/scala /bin/scala, but when I run scala I get the following error:

      java.lang.ClassNotFoundException
      – klass: ‘java/lang/ClassNotFoundException’

      so I tried with
      #!/home/sas/devel/scala/bin/scala “$0″ “$@”

      and when I run it I get:

      Exception in thread “main” java.lang.RuntimeException: Cannot figure out how to run target: “$0″ “$@”
      at scala.sys.package$.error(package.scala:27)

      As I said, I’m no bash expert…

      Reply

  5. Posted by Franck_re on 12 January, 2012 at 9:13

    Nice exercise. I personally find the 1 liner better for anything beyond an exercise though. That’s reminescent of Knuth vs Unix see http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

    Reply

  6. Startup time of scala is sure bigger… What is wrong with the bash one-liner? I think it is a lot more elegant…

    Of course, If you’re looking for an excuse to delve into scala, indulge…

    Reply

  7. here’s a very interesting discussion on Hacker news, lots of useful knowledge and some bash and python wisdom: http://news.ycombinator.com/item?id=3455114

    Thanks to that, I wrote another article making peace with bash and python.

    Reply

  8. Hi, what about library dependencies? let’s say your scripts requires the jfreechart library. how to simply manage the dependencies? http://www.jfree.org/jfreechart/

    Reply

    • I recall that for a tiny script I had to make to manipulate excel files there was a command line parameter to tell the folder where jars could be found.

      If I recall correctly it was something like this:

      scala -classpath joda-time.jar

      I’m also pretty sure there must be some way to achieve it with sbt (it seems like everything can be done with sbt) but sbt is not very user friendly…

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: