Monday, August 10, 2009

Calling Java APIs

Calling Java APIs from Scala is completely seamless. I will demonstrate this functionality by copying data from a URL to a file and then making a copy of that file.

Important Note: Scala 2.8 is getting a redesigned API for accessing files and streams based on JSR-203 New NIO. It is quite nice to use. For example File("/tmp") / "dir" / "dir2" will create a file to /\
tmp/dir/dir2. That is just the most basic of what you can expect. I will do a couple topics on that when it gets closer to being finalized.
  1. scala>import java.net._
  2. import java.net._
  3. scala>import scala.io._
  4. import scala.io._
  5. scala>import java.io.{File, FileWriter}
  6. import java.io.{File, FileWriter}
  7. scala>val in = Source.fromURL("http://www.google.com")
  8. in: scala.io.Source = non-empty iterator
  9. scala>// Until scala 2.8 we have to use the standard java streams and idioms
  10. scala>val out = new FileWriter("/tmp/daily-scala")
  11. out: java.io.FileWriter = java.io.FileWriter@71d0e17a
  12. scala>try {
  13.     | out.write( in.getLines.mkString("\n") )
  14.     | }finally{
  15.     | out.close
  16.     | }
  17. scala>// now lets copy the file
  18. scala>val copy = new FileWriter("/tmp/copy")
  19. copy: java.io.FileWriter = java.io.FileWriter@7bfd25ce
  20. scala>try {
  21.     | copy.write( Source.fromFile("/tmp/daily-scala").getLines.mkString("\n") )
  22.     | } finally {copy.close}
  23. scala>val copy2 = new FileWriter("/tmp/copy2")
  24. copy2: java.io.FileWriter = java.io.FileWriter@7bfd25ce
  25. // You can reuse a source if you reset it
  26. scala>try {
  27.     | copy2.write( in.reset.getLines.mkString("\n") )
  28.     | } finally {copy2.close}
  29. // Change all 'e' to upper case.  We could write this to a file if we desired
  30. scala> in.reset.getLines.mkString("\n").map( c => if (c == 'e') c.toUpperCase else c).mkString("")
  31. res9: String = This is thE dEmo filE

4 comments:

  1. Hi Jesse

    Thanks a lot for this valuable sample!

    I pushed the sample further and tried to go for binary copies in scala.

    So after a while I got the following snippet to do the job in scala:

    [code]

    import java.io._

    val src = "./test.txt"
    val dst = "/home/simon/workspace/scala/test_copy.txt"

    val in = new FileInputStream(src);
    val out = new FileOutputStream(dst);

    // Transfer bytes from in to out
    val buf = new Array [byte](1024);
    var len = 0;

    len = in.read(buf)
    while ( len > 0) {
    out.write(buf, 0, len)
    len = in.read(buf)
    }

    in.close
    out.close

    [/code]

    Now I feel I could do better: I don't like this additional line:
    len = in.read(buf)
    at the beginning of the loop.

    This looks as if with Java it'd be less code and more elegant because then I could leave out that line
    and write the while statement like this:

    while ((len = in.read(buf)) > 0) {

    This compiles and even runs under Scala but for known reasons I get a zero-Length copy
    plus a
    "comparing values of types Ordered[Unit] and Unit using `>' will always yield false" -Warning

    So how could I do better?

    Thanks a lot
    Patrick

    ReplyDelete
  2. I have a couple points for you. First = in scala returns Unit not the assigned value. So you cannot do:

    val x = y = 1

    this obviously disallows while ((len = in.read(buf)) > 0)

    I can't honestly remember the reasoning behind this.


    Right now I would not obsess about the best way to copy in Scala because the API is dramatically changing in Scala 2.8 for example:

    Source.fromFile("xyz") pumpto Sink.fromFile("out")

    I don't know the API yet but that is the sort of think you can expect.

    The reason you get a zero length copy is because you are comparing 0 and Unit which is always false. (len = in.read(buf)) returns Unit.

    As for a way to do this that is more "functional" would be to use recursion. It is totally unnecessary but if you want to learn to program in a functional manner then you would probably do:
    import java.io._

    val in = new FileInputStream("/tmp/in")
    val out = new FileOutputStream("/tmp/out")

    def copy( in:FileInputStream, out:FileOutputStream ):Unit = {
    val buf = new Array [byte](1024)
    val len = in.read(buf)
    if( len > 0 ) {
    out.write(buf, 0, len)
    copy(in,out)
    }
    }

    copy(in,out)
    in.close()
    out.close()

    I have a bug in this little program that I dont have time to figure out. It should be turned into a loop thanks to tail recursion so you dont get a stack overflow. I think I may have to ask the scala list.

    ReplyDelete
  3. Ah I figured out the bug (with some help). The copy method could be overridden by a subclass so it can't be optimized. It has to be final or it has to be an inner function:

    def copyFile(..)={
    def copy(...){...}
    }

    of final:

    final def copy(..){...}

    I made that change compiled it and then used jd-gui to look at the byte code and it is transformed into a for loop. Although I would still like to do a speed test to see if the multiple allocations of the array is expensive. I have been told that the JVM should reuse the same object so the difference should be minimal.

    ReplyDelete
  4. I think the final 'mkString("")' is redundant in this line:

    scala> in.reset.getLines.mkString("\n").map( c => if (c == 'e') c.toUpperCase else c).mkString("")

    ReplyDelete