Spark, Shark, and BDAS In the News

(NOTE: I continue to update this list as I encounter new posts. Email me if you know of an article I haven’t listed yet.)

Myself and several other members of component projects of the Berkeley Data Analytics Stack (BDAS), especially Spark and Shark, have been trying to keep track of news articles that mention or are about the projects. I’ll discuss why this sort of news coverage is so unique and deserved below, but first the list:

Why did we build this list? I’m a huge fan of the Spark Project, and I also contribute to it (mostly documentation, and community related stuff). Spark was born and developed in the UCB AMPLab. The lab’s approach to releasing and supporting high quality open source software projects and fostering communities around them is somewhat unique, and has resulted in serious adoption of the projects coming out of the group.

Research labs traditionally spend very few resources on promoting adoption of the projects that are built as part of their research agenda because often the research ideas can be tested using throw-away software prototypes. The extra energy required to turn such prototypes into production quality projects is not obviously worth the effort. The folks in the AMPLab believe the extra effort is worth it. Not only do graduate students spend a lot of time answering questions on developer and user mailing lists, but the lab itself is investing in the effort. For example, the AMPLab recently hired Matt Massie, Cloudera engineer #5 and he is recruiting a rock-star team charged with testing and hardening the software coming out of the lab.

If the goal is “free”, high-quality, next-generation software, then how can we measure if we are succeeding? Well, we can measure adoption of the BDAS software for production use, as well as the grassroots community activity by hundreds of BDAS enthusiasts. Another way is to measure and track discussion of the software in the media, which the above list aims to do. By any metric, we can say that the AMPLab projects like Spark and Shark are having tremendous impact!

Feel free to email me if you know of other articles, or if you are a technology journalist or reporter and would like introductions to the key folks on the Spark, Shark, Mesos or BDAS projects.

Applescript, the most unnatural natural language

Coming from a computer science background, Applescript is one of the most difficult languages to learn and use. Scripting languages should encapsulate different ways of expressing the same functionality. For example, most languages support conditional statements such as if-then/else, iteration (i.e., for, while, until), and some mechanism for sub-routines. In most popular scripting languages, operators are straightforward: plus and equal signs, brackets or braces, etc.

However Apple has created a language which throws off the chains of cryptic language conventions. Apple created Applescript, a programming language (of sorts) which is based on natural language. Sounds great, right? It will be just like writing a letter to mom. Let’s get started.

Dear Applescript interpreter,

Will you please start at the number 1 and count up to 1000 and do [something useful] each time you count up to the next number?

–your loyal user.

Applescripts response:

Dear Applescript user,

57974, error.

–signed Applescript.

(Read: ha! good luck with that. Your lucky I even gave you this stupid error message, you stupid loser. Learn a real programming language.)

The real problem here is that we as humans are really really good at expressing one idea in a nearly infinite number of ways, Applescript only captures and recognizes about 3 of those. So much for natural.

Next, in contrast to the premise of the language, Applescript documentation is horrible. For example, in the official online Applescript Language Guide, under “The Language At a Glance” we find this unintuitive explanation of the syntax for using handlers:

Handler Syntax
Subroutine definition
(labeled parameters)
( on | to ) subroutineName¬
[ of | in directParameterVariable
[ subroutineParamLabel
paramVariable] … ¬
[ given label:paramVariable[, label:paramVariable
[ global variable[, variable]…]
[ local variable[, variable]…]
[ statement]…
end [ subroutineName]

For a language whose motivation is doing it the intuitive and natural language way, it’s ironic that the official Applescript documentation is more difficult to understand than that of most other programming languages. It is far behind Java’s simple yet effective API, even Perl’s online documentation is better.

Also, finding answers to Applescript questions online is more difficult and less fruitful than analogous searches for help with many other languages. For my future reference and others, here are a few snippets that I found useful along the way.

Handling errors:

Use a try/end try block, and add an “on error” block to the end (no closing tag). Something like

    –code goes here
on error errorStr number errorNum
    display dialog errorStr & “: ” & errorNum
end try

Sub routines:

Here is a template for using a sub routine. Don’t forget the “my in the function call.

tell application “Finder”
    my mySub()
end tell

on mySub
    display dialog “in mySub”
end mySub

If you’d like, leave a comment to share your own tips.