Android for ActionScript developers, part 4: Reading XML is nuts

One of the most baffling things for me when I started messing around with Android was dealing with XML files, especially parsing them. In Actionscript, parsing a XML is simple enough:

var xml:XML = new XML("<root>Some-stuff</root>");

And you can get children pretty easily using E4X, or the standard syntax. For example, to get a list of nodes:

var children:XMLList = xml.child("someNodeName");

It is not quite so easy in Java. In a way, due to the way the platform is constructed – you have a lower level access to most of the moving parts of the platform – XML parsing is a bit more convoluted, since you have to parse the data yourself. When I first set out to parse a XML response in Java, most of the documentation I was reading was recommending me using something called a Sax Parser. This is how you would do the top level parsing, using code lifted from this tutorial:

SAXParserFactory spf = SAXParserFactory.newInstance();
try {
	//get a new instance of parser
	SAXParser sp = spf.newSAXParser();
	//parse the file and also register this class for call backs
	sp.parse("employees.xml", this);
}catch(SAXException se) {
	se.printStackTrace();
}catch(ParserConfigurationException pce) {
	pce.printStackTrace();
}catch (IOException ie) {
	ie.printStackTrace();
}

This handles the basic parsing call. But, of course, you first have to create a parser before you can run that. This low-level XML parsing works similarly to some other languages (I specifically remember doing something similar in PHP), in the sense that you setup callbacks to certain events (starting a node, or ending one) and deal with the data that gets passed to you in each of these calls. Again, using code from the tutorial linked above:

//Event Handlers
public void startElement(String uri, String localName, String qName,
	Attributes attributes) throws SAXException {
	//reset
	tempVal = "";
	if(qName.equalsIgnoreCase("Employee")) {
		//create a new instance of employee
		tempEmp = new Employee();
		tempEmp.setType(attributes.getValue("type"));
	}
}

public void characters(char[] ch, int start, int length) throws SAXException {
	tempVal = new String(ch,start,length);
}

public void endElement(String uri, String localName,
	String qName) throws SAXException {

	if(qName.equalsIgnoreCase("Employee")) {
		//add it to the list
		myEmpls.add(tempEmp);

	}else if (qName.equalsIgnoreCase("Name")) {
		tempEmp.setName(tempVal);
	}else if (qName.equalsIgnoreCase("Id")) {
		tempEmp.setId(Integer.parseInt(tempVal));
	}else if (qName.equalsIgnoreCase("Age")) {
		tempEmp.setAge(Integer.parseInt(tempVal));
	}
}

This is some parsing for a very basic XML schema:

<?xml version="1.0" encoding="UTF-8"?>
<Personnel>
	<Employee type="permanent">
		<Name>Seagull</Name>
		<Id>3674</Id>
		<Age>34</Age>
	</Employee>
</Personnel>

The problem with this approach is simple to understand – this parser will only work for this specific XML schema. For every new XML you use, you need a new parsing class. This makes sense if you always want to have your XML transformed into something specific (a collection of value objects for instance), but if you want to do some XML reading and a rapid iteration through a node, it means you have to go through the whole process of creating a separate parser class, a new class to hold the values, doing the parsing, and only then iterating through the new object you created. If you have a complex XML schema, this can get pretty annoying. Comparing this to the one-line parsing method I was used to in Flash made me realize a big part of the ethos of the Java platform: everything is a bit more complicated, like zigzagging through a bunch of obstacles when going from point A to point B, and it’s up to you to create shortcuts to make your life easier.

Often, there is more than one way to do something in Java, without a single officially established way. That’s also true of XML. The same tutorial I linked, for instance, has a second example, using a DOM parser to read the XML. It’s a much more flexible solution.

What nags me about this is that maybe due to the over-engineering of the platform, additional solutions are, many times, still too complicated to use repeatedly. Consider an example of how to read an XML using XPath syntax, found in this page:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("books.xml");

// Create an XPath factory, and use the factory to read an object we can use
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();

// Create an object and evaluate it
XPathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);

// Cast the result to a DOM NodeList and iterate through that to find all the titles:
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
   	System.out.println(nodes.item(i).getNodeValue()); 
}

In a way, I understand the need for approaches like this; they’re giving you more power, and allowing you to tailor the results to your needs in terms of result and performance.

However, something should be said about easiness and speed of development. If I have an XML object in front of me – in whatever shape or form – do I really want to write a complicated parser or additional classes every time I want to lift the value of a node from it? I find any (potential) disadvantages of higher-level libraries hard to justify in the face of making actual development time shorter – buying me time I could spend in UI polishing and responsiveness, for example.

So from my high-level platform developer point of view, the problem seems to be that, with Java, you have to look around for a solution that meets your needs. Much like some features of the ActionScript language – like tweening – are lacking and normally solved by third party libraries, the Java world seem to benefit immensely from libraries created by developers that may have a different view of how their development time should be spent. In that sense, I know now that there are several, more sensible XML solutions out there – but that didn’t stop me rolling my own extremely simple XML parsing solution that, admittedly, mimics most of the way XML parsing works in ActionScript; with no XPath/E4X parsing, but with a simple enough interface that instantiating a XML from a number of source types, iterating through nodes, and reading any kind of data from it is pretty straightforward.

XML parsing in Java is pretty symbolic of the whole platform: you have absolute power over what you’re doing, and how, but that also means you have to get the building blocks together, or at least employ blocks built by someone else, to create a solution you’re comfortable with. Otherwise, you’ll feel you are constantly repeating yourself, creating the same factories and builders over and over again. In the Java world, it’s as if the industrial revolution never ended.

  • susie

    Could this be why java is better for business models where the values are more static nothing has to be developed on the fly? and some other method is better for more flexible web apps and such…another article on that would be great!

  • Zeh

    @susie: that’s probably it. My experience with the Java platform is not as big as what I have with ActionScript, so I can’t pretend to know what’s “better”, but I’d say the normal Java approach is to be less agile, but more rigid, and that’s what we see here. I’ll talk a bit more about it on another post I’m written about the obsession with private members on the Android framework, and how it “protects” the framework but make extension of the framework capabilities extremely hard… I think that’s also a reflection of this business-model frame of mind.

  • Mirko

    Thanx for this post. I prefer Java last 2 years and I like that LANG so much. BUt I must to say that AS is much better on the client side, but Java as serverside language is the best for me.

  • ClamChowder

    I totally agree. I come from AS background and I am frustrated that I can’t find a simple XML parser. I don’t care if XMLPull parser is better than SAX. I can’t believe SAX makes me implement a subclass just for parsing XML. With DOM I have to do DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(url) which is insanely long and that crap doesn’t do anything than parse. What the hell is that new new thing going on. I just want the app done first with readable code. If the final thing requires optimization or just gets enough attention then I’ll mess around with the lower level stuff.

  • nice one