Kick Out Java Fundamental

Tuesday, 5 November 2013

What are the pros and cons of the leading Java HTML parsers?

Almost all known HTML parsers implements the W3C DOM API (part of the JAXP API, Java API for XML processing) and gives you a org.w3c.dom.Document back which is ready for direct use by JAXP API. The major differences are usually to be found in the features of the parser in question. Most parsers are to a certain degree forgiving and lenient with non-wellformed HTML ("tagsoup"), like JTidy, NekoHTML,TagSoup and HtmlCleaner. You usually use this kind of HTML parsers to "tidy" the HTML source (e.g. replacing the HTML-valid <br> by a XML-valid <br />), so that you can traverse it "the usual way" using the W3C DOM and JAXP API.

The only ones which jumps out are HtmlUnit and Jsoup.

HtmlUnit

HtmlUnit provides a completely own API which gives you the possibility to act like a webbrowser programmatically. I.e. enter form values, click elements, invoke JavaScript, etcetera. It's much more than alone a HTML parser. It's a real "GUI-less webbrowser" and HTML unit testing tool.

Jsoup

Jsoup also provides a completely own API. It gives you the possibility to select elements using jQuery-likeCSS selectors and provides a slick API to traverse the HTML DOM tree to get the elements of interest.

Particularly the traversing of the HTML DOM tree is the major strength of Jsoup. Ones who have worked with org.w3c.dom.Document know what a hell of pain it is to traverse the DOM using the verboseNodeList and Node APIs. True, XPath makes the life easier, but still, it's another learning curve and it can end up to be still verbose.

Here's an example which uses a "plain" W3C DOM parser like JTidy in combination with XPath to extract the first paragraph of your question and the names of all answerers (I am using XPath since without it, the code needed to gather the information of interest would otherwise grow up 10 times as big, without writing utility/helper methods).


String url = "http://stackoverflow.com/questions/3152138";

Document document = new Tidy().parseDOM(new URL(url).openStream(), null);

XPath xpath = XPathFactory.newInstance().newXPath();


Node question = (Node) xpath.compile("//*[@id='question']//*[contains(@class,'post-text')]//p[1]").evaluate(document, XPathConstants.NODE);

System.out.println("Question: " + question.getFirstChild().getNodeValue());


NodeList answerers = (NodeList) xpath.compile("//*[@id='answers']//*[contains(@class,'user-details')]//a[1]").evaluate(document, XPathConstants.NODESET);

for (int i = 0; i < answerers.getLength(); i++) {

    System.out.println("Answerer: " + answerers.item(i).getFirstChild().getNodeValue());

}

And here's an example how to do exactly the same with Jsoup:


String url = "http://stackoverflow.com/questions/3152138";

Document document = Jsoup.connect(url).get();


Element question = document.select("#question .post-text p").first();

System.out.println("Question: " + question.text());


Elements answerers = document.select("#answers .user-details a");

for (Element answerer : answerers) {

    System.out.println("Answerer: " + answerer.text());

}

Do you see the difference? It's not only less code, but Jsoup is also relatively easy to grasp if you already have moderate experience with CSS selectors (by e.g. developing websites and/or using jQuery).

Summary

The pros and cons of each should be clear enough now. If you just want to use the standard JAXP API to traverse it, then go for the first mentioned group of parsers. There are pretty a lot of them. Which one to choose depends on the features it provides (how is HTML cleaning made easy for you? are there some listeners/interceptors and tag-specific cleaners?) and the robustness of the library (how often is it updated/maintained/fixed?). If you like to unit test the HTML, then HtmlUnit is the way to go. If you like to extract specific data from the HTML (which is more than often the real world requirement), then Jsoup is the way to go.

WebLaF

It combines three parts required for successful UI development:

Cross-platform L&F for Swing applications
Additional extended Swing components set
Various Swing utilities and managers

Binaries: https://github.com/mgarin/weblaf/releases

Source: https://github.com/mgarin/weblaf
Licenses: GPLv3 and Commercial

A few examples showing how does some of WebLaF components look like: enter image description here

Main reason why i have started with a totally new L&F is that most of available L&F lack flexibility - you cannot re-style them in most cases (you can only change a few colors and turn on/off some UI elements in best case) or there are only inconvenient ways to do that.

My goal is to provide a fully customizable L&F with a pack of additional widely-known and useful elements (like date picker, tree table, dockable pane and lots of other) and some additional helpful managers and utilities, which will reduce the amount of code required to create awesome Swing UIs.

How to know tomcat server: JSP page is modified ..?

Because when Tomcat is asked to execute a JSP, is compares the modification date of the JSP file with the modification time of the compiled class corresponding to this JSP, and if more recent, it recompiles on the fly before executing it.

This is BTW an option that should be turned off in production, because it takes time to perform this check.

See http://tomcat.apache.org/tomcat-7.0-doc/jasper-howto.html for details.

It is upto the container to decide when to load servlets. A servlet can be loaded at runtime on demand. And coming to JSP, JSP translated to servlet can also be loaded at runtime.

Coming to your question,

Why Tomcat does not require restart?

It is because Tomcat is capable of adding/modifying classpath to Web Application classloader at runtime. Tomcat will be having their custom Classloader implementation which allows them to add the classpaths at runtime.

How does the custom classloader might work?

One way to get this working is when a Servlet/JSP is modified,

a new classloader is created for the Servlet/JSP with Application classloader as parent classloader . And the new classloader will load the modified class again

Java String.equals versus ==

Jorman is a successful businessman and has 2 houses.

But others don't know that.

Is it the same Jorman?

When you ask neighbours from either Madison or Burke streets, this is the only thing they can say:

Using the residence alone, it's tough to confirm that it's the same Jorman. Since they're 2 different addresses, it's just natural to assume that those are 2 different persons.

That's how the operator == behaves. So it will say that datos[0]==usuario is false, because it only compares the addresses.

An Investigator to the Rescue

What if we sent an investigator? We know that it's the same Jorman, but we need to prove it. Our detective will look closely at all physical aspects. With thorough inquiry, the agent will be able to conclude whether it's the same person or not. Let's see it happen in Java terms.

Here's the source code of String's equals() method:

It compares the Strings character by character, in order to come to a conclusion that they are indeed equal.

That's how the String equals method behaves. So datos[0].equals(usuario) will return true, because it performs a logical comparison.

Java is Pass-by-Value, Dammit!

Can you write a traditional swap(a,b) method/function in the language?

A traditional swap method or function takes two arguments and swaps them such that variables passed into the function are changed outside the function. Its basic structure looks like


swap(Type arg1, Type arg2) {

    Type temp = arg1;

    arg1 = arg2;

    arg2 = temp;

}

If you can write such a method/function in your language such that calling


Type var1 = ...;

Type var2 = ...;

swap(var1,var2);

actually switches the values of the variables var1 and var2, the language supports pass-by-reference semantics.

For example, in Pascal, you can write


procedure swap(var arg1, arg2: SomeType);


   var

 

       temp : SomeType;

 

   begin

 

       temp := arg1;

 

       arg1 := arg2;

 

       arg2 := temp;

 

   end;

 





...


{ in some other procedure/function/program }


var

    var1, var2 : SomeType;


begin

    var1 := ...; { value "A" }

    var2 := ...; { value "B" } 

    swap(var1, var2);

    { now var1 has value "B" and var2 has value "A" }

end;

or in C++ you could write


void swap(SomeType& arg1, Sometype& arg2) {

    SomeType temp = arg1;

    arg1 = arg2;

    arg2 = temp;

}


...

SomeType var1 = ...; // value "A"

SomeType var2 = ...; // value "B"

swap(var1, var2); // swaps their values!

// now var1 has value "B" and var2 has value "A"

(Please let me know if my Pascal or C++ has lapsed and I've messed up the syntax...)

But you cannot do this in Java!

Now the details...

The problem we're facing here is statements like

In Java, Objects are passed by reference, and primitives are passed by value.

This is half incorrect. Everyone can easily agree that primitives are passed by value; there's no such thing in Java as a pointer/reference to a primitive.

However, Objects are not passed by reference. A correct statement would be Object references are passed by value.

This may seem like splitting hairs, bit it is far from it. There is a world of difference in meaning. The following examples should help make the distinction.

In Java, take the case of


public void foo(Dog d) {

    d = new Dog("Fifi"); // creating the "Fifi" dog

}






Dog aDog = new Dog("Max"); // creating the "Max" dog

// at this point, aDog points to the "Max" dog

foo(aDog);

// aDog still points to the "Max" dog

the variable passed in (aDog) is not modified! After calling foo, aDog still points to the "Max" Dog!

Many people mistakenly think/state that something like


public void foo(Dog d) { 

    d.setName("Fifi");

}

shows that Java does in fact pass objects by reference.

The mistake they make is in the definition of

Dog d;

itself. When you write that definition, you are defining a pointer to a Dog object, not a Dog object itself.

On Pointers versus References...

The problem here is that the folks at Sun made a naming mistake.

In programming language design, a "pointer" is a variable that indirectly tracks the location of some piece of data. The value of a pointer is often the memory address of the data you're interested in. Some languages allow you to manipulate that address; others do not.

A "reference" is an alias to another variable. Any manipulation done to the reference variable directly changes the original variable.

Check out the second sentence of http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.3.1.

"The reference values (often just references) are pointers to these objects, and a special null reference, which refers to no object"

They emphasize "pointers" in their description... Interesting...

When they originally were creating Java, they had "pointer" in mind (you can see some remnants of this in things like

NullPointerException).

Sun wanted to push Java as a secure language, and one of Java's advantages was that it does not allow pointer arithmetic as C++ does.

They went so far as to try a different name for the concept, formally calling them "references". A big mistake and it's caused even more confusion in the process.

There's a good explanation of reference variables at http://www.cprogramming.com/tutorial/references.html. (C++ specific, but it says the right thing about the concept of a reference variable.)

The word "reference" in programming language design originally comes from how you pass data to subroutines/functions/procedures/methods. A reference parameter is an alias to a variable passed as a parameter.

In the end, Sun made a naming mistake that's caused confusion. Java has pointers, and if you accept that, it makes the way Java behaves make much more sense.

Calling Methods

Calling

foo(d);

passes the value of d to foo; it does not pass the object that d points to!

The value of the pointer being passed is similar to a memory address. Under the covers it may be a tad different, but you can think of it in exactly the same way. The value uniquely identifies some object on the heap.

However, it makes no difference how pointers are implemented under the covers. You program with them exactly the same way in Java as you would in C or C++. The syntax is just slightly different (another poor choice in Java's design; they should have used the same -> syntax for de-referencing as C++).

In Java,

Dog d;

is exactly like C++'s

Dog *d;

And using

d.setName("Fifi");

is exactly like C++'s

d->setName("Fifi");

To sum up: Java has pointers, and the value of the pointer is passed in. There's no way to actually pass an object itself as a parameter. You can only pass a pointer to an object.

Keep in mind, when you call

foo(d);

you're not passing an object; you're passing a pointer to the object.

Friday, 25 October 2013

3D viewer navigation

The following keystrokes control navigation in the 3D viewer. For more information about navigating in the 3D viewer, see Using the Navigation Controls.

Note - The focus must be in the 3D viewer in order for these controls to take effect. Simply click anywhere in the 3D viewer to change focus.

Command	Windows/Linux Keystroke(s)	Mac Keystroke(s)	Result
Move left	Left arrow	Left arrow	Moves the viewer in the direction of the arrow.
Move right	Right arrow	Right arrow	Moves the viewer in the direction of the arrow.
Move up	Up arrow	Up arrow	Moves the viewer in the direction of the arrow.
Move down	Down arrow	Down arrow	Moves the viewer in the direction of the arrow.
Rotate clockwise	Shift + left arrow	Shift + left arrow	Rotates the view clockwise. The earth spins counter-clockwise.
Rotate counter-clockwise	Shift + right arrow	Shift + right arrow	Rotates the view counter-clockwise.
Show/hide Overview window	CTRL + M	+ M	Displays or closes overview window.
Tilt up	Shift + left mouse button + drag down, Shift + down arrow	Shift + down arrow	Tilts the viewer toward "horizon" view.
Tilt down	Shift + left mouse button + drag up, Shift + up arrow	Shift + up arrow	Tilts the viewer toward "top-down" view.
Look	CTRL + left mouse button + drag	+ mouse button + drag	Perspective points in another direction, as if you are turning your head up, down, left or right.
Zoom in	Scroll wheel, + key, PgUp key	Scroll wheel, + key	Zooms the viewer in. Tip: to use the 'Page Up' key, make sure 'Num Lock' on your keyboard is off.
Zoom out	Scroll wheel, - key (both keyboard and numpad), PgDn key	Scroll wheel, - key (both keyboard and numpad)	Zooms the viewer out. Tip: to use the 'Page Down' key, make sure 'Num Lock' on your keyboard is off.
Zoom + automatic tilt	Right mouse button + drag up or down	CTRL + click + drag up or down	Zooms the viewer in and automatically tilts your view as you approach ground level.
Stop current motion	Spacebar	Spacebar	When the viewer is in motion, stops movement
Reset view to "north - up"	n	n	Rotates view so that view is 'n'orth-up.
Reset tilt to "top-down" view	u	u	Resets angle to view scene in "top-down" or "'u'p" mode.
Reset tilt and compass view to default	r	r	'R'esets angle to view "top-down" and rotates to "north-up" view. Use this feature to orient the earth in the center of the viewer.

Tip - Use the ALT key in combination with most of these keystrokes to move more slowly in the indicated direction.