Tuesday, 5 November 2013

What are the pros and cons of the leading Java HTML parsers?

Almost all known HTML parsers implements the W3C DOM API (part of the JAXP API, Java API for XML processing) and gives you a org.w3c.dom.Document back which is ready for direct use by JAXP API. The major differences are usually to be found in the features of the parser in question. Most parsers are to a certain degree forgiving and lenient with non-wellformed HTML ("tagsoup"), like JTidyNekoHTML,TagSoup and HtmlCleaner. You usually use this kind of HTML parsers to "tidy" the HTML source (e.g. replacing the HTML-valid <br> by a XML-valid <br />), so that you can traverse it "the usual way" using the W3C DOM and JAXP API.
The only ones which jumps out are HtmlUnit and Jsoup.

HtmlUnit

HtmlUnit provides a completely own API which gives you the possibility to act like a webbrowser programmatically. I.e. enter form values, click elements, invoke JavaScript, etcetera. It's much more than alone a HTML parser. It's a real "GUI-less webbrowser" and HTML unit testing tool.

Jsoup

Jsoup also provides a completely own API. It gives you the possibility to select elements using jQuery-likeCSS selectors and provides a slick API to traverse the HTML DOM tree to get the elements of interest.
Particularly the traversing of the HTML DOM tree is the major strength of Jsoup. Ones who have worked with org.w3c.dom.Document know what a hell of pain it is to traverse the DOM using the verboseNodeList and Node APIs. True, XPath makes the life easier, but still, it's another learning curve and it can end up to be still verbose.
Here's an example which uses a "plain" W3C DOM parser like JTidy in combination with XPath to extract the first paragraph of your question and the names of all answerers (I am using XPath since without it, the code needed to gather the information of interest would otherwise grow up 10 times as big, without writing utility/helper methods).
String url = "http://stackoverflow.com/questions/3152138";
Document document = new Tidy().parseDOM(new URL(url).openStream(), null);
XPath xpath = XPathFactory.newInstance().newXPath();
Node question = (Node) xpath.compile("//*[@id='question']//*[contains(@class,'post-text')]//p[1]").evaluate(document, XPathConstants.NODE);
System.out.println("Question: " + question.getFirstChild().getNodeValue());
NodeList answerers = (NodeList) xpath.compile("//*[@id='answers']//*[contains(@class,'user-details')]//a[1]").evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < answerers.getLength(); i++) {
System.out.println("Answerer: " + answerers.item(i).getFirstChild().getNodeValue());
}
And here's an example how to do exactly the same with Jsoup:
String url = "http://stackoverflow.com/questions/3152138";
Document document = Jsoup.connect(url).get();
Element question = document.select("#question .post-text p").first();
System.out.println("Question: " + question.text());
Elements answerers = document.select("#answers .user-details a");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.text());
}
Do you see the difference? It's not only less code, but Jsoup is also relatively easy to grasp if you already have moderate experience with CSS selectors (by e.g. developing websites and/or using jQuery).

Summary

The pros and cons of each should be clear enough now. If you just want to use the standard JAXP API to traverse it, then go for the first mentioned group of parsers. There are pretty a lot of them. Which one to choose depends on the features it provides (how is HTML cleaning made easy for you? are there some listeners/interceptors and tag-specific cleaners?) and the robustness of the library (how often is it updated/maintained/fixed?). If you like to unit test the HTML, then HtmlUnit is the way to go. If you like to extract specific data from the HTML (which is more than often the real world requirement), then Jsoup is the way to go.

WebLaF

It combines three parts required for successful UI development:
  • Cross-platform L&F for Swing applications
  • Additional extended Swing components set
  • Various Swing utilities and managers
Licenses: GPLv3 and Commercial

A few examples showing how does some of WebLaF components look like: enter image description here
Main reason why i have started with a totally new L&F is that most of available L&F lack flexibility - you cannot re-style them in most cases (you can only change a few colors and turn on/off some UI elements in best case) or there are only inconvenient ways to do that.
My goal is to provide a fully customizable L&F with a pack of additional widely-known and useful elements (like date picker, tree table, dockable pane and lots of other) and some additional helpful managers and utilities, which will reduce the amount of code required to create awesome Swing UIs.

How to know tomcat server: JSP page is modified ..?

Because when Tomcat is asked to execute a JSP, is compares the modification date of the JSP file with the modification time of the compiled class corresponding to this JSP, and if more recent, it recompiles on the fly before executing it.
This is BTW an option that should be turned off in production, because it takes time to perform this check.
See http://tomcat.apache.org/tomcat-7.0-doc/jasper-howto.html for details.


It is upto the container to decide when to load servlets. A servlet can be loaded at runtime on demand. And coming to JSP, JSP translated to servlet can also be loaded at runtime.
Coming to your question,
Why Tomcat does not require restart?
It is because Tomcat is capable of adding/modifying classpath to Web Application classloader at runtime. Tomcat will be having their custom Classloader implementation which allows them to add the classpaths at runtime.
How does the custom classloader might work?
One way to get this working is when a Servlet/JSP is modified, a new classloader is created for the Servlet/JSP with Application classloader as parent classloader . And the new classloader will load the modified class again.

Java String.equals versus ==

Jorman is a successful businessman and has 2 houses.
enter image description here
But others don't know that.

Is it the same Jorman?

When you ask neighbours from either Madison or Burke streets, this is the only thing they can say:
enter image description here
Using the residence alone, it's tough to confirm that it's the same Jorman. Since they're 2 different addresses, it's just natural to assume that those are 2 different persons.
That's how the operator == behaves. So it will say that datos[0]==usuario is false, because it only compares the addresses.

An Investigator to the Rescue

What if we sent an investigator? We know that it's the same Jorman, but we need to prove it. Our detective will look closely at all physical aspects. With thorough inquiry, the agent will be able to conclude whether it's the same person or not. Let's see it happen in Java terms.
Here's the source code of String's equals() method:
enter image description here
It compares the Strings character by character, in order to come to a conclusion that they are indeed equal.
That's how the String equals method behaves. So datos[0].equals(usuario) will return true, because it performs a logical comparison.

Java is Pass-by-Value, Dammit!

Can you write a traditional swap(a,b) method/function in the language?
A traditional swap method or function takes two arguments and swaps them such that variables passed into the function are changed outside the function. Its basic structure looks like
Figure 1: (Non-Java) Basic swap function structure
swap(Type arg1, Type arg2) {
Type temp = arg1;
arg1 = arg2;
arg2 = temp;
}
If you can write such a method/function in your language such that calling
Figure 2: (Non-Java) Calling the swap function
Type var1 = ...;
Type var2 = ...;
swap(var1,var2);
actually switches the values of the variables var1 and var2, the language supports pass-by-reference semantics.
For example, in Pascal, you can write
Figure 3: (Pascal) Swap function
procedure swap(var arg1, arg2: SomeType);
var
temp : SomeType;
begin
temp := arg1;
arg1 := arg2;
arg2 := temp;
end;
...
{ in some other procedure/function/program }
var
var1, var2 : SomeType;
begin
var1 := ...; { value "A" }
var2 := ...; { value "B" }
swap(var1, var2);
{ now var1 has value "B" and var2 has value "A" }
end;
or in C++ you could write
Figure 4: (C++) Swap function
void swap(SomeType& arg1, Sometype& arg2) {
SomeType temp = arg1;
arg1 = arg2;
arg2 = temp;
}
...
SomeType var1 = ...; // value "A"
SomeType var2 = ...; // value "B"
swap(var1, var2); // swaps their values!
// now var1 has value "B" and var2 has value "A"
(Please let me know if my Pascal or C++ has lapsed and I've messed up the syntax...)
But you cannot do this in Java!

Now the details...

The problem we're facing here is statements like
In Java, Objects are passed by reference, and primitives are passed by value.
This is half incorrect. Everyone can easily agree that primitives are passed by value; there's no such thing in Java as a pointer/reference to a primitive.
However, Objects are not passed by reference. A correct statement would be Object references are passed by value.
This may seem like splitting hairs, bit it is far from it. There is a world of difference in meaning. The following examples should help make the distinction.
In Java, take the case of
Figure 5: (Java) Pass-by-value example
public void foo(Dog d) {
d = new Dog("Fifi"); // creating the "Fifi" dog
}
Dog aDog = new Dog("Max"); // creating the "Max" dog
// at this point, aDog points to the "Max" dog
foo(aDog);
// aDog still points to the "Max" dog
the variable passed in (aDog) is not modified! After calling foo, aDog still points to the "Max" Dog!
Many people mistakenly think/state that something like
Figure 6: (Java) Still pass-by-value...
public void foo(Dog d) {
d.setName("Fifi");
}
shows that Java does in fact pass objects by reference.
The mistake they make is in the definition of
Figure 7: (Java) Defining a Dog pointer
Dog d;
itself. When you write that definition, you are defining a pointer to a Dog object, not a Dog object itself.

On Pointers versus References...

The problem here is that the folks at Sun made a naming mistake.
In programming language design, a "pointer" is a variable that indirectly tracks the location of some piece of data. The value of a pointer is often the memory address of the data you're interested in. Some languages allow you to manipulate that address; others do not.
A "reference" is an alias to another variable. Any manipulation done to the reference variable directly changes the original variable.
"The reference values (often just references) are pointers to these objects, and a special null reference, which refers to no object"
They emphasize "pointers" in their description... Interesting...
When they originally were creating Java, they had "pointer" in mind (you can see some remnants of this in things like
NullPointerException).

Sun wanted to push Java as a secure language, and one of Java's advantages was that it does not allow pointer arithmetic as C++ does.
They went so far as to try a different name for the concept, formally calling them "references". A big mistake and it's caused even more confusion in the process.
There's a good explanation of reference variables at http://www.cprogramming.com/tutorial/references.html. (C++ specific, but it says the right thing about the concept of a reference variable.)
The word "reference" in programming language design originally comes from how you pass data to subroutines/functions/procedures/methods. A reference parameter is an alias to a variable passed as a parameter.
In the end, Sun made a naming mistake that's caused confusion. Java has pointers, and if you accept that, it makes the way Java behaves make much more sense.

Calling Methods

Calling
Figure 8: (Java) Passing a pointer by value
foo(d);
passes the value of d to foo; it does not pass the object that d points to!
The value of the pointer being passed is similar to a memory address. Under the covers it may be a tad different, but you can think of it in exactly the same way. The value uniquely identifies some object on the heap.
However, it makes no difference how pointers are implemented under the covers. You program with them exactly the same way in Java as you would in C or C++. The syntax is just slightly different (another poor choice in Java's design; they should have used the same -> syntax for de-referencing as C++).
In Java,
Figure 9: (Java) A pointer
Dog d;
is exactly like C++'s
Figure 10: (C++) A pointer
Dog *d;
And using
Figure 11: (Java) Following a pointer and calling a method
d.setName("Fifi");
is exactly like C++'s
Figure 12: (C++) Following a pointer and calling a method
d->setName("Fifi");
To sum up: Java has pointers, and the value of the pointer is passed in. There's no way to actually pass an object itself as a parameter. You can only pass a pointer to an object.
Keep in mind, when you call
Figure 13: (Java) Even more still passing a pointer by value
foo(d);

you're not passing an object; you're passing a pointer to the object.

Friday, 25 October 2013

3D viewer navigation


The following keystrokes control navigation in the 3D viewer. For more information about navigating in the 3D viewer, see Using the Navigation Controls.
Note - The focus must be in the 3D viewer in order for these controls to take effect. Simply click anywhere in the 3D viewer to change focus.

 
Command
Windows/Linux
Keystroke(s)
Mac
Keystroke(s)
Result
Move leftLeft arrowLeft arrowMoves the viewer in the direction of the arrow.
Move rightRight arrowRight arrowMoves the viewer in the direction of the arrow.
Move upUp arrowUp arrowMoves the viewer in the direction of the arrow.
Move downDown arrowDown arrowMoves the viewer in the direction of the arrow.
Rotate clockwiseShift + left arrowShift + left arrowRotates the view clockwise. The earth spins counter-clockwise.
Rotate counter-clockwiseShift + right arrowShift + right arrowRotates the view counter-clockwise.
Show/hide Overview windowCTRL + MCommand/Open Apple Key + MDisplays or closes overview window.
Tilt upShift + left mouse button + drag down, Shift + down arrowShift + down arrowTilts the viewer toward "horizon" view.
Tilt downShift + left mouse button + drag up, Shift + up arrowShift + up arrowTilts the viewer toward "top-down" view.
LookCTRL + left mouse button + dragCommand/Open Apple Key + mouse button + dragPerspective points in another direction, as if you are turning your head up, down, left or right.
Zoom inScroll wheel, + key, PgUp keyScroll wheel, + keyZooms the viewer in. Tip: to use the 'Page Up' key, make sure 'Num Lock' on your keyboard is off.
Zoom outScroll wheel, - key (both keyboard and numpad), PgDn keyScroll wheel, - key (both keyboard and numpad)Zooms the viewer out. Tip: to use the 'Page Down' key, make sure 'Num Lock' on your keyboard is off.
Zoom + automatic tiltRight mouse button + drag up or downCTRL + click + drag up or downZooms the viewer in and automatically tilts your view as you approach ground level.
Stop current motionSpacebarSpacebarWhen the viewer is in motion, stops movement
Reset view to "north - up"nnRotates view so that view is 'n'orth-up.
Reset tilt to "top-down" viewuuResets angle to view scene in "top-down" or "'u'p" mode.
Reset tilt and compass view to defaultrr'R'esets angle to view "top-down" and rotates to "north-up" view. Use this feature to orient the earth in the center of the viewer.

Tip - Use the ALT key in combination with most of these keystrokes to move more slowly in the indicated direction.