Skip to content
Tags

Cropping lines

August 2, 2010

One thing I’ve written quickly a few weeks ago was an algorithm to crop the lines. The basic idea is to have a text where all the text is in an unique line and has to be written in separate lines. This is a typical problem when you try to send a text in a mail where the lines should not be larger than 72 characters.

I know many mail readers broke the lines if they can not be displayed correctly but I have written the following code you can reuse:


public static String cropLines(Reader in, int maxChars, boolean justify) throws IOException {
 StringBuilder buf = new StringBuilder(250);

 // Using a reader avoids issues with the fucking
 // end of lines.
 LineNumberReader reader = new LineNumberReader(in);
 String line;
 while ((line = reader.readLine()) != null) {
 // Read multiple lines if the "\" is discovered at the end of
 // the line (means the line is extended).
 while (line.endsWith("\\")) {
 line = line.substring(0, line.length() - 1)    + reader.readLine();
 }

 if( line.length() == 0 ){
 buf.append(EOL);
 }
 else {
 // Calculate the number of spaces used for indentation.
 StringBuilder indent = new StringBuilder();
 int len = line.length();
 char c;
 while (indent.length() < len
 && Character.isWhitespace(c = line.charAt(indent.length())))
 indent.append(c);

 // For indented lines, remove the first spaces as they will
 // added at the contenation part.
 if (indent.length() > 0) {
 line = line.trim();
 len = line.length();
 }

 // Until the line is exhausted.
 while (len > 0) {
 int pos = 0;
 int last = 0;

 // Look for the biggest part we can cut
 while (pos < len
 && (pos + indent.length() < maxChars || last == 0)) {
 c = line.charAt(pos);
 if (c == ' ') last = pos;
 pos++;
 }
 buf.append(indent);

 if (last == 0 || pos == len) {
 // No space, then put it all!
 buf.append(line);
 len = 0;
 }
 else {
 // Add the beginning of the line and truncate.
 StringBuilder cropped = new StringBuilder(line
 .substring(0, last));

 if (justify && pos < len) {
 // Justify on the left and on the right.
 boolean hasSpaces = true;
 while (cropped.length() < maxChars && hasSpaces ) {
 int i = cropped.length() - 2;
 hasSpaces = false;

 // Add spaces for the end to the beginning of
 // the cropped line.
 while (cropped.length() < maxChars && i > 0 ) {
 if (Character.isWhitespace(c = cropped.charAt(i--))) {
 while (i > 0 && Character.isWhitespace(c = cropped.charAt(i))){
 i--;
 }
 cropped.insert(i+1, ' ');
 hasSpaces = true;
 }
 }
 }
 }
 buf.append(cropped);
 line = line.substring(last + 1);
 len = line.length();
 }
 buf.append(EOL);
 }
 }
 }
 return buf.toString();
 }

 /**
 * Crop the lines to a number of chars. The algorithm does
 * at its best. If there are no spaces, the line will be
 * not broken.
 *
 * <p>
 * If your text begins with some spqces, those spaces are
 * kept if the line is divided (to keep the correct alignment).
 * </p>
 *
 * <p>
 * The algorithm is not optimized. But, the book of Jules Verne,
 * <i>20000 lieux sous les mers<i> (about 800Kb), is processed in
 * about 350ms for lines of 30 characters only with justification
 * on the right and on the left. The times falls to 200 ms for
 * the same text with lines of 70 characters.
 * </p>
 *
 * @param str the string contqining the text.
 * @param maxChars the maximum of characters per line. A value
 *         of 72 is reasonnqble, below 30 characters, the work to do
 *         is quite important.
 * @param justify <code>true</code> for a
 *         full justification (on the left and on the
 *         right).
 * @return the new string.
 */
 public static String cropLines(String str, int maxChars, boolean justify) {
 String str2 = null;
 try {
 str2 = cropLines(new StringReader(str), maxChars, justify);
 } catch (IOException e) {
 LOG.error("I/O ERROR: " + e.getMessage());
 }
 return str2;
 }



The most interesting thing in this code are the performances. I was very sure that my code was very bad (and I think it is) because I manipulate Strings in many ways: adding chars, cutting, copying and so on. My first concern was to say: this algorithm has very bad performances and I should not use it on a heavy loaded server. But, rather than refactoring my code which was working fine (with a full justification by adding spaces for example), I’ve just tried to do the line breaking on a very big text, not a very simple mail.

The results are speaking by themselves: 350 milliseconds for an entire book in the worst case (my computer is not a fast one, you can trust me). When I’ve seen the results, I’ve said: Whouah, don’t bother with performances…!

This is a regular issue in development teams: the programmer knows the code is not performant and he tries to enhance it. This is a costly mistake. A performance-killer for the team: because even if the code is bad in terms of performance, this is not annoying because this part of code is called only twice a day! And sometimes, the solution could be worse than the original one, then follow my advice:

  1. Check the performance of the current code;
  2. Be sure the code is called often;
  3. Optimize only the revelant parts.

If you follow these 3 rules (in this order), you will have more time to drink a beer…

Advertisements

From → Other

2 Comments
  1. Hello William,

    Your program sounds good to me too compared to par unix command!

    I am running a Dell xps M1710 on ubuntu 10.4 and here the results for a 1143 lines book with 120162 words:

    time cat “l’éthique de la liberté.txt” | par 72j > test.txt

    real 0m0.119s
    user 0m0.108s
    sys 0m0.012s

    The book is available here:
    http://pontoize.free.fr/l%27%C3%A9thique%20de%20la%20libert%C3%A9.txt

  2. Hi Nicolas,

    I’ve done the test with my UNIX box (UBUNTU). JAVA takes about 300 ms and the “par” command about 100 ms with your file.

    That’s mean my code is 3 times slower than an optimized code. Not so bad, in the sense the usage is very limited and not for big volumes. The JAVA code is suitable for a variety of usages without any further optimization. And you confirm the rules I gave: do not optimize first!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: