Problem
Wrangler step which reads data in XML or JSON format fails with the following types of errors:
For JSON:
java.io.EOFException: End of input at line 1 column 2 at com.google.gson.stream.JsonReader.nextNonWhitespace(JsonReader.java:1377) ~[com.google.code.gson.gson-2.2.4.jar:na]at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:483) ~[com.google.code.gson.gson-2.2.4.jar:na]
For XML:
Caused by: org.json.JSONException: Mismatched close tag note at 6 [character 7 line 1] at org.json.JSONTokener.syntaxError(JSONTokener.java:505) ~[org.json.json-20090211.jar:na] at org.json.XML.parse(XML.java:311) ~[org.json.json-20090211.jar:na]
$ cat -e <file_path>
Environment
- CDAP version 6.2.3 or earlier
Solution
- Before the parse as JSON or parse XML to JSON directive, remove all new lines from the input file and replace it with empty space.
- To remove new lines, insert a find and replace step and replace all newlines using this regex (\r\n|\r|\n) to empty space ``.
Cause
The issue is caused due to the Data Fusion Wrangler step unable to properly handle line ending generated by different Operating Systems. Currently it only handles '' and \n.