Discussion:
[jira] [Updated] (CSV-222) invalid char between encapsulated token and delimiter
Patrick Gäckle (JIRA)
2018-03-21 11:06:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Gäckle updated CSV-222:
-------------------------------
Description:
When trying to read the file [^faulty.csv] and parse it I get the following error:

{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}

The line of code is the parsing part returning the iterator of it:

{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}

The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in {noformat}Lexer#parseEncapsulatedToken{noformat}.

Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.

Sincerely

was:
When trying to read the file [^faulty.csv] and parse it I get the folowwing error:

{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}

The line of code is the parsing part returning the iterator of it:

{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}

The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in {noformat}Lexer#parseEncapsulatedToken{noformat}.

Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.

Sincerely
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in {noformat}Lexer#parseEncapsulatedToken{noformat}.
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-03-21 11:06:00 UTC
Permalink
Patrick Gäckle created CSV-222:
----------------------------------

Summary: invalid char between encapsulated token and delimiter
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Attachments: faulty.csv

When trying to read the file [^faulty.csv] and parse it I get the folowwing error:

{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}

The line of code is the parsing part returning the iterator of it:

{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}

The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in {noformat}Lexer#parseEncapsulatedToken{noformat}.

Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.

Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-03-21 11:07:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Gäckle updated CSV-222:
-------------------------------
Description:
When trying to read the file [^faulty.csv] and parse it I get the following error:

{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}

The line of code is the parsing part returning the iterator of it:

{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}

The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters

Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.

Sincerely

was:
When trying to read the file [^faulty.csv] and parse it I get the following error:

{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}

The line of code is the parsing part returning the iterator of it:

{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}

The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in {noformat}Lexer#parseEncapsulatedToken{noformat}.

Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.

Sincerely
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-03-27 22:36:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416349#comment-16416349 ]

Gary Gregory commented on CSV-222:
----------------------------------

Are expecting that Commons CSV should somehow recover from junk in the input?
Or do want to be able to set the end-of-record marker to SOH-STX-LF?
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-03-28 06:57:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416920#comment-16416920 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Setting the end-of-record marker to SOH-STX-LF would help me as this would match my current problem.
Recovering from junk would be the long lasting solution. I can think of an _lazy reading option_ that instead of throwing an error
when something unexpected happens between encapsulated token and delimiter just continues without taking any action like appending text to current field/header or continueing to the next field.

Thanks.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-03-29 13:57:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419025#comment-16419025 ]

Gary Gregory commented on CSV-222:
----------------------------------

Does {{org.apache.commons.csv.CSVFormat.withRecordSeparator(String)}} work for you then?
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-03-29 17:30:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419448#comment-16419448 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

This is the current workaround I use.
Maybe it would be nice to include the position in the log statement as another hint where to search.

I'd really would like to see some option to just leave characters not identified as in colum aside.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-03 08:26:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419448#comment-16419448 ]

Patrick Gäckle edited comment on CSV-222 at 4/3/18 8:25 AM:
------------------------------------------------------------

This is the option I'd like to use but how can I set them to these non printable characters?
Maybe it would be nice to include the position in the log statement as another hint where to search.

I'd really would like to see some option to just leave characters not identified as in colum aside.


was (Author: lostkatana):
This is the current workaround I use.
Maybe it would be nice to include the position in the log statement as another hint where to search.

I'd really would like to see some option to just leave characters not identified as in colum aside.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-04-03 23:46:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424775#comment-16424775 ]

Gary Gregory commented on CSV-222:
----------------------------------

Call {{org.apache.commons.csv.CSVFormat.withRecordSeparator(String)}} and use Unicode literals to specify whatever characters you want like {{"\u0001\u0002\u0003"}}.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 07:53:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425139#comment-16425139 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Thanks [~garydgregory]. Haven't thought of this solution.

Anyways I still thinnk it is a bug as when placing these characters in betweens column 1 and column 2 nothing happens. Only when it is the last character read as possible "line end".

For myself the solution of this is using a FilterReader that throws away all non printable characters as it happend to have a lot more in this file I need to process.
Thanks
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-04-04 14:42:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425629#comment-16425629 ]

Gary Gregory commented on CSV-222:
----------------------------------

The issue you initially described talked about special characters in the record separator, not the column delimiter.
The column delimiter is currently limited to a single character. There is a separate ticket to enhance the column delimiter to a String instead of a char.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 14:50:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Gäckle updated CSV-222:
-------------------------------
Attachment: faulty2.csv
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 14:52:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425645#comment-16425645 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

You slightly missunderstood me or I was not precise enough.
I attached [^faulty2.csv] where you can see in header row there is also an SOH and STX in column1 before the columns separator.
This is currently no problem but it is for the last column in a row.

Hope I could decribe this a bit better now.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-04-04 15:33:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425725#comment-16425725 ]

Gary Gregory commented on CSV-222:
----------------------------------

In faulty2.csv, you have SOH+STX between headers and in record separators.
As of now, you need to filters these characters before they get to Commons CSV.
We would need a new features that completely ignores a given set of characters between tokens.
Do you want to provide a PR for that?
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 16:20:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425794#comment-16425794 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Sorry I don't know what PR means.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-04-04 16:22:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425799#comment-16425799 ]

Gary Gregory commented on CSV-222:
----------------------------------

That would be a "Pull Request" on GitHub: https://github.com/apache/commons-csv
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 16:23:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Gäckle updated CSV-222:
-------------------------------
Comment: was deleted

(was: Sorry I don't know what PR means.)
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-04 16:25:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425803#comment-16425803 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Ah sure. I see what I can do about that.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-14 00:34:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438138#comment-16438138 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

[~garydgregory] I did some coding on this but as I'm not familar with this project I'm quite not sure if I missed something.
Any chance you'd have a look before I create the PR (this is the first time I'm contributing)?
--> https://github.com/LostKatana/commons-csv/commits/feature/CSV-222_ignore_set_of_characters
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-14 10:08:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438297#comment-16438297 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Oh BTW I have an issue with dependencies for the CSVBenchmark class (\src\test\java\org\apache\commons\csv\CSVBenchmark.java).
Can you help me on this?
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-04-14 10:14:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Gäckle updated CSV-222:
-------------------------------
Comment: was deleted

(was: Oh BTW I have an issue with dependencies for the CSVBenchmark class (\src\test\java\org\apache\commons\csv\CSVBenchmark.java).
Can you help me on this?)
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-05-21 16:54:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482724#comment-16482724 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

Opened PR: https://github.com/apache/commons-csv/pull/29
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-05-21 20:12:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482989#comment-16482989 ]

Gary Gregory commented on CSV-222:
----------------------------------

Thank you for the PR. 

I am wondering if, instead of further complicating the lexer code, it wouldn't be cleaner and simpler to do the filtering in a reader. For example, I might propose something like the following for Commons IO:
{code:java}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.commons.io.input;

import java.io.FilterReader;
import java.io.IOException;
import java.io.Reader;
import java.util.HashSet;
import java.util.Set;

/**
* A filter reader that removes a given set of characters represented as int code points.
*/
public class IntegerSetFilterReader extends FilterReader {

private static final HashSet<Integer> EMPTY_SET = new HashSet<>(0);
private final Set<Integer> intSet;

/**
* Constructs a new reader.
*
* @param in
* the reader to filter
* @param intSet
* what to filter
*/
public IntegerSetFilterReader(Reader in, Set<Integer> intSet) {
super(in);
this.intSet = intSet == null ? EMPTY_SET : intSet;
}

@Override
public int read() throws IOException {
int ch;
do {
ch = super.read();
} while (skip(ch));
return ch;
}

private boolean skip(int ch) {
// Note that you can increase the Integer cache with a system property.
return intSet.contains(Integer.valueOf(ch));
}

@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int read = super.read(cbuf, off, len);
if (read == -1) {
return -1;
}
int pos = off - 1;
for (int readPos = off; readPos < off + read; readPos++) {
if (skip(read)) {
continue;
}
pos++;
if (pos < readPos) {
cbuf[pos] = cbuf[readPos];
}
}
return pos - off + 1;
}
}
{code}

Thoughts?
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-05-21 16:32:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482692#comment-16482692 ]

Gary Gregory commented on CSV-222:
----------------------------------

It's easier for anyone to review your changes if you create a PR...
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-05-21 23:26:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483196#comment-16483196 ]

Gary Gregory commented on CSV-222:
----------------------------------

See WIP in IO-577.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-05-22 16:10:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484215#comment-16484215 ]

Gary Gregory commented on CSV-222:
----------------------------------

Please try to use the new classes in Commons IO 2.7-SNAPSHOT: {{CharacterSetFilterReader}} and {{CharacterFilterReader}}.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Gary Gregory (JIRA)
2018-05-22 16:16:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484215#comment-16484215 ]

Gary Gregory edited comment on CSV-222 at 5/22/18 4:15 PM:
-----------------------------------------------------------

Please try to use the new classes in Commons IO 2.7-SNAPSHOT: {{CharacterSetFilterReader}} and {{CharacterFilterReader}}.

Please see the Apache snapshot repository here: https://repository.apache.org/content/repositories/snapshots/


was (Author: garydgregory):
Please try to use the new classes in Commons IO 2.7-SNAPSHOT: {{CharacterSetFilterReader}} and {{CharacterFilterReader}}.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Patrick Gäckle (JIRA)
2018-05-22 20:11:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484492#comment-16484492 ]

Patrick Gäckle commented on CSV-222:
------------------------------------

I'm having a look into. Also I will close the PR as it really seems to be easier to use the reader by only looking at the name.
I will get back to you when done.
Post by Patrick Gäckle (JIRA)
invalid char between encapsulated token and delimiter
-----------------------------------------------------
Key: CSV-222
URL: https://issues.apache.org/jira/browse/CSV-222
Project: Commons CSV
Issue Type: Bug
Components: Parser
Affects Versions: 1.4
Reporter: Patrick Gäckle
Priority: Major
Attachments: faulty.csv, faulty2.csv
{code}
java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:284)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:252)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
{code}
{code:java}
csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
iterator = csvFormat.parse(reader).iterator();
{code}
The invalid char is the contained SOH and STX non printable characters at the end of line.
I debugged through the source of this and ran into the Exception in the Lexer not handling these special characters
Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with these type of characters and what behaviour they should have.
Sincerely
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Loading...