Dealing with EOL chars

Matt Lawrence matt.lawrence at virgin.net
Thu Jan 24 17:14:41 GMT 2008


Alex Brelsfoard wrote:
> Matt, All,
>
> Here's a more detailed explanation:
> We are reading in feeds (typically TSV or CSV files), parsing them,
> reformatting the data, and spitting it out as another file.
> These feeds can come from all sorts of people/places/things.
> So sometimes they are wonderfully formatted and we understand their content.
> Sometime they are not, and we do not.
>
> Imagine a feed that is a list of products.
> Each row lists the name, type, description, and price of a product.
> Now say you have someone using some sort of CMS to create their feed.
> Now say that they have all of their information stored in Word files.
> It is very easy to see the scenario where the person will just copy and
> paste the description (with line breaks) into their CMS.
> Their CMS may just enclose this field in quotes and move on.
> So now we have a feed where the description column may have linebreaks in
> it.
> So I can't just split on any form of linebreak.
>
> Does this make a bit more sense?
>   

Have you already looked at Text::CSV_XS? As long as the feeds are
quoted, I think it will properly parse values containing newlines.

http://search.cpan.org/dist/Text-CSV_XS/CSV_XS.pm

> btw, there's no chance that I could define $/ as a regex is there?
>   
my man perlvar (5.8.8) says: "Remember: the value of $/ is a string, not
a regex.  awk has to be better for something. :-)"

> Thanks again for the help.
>   

Matt



More information about the london.pm mailing list