Dealing with EOL chars
matt.lawrence at virgin.net
Thu Jan 24 17:14:41 GMT 2008
Alex Brelsfoard wrote:
> Matt, All,
> Here's a more detailed explanation:
> We are reading in feeds (typically TSV or CSV files), parsing them,
> reformatting the data, and spitting it out as another file.
> These feeds can come from all sorts of people/places/things.
> So sometimes they are wonderfully formatted and we understand their content.
> Sometime they are not, and we do not.
> Imagine a feed that is a list of products.
> Each row lists the name, type, description, and price of a product.
> Now say you have someone using some sort of CMS to create their feed.
> Now say that they have all of their information stored in Word files.
> It is very easy to see the scenario where the person will just copy and
> paste the description (with line breaks) into their CMS.
> Their CMS may just enclose this field in quotes and move on.
> So now we have a feed where the description column may have linebreaks in
> So I can't just split on any form of linebreak.
> Does this make a bit more sense?
Have you already looked at Text::CSV_XS? As long as the feeds are
quoted, I think it will properly parse values containing newlines.
> btw, there's no chance that I could define $/ as a regex is there?
my man perlvar (5.8.8) says: "Remember: the value of $/ is a string, not
a regex. awk has to be better for something. :-)"
> Thanks again for the help.
More information about the london.pm