Dealing with EOL chars

Alex Brelsfoard alex.brelsfoard at gmail.com
Thu Jan 24 15:30:48 GMT 2008


Matt, All,

Here's a more detailed explanation:
We are reading in feeds (typically TSV or CSV files), parsing them,
reformatting the data, and spitting it out as another file.
These feeds can come from all sorts of people/places/things.
So sometimes they are wonderfully formatted and we understand their content.
Sometime they are not, and we do not.

Imagine a feed that is a list of products.
Each row lists the name, type, description, and price of a product.
Now say you have someone using some sort of CMS to create their feed.
Now say that they have all of their information stored in Word files.
It is very easy to see the scenario where the person will just copy and
paste the description (with line breaks) into their CMS.
Their CMS may just enclose this field in quotes and move on.
So now we have a feed where the description column may have linebreaks in
it.
So I can't just split on any form of linebreak.

Does this make a bit more sense?

btw, there's no chance that I could define $/ as a regex is there?

Thanks again for the help.
--Alex

On Jan 24, 2008 5:50 AM, Matt Lawrence <matt.lawrence at virgin.net> wrote:

> Alex Brelsfoard wrote:
> > Hi all,
> >
> > Sorry for making such an on-topic post, but seeing as this is my first
> post
> > with london.pm..... I figured I might pretend to be Perl-centric.
> >
> > I am currently trying to work on a system that reads in all kinds of
> feeds.
> > These feeds can be created on a PC, new/old mac, or a *nix machine.
> > And I need to be able to deal with them all.
> > Here's the kicker, these feeds sometimes have inline breaks, and we need
> to
> > keep them.
> >
> > Does anyone have any suggestions on how to deal with this?
> >
> > This works:
> > ---------------------------
> > my $newline = "\n";
> > my $file = '788_test.txt';
> > open (my $file_fh, $file) || die "could not open $file for reading";
> > my $file_content = <$file_fh>;
> > $file_content =~ s/(?:\015{1,2}\012|\015|\012)/$newline/sg;
> > foreach (split(/\n/, $file_content)) {
> > print "$_\n";
> > }
> > close($file_fh);
> > ---------------------------
> >
> > I can even split on that regular expression and save a line of code.
> > But I'm just concerned about losing inline breaks.
> >
> > Thoughts?
> >
>
> What do you mean by inline breaks?
>
> Matt
>


More information about the london.pm mailing list