From david at cantrell.org.uk Mon Apr 21 01:42:23 2014 From: david at cantrell.org.uk (David Cantrell) Date: Mon, 21 Apr 2014 01:42:23 +0100 Subject: Finding the intersection between two regexes Message-ID: <20140421004223.GA15289@bytemark.barnyard.co.uk> Can anyone point me at some code on the CPAN that, given two regexes, can figure out whether there are any bits of text that will be matched by both? eg, given /abc.../ and /...def/ it should tell me that there is an intersection, because the string 'abcdef' matches both. I'm not interested in covering all the utterly loopy things that one can do with perl regexes, of course. What I'm really after is a version of this: http://qntm.org/greenery because I'm *far* too lazy to port it myself. -- David Cantrell | London Perl Mongers Deputy Chief Heretic We decided that in a world of 8 string guitars, 12 string basses and drumkits from hell "shredding" in weird scat-jazz time signatures that we wanted a stripped down song with lots of shouting, and where the verse and chorus are the same chords. From mark at twoshortplanks.com Mon Apr 21 03:14:48 2014 From: mark at twoshortplanks.com (Mark Fowler) Date: Sun, 20 Apr 2014 22:14:48 -0400 Subject: Finding the intersection between two regexes In-Reply-To: <20140421004223.GA15289@bytemark.barnyard.co.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> Message-ID: On Sunday, April 20, 2014, David Cantrell wrote: > Can anyone point me at some code on the CPAN that, given two regexes, > can figure out whether there are any bits of text that will be matched > by both? I'm not sure I understand the question here, or moreover why you want to do this..is it just an intellectual exercise? If it's just a matter of wanting a single Perl regular expression that can match something iff both of these other regular expressions would match, surely you can just do this by inserting the second regular expression at the beginning of the first encapsulated in a zero-width positive look ahead assertion (with suitable variable length doodads to pad if they're not anchoring at the same place in the string.) What the link is talking about seems to be converting a regular expression down into a finate state machine and then combining that finate state machine with another finate state machine (I.e. non deterministic, being turned back into deterministic with maths). I can see how that's possible for a strict regular expression, but as you say, not for a true Perl non-regular regular expression. So...why do you want to do this? Mark From djk at tobit.co.uk Mon Apr 21 09:45:09 2014 From: djk at tobit.co.uk (Dirk Koopman) Date: Mon, 21 Apr 2014 09:45:09 +0100 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> Message-ID: <5354DA95.9020008@tobit.co.uk> On 21/04/14 03:14, Mark Fowler wrote: > On Sunday, April 20, 2014, David Cantrell wrote: > >> Can anyone point me at some code on the CPAN that, given two regexes, >> can figure out whether there are any bits of text that will be matched >> by both? > > > I'm not sure I understand the question here, or moreover why you want to do > this..is it just an intellectual exercise? > > If it's just a matter of wanting a single Perl regular expression that can > match something iff both of these other regular expressions would match, > surely you can just do this by inserting the second regular expression at > the beginning of the first encapsulated in a zero-width positive look ahead > assertion (with suitable variable length doodads to pad if they're not > anchoring at the same place in the string.) > > What the link is talking about seems to be converting a regular expression > down into a finate state machine and then combining that finate state > machine with another finate state machine (I.e. non deterministic, being > turned back into deterministic with maths). I can see how that's possible > for a strict regular expression, but as you say, not for a true Perl > non-regular regular expression. > > So...why do you want to do this? > This may be related to the question I asked recently about turning (up to) a few hundred REGEXes into one giant REGEX. The goal being to test all those disparate REGEXes in the most efficient way possible on a string. Dirk From diment at gmail.com Mon Apr 21 10:02:33 2014 From: diment at gmail.com (Kieren Diment) Date: Mon, 21 Apr 2014 19:02:33 +1000 Subject: Finding the intersection between two regexes In-Reply-To: <5354DA95.9020008@tobit.co.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <5354DA95.9020008@tobit.co.uk> Message-ID: On 21/04/2014, at 6:45 PM, Dirk Koopman wrote: > This may be related to the question I asked recently about turning (up to) a few hundred REGEXes into one giant REGEX. The goal being to test all those disparate REGEXes in the most efficient way possible on a string. > > Dirk > With respect to stuff like this, I'm sure that Regex::Assemble and the modifications in core inspired by it's problem space are well worth significant thought for someone with motivation to do so. From james.laver at gmail.com Mon Apr 21 10:03:22 2014 From: james.laver at gmail.com (James Laver) Date: Mon, 21 Apr 2014 10:03:22 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <5354DA95.9020008@tobit.co.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <5354DA95.9020008@tobit.co.uk> Message-ID: <32726F75-47AC-4CC2-88C0-FA4D4F52E125@gmail.com> On 21 Apr 2014, at 09:45, Dirk Koopman wrote: > This may be related to the question I asked recently about turning (up to) a few hundred REGEXes into one giant REGEX. The goal being to test all those disparate REGEXes in the most efficient way possible on a string. Sounds like an implementation detail. My reading of the problem was this: package Evil; use Moose; has regexen => ( is => ?ro?, default => sub { [] }, traits => [?Array?], handles => { add_regex => ?push?, res => ?elements?, }, ); sub go { my ($self, $datum) = @_; for my $r ($self->res) { return 0 if $datum !~ $r; } 1; } __PACKAGE__->meta->make_immutable; 1; __END__ From djk at tobit.co.uk Mon Apr 21 11:24:57 2014 From: djk at tobit.co.uk (Dirk Koopman) Date: Mon, 21 Apr 2014 11:24:57 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <32726F75-47AC-4CC2-88C0-FA4D4F52E125@gmail.com> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <5354DA95.9020008@tobit.co.uk> <32726F75-47AC-4CC2-88C0-FA4D4F52E125@gmail.com> Message-ID: <5354F1F9.1040507@tobit.co.uk> On 21/04/14 10:03, James Laver wrote: > > On 21 Apr 2014, at 09:45, Dirk Koopman wrote: > >> This may be related to the question I asked recently about turning (up to) a few hundred REGEXes into one giant REGEX. The goal being to test all those disparate REGEXes in the most efficient way possible on a string. > > Sounds like an implementation detail. > Yes, it is *the* implementation detail if one is writing a message switch whose primary purpose is to route messages based on said lists of regexes. It's bad enough having just the one list of regexes on the one key, but when there are hierarchical lists dealing with tuples of the type (data, key1, key2, [...]), that "detail" really, really matters. Oh and then there is the possible random ordering of the keys and one may need regexes to choose which list of regexes to use. Dirk From david at cantrell.org.uk Tue Apr 22 12:16:14 2014 From: david at cantrell.org.uk (David Cantrell) Date: Tue, 22 Apr 2014 12:16:14 +0100 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> Message-ID: <20140422111614.GA9444@bytemark.barnyard.co.uk> On Sun, Apr 20, 2014 at 10:14:48PM -0400, Mark Fowler wrote: > On Sunday, April 20, 2014, David Cantrell wrote: > > Can anyone point me at some code on the CPAN that, given two regexes, > > can figure out whether there are any bits of text that will be matched > > by both? > I'm not sure I understand the question here, or moreover why you want to do > this..is it just an intellectual exercise? I do actually have a use for it, which would help to explain the question. A large part of Number::Phone is based on data in google's libphonenumber project. That has, for most countries, regular expressions that match valid fixed lines and valid mobiles. For some countries those two regexes can both match some of the same numbers. Here's the data: http://goo.gl/hTBAhZ If you look at the data for Barbados, they have for fixed lines: 246[2-9]\d{6} and for mobiles: 246(?:(?:2[346]|45|82)\d|25[0-4])\d{4} then some strings will match both expressions - 2462303333, for example. But if you look at the data for Jamaica there are no strings that match both regexes. At the moment I detect these overlaps (and then throw the regexes away as being unfit for my purpose) by just going through each country's number space. This is practical for NANP countries as I can do it all with only about a million comparisons in the worst possible case. It would be impractical to apply this to the whole world though. -- David Cantrell | Bourgeois reactionary pig From paulm at paulm.com Tue Apr 22 17:05:58 2014 From: paulm at paulm.com (Paul Makepeace) Date: Tue, 22 Apr 2014 09:05:58 -0700 Subject: Finding the intersection between two regexes In-Reply-To: <20140422111614.GA9444@bytemark.barnyard.co.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> Message-ID: On Tue, Apr 22, 2014 at 4:16 AM, David Cantrell wrote: > On Sun, Apr 20, 2014 at 10:14:48PM -0400, Mark Fowler wrote: >> On Sunday, April 20, 2014, David Cantrell wrote: >> > Can anyone point me at some code on the CPAN that, given two regexes, >> > can figure out whether there are any bits of text that will be matched >> > by both? >> I'm not sure I understand the question here, or moreover why you want to do >> this..is it just an intellectual exercise? > > I do actually have a use for it, which would help to explain the > question. > > A large part of Number::Phone is based on data in google's > libphonenumber project. That has, for most countries, regular > expressions that match valid fixed lines and valid mobiles. For some > countries those two regexes can both match some of the same numbers. > Here's the data: > http://goo.gl/hTBAhZ > > If you look at the data for Barbados, they have for fixed lines: > 246[2-9]\d{6} > > and for mobiles: > 246(?:(?:2[346]|45|82)\d|25[0-4])\d{4} I'd go out on a limb and say that the complete list of overlapping situations all share a /^prefix/ like this. (This doesn't necessarily help you since you'd have to exhaustively falsify it but if you're going for the quick win I bet just looking for prefixes gets you most/all of the way.) > then some strings will match both expressions - 2462303333, for example. > But if you look at the data for Jamaica there are no strings that match > both regexes. > > At the moment I detect these overlaps (and then throw the regexes away > as being unfit for my purpose) by just going through each country's > number space. This is practical for NANP countries as I can do it > all with only about a million comparisons in the worst possible case. It > would be impractical to apply this to the whole world though. If your goal is to simply identify overlaps rather than generate encompassing regexes, you could try attacking it with intelligently/heuristically generated random numbers. Paul > > -- > David Cantrell | Bourgeois reactionary pig From benjamin.john.evans at gmail.com Tue Apr 22 17:20:08 2014 From: benjamin.john.evans at gmail.com (Ben Evans) Date: Tue, 22 Apr 2014 17:20:08 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <5354DA95.9020008@tobit.co.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <5354DA95.9020008@tobit.co.uk> Message-ID: This piece of anecdotal evidence is now a good ~8 years out of date, but I found that there were some surprising performance regressions for a complex, combined regex versus versus multiple runs with simple ones. As ever the moral of the story is, if performance matters, always measure, and get a friend to sanity check your results. Cheers, Ben On Mon, Apr 21, 2014 at 9:45 AM, Dirk Koopman wrote: > On 21/04/14 03:14, Mark Fowler wrote: >> >> On Sunday, April 20, 2014, David Cantrell wrote: >> >>> Can anyone point me at some code on the CPAN that, given two regexes, >>> can figure out whether there are any bits of text that will be matched >>> by both? >> >> >> >> I'm not sure I understand the question here, or moreover why you want to >> do >> this..is it just an intellectual exercise? >> >> If it's just a matter of wanting a single Perl regular expression that can >> match something iff both of these other regular expressions would match, >> surely you can just do this by inserting the second regular expression at >> the beginning of the first encapsulated in a zero-width positive look >> ahead >> assertion (with suitable variable length doodads to pad if they're not >> anchoring at the same place in the string.) >> >> What the link is talking about seems to be converting a regular expression >> down into a finate state machine and then combining that finate state >> machine with another finate state machine (I.e. non deterministic, being >> turned back into deterministic with maths). I can see how that's possible >> for a strict regular expression, but as you say, not for a true Perl >> non-regular regular expression. >> >> So...why do you want to do this? >> > > This may be related to the question I asked recently about turning (up to) a > few hundred REGEXes into one giant REGEX. The goal being to test all those > disparate REGEXes in the most efficient way possible on a string. > > Dirk > From mjlush at gmail.com Thu Apr 24 09:04:04 2014 From: mjlush at gmail.com (Michael Lush) Date: Thu, 24 Apr 2014 09:04:04 +0100 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> Message-ID: On Tue, Apr 22, 2014 at 5:05 PM, Paul Makepeace wrote: > If your goal is to simply identify overlaps rather than generate > encompassing regexes, you could try attacking it with > intelligently/heuristically generated random numbers. > > Paul > Its just about possible to brute force the problem, on my box (~2.4GHz intel) my $count = 0; foreach my $x (1..5000000000) { if ($x =~ /^246[2-9]\d{6}$/ and $x =~ /^246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}$/ ) { $count++; } } print "$count\n"; takes about 14 minutes to run, it would be easy enough to parallelise the search with threads determine the minimum and maximum numbers that could match. I guess you could get the runtime down to a minute or less. Depends on how many regexes you have to evaluate From mark at twoshortplanks.com Thu Apr 24 23:28:33 2014 From: mark at twoshortplanks.com (Mark Fowler) Date: Thu, 24 Apr 2014 18:28:33 -0400 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> Message-ID: On Thursday, April 24, 2014, Michael Lush wrote: if ($x =~ /^246[2-9]\d{6}$/ and $x =~ > /^246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}$/ ) Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a suitably modern perl. Mark From david at cantrell.org.uk Fri Apr 25 00:47:04 2014 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 25 Apr 2014 00:47:04 +0100 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> Message-ID: <5359A278.2010208@cantrell.org.uk> On 24/04/2014 23:28, Mark Fowler wrote: > On Thursday, April 24, 2014, Michael Lush wrote: > > if ($x =~ /^246[2-9]\d{6}$/ and $x =~ >> /^246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}$/ ) > > Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a > suitably modern perl. https://code.google.com/p/libphonenumber/source/browse/trunk/resources/PhoneNumberMetadata.xml My regexes come directly from Google's libphonenumber. They are happy to accept patches provided you sign your life away in blood. I require no such blood sacrifice for my code, but do insist that the tests still pass on perl 5.8.8. https://github.com/DrHyde/perl-modules-Number-Phone Assuming that you actually care, and aren't just pedanting in the finest custom of this august group, of course :-) -- David Cantrell | top google result for "internet beard fetish club" Nuke a disabled unborn gay baby whale for JESUS! From abigail at abigail.be Fri Apr 25 06:38:55 2014 From: abigail at abigail.be (Abigail) Date: Fri, 25 Apr 2014 07:38:55 +0200 Subject: Finding the intersection between two regexes In-Reply-To: <5359A278.2010208@cantrell.org.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> Message-ID: <20140425053855.GA23212@almanda.fritz.box> On Fri, Apr 25, 2014 at 12:47:04AM +0100, David Cantrell wrote: > On 24/04/2014 23:28, Mark Fowler wrote: >> On Thursday, April 24, 2014, Michael Lush wrote: >> >> if ($x =~ /^246[2-9]\d{6}$/ and $x =~ >>> /^246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}$/ ) >> >> Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a >> suitably modern perl. > > https://code.google.com/p/libphonenumber/source/browse/trunk/resources/PhoneNumberMetadata.xml > > My regexes come directly from Google's libphonenumber. They are happy to > accept patches provided you sign your life away in blood. I require no > such blood sacrifice for my code, but do insist that the tests still > pass on perl 5.8.8. > > https://github.com/DrHyde/perl-modules-Number-Phone > > Assuming that you actually care, and aren't just pedanting in the finest > custom of this august group, of course :-) I'm with Mark. My view is that a /\d/ is almost always wrong, on any perl released in this century. /\d/a or /(?a:\d)/ is just a really ugly and confusing way to write /[0-9]/. I bet most Perl programmers [1] won't know what /a does, let alone know the difference between /a and /aa. [1] Assuming developers at $WORK are a representative sample. Abigail From hakim.cassimally at gmail.com Fri Apr 25 09:08:26 2014 From: hakim.cassimally at gmail.com (Hakim C) Date: Fri, 25 Apr 2014 09:08:26 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <20140425053855.GA23212@almanda.fritz.box> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> <20140425053855.GA23212@almanda.fritz.box> Message-ID: On 25 April 2014 06:38, Abigail wrote: > On Fri, Apr 25, 2014 at 12:47:04AM +0100, David Cantrell wrote: > I'm with Mark. My view is that a /\d/ is almost always wrong, on any perl > released in this century. > > /\d/a or /(?a:\d)/ is just a really ugly and confusing way to write > /[0-9]/. > I bet most Perl programmers [1] won't know what /a does, let alone know the > difference between /a and /aa. > ?That said, [0-9] is an ugly and error-prone way to write something that used to be simply \d. I recently had fun spotting what I'd done wrong with something along the lines of /[0-9]{2}:[0-8]{2}:[0-9]:{2}/?. Changing the semantics of \d was a Really Bad Idea. osf' From david at cantrell.org.uk Fri Apr 25 15:09:27 2014 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 25 Apr 2014 15:09:27 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <20140425053855.GA23212@almanda.fritz.box> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> <20140425053855.GA23212@almanda.fritz.box> Message-ID: <20140425140927.GA9388@bytemark.barnyard.co.uk> On Fri, Apr 25, 2014 at 07:38:55AM +0200, Abigail wrote: > On Fri, Apr 25, 2014 at 12:47:04AM +0100, David Cantrell wrote: > > On 24/04/2014 23:28, Mark Fowler wrote: > >> On Thursday, April 24, 2014, Michael Lush wrote: > >> if ($x =~ /^246[2-9]\d{6}$/ and $x =~ > >>> /^246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}$/ ) > >> Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a > >> suitably modern perl. > > My regexes come directly from Google's libphonenumber. They are happy to > > accept patches provided you sign your life away in blood. I require no > > such blood sacrifice for my code, but do insist that the tests still > > pass on perl 5.8.8. > I'm with Mark. My view is that a /\d/ is almost always wrong, on any perl > released in this century. I'll accept patches from you too :-) -- David Cantrell | Official London Perl Mongers Bad Influence I'm in retox From mark at twoshortplanks.com Fri Apr 25 17:37:17 2014 From: mark at twoshortplanks.com (Mark Fowler) Date: Fri, 25 Apr 2014 12:37:17 -0400 Subject: Finding the intersection between two regexes In-Reply-To: <5359A278.2010208@cantrell.org.uk> References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> Message-ID: On Thu, Apr 24, 2014 at 7:47 PM, David Cantrell wrote: > Mark wrote: >> Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a >> suitably modern perl. > > My regexes come directly from Google's libphonenumber. They are happy to > accept patches provided you sign your life away in blood. Let's not do that. > I require no such > blood sacrifice for my code, but do insist that the tests still pass on perl > 5.8.8. That makes sense. So we sadly can't use /a. Ideally we'd want to munge the \d into [0-9]. It's as easy as s/\\d/[0-9]/g, but that's relying on google to never use some constructs in their regular expression (i.e. they don't put \\d in their own regular expression.) What do you think about that? Otherwise, we need to start pulling regular expressions apart (but if you're doing this anyway, maybe this could be put in there. Mark. From david at cantrell.org.uk Fri Apr 25 19:12:43 2014 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 25 Apr 2014 19:12:43 +0100 Subject: [ANNOUNCE] ORIGINAL AND BEST social, Thu 1 May Message-ID: <20140425181243.GA13377@bytemark.barnyard.co.uk> May is one of those terribly confusing months, but in accordance with ancient tradition some of us will be following the ORIGINAL AND BEST calendar and meeting in the Trinity Arms in Brixton to quaff pints of delicious ale on Thursday the 1st of May. http://london.randomness.org.uk/wiki.cgi?Trinity_Arms,_SW9_8DR -- David Cantrell | top google result for "topless karaoke murders" "IMO, the primary historical significance of Unix is that it marks the time in computer history where CPUs became so cheap that it was possible to build an operating system without adult supervision." -- Russ Holsclaw in a.f.c From abigail at abigail.be Fri Apr 25 21:24:57 2014 From: abigail at abigail.be (Abigail) Date: Fri, 25 Apr 2014 22:24:57 +0200 Subject: Finding the intersection between two regexes In-Reply-To: References: <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> Message-ID: <20140425202457.GC23212@almanda.fritz.box> On Fri, Apr 25, 2014 at 12:37:17PM -0400, Mark Fowler wrote: > On Thu, Apr 24, 2014 at 7:47 PM, David Cantrell wrote: > > Mark wrote: > >> Those /d are incorrect. You want [0-9] or to use the /a regexp flag on a > >> suitably modern perl. > > > > My regexes come directly from Google's libphonenumber. They are happy to > > accept patches provided you sign your life away in blood. > > Let's not do that. > > > I require no such > > blood sacrifice for my code, but do insist that the tests still pass on perl > > 5.8.8. > > That makes sense. So we sadly can't use /a. > > Ideally we'd want to munge the \d into [0-9]. It's as easy as > s/\\d/[0-9]/g, but that's relying on google to never use some > constructs in their regular expression (i.e. they don't put \\d in > their own regular expression.) > > What do you think about that? Otherwise, we need to start pulling > regular expressions apart (but if you're doing this anyway, maybe this > could be put in there. > Do a pre-check? Reject anything that contains a non-ASCII character flatout. About the only case where it's ok to use /\d/ is if you can garantee the string you are matching against contains no non-ASCII digits. Abigail From david at cantrell.org.uk Fri Apr 25 22:15:31 2014 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 25 Apr 2014 22:15:31 +0100 Subject: Finding the intersection between two regexes In-Reply-To: <20140425202457.GC23212@almanda.fritz.box> References: <20140425202457.GC23212@almanda.fritz.box> <20140421004223.GA15289@bytemark.barnyard.co.uk> <20140422111614.GA9444@bytemark.barnyard.co.uk> <5359A278.2010208@cantrell.org.uk> Message-ID: <20140425211531.GA16147@bytemark.barnyard.co.uk> On Fri, Apr 25, 2014 at 12:37:17PM -0400, Mark Fowler wrote: > David Cantrell wrote: > > I require no such blood sacrifice for my code, but do insist that > > the tests still pass on perl 5.8.8. > That makes sense. So we sadly can't use /a. Although you can use fancy new features in the build scripts. > Ideally we'd want to munge the \d into [0-9]. It's as easy as > s/\\d/[0-9]/g, but that's relying on google to never use some > constructs in their regular expression (i.e. they don't put \\d in > their own regular expression.) There's very little chance of that happening. Backslashes and letters can't appear in phone numbers anyway. Go fer it! On Fri, Apr 25, 2014 at 10:24:57PM +0200, Abigail wrote: > Do a pre-check? Reject anything that contains a non-ASCII character flatout. That would be another option. Perhaps a better one, as it'll mean I won't have to remember to eschew \d in future. -- David Cantrell | even more awesome than a panda-fur coat You may now start misinterpreting what I just wrote, and attacking that misinterpretation.