[OT] select and sysread problem on solaris
mark at blackmans.org
Thu Sep 11 10:33:20 BST 2008
On 11 Sep 2008, at 02:12, Paul Johnson wrote:
> I'm looking for a little help in solving a problem which has me
> and couldn't think of anywhere better to come. That's not the problem
> by the way, but I'll take answers to that as well.
> I have about 210 named pipes (FIFOs) and three processes which are
> running a select over a third of the pipes each, and then calling
> sysread on the pipe before writing out the data to log files.
> This has been working well in production for almost two years handling
> many GB of data daily.
> Recently, another thirty or so pipes have been added to this group and
> very occassionally I am noticing a problem whereby select will
> that a pipe is ready for reading and sysread will attempt to read from
> the pipe, but there is actually nothing there to be read, and so the
> sysread call hangs waiting for input.
> Reproducing this problem is difficult, but I currently have the system
> in such a state. The pipe on which the sysread call is waiting is one
> of the new pipes.
> I can only think of four possible explanations here:
> 1. My code is broken. I don't think this is the case but don't want
> to rule it out.
> 2. Some other process has read the data inbetween the select
> and the sysread being called. lsof shows no unexpected processes
> accessing the pipe at the moment and no one should have been
> on the
> system to have run cat or anything. last shows nothing
> 3. Perl's select is broken.
> 4. The OS broken.
> Is my assumption correct that if select tells you there is
> something to
> be read then there should be something there to be read? Can anyone
> think of any other possibilities?
> What is curious to me is that the process writing to the named pipe is
> hung. Is the pipe locked somehow until the sysread call has returned?
> Unless I can think of anything better to do, tomorrow I will try to
> some data to the named pipe that is being read to see if that will
> the sysread to return. If it does, I should be able to tell
> whether any
> data has been lost from the named pipe, which might indicate that
> another process had read it.
> I am running perl-5.8.8 on Solaris 8. The program writing to named
> is a Java program which is writing to STDOUT. That program has been
> called using system by a Perl wrapper which has reopened STDOUT to the
> named pipe. The program reading from the named pipe is using PERLIO.
> I'm open to any hints, suggestions or solutions.
This reminds of a issue I found with select/sysread on solaris too,
although it turned out it was a misunderstanding on my part of the perl
sysread semantics compared to the read system call. It was something
to do with what happened when a pipe was closed unexpectedly I think.
You might review the docs on sysread and select, but I'm sure you've
done that already.
the perl select docs also suggest you use the O_NONBLOCK flag for the
case you're referring to as well.
Sorry, but that's all I can offer without doing any serious research.
More information about the london.pm