[OT] select and sysread problem on solaris

Thu Sep 11 10:33:20 BST 2008

On 11 Sep 2008, at 02:12, Paul Johnson wrote:

> I'm looking for a little help in solving a problem which has me  
> stumped
> and couldn't think of anywhere better to come.  That's not the problem
> by the way, but I'll take answers to that as well.
>
> I have about 210 named pipes (FIFOs) and three processes which are
> running a select over a third of the pipes each, and then calling
> sysread on the pipe before writing out the data to log files.
>
> This has been working well in production for almost two years handling
> many GB of data daily.
>
> Recently, another thirty or so pipes have been added to this group and
> very occassionally I am noticing a problem whereby select will  
> indicate
> that a pipe is ready for reading and sysread will attempt to read from
> the pipe, but there is actually nothing there to be read, and so the
> sysread call hangs waiting for input.
>
> Reproducing this problem is difficult, but I currently have the system
> in such a state.  The pipe on which the sysread call is waiting is one
> of the new pipes.
>
> I can only think of four possible explanations here:
>
>  1.  My code is broken.  I don't think this is the case but don't want
>      to rule it out.
>
>  2.  Some other process has read the data inbetween the select  
> returning
>      and the sysread being called.  lsof shows no unexpected processes
>      accessing the pipe at the moment and no one should have been  
> on the
>      system to have run cat or anything.  last shows nothing  
> suspicious.
>
>  3. Perl's select is broken.
>
>  4. The OS broken.
>
> Is my assumption correct that if select tells you there is  
> something to
> be read then there should be something there to be read?  Can anyone
> think of any other possibilities?
>
> What is curious to me is that the process writing to the named pipe is
> hung.  Is the pipe locked somehow until the sysread call has returned?
>
> Unless I can think of anything better to do, tomorrow I will try to  
> send
> some data to the named pipe that is being read to see if that will  
> allow
> the sysread to return.  If it does, I should be able to tell  
> whether any
> data has been lost from the named pipe, which might indicate that
> another process had read it.
>
> I am running perl-5.8.8 on Solaris 8.  The program writing to named  
> pipe
> is a Java program which is writing to STDOUT.  That program has been
> called using system by a Perl wrapper which has reopened STDOUT to the
> named pipe.  The program reading from the named pipe is using PERLIO.
>
> I'm open to any hints, suggestions or solutions.
>

This reminds of a issue I found with select/sysread on solaris too,
although it turned out it was a misunderstanding on my part of the perl
sysread semantics compared to the read system call. It was something
to do with what happened when a pipe was closed unexpectedly I think.
You might review the docs on sysread and select, but I'm sure you've
done that already.

the perl select docs also suggest you use the O_NONBLOCK flag for the
case you're referring to as well.

Sorry, but that's all I can offer without doing any serious research.

- Mark