TSWriter deadlock potential fix. (3 Viewers)

ramu

Portal Pro
July 15, 2008
190
7
Hi,

I'm not quite sure what this technical stuff exactly means... Can You tell me whats going on and if there is any newer fix for that problem or is the fix allready in 1.4.0?

Thx
ramu
 

Owlsroost

Retired Team Member
  • Premium Supporter
  • October 28, 2008
    5,540
    5,038
    Cambridge
    Home Country
    United Kingdom United Kingdom
    @DJBlu
    Looks like you've made some excellent progress with this issue.
    Having said that, personally I don't understand how your changes are solving the problem.
    Are you able to explain a little?

    My perspective is this...
    The deadlock at GetNewSubChannel() (which is calling CMpTs::AddChannel()) is obviously blocking on the m_Lock declared in CMpTs.
    We can see who/what is holding the lock by reading/comparing logs very carefully. I've attached such a sample with the epg.log, tv.log and TsWriter.log stitched together. Compare the output of the last successful EPG grab (for "ANTENA 3"):

    2013-05-14 16:28:01.322053 [EPG Update thread(21)]: EpgGrabbing: Stop - user epg

    2013-05-14 16:28:01.357058 [EPG Update thread(21)]: user:epg remove
    2013-05-14 16:28:01.422066 [EPG Update thread(21)]: tvcard:FreeSubChannel: subchannels count 1 subch#0
    2013-05-14 16:28:01.486074 [EPG Update thread(21)]: DVB subch:0 Decompose()

    14-05-2013 16:28:01.518 del m_pVideoAnalyzer
    14-05-2013 16:28:01.522 analyzer: reset
    14-05-2013 16:28:01.526 del m_pPmtGrabber
    14-05-2013 16:28:01.529 del m_pRecorder
    14-05-2013 16:28:01.533 del m_pTimeShifting
    14-05-2013 16:28:01.536 del m_pTeletextGrabber
    14-05-2013 16:28:01.540 del m_pCaGrabber
    14-05-2013 16:28:01.543 del done...

    2013-05-14 16:28:01.550087 [EPG Update thread(21)]: FreeSubChannel CA: freeing sub channel : 0
    2013-05-14 16:28:01.614090 [EPG Update thread(21)]: tvcard:FreeSubChannel : no subchannels present, pausing graph
    2013-05-14 16:28:01.677098 [EPG Update thread(21)]: dvb:confused:topGraph called
    2013-05-14 16:28:01.743107 [EPG Update thread(21)]: tvcard:FreeAllSubChannels
    2013-05-14 16:28:01.807115 [EPG Update thread(21)]: dvb:confused:topGraph

    14-05-2013 16:28:01.838 CMpTsFilter::pause()
    14-05-2013 16:28:01.842 CMpTsFilter::confused:top()

    2013-05-14 16:28:02.085151 [EPG Update thread(21)]: Pausecard
    2013-05-14 16:28:02.150159 [EPG Update thread(21)]: dvb:confused:topGraph called
    2013-05-14 16:28:02.215167 [EPG Update thread(21)]: tvcard:FreeAllSubChannels
    2013-05-14 16:28:02.279175 [EPG Update thread(21)]: dvb:confused:topGraph filterstate already stopped, returning.

    2013-05-14 16:28:15.729892 [DVB EPG timer(13)]: Grab for card:#10 transponder #7/12 channel: Canal Catala Barcelona
    2013-05-14 16:28:15.741886 [DVB EPG timer(13)]: EpgCard: grab epg on card: #10 transponder: #6 ch:Canal Catala Barcelona

    ...with the failure:

    2013-05-14 16:29:20.132070 [(18)]: EpgGrabbing: Stop - user epg

    2013-05-14 16:29:20.136070 [(18)]: user:epg remove
    2013-05-14 16:29:20.169074 [(18)]: tvcard:FreeSubChannel: subchannels count 1 subch#0
    2013-05-14 16:29:20.200078 [(18)]: DVB subch:0 Decompose()

    14-05-2013 16:29:20.232 del m_pVideoAnalyzer
    14-05-2013 16:29:20.236 analyzer: reset
    14-05-2013 16:29:20.239 del m_pPmtGrabber
    14-05-2013 16:29:20.243 del m_pRecorder
    14-05-2013 16:29:20.246 del m_pTimeShifting
    14-05-2013 16:29:20.250 del m_pTeletextGrabber
    14-05-2013 16:29:20.253 del m_pCaGrabber
    14-05-2013 16:29:20.257 del done...

    2013-05-14 16:29:20.261086 [(18)]: FreeSubChannel CA: freeing sub channel : 0
    2013-05-14 16:29:20.294090 [(18)]: tvcard:FreeSubChannel : no subchannels present, pausing graph
    2013-05-14 16:29:20.325094 [(18)]: dvb:confused:topGraph called
    2013-05-14 16:29:20.358099 [(18)]: tvcard:FreeAllSubChannels
    2013-05-14 16:29:20.389102 [(18)]: dvb:confused:topGraph

    14-05-2013 16:29:20.421 CMpTsFilter::pause()
    14-05-2013 16:29:20.425 CMpTsFilter::confused:top()

    2013-05-14 16:29:45.773335 [DVB EPG timer(4)]: Grab for card:#10 transponder #8/12 channel: Nitro
    2013-05-14 16:29:45.786331 [DVB EPG timer(4)]: EpgCard: grab epg on card: #10 transponder: #7 ch:Nitro

    Note that thread 18 in the second section is locking in CMpTsFilter::confused:top(). There is no "Pausegraph" entry.
    This is telling me that for some reason the filter is refusing to stop.

    Here's the corresponding function:
    Code:
    STDMETHODIMP CMpTsFilter::Stop()
    {
      CAutoLock cObjectLock(m_pLock);
    	LogDebug("CMpTsFilter::Stop()");
      return CBaseFilter::Stop();
    }

    Quite simple really.
    We can tell that m_pLock is the same lock as is used for CMpTs::AddChannel() by examining the constructor of CMpTs, the constructor of CMpTsFilter and the declaration of CBaseFilter in amfilter.h:
    m_pFilter = new CMpTsFilter(this, GetOwner(), &m_Lock, phr);
    CMpTsFilter::CMpTsFilter(CMpTs *pDump,LPUNKNOWN pUnk,CCritSec *pLock,HRESULT *phr) :
    CBaseFilter(NAME("TsWriter"), pUnk, pLock, CLSID_MpTsFilter),
    m_pWriterFilter(pDump)
    {
    }
    CCritSec *m_pLock; // Object we use for locking
    CBaseFilter(
    __in_opt LPCTSTR pName, // Object description
    __inout_opt LPUNKNOWN pUnk, // IUnknown of delegating object
    __in CCritSec *pLock, // Object who maintains lock
    REFCLSID clsid); // The clsid to be used to serialize this filter


    What I don't understand is why CBaseFilter::confused:top() is blocking.

    Note that the example above is about as single-threaded as TV Server can get. Sequential tunes, over and over again... then we get this.

    mm

    It feels like the graph is hanging (and this ties in with having to kill the whole process to recover from it) - I've had this happen often enough when doing dev work on TsReader and dshowhelper....;) . The deadlock might not be just inside TsWriter - there are presumably other filters in the graph which are feeding TsWriter with data, and also receiving Pause(), Stop(), Run() commands from the filtergraph manager, so it could be timing issue involving more than one filter.

    One question is - do we actually need the locking in Pause(), Stop() etc - is it doing anything useful inside TsWriter ? (or do we need to hold the lock for the whole time in Pause(), Stop() etc ?)
     

    mm1352000

    Retired Team Member
  • Premium Supporter
  • September 1, 2008
    21,577
    8,224
    Home Country
    New Zealand New Zealand
    One question is - do we actually need the locking in Pause(), Stop() etc - is it doing anything useful inside TsWriter ? (or do we need to hold the lock for the whole time in Pause(), Stop() etc ?)
    I'm not sure, but I think our code here is pretty standard.
    I've been reading material like this trying to gain some knowledge and context on standard filter locking implementations.
    One thing which I think is a bit unusual in TsWriter is our locking with respect to Receive(). Stick with me here as this gets a bit curly... :)

    If you check CMpTsFilterPin::Receive() you'll see we don't lock directly within that function. Use of the receive lock is commented out... which is interesting in itself.
    Having said that, CMpTsFilterPin implements CPacketSync.
    Samples Receive()'d are passed to CPacketSync::OnRawData(). If a complete TS packet is available in the buffer, that function invokes the virtual OnTsPacket() function which is overriden by CMpTsFilterPin. That OnTsPacket() override in turn calls AnalyzeTsPacket() on the main filter object.
    Now here's something that I think could be key: AnalyzeTsPacket() attempts to acquire the filter lock.

    Consider the scenario.
    Filter graph is running. Upstream filter is delivering samples. Everything honky dory.
    TV service decides its time to stop the graph for whatever reason.
    CMpTsFilter::confused:top() is invoked. While the lock is not yet acquired, samples continue to be delivered and nothing changes.
    When the lock is acquired, any sample that completes a TS packet in the CPacketSync buffer will block the streaming thread (because AnalyzeTsPacket() attempts to acquire the same lock held by Stop()).
    Having the streaming thread blocked could be very problematic depending on what CBaseFilter::confused:top() does.
    I would presume that it in some way triggers the upstream filters to be stopped, and if any one of those filters tries to acquire an internal lock on its streaming thread (if it has one) the graph would immediately be in deadlock.

    Does that make any sense at all, or is it just nonsense?
    The other thing that is interesting to me is that we return S_FALSE for CMpTsFilterPin::ReceiveCanBlock(). I don't understand that considering the context of the above explanation that a lock is indeed acquired indirectly by the streaming thread.

    @tourettes @davidf @offbyone @georgius @morpheus_xx your thoughts would be most welcome! :)
    Also ping to @Developers in general. It is high time we figured this out!
     
    Last edited:

    mm1352000

    Retired Team Member
  • Premium Supporter
  • September 1, 2008
    21,577
    8,224
    Home Country
    New Zealand New Zealand
    Hmmm, okay.

    Anybody who is able to reproduce this problem: please could you try the attached TsWriter which should be compatible with MP 1.3 and 1.4.

    The patch is also attached for those who are interested. My main goal was to replace the common use of m_Lock with use of m_filterLock for filter related synchronisation (ie. Run(), Pause(), Stop()) and m_channelLock for synchronising access to m_vecChannels.

    Feedback is welcome! :)
     

    Attachments

    • tswriter_deadlock_v2.patch
      17.7 KB
    • TsWriter_mm_deadlock_v2.zip
      117.1 KB
    Last edited:

    azzuro

    Test Group
  • Team MediaPortal
  • May 10, 2007
    9,984
    5,663
    France - IDF
    Home Country
    France France
    @mm135200 :
    all files related to your PDF, we can add this document on FTP ?
    PDF
     
    Last edited:

    DJBlu

    Portal Pro
    August 14, 2007
    1,670
    813
    Llanelli
    Home Country
    United Kingdom United Kingdom
    @mm135200 The code looks good. Sometimes you can't see the wood for the trees.

    I'll have a chance to try this next week sometime.
     

    Users who are viewing this thread

    Top Bottom