#72 ✓resolved
bodhi

AMI lexer buffer offset

Reported by bodhi | May 24th, 2009 @ 11:53 PM | in 1.0.0

When data arrives in a couple of separate packets over AMI, there is a problem in AbstractAsteriskManagerInterfaceStreamLexer when the buffer size is exceeded. The start of the buffer is trimmed, and the new data is added to the end of the buffer, but there are various pointers (indexes) hidden away in the code that aren't updated, such as @follows_text_start. changing #extend_buffer_with to:

          def extend_buffer_with(new_data)
#            if new_data.size + @data.size > BUFFER_SIZE
#              view_buffer
#              @data.slice! 0...new_data.size
#              @current_pointer -= new_data.size
#              view_buffer
#            end
            @data << new_data
            @data_ending_pointer = @data.size
          end

(ie, ignoring the buffer limit) fixes the problem, but introduces a memory leak in that old data will never be cleared from the buffer.

One workaround would be to use an interface for @data to separate the indexing from the data-storage, but that will lead to overflow problems for a long-running process.

Another might be to clear the buffer whenever the lexer returns to the protocol state?

Comments and changes to this ticket

  • Jay Phillips

    Jay Phillips June 4th, 2009 @ 03:04 AM

    • State changed from “new” to “open”

    Couldn't I just change the buffer to be 1.megabyte instead of 8.kilobytes? I can't imagine Asterisk having a million characters sent in one big chunk.

  • Jay Phillips

    Jay Phillips June 4th, 2009 @ 03:05 AM

    Also, what is the size of the chunk you're receiving from Asterisk that's greater than 8kb? Is it something you wrote into Asterisk yourself to be sent over AMI?

  • bodhi

    bodhi June 4th, 2009 @ 03:21 AM

    It's not one packet thats greater than 8kb, its the total of all the data sent from Asterisk. So if you have a long running process (eg. a rails/merb process) that communicates with Asterisk on every page-view, the buffer fills up.

  • Jay Phillips

    Jay Phillips June 4th, 2009 @ 04:02 AM

    Hmm, I'll take a look into this.

  • bodhi

    bodhi June 4th, 2009 @ 04:20 AM

    For testing I had something like (not exact, i've since deleted the testing code):

    while !Manager.send_action("Command", "command" => "core show channels").empty?; end
    

    Which, while a pathological case, would show up the problem pretty quickly quickly. I'm not sure that it really was core show channels but it was definitely a command that resulted in Response: Follows

  • Jay Phillips

    Jay Phillips June 4th, 2009 @ 04:22 AM

    lol, ok, I'll try to do something that pounds it like that.

  • dimas

    dimas November 11th, 2009 @ 01:27 PM

    Just installed 0.8.3
    It seems the problem is not solved there - simple test running "core show channels" does about 10 iterations and then gets empty response. The packet capture shows that server actually sent non-empty response.

  • Jay Phillips

    Jay Phillips November 11th, 2009 @ 02:36 PM

    "The packet capture shows that server actually sent non-empty response."

    Do you mean it sent the same data as the previous "core show channels" executions? By "non-empty" response do you mean "a response with more one or more channels in it"? Does this only blow up for you when receiving responses to "core show channels" when channels are active?

  • dimas

    dimas November 14th, 2009 @ 06:19 AM

    Here is the test application you can try yourself:

    cmd = "iax2 show peers"
    res = $ami.send_action "Command", :Command => cmd
    target_text = res.text_body
    p "---------------"
    p "#{target_text}"
    p "---------------"
    p "Len: #{target_text.length}"
    p "---------------"
    
    i = 1
    while 1 do
        res = $ami.send_action "Command", :Command => cmd
        text = res.text_body
        break if text.length != target_text.length || i > 1000
        i += 1
    end
    
    p "---------------"
    p "#{text}"
    p "---------------"
    p "Len: #{text.length}"
    p "Counter: #{i}"
    p "---------------"
    

    Note the first line - it specifies the command to run.
    What application does is it simply executes the command first time and remembers what answer was. Then it executes the same command 1000 times in a loop and breaks as soon as as next answer does not match the first.
    Of course it only makes sense if your IAX2 peer list is not changes every second. You may try another command which produces more stable result on a short time frame (dundi show peers or core show channels or meetme list XXX).

    On my system the length of the 'iax2 show peers' is 1014 bytes. The loop runs successfully 7 times and at the end displays:

    "---------------"
    ""
    "---------------"
    "Len: 0"
    "Counter: 7"
    "---------------"
    

    which means 8-th response is just empty. At the same time I captured the traffic on eth0 tcp port 5038 and looking inside the traffic dump I see that server's reply to last AMI action is exactly the same as for previous actions.

    If I change 'iax2 show peers' to 'sip show peers' which returns response of 2845 bytes, then the same loop dies on third iteration.

    please note:
    1. I had some troubles reproducing the issue when AMI connects to localhost - you better try with real network interface.
    2. it seem we are hitting limit of some internal buffer (which from the experiments above I assume is 8K). Just increasing the buffer is not enought - as with all long-running applications you will eventually fill buffer of any size..

  • Jay Phillips

    Jay Phillips November 14th, 2009 @ 01:55 PM

    Does increasing the buffer size help the problem? Attached is a patch file that will do that. Give that a try. 8kb does seem kinda small...

  • dimas

    dimas November 14th, 2009 @ 05:45 PM

    Jay,
    I will give patch a try Monday.
    However I already have concerns about your patch :)
    As I wrote just increasing a buffer size seems a bit odd to me. I'm running synchroneous actions which means I always wait for one the completion of one action before sending another. And response from a single action in my tests is never even close to 8K. Which means buffer just can NOT be overflown in my tests. So the problem is anything but buffer size.

    Obviously increasing buffer N times will fix application dying on third iteration of the loop. But can you be sure the problem is really solved? Maybe the application will still be dying on 3*N iterations...

    My point is: if buffer is big enough to fit single response. And if there is guarantee that actions are serialized so no more then one action is being executed at given moment - there should be NO problems caused by buffer size. So if we have any problems - these are caused by anything but buffer size.

    PS: You can't imagine how happy I'm you are actively working on these issues :)

  • Jay Phillips

    Jay Phillips November 14th, 2009 @ 08:07 PM

    Doh, sorry, I misread your original post about the buffer size. I wrote that in a coffee shop earlier today and clearly wasn't paying much attention. :)

    I'm still struggling to see what exactly is causing that. Do you have Skype or something so we can talk about this more in realtime?

  • dimas

    dimas November 14th, 2009 @ 08:43 PM

    yes, I have skype. Please drop me an email to adhearsion at dima.spb.ru - will give my id.

  • bodhi

    bodhi November 15th, 2009 @ 07:59 PM

    Is the problem still as I described when I opened the ticket?

    The original fault was that the buffer was used as a fifo queue and when it filled up the early data was trimmed. But various parts of the parser code was keeping indices to the data, and these indices were being made invalid when the buffer was sliced. Maybe I can explain better with an ascii diagram:

    buffer is 7 characters long, initially empty
    -------
    
    -------
    
    3 bytes arrive, "abc". parser is looking for eg. 'c'
    -------
    abc
    -------
    
    so 'c' is at index 2.
    6 bytes arrive: 'defghi', 'c' is at 2.
    -------
    abcdefghi
    -------
    
    hmm, buffer is too long now, trim it. (lose "ab")
    -------
    cdefghi
    -------
    
    ok, so 'c' is at 2... err, wait: 'c' is now at 0!?
    

    So the buffer is trimmed, but the index is not updated to the new location of the data, and (It's been a long time since I looked at this) the parser's assumptions about the data are wrong.

  • dimas

    dimas November 16th, 2009 @ 07:39 PM

    Yep, I believe the problem is still same.
    The code of extend_buffer_with changed a little but it still does not change pointers. There are lots of them - @current_pointer, @start_of_version, @event_name_start, @current_key_position just name a few.

    I do not know how much of the code is generated by lexer (Ragel? Excuse my ignorance I'm not Ruby guy) so I;m not sure how to properly fix the thing. I see two options:
    1. Create a method like

        def adjust_pointers(delta)

        @current_pointer -= delta
        @start_of_version -= delta
        @event_name_start -= delta
        #...
    end</code>
    
    
    
    

    and call it when we shift buffer content.
    Pros:
    * Easy to implement Cons:
    * I'm not sure if there are any automatically generated pointers. If there are any - we better not mess with these. * If more pointers are added it will be so damn easy to forget adjusting them which will cause difficult-to-catch bugs.

    1. Create a class Buffer representing a window of given size sliding along the data stream and knowing its position on the stream.Make @data = Buffer.new and implement << and [] operations for the buffer so no code working with @data needs to be changed.
      After this all pointers in the lexer will refer to absolute position of the character in the stream.
        class Buffer
            attr_reader :pos
            attr_reader :data
        end
        # implementation for [] and <<
    

    When you take substring of the buffer with [x..y] it will subtract stream pos from x and y to get real position in the internal data string. When you insert too much data into the Buffer, it removes old data from the beginning and increases its stream pos appropriately.

    To illustrate: the Buffer capacity is 10 bytes. Initially it is empty.

    @data.data = ""
    @data.pos = 0
    

    if we insert a chunk with @data << "12345", the buffer changes as:

    @data.data = "12345"
    @data.pos = 0
    

    if you try @data[3..4] it will give you "45". Now we add more @data << "abcdefgh". Buffer changes as:

    @data.data = "45abcdefgh"
    @data.pos = 3
    

    Note that pos indicates we are three bytes away from the beginning of the data (in other words, the first byte in the buffer correspond to position 3 of our virtual data stream).
    now if you access @data[3..4] it will give you "bc" - that is two characters on position 6 of the whole stream seen so far (that is "12345abcdefgh")

    Pros:
    * No matter how many pointers we already have and may have in the future. We may not even know about some internal pointers - they all are valid absolute positions within our virtual stream so they do not need to be updated. Cons:
    * More difficult to implement * AMI connection kept for months may overflow integer pos of the Buffer (stream longer than 4 Gigabytes) and everything will die.

    I would probably choose #1...

  • bodhi

    bodhi November 16th, 2009 @ 10:44 PM

    AMI connection kept for months may overflow integer pos of the Buffer

    Actually Ruby integers won't overflow, it will just switch to Bignum:

    1 << 65 => 36893488147419103232
    

    I imagine the impact on performance of this would be minimal.

    I still think my early idea

    Another might be to clear the buffer whenever the lexer returns to the protocol state?

    is worth thinking about. There's no reason to buffer data that has already been handled. But this might be difficult to integrate with Asterisk's communications protocol.

  • dimas

    dimas November 17th, 2009 @ 05:01 AM

    Agree, this should work too. I missed your idea in the text above.

    However there is one thing you need take care about: the buffer should not grow uncontrollably even if we got out-of-sync with the protocol and can't match some end token.

    With option 1 this can be handled handled by moving @data[x..y] into some getter method which checks that x and y are not negative and raises an exception if they are.
    With option 2 the same is done in the [x..y] method of the buffer - if any x or y is less then bufer's stream pos, we raise an exception.

    In your approach if buffer is guaranteed to be freed from already processed chunk, then the corresponding check must probably be in extend_buffer_with - if new data make buffer too big - exception. Something like that...

  • dimas

    dimas November 22nd, 2009 @ 07:11 AM

    Jay, I'm not sure if you are working on this issue (needless to say it is the most critical of all the issues I reported because it makes adhearsion return rubbish response from actions). I would be happy to contribute something - I can implement either number 1 or 2 from my proposal (not sure how to code bodhi's proposal though).

    But what is DEFINITELY needs to be done first is unit testing framework for the AMI lexer. Without it I would not event touch existing code because I can break more than fix. I believe UT for the lexer is a must have thing anyway. So if you make one, I will be happy to extend test cases with what I saw and when test coverage is good, we can discuss fixing the issue itself.

    Thanks.

  • dimas

    dimas January 7th, 2010 @ 07:27 AM

    I'm curious if jay lost interest to the project? :)
    To me this is most critical issue in the Adhearsion ever because it renders AMI adapter nearly unusable for all long-running applications. And the issue is here for half an year already :(

  • Ben Klang

    Ben Klang July 30th, 2010 @ 11:00 AM

    • State changed from “open” to “new”
    • Assigned user changed from “Jay Phillips” to “Ben Klang”
    • Milestone order changed from “0” to “0”

    I have just been testing for this issue with the Git version of Adhearsion. Using the "core show channels" command describe above, I am not able to reproduce the issue, even after letting the loop run for several minutes.

    To the original reporter: can you test again and confirm whether this is still an issue?

  • bodhi

    bodhi July 30th, 2010 @ 09:08 PM

    Sorry, I've moved on from that job and I'm no longer working with Asterisk or Adhearsion.

  • Ben Klang

    Ben Klang August 24th, 2010 @ 01:51 PM

    • State changed from “new” to “resolved”

    Since I am unable to reproduce the issue I am going to assume it is fixed for now. Feel free to reopen if the issue recurs.

  • dimas

    dimas September 3rd, 2010 @ 05:11 PM

    Ben,
    The issue is still 100% reproducible. I just upgraded to 0.8.6 and see no changes in the related code. Please re-read the whole thread - i know it is boring ) but it will give you better understanding of the issue and you will understand how deep it is.

    Then please try the following code (fix host/user/pass) appropriately.
    It just repeats executing the same command (core show applications) and stops when the response is different from the first one. Ideally it should do it 1000 times because it is the loop condition and responses are the same.

    However on practice it fails on the first run. This is because default buffer size of 8K in /usr/lib/ruby/gems/1.8/gems/adhearsion-0.8.6/lib/adhearsion/voip/asterisk/manager_interface/ami_lexer.rb is not enough for the command response of 9K.

               BUFFER_SIZE = 8.kilobytes unless defined? BUFFER_SIZE
    

    Now fix the buffer fize to 80K and re-run the app. It will make 8 iterations and stop.

    When you have reproduced itt - please re-read the thread if you haven't done it before running the code :) as it explains what the problem is.

    And many thanks for supporting the project, I stared thinking it is dead.....

    require 'rubygems'
    require 'adhearsion'
    require 'adhearsion/voip/asterisk/manager_interface'
    
    $ami = Adhearsion::VoIP::Asterisk::Manager::ManagerInterface.new :host => "", :username => "", :password => "", :events => false
    $ami.connect!
    
    
    cmd = "core show applications"
    res = $ami.send_action "Command", :Command => cmd
    target_text = res.text_body
    p "---------------"
    p "#{target_text}"
    p "---------------"
    p "Len: #{target_text.length}"
    p "---------------"
    
    i = 1
    while 1 do
        res = $ami.send_action "Command", :Command => cmd
        text = res.text_body
        break if text.length != target_text.length || i > 1000
        i += 1
    end
    
    p "---------------"
    p "#{text}"
    p "---------------"
    p "Len: #{text.length}"
    p "Counter: #{i}"
    p "---------------"
    
  • Ben Klang

    Ben Klang September 3rd, 2010 @ 05:22 PM

    • Milestone set to 1.0.0
    • State changed from “resolved” to “open”
    • Milestone order changed from “3013” to “0”

    Yep. I can reproduce this. I will re-read the ticket history and figure out the appropriate fix.

    Thanks very much for taking the time to recheck and update the ticket.

  • Ben Klang

    Ben Klang October 23rd, 2010 @ 03:31 PM

    I am working on this issue in the ami_rerobustification branch on Github. It's not quite fixed yet but I intend to address this before 1.0 is released.

    http://github.com/adhearsion/adhearsion/tree/ami_rerobustification

  • Ben Klang

    Ben Klang October 23rd, 2010 @ 05:22 PM

    • State changed from “open” to “fixcommitted”

    I have committed a fix for this issue here:
    http://github.com/adhearsion/adhearsion/commit/89667f528bd6033aa4b2...

    PLEASE TEST! This change required lots of careful math and it's possible bugs still exist. I've re-run all the unit tests it has not caused any new tests to fail (we still have 4 in error, but those are unrelated). The latest example code by dimas now runs continuously and without error. My personal applications appear to be running successfully as well.

    Feedback appreciated.

    Thanks to bodhi for a detailed initial report and thanks to dimas for continued feedback and suggesting the eventual fix.

  • dimas

    dimas October 27th, 2010 @ 05:55 PM

    Ben, I'm only able to test either released version by gem update'ing to it or I can apply a patch to the installed version 0.8.6. But to be honest I have no idea how to obtain such a patch.
    Since the change looks ok to me (it was a quick look though) and you are sure no unit tests broken, I would propose going forward with it and when next version will be available - I will update and report results here.

    Thanks a lot.

  • Ben Klang

    Ben Klang October 28th, 2010 @ 10:43 AM

    • State changed from “fixcommitted” to “resolved”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Shared Ticket Bins

Tags

Referenced by

Pages