Forking incoming RTP to a separate server
Posted: Thu Jan 26, 2017 9:26 am
Hi,
I am considering building a virtual assistant for incoming phone calls but I am very new to SIP. Several people have recommended to use Asterisk/Freeswitch but I am concerned about scalability of the solution.
Alternatively, I am considering existing solutions Plivo, Tropo, Twilio for Text To Speech (TTS) generation and my own propietary solution for Speech recognition and automated dialog generation. However, the best I have been able to build with these solutions is a dialog where the user is prompted when to speak with a beep and audio is only available once the user has finished and that is hardly any improvement over previous technologies.
I wonder whether I could use SIP Sorcery to fork the incoming phonecall into two legs: the one for TTS output, which should be forwarded to Plivo, Tropo, Twilio, so I can use their API's to provide output to the user, and the one for audio input, i.e., the incoming RTP so we can run our Speech recognition solution.
In particular, I would appreciate a solution that solves two problems: how to match incoming RTP's to incoming calls to the cloud-based IVR product and how to guarantee that the RTP receives only audio from the user and nothing from the RTP.
Yes, I am a newbie on SIP so, please, try not to use too fancy terms and, if you do, send me a link. I have considered getting it all done with an Asterisk or FreeSwitch box but I am concerned about scalability and I would prefer to have existing solutions in the market take care or SIP signaling and TTS generation.
Thanks
I am considering building a virtual assistant for incoming phone calls but I am very new to SIP. Several people have recommended to use Asterisk/Freeswitch but I am concerned about scalability of the solution.
Alternatively, I am considering existing solutions Plivo, Tropo, Twilio for Text To Speech (TTS) generation and my own propietary solution for Speech recognition and automated dialog generation. However, the best I have been able to build with these solutions is a dialog where the user is prompted when to speak with a beep and audio is only available once the user has finished and that is hardly any improvement over previous technologies.
I wonder whether I could use SIP Sorcery to fork the incoming phonecall into two legs: the one for TTS output, which should be forwarded to Plivo, Tropo, Twilio, so I can use their API's to provide output to the user, and the one for audio input, i.e., the incoming RTP so we can run our Speech recognition solution.
In particular, I would appreciate a solution that solves two problems: how to match incoming RTP's to incoming calls to the cloud-based IVR product and how to guarantee that the RTP receives only audio from the user and nothing from the RTP.
Yes, I am a newbie on SIP so, please, try not to use too fancy terms and, if you do, send me a link. I have considered getting it all done with an Asterisk or FreeSwitch box but I am concerned about scalability and I would prefer to have existing solutions in the market take care or SIP signaling and TTS generation.
Thanks