Yealink Forums

Full Version: t28 phones lagged and unreachable after firmware upgrade
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
We are in the middle of a deployment of 40 yealink t28 handsets.
As part of this deployment we have upgraded our standard install from using the
yealink_T28.2.70.0.140.rom firmware to using the yealink_T28.2.71.0.165.rom firmware (the older firmware has a bug with parking lots).

During trials at the site we noticed that the handsets were regularly falling off the network (becoming lagged and then unreachable).

We initially blamed this problem on the network as we had the phones/pc on a single network with the PC's bridged through the phones.
Yesterday we made the client split the network.
The phones are now on a separate network spread across 4 brand new POE switches.
With only 6 handsets attached to the network we were stunned to see that phones where still becoming lagged and unreachable.

As part of our deployment with the v.71 firmware we had also turned on the failover settings:

Code:
# Concurrent registration to both servers.
account.1.fallback.redundancy_type=0
account.1.sip_server.1.address = YY.YY.YY.YY
account.1.sip_server.1.port = 5060
account.1.sip_server.1.expires = 300
# try to fail back on every sip transaction, this could slow down calls?
account.1.sip_server.1.failback_mode = 0
account.1.sip_server.1.register_on_enable=1
  
# A DR server has been configured so lets enable it.
account.1.sip_server.2.address = XX.XX.XX.XX
account.1.sip_server.2.port = 5060
account.1.sip_server.3.expires = 300
# try to fail back on every sip transaction, this could slow down calls?
account.1.sip_server.2.failback_mode = 0
account.1.sip_server.2.register_on_enable=1

During the trial deployment sip_server.2 was not available so we did wonder if maybe the phones we unable to deal with the second server not being available so we went back and disabled the second server.

The result was no change as the handsets were still falling off the network (lagged and then unreachable).

At this point we decided to revert to the v70 firmware using essentially the same .cfg file with the exception of the sip_server settings as these don't work in v70:

Code:
account.1.sip_server_host = yy.yy.yy.yy
account.1.sip_server_port = 5060
##account.1.sip_listen_random = 0
account.1.sip_listen_port = 5060

As soon as we downgraded the phones, the phones stopped falling of the network.

You can see from the following logs the occurance of handsets falling of the network. Across a full 24 hours we see a handset become unreachable ever 15 to 30 minutes. We downgraded the phones at 15:30. As you can see we have not had a single phone become unreachable since then (and I'm writting this email some 15 hours after the downgrade).

So it looks to me like we have a major bug in the v71 firmware.
I've attached the full config file we use.

Note : we use a macro language (velocity) to generate the templates at runtime. I've left the velocity macros in place but I think you can easily see what is going on.
The variables NJPROVISION_SIP_REGISTRAR_DR and NJPROVISION_SIP_REGISTRAR simply contain the ip address of the SIP server, whilst expressions like $Extension contains the handsets extension no.

We need an urgent response on this as we need the v71 firmware for the call parking but we can't go live with handsets falling off the network (the client is an inbound call centre - message service - which operates 24 x 7).



Thanks in advance.
Brett

[Dec 22 14:47:28] NOTICE[1990] chan_sip.c: Peer '111' is now Reachable. (26ms / 2000ms)
[Dec 22 14:48:44] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 521
[Dec 22 14:48:54] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (31ms / 2000ms)
[Dec 22 14:49:32] NOTICE[1990] chan_sip.c: Peer '108' is now UNREACHABLE! Last qualify: 20
[Dec 22 14:49:43] NOTICE[1990] chan_sip.c: Peer '109' is now UNREACHABLE! Last qualify: 22
[Dec 22 14:49:43] NOTICE[1990] chan_sip.c: Peer '108' is now Reachable. (26ms / 2000ms)
[Dec 22 14:49:53] NOTICE[1990] chan_sip.c: Peer '109' is now Reachable. (24ms / 2000ms)
[Dec 22 14:51:59] NOTICE[1990] chan_sip.c: Peer '108' is now Lagged. (2192ms / 2000ms)
[Dec 22 14:52:10] NOTICE[1990] chan_sip.c: Peer '134' is now UNREACHABLE! Last qualify: 24
[Dec 22 14:52:10] NOTICE[1990] chan_sip.c: Peer '108' is now Reachable. (22ms / 2000ms)
[Dec 22 14:52:21] NOTICE[1990] chan_sip.c: Peer '111' is now UNREACHABLE! Last qualify: 64
[Dec 22 14:52:21] NOTICE[1990] chan_sip.c: Peer '134' is now Reachable. (53ms / 2000ms)
[Dec 22 14:52:25] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 23
[Dec 22 14:52:31] NOTICE[1990] chan_sip.c: Peer '111' is now Reachable. (25ms / 2000ms)
[Dec 22 14:52:40] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (21ms / 2000ms)
[Dec 22 14:52:54] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 21
[Dec 22 14:53:18] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (30ms / 2000ms)
[Dec 22 14:54:36] NOTICE[1990] chan_sip.c: Peer '134' is now Lagged. (2467ms / 2000ms)
[Dec 22 14:54:46] NOTICE[1990] chan_sip.c: Peer '134' is now Reachable. (36ms / 2000ms)
[Dec 22 14:55:22] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 22
[Dec 22 14:55:31] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (38ms / 2000ms)
[Dec 22 14:57:35] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 34
[Dec 22 14:57:45] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (44ms / 2000ms)
[Dec 22 14:59:06] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 39
[Dec 22 14:59:17] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (578ms / 2000ms)
[Dec 22 15:02:50] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 40
[Dec 22 15:03:00] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (26ms / 2000ms)
[Dec 22 15:04:21] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 42
[Dec 22 15:04:31] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (23ms / 2000ms)
[Dec 22 15:08:33] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 79
[Dec 22 15:08:57] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (29ms / 2000ms)
[Dec 22 15:11:02] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 23
[Dec 22 15:11:20] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (42ms / 2000ms)
[Dec 22 15:12:24] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 42
[Dec 22 15:12:35] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (623ms / 2000ms)
[Dec 22 15:14:38] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 26
[Dec 22 15:15:01] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (29ms / 2000ms)
[Dec 22 15:18:17] NOTICE[1990] chan_sip.c: Peer '134' is now UNREACHABLE! Last qualify: 789
[Dec 22 15:18:23] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 22
[Dec 22 15:18:34] NOTICE[1990] chan_sip.c: Peer '129' is now Reachable. (36ms / 2000ms)
[Dec 22 15:18:50] NOTICE[1990] chan_sip.c: Peer '109' is now Reachable. (36ms / 2000ms)
[Dec 22 15:18:58] NOTICE[1990] chan_sip.c: Peer '134' is now Reachable. (40ms / 2000ms)
[Dec 22 15:19:10] NOTICE[1990] chan_sip.c: Peer '111' is now Reachable. (37ms / 2000ms)
[Dec 22 15:19:19] NOTICE[1990] chan_sip.c: Peer '106' is now Reachable. (37ms / 2000ms)
[Dec 22 15:19:27] NOTICE[1990] chan_sip.c: Peer '107' is now Reachable. (37ms / 2000ms)
[Dec 22 15:19:40] NOTICE[1990] chan_sip.c: Peer '108' is now Reachable. (33ms / 2000ms)
[Dec 22 15:19:55] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (33ms / 2000ms)
[Dec 22 15:20:02] NOTICE[1990] chan_sip.c: Peer '134' is now UNREACHABLE! Last qualify: 40
[Dec 22 15:20:59] NOTICE[1990] chan_sip.c: Peer '126' is now UNREACHABLE! Last qualify: 33
[Dec 22 15:22:28] NOTICE[1990] chan_sip.c: Peer '134' is now Reachable. (430ms / 2000ms)
[Dec 22 15:24:15] NOTICE[1990] chan_sip.c: Peer '126' is now Reachable. (38ms / 2000ms)
Hi bsutton,

1. Can you set sip server port as 0 and copy the sip server address to the outbound server?
-----------------------------------------------------------------------------------
account.1.sip_server_host = $System.getProperty('NJPROVISION_SIP_REGISTRAR')
account.1.sip_server_port = 0
##account.1.sip_listen_random = 0
account.1.sip_listen_port = 5060
account.1.expires = 3600
account.1.outbound_proxy_enable = 1
account.1.outbound_host = $System.getProperty('NJPROVISION_SIP_REGISTRAR')
account.1.outbound_port = 0
-----------------------------------------------------------------------------------
2. If the issue is still on, please supply me syslog level 6, config.bin and pcap.
Before you export the syslog, please set log level as 6, and reboot the phone, then click Start,and reproduce the issue, then click Stop,and export the trace, syslog, config,bin to us.
(About where to export these files, please refer to attached screenshot.)

These three files are very important for us, hope you can kindly understand.
OK, i've not got to the logs yet but some more information.
As mentioned we have rolled back to the older firmware (yealink_T28.2.70.0.140.rom).
The system has been offline over the xmas period (we failed back to the customers old system) so we have only just got the handsets back online.
We are still however seeing some handsets become unreachable.
We currently have asterisk's sip debug options on and I'm seeing some interesting logs which to some extent line up with the handsets going unreachable:
Code:
[Jan  7 00:48:05] NOTICE[1990] chan_sip.c: Peer '110' is now UNREACHABLE!  Last qualify: 18
[Jan  7 00:48:15] NOTICE[1990] chan_sip.c: Peer '110' is now Reachable. (16ms / 2000ms)
[Jan  7 08:01:52] NOTICE[1990] chan_sip.c: Peer '119' is now UNREACHABLE!  Last qualify: 17
[Jan  7 08:01:52] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"119" <sip:119@noojee.pmdomain.local>;tag=1976633730'
[Jan  7 08:01:52] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"104" <sip:104@noojee.pmdomain.local>;tag=1507655074'
[Jan  7 08:01:58] NOTICE[1990] chan_sip.c: Peer '102' is now UNREACHABLE!  Last qualify: 22
[Jan  7 08:01:58] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"102" <sip:102@noojee.pmdomain.local>;tag=1138441911'
[Jan  7 08:01:58] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"102" <sip:102@noojee.pmdomain.local>;tag=1138441911'
[Jan  7 08:02:02] NOTICE[1990] chan_sip.c: Peer '119' is now Reachable. (16ms / 2000ms)
[Jan  7 08:02:08] NOTICE[1990] chan_sip.c: Peer '102' is now Reachable. (19ms / 2000ms)
[Jan  7 08:02:34] NOTICE[1990] chan_sip.c: Peer '134' is now UNREACHABLE!  Last qualify: 18
[Jan  7 08:02:34] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"134" <sip:134@noojee.pmdomain.local>;tag=416868363'
[Jan  7 08:02:34] NOTICE[1990] chan_sip.c: Correct auth, but based on stale nonce received from '"134" <sip:134@noojee.pmdomain.local>;tag=416868363'
[Jan  7 08:02:44] NOTICE[1990] chan_sip.c: Peer '134' is now Reachable. (29ms / 2000ms)

You can see from above we are often seeing a 'nonce' warning after which the handset becomes unreachable.
Having said that its not consistent as you can see that handsets sometimes become unreachable even if we don't see the nonce warning.
OK, we have made a little progress.

As noted when running the yealink_T28.2.70.0.140.rom firmware we are still seeing the occasional handset becoming unregistered for 10 seconds.
With some sip debugging we tracked down the problem.

Basically asterisk sends out an 'OPTION' sip packet to do a qualify about every 2 minutes.
However asterisk also forces the phone to re-register every hour by sending out a 'REGISTER' sip packet.

It appears that if an OPTION packet is sent and then asterisk immediately sends out a REGISTER packet, before the handset has had a chance to reply to the above OPTION packet, then the phone will become un-reachable until the new registration completes.

I'm not certain if this is an asterisk bug or a handset bug.
I should note that the handset does send the 'OK' packet back in response to the OPTION packet just after the 'REGISTER' gets sent.

We are looking at changing the OPTION and REGISTRATION timeouts to prime no.s to stop the two overlapping.

Its clear that the above problem is not the same issue that we had with phones becoming unreachable and lagged when running the T28.2.71.0.165.rom. They would be come unreachable for 1-2 minutes not 10 seconds and we have never seen a 70.0.140 phone become lagged.

The weight of evidence is that there is a bug in the 71.0.165 version of the firmware that is causing lag which unltimately causes the phone to become unregistered.

Over the next week we will move some of the phones back to the 71.0.165 firmware to try and reproduce the problem in a controlled manner.
Hi bsutton´╝î

If the issue reappear in x.71.0.165 again, please help to supply three files for me. And it is better to supply compared files that phones work well in x.70.0.140.
Reference URL's