BCOP presentation at RIPE meeting in Warsaw
Dear BCOP group, I would like to get a short slot in BCOP TF meeting in Warsaw, I would like to present the 00 version of a draft "IPv6 troubleshooting for helpdesks" The 00 draft can be found in PDF format here: http://go6.si/ipv6-troubleshooting-for-helpdesks/ (find the link at the end of the text). I would encourage you to read the text and comment in terms if this is something that could be the best current operational practice - and particularly I'm searching for people that would join the working pack and continue the technical IPv6 part in IPv6 WG, where we would like to check for IPv6 technical validity and soundness of the document. Thank you very much, Jan Zorz
Hi Jan, finally I've managed to summarize my comments to the doc ;) 0) the document says that the target audience is residential ISPs and enterprise IT helpdesk while most of troubleshooting steps are mostly applicable for residential ISPs as they imply that there are CPE/home devices/etc. Shall we create another doc for enterprise networks or just extend this one? 1) You suggest to have a local mirror of the test-ipv6. While it definitely makes sense, using the local mirror might hide some edge connectivity issues so it worth mentioning. We might recommend to use *both*. 2) Section 5 You provide detailed instruction on how to run ping, but then saying just 'check DNS settings' and even 'configure different servers' - support engineers who need explanations on ping, would need more detailed explanations on how to change DNS settings. "If IPv4 is working but the page is unavaliable' - I'd assume that an engineer could not tell if 'IPv4 is working'. So IMHO we shall just say that if site is unreachange - troubleshoot as any other 'I can not open this page' v4-only case. If we believe that troubleshooting such case is in scope of this document, I'd suggest to include traceroute as a troubleshooting step. 3) Helpdesk code section: I'd use different fonts/color to distinguish between 'fixed' and 'example' fields in the output. 4) Code 112 (v4 + broken Ipv6) - can we show the process as a flowchart? if-then-else? For example, the doc says 'determine if Ipv6 is offered'. I'd add 'if it is not,, the customer either has a misconfigured CPE (which has IPv6 enables while it should not), or there is other Ipv6-enabled device which is used as a router. Check CPE configuration/state for IPv6 and disable it if it has it enabled'. - I'd re-phrase step 2 as smth like 'if IPv6 is offered to the customer and you manage their CPE, check if CPE has a approved firmware version. Upgrade it if it does not'. - I'd say that grep (to find the IPv6 address on MacOS) should be 'grep -E "en|inet6" so interface name is visible (to avoid the scnario when addresses are assigned to wrong interface); - the IPv6 addresses table is a little bit confusing: -- it contains 6to4 case which should not cause code 112 (as well as Teredo case); -- some sections say 'call theor router vendor for support'. I believe we shall clarify somewhere in the beginning of the document that we assume that the support deals with customers CPE and if it is not a case, those CPE-related instructions should be either 'escalate' or 'advise customer to contact their router vendor for support' (which in my reality would never happen.....:) -- as each section of the tabel has instructions (and the last row says 'escalate', it is not clear in which case the engieer should proceed to the next step (checking the home router config). the section of home router config check is a little bit confusing. When we just say 'check the configuration of the device', it does not mean much as we don't specify what we are looking for. Maybe we shall say smth like: - 'if CPE is managed by your support, check if CPE is configured as per your internal documentation' (as we might say somewhere in the doc that in that case it is strongly recommended to have a separate how-to on what should be configured on those CPEs); - check if the LAN interface has IPv6 address from the ISP allocated range; - check on user device if it has routers's both link-local (fe80:...) and ISP-assigned address in the neighbor discovery cache (<provide commands); - check the routing table on the user's device; see if the default route points to the router's address. If not - check if DHCP and or RAs are enabled on the home router. - run ipv6 traceroute to isp.test-ipv6 site and see where the traceroure stops. Code 46 Section - I'd sugegst to run traceroute to see where it stops. If traceroute does not show any issues - escalate. If it does outside of your network - contact the affected netwokr NOC. IMHO a section should be added which explains what kind of information should be collected for an escalation. I'd suggest: - Ipv4 and Ipv6 traceroute; - ifconfig output - routing table output - maybe packet capture for the session which is having issues. On Thu, Apr 10, 2014 at 12:41 PM, Jan Zorz @ go6.si <jan@go6.si> wrote:
Dear BCOP group,
I would like to get a short slot in BCOP TF meeting in Warsaw, I would like to present the 00 version of a draft "IPv6 troubleshooting for helpdesks"
The 00 draft can be found in PDF format here:
http://go6.si/ipv6-troubleshooting-for-helpdesks/ (find the link at the end of the text).
I would encourage you to read the text and comment in terms if this is something that could be the best current operational practice - and particularly I'm searching for people that would join the working pack and continue the technical IPv6 part in IPv6 WG, where we would like to check for IPv6 technical validity and soundness of the document.
Thank you very much, Jan Zorz
-- SY, Jen Linkova aka Furry
On 15/05/14 00:57, Jen Linkova wrote:
Hi Jan,
finally I've managed to summarize my comments to the doc ;)
Hi, thank you very much for all your comments. I'm bringing part of this discussion to IPv6 WG mailing list as it is of technical nature and I think it belongs here. I put all your comments in our issues tracker, reachable publicly at https://git.steffann.nl/go6/ipv6-troubleshooting-for-helpdesks/issues
1) You suggest to have a local mirror of the test-ipv6. While it definitely makes sense, using the local mirror might hide some edge connectivity issues so it worth mentioning. We might recommend to use *both*.
Yes, absolutely. This diversity would also help the ISP to understand if he has any IPv6 issues in other parts of the network other than access ;)
2) Section 5 You provide detailed instruction on how to run ping, but then saying just 'check DNS settings' and even 'configure different servers' - support engineers who need explanations on ping, would need more detailed explanations on how to change DNS settings.
"If IPv4 is working but the page is unavaliable' - I'd assume that an engineer could not tell if 'IPv4 is working'. So IMHO we shall just say that if site is unreachange - troubleshoot as any other 'I can not open this page' v4-only case. If we believe that troubleshooting such case is in scope of this document, I'd suggest to include traceroute as a troubleshooting step.
seems as a good idea. let us think about that ;)
3) Helpdesk code section: I'd use different fonts/color to distinguish between 'fixed' and 'example' fields in the output.
4) Code 112 (v4 + broken Ipv6)
- can we show the process as a flowchart? if-then-else?
that's a bit hard one... do we have anybody specialized in flowcharts here?
For example, the doc says 'determine if Ipv6 is offered'. I'd add 'if it is not,, the customer either has a misconfigured CPE (which has IPv6 enables while it should not), or there is other Ipv6-enabled device which is used as a router. Check CPE configuration/state for IPv6 and disable it if it has it enabled'.
- I'd re-phrase step 2 as smth like 'if IPv6 is offered to the customer and you manage their CPE, check if CPE has a approved firmware version. Upgrade it if it does not'.
- I'd say that grep (to find the IPv6 address on MacOS) should be 'grep -E "en|inet6" so interface name is visible (to avoid the scnario when addresses are assigned to wrong interface);
- the IPv6 addresses table is a little bit confusing: -- it contains 6to4 case which should not cause code 112 (as well as Teredo case); -- some sections say 'call theor router vendor for support'. I believe we shall clarify somewhere in the beginning of the document that we assume that the support deals with customers CPE and if it is not a case, those CPE-related instructions should be either 'escalate' or 'advise customer to contact their router vendor for support' (which in my reality would never happen.....:) -- as each section of the tabel has instructions (and the last row says 'escalate', it is not clear in which case the engieer should proceed to the next step (checking the home router config).
the section of home router config check is a little bit confusing. When we just say 'check the configuration of the device', it does not mean much as we don't specify what we are looking for. Maybe we shall say smth like: - 'if CPE is managed by your support, check if CPE is configured as per your internal documentation' (as we might say somewhere in the doc that in that case it is strongly recommended to have a separate how-to on what should be configured on those CPEs); - check if the LAN interface has IPv6 address from the ISP allocated range; - check on user device if it has routers's both link-local (fe80:...) and ISP-assigned address in the neighbor discovery cache (<provide commands); - check the routing table on the user's device; see if the default route points to the router's address. If not - check if DHCP and or RAs are enabled on the home router. - run ipv6 traceroute to isp.test-ipv6 site and see where the traceroure stops.
I think this are all good suggestions. We'll go through them during our authosr edit cycle meeting, probably during the IETF meeting in Toronto.
Code 46 Section - I'd sugegst to run traceroute to see where it stops. If traceroute does not show any issues - escalate. If it does outside of your network - contact the affected netwokr NOC.
ack...
IMHO a section should be added which explains what kind of information should be collected for an escalation. I'd suggest: - Ipv4 and Ipv6 traceroute; - ifconfig output - routing table output - maybe packet capture for the session which is having issues.
I think this is asking a bit too much to a first level helpdesk employee... I don't know... Cheers, Jan
[now with IPv6 WG address in cc:, actually] On 15/05/14 00:57, Jen Linkova wrote:
Hi Jan,
finally I've managed to summarize my comments to the doc ;)
Hi, thank you very much for all your comments. I'm bringing part of this discussion to IPv6 WG mailing list as it is of technical nature and I think it belongs here. I put all your comments in our issues tracker, reachable publicly at https://git.steffann.nl/go6/ipv6-troubleshooting-for-helpdesks/issues
1) You suggest to have a local mirror of the test-ipv6. While it definitely makes sense, using the local mirror might hide some edge connectivity issues so it worth mentioning. We might recommend to use *both*.
Yes, absolutely. This diversity would also help the ISP to understand if he has any IPv6 issues in other parts of the network other than access ;)
2) Section 5 You provide detailed instruction on how to run ping, but then saying just 'check DNS settings' and even 'configure different servers' - support engineers who need explanations on ping, would need more detailed explanations on how to change DNS settings.
"If IPv4 is working but the page is unavaliable' - I'd assume that an engineer could not tell if 'IPv4 is working'. So IMHO we shall just say that if site is unreachange - troubleshoot as any other 'I can not open this page' v4-only case. If we believe that troubleshooting such case is in scope of this document, I'd suggest to include traceroute as a troubleshooting step.
seems as a good idea. let us think about that ;)
3) Helpdesk code section: I'd use different fonts/color to distinguish between 'fixed' and 'example' fields in the output.
4) Code 112 (v4 + broken Ipv6)
- can we show the process as a flowchart? if-then-else?
that's a bit hard one... do we have anybody specialized in flowcharts here?
For example, the doc says 'determine if Ipv6 is offered'. I'd add 'if it is not,, the customer either has a misconfigured CPE (which has IPv6 enables while it should not), or there is other Ipv6-enabled device which is used as a router. Check CPE configuration/state for IPv6 and disable it if it has it enabled'.
- I'd re-phrase step 2 as smth like 'if IPv6 is offered to the customer and you manage their CPE, check if CPE has a approved firmware version. Upgrade it if it does not'.
- I'd say that grep (to find the IPv6 address on MacOS) should be 'grep -E "en|inet6" so interface name is visible (to avoid the scnario when addresses are assigned to wrong interface);
- the IPv6 addresses table is a little bit confusing: -- it contains 6to4 case which should not cause code 112 (as well as Teredo case); -- some sections say 'call theor router vendor for support'. I believe we shall clarify somewhere in the beginning of the document that we assume that the support deals with customers CPE and if it is not a case, those CPE-related instructions should be either 'escalate' or 'advise customer to contact their router vendor for support' (which in my reality would never happen.....:) -- as each section of the tabel has instructions (and the last row says 'escalate', it is not clear in which case the engieer should proceed to the next step (checking the home router config).
the section of home router config check is a little bit confusing. When we just say 'check the configuration of the device', it does not mean much as we don't specify what we are looking for. Maybe we shall say smth like: - 'if CPE is managed by your support, check if CPE is configured as per your internal documentation' (as we might say somewhere in the doc that in that case it is strongly recommended to have a separate how-to on what should be configured on those CPEs); - check if the LAN interface has IPv6 address from the ISP allocated range; - check on user device if it has routers's both link-local (fe80:...) and ISP-assigned address in the neighbor discovery cache (<provide commands); - check the routing table on the user's device; see if the default route points to the router's address. If not - check if DHCP and or RAs are enabled on the home router. - run ipv6 traceroute to isp.test-ipv6 site and see where the traceroure stops.
I think this are all good suggestions. We'll go through them during our authosr edit cycle meeting, probably during the IETF meeting in Toronto.
Code 46 Section - I'd sugegst to run traceroute to see where it stops. If traceroute does not show any issues - escalate. If it does outside of your network - contact the affected netwokr NOC.
ack...
IMHO a section should be added which explains what kind of information should be collected for an escalation. I'd suggest: - Ipv4 and Ipv6 traceroute; - ifconfig output - routing table output - maybe packet capture for the session which is having issues.
I think this is asking a bit too much to a first level helpdesk employee... I don't know... Cheers, Jan
On Thu, May 22, 2014 at 11:35 AM, Jan Zorz @ go6.si <jan@go6.si> wrote:
4) Code 112 (v4 + broken Ipv6)
- can we show the process as a flowchart? if-then-else?
that's a bit hard one... do we have anybody specialized in flowcharts here?
As soon as we describe the procedure in plain text more clearly, it would be relatively easy to add a flowchart (I could do it). IMHO if we could not draw a flowchart based on the text, it means that troubleshooting steps are not defined clear enough :) So let's concentrate on the defining the procedure and then I'll add a flow chart.
I think this are all good suggestions. We'll go through them during our authosr edit cycle meeting, probably during the IETF meeting in Toronto.
I'd be happy to join.
IMHO a section should be added which explains what kind of information should be collected for an escalation. I'd suggest: - Ipv4 and Ipv6 traceroute; - ifconfig output - routing table output - maybe packet capture for the session which is having issues.
I think this is asking a bit too much to a first level helpdesk employee... I don't know...
Packet capture might be tricky as the user machine might not have any sniffer installed. However I believe we could expect even first level support engineer to be able to run the well-defined set of commands (such as traceroute, netstat and ifconfig) - they don't need to understand the output, just provide it while escalating. -- SY, Jen Linkova aka Furry
On 05/06/14 10:40, Jen Linkova wrote:
On Thu, May 22, 2014 at 11:35 AM, Jan Zorz @ go6.si <jan@go6.si> wrote:
4) Code 112 (v4 + broken Ipv6)
- can we show the process as a flowchart? if-then-else?
that's a bit hard one... do we have anybody specialized in flowcharts here?
As soon as we describe the procedure in plain text more clearly, it would be relatively easy to add a flowchart (I could do it).
So we need to address all the comments first and make the text more clear before moving forward with the flowchart?
IMHO if we could not draw a flowchart based on the text, it means that troubleshooting steps are not defined clear enough :) So let's concentrate on the defining the procedure and then I'll add a flow chart.
thnx!
I think this are all good suggestions. We'll go through them during our authosr edit cycle meeting, probably during the IETF meeting in Toronto.
I'd be happy to join.
ok, count yourself in ;)
IMHO a section should be added which explains what kind of information should be collected for an escalation. I'd suggest: - Ipv4 and Ipv6 traceroute; - ifconfig output - routing table output - maybe packet capture for the session which is having issues.
I think this is asking a bit too much to a first level helpdesk employee... I don't know...
Packet capture might be tricky as the user machine might not have any sniffer installed. However I believe we could expect even first level support engineer to be able to run the well-defined set of commands (such as traceroute, netstat and ifconfig) - they don't need to understand the output, just provide it while escalating.
What do others think here? What is your experience with helpdesks and their ability to perform this stuff? Cheers, Jan
Hi Jan and lists, "Jan Zorz @ go6.si" <jan@go6.si> writes:
On 05/06/14 10:40, Jen Linkova wrote:
I think this is asking a bit too much to a first level helpdesk employee... I don't know...
Packet capture might be tricky as the user machine might not have any sniffer installed. However I believe we could expect even first level support engineer to be able to run the well-defined set of commands (such as traceroute, netstat and ifconfig) - they don't need to understand the output, just provide it while escalating.
What do others think here? What is your experience with helpdesks and their ability to perform this stuff?
that really depends. I've seen first level supporters who wouldn't need this at such a level because they'd know how to do that anyway, and I've seen ones who would read whatever a flowchart says without understanding a single word. But what's worrying me more is if we can actually come up with a one-size-fits-all flowchart that is actually any use to anybody. It might well be that this turns out a completely futile exercise, but the only way to find that out is to actually give it a try. Cheers, Benedikt -- Benedikt Stockebrand, Stepladder IT Training+Consulting Dipl.-Inform. http://www.stepladder-it.com/ Business Grade IPv6 --- Consulting, Training, Projects BIVBlog---Benedikt's IT Video Blog: http://www.stepladder-it.com/bivblog/
On Tue, Jun 10, 2014 at 2:11 PM, Benedikt Stockebrand <bs@stepladder-it.com> wrote:.
However I believe we could expect even first level support engineer to be able to run the well-defined set of commands (such as traceroute, netstat and ifconfig) - they don't need to understand the output, just provide it while escalating.
What do others think here? What is your experience with helpdesks and their ability to perform this stuff?
that really depends. I've seen first level supporters who wouldn't need this at such a level because they'd know how to do that anyway
My understanding they are not a target audience of this document ;)
and I've seen ones who would read whatever a flowchart says without understanding a single word.
That's why I'd like to make sure that the troubleshooting procedure in this document could be presented as a flow chart and all possible scenarios are covered (even as 'then escalate'), so first-level support always know what to do. Re: collecting additional information: my point here is that when X-level support is escalating to X+1 level, it's always a good idea to have a well-defined list of what information should be collected and provided with an escalation request (like attaching 'show tech' to Cisco TAC request :)) It does not mean that escalating engineer has to understand every single word in the data they are collecting and it does not mean that the collected information would necessary contain everything needed to solve the case. The goal is to cover some common cases (and if somebody would complain to me about poor v[46] connectivity from their workstation I personally would ask for ifconfig, netstat and traceroute before doing anything else ;)
But what's worrying me more is if we can actually come up with a one-size-fits-all flowchart that is actually any use to anybody.
I'm sure we can not do 'one-size-fits-all' thing but we could provide a kind of template which could be customized.
It might well be that this turns out a completely futile exercise, but the only way to find that out is to actually give it a try.
I agree. If I remember correctly, Ragnar mentioned during v6 WG session that his helpdesk was pretty happy with this document. So I believe we should get the draft to a slightly better state and let people try it in their networks. -- SY, Jen Linkova aka Furry
Hi Jen and list, Jen Linkova <furry13@gmail.com> writes:
that really depends. I've seen first level supporters who wouldn't need this at such a level because they'd know how to do that anyway
My understanding they are not a target audience of this document ;)
right---but they might still feel offended because they think we think they are idiots:-)
But what's worrying me more is if we can actually come up with a one-size-fits-all flowchart that is actually any use to anybody.
I'm sure we can not do 'one-size-fits-all' thing but we could provide a kind of template which could be customized.
Fair enough, only then we should make absolutely that this is clearly stated in a prominent way.
It might well be that this turns out a completely futile exercise, but the only way to find that out is to actually give it a try.
I agree. If I remember correctly, Ragnar mentioned during v6 WG session that his helpdesk was pretty happy with this document. So I believe we should get the draft to a slightly better state and let people try it in their networks.
Well, forget about Ragnar's crowd. They are actually doing Gigabit fiber to people's homes in some remote corner of Norway while Deutsche Telekom and their resellers only gets me 10 Mbit/s DSL downstream in downtown Frankfurt; I consider that pretty strong indication that their first level support is about two orders of magnitude better than the international average as well:-) Cheers, Benedikt -- Benedikt Stockebrand, Stepladder IT Training+Consulting Dipl.-Inform. http://www.stepladder-it.com/ Business Grade IPv6 --- Consulting, Training, Projects BIVBlog---Benedikt's IT Video Blog: http://www.stepladder-it.com/bivblog/
On 15/05/14 00:57, Jen Linkova wrote:
Hi Jan,
finally I've managed to summarize my comments to the doc ;)
0) the document says that the target audience is residential ISPs and enterprise IT helpdesk while most of troubleshooting steps are mostly applicable for residential ISPs as they imply that there are CPE/home devices/etc. Shall we create another doc for enterprise networks or just extend this one?
Hi, This is a discussion for BCOP crowd... Suggestions? Cheers, Jan
participants (3)
-
Benedikt Stockebrand
-
Jan Zorz @ go6.si
-
Jen Linkova