返回頂部

ARP導致3850交換機CPU中斷高

  第一步: 找到CPU占用率高的進程

  3850系列交換機使用4核CPU, 使用show process cpu 命令將會看到:

  3850-2#show processes cpu sort | exclude 0.0

  Core 0: CPU utilization for five seconds: 53%; one minute: 39%; five minutes: 41%

  Core 1: CPU utilization for five seconds: 43%; one minute: 57%; five minutes: 54%

  Core 2: CPU utilization for five seconds: 95%; one minute: 60%; five minutes: 58%

  Core 3: CPU utilization for five seconds: 32%; one minute: 31%; five minutes: 29%

  PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

  8525 472560 2345554 7525 31.37 30.84 30.83 0 iosd

  5661 2157452 9234031 698 13.17 12.56 12.54 1088 fed

  6206 19630 74895 262 1.83 0.43 0.10 0 eicored

  6197 725760 11967089 60 1.41 1.38 1.47 0 pdsd

  根據以上輸出,可以看到iosd和fed進程占用大量CPU.

  *Forwarding Engine Driver (FED): This is the heart of the Cisco Catalyst 3850 Series Switch and is responsible for all hardware programming/forwarding.

  當由于中斷(interrupts)導致CPU利用率高,將會看到IOSd和FED進程占用大量CPU,以下子進程將占用CPU:

  FED Punject TX

  FED Punject RX

  FED Punject replenish

  FED Punject TX complete

  我們可以使用 show process cpu detailed 命令去獲得進程的更詳細內容.

  在這個案例中,我們看看這2個進程:

  3850-2#show processes cpu detailed process iosd sort | ex 0.0

  Core 0: CPU utilization for five seconds: 36%; one minute: 39%; five minutes: 40%

  Core 1: CPU utilization for five seconds: 73%; one minute: 52%; five minutes: 53%

  Core 2: CPU utilization for five seconds: 22%; one minute: 56%; five minutes: 58%

  Core 3: CPU utilization for five seconds: 46%; one minute: 40%; five minutes: 31%

  PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process

  (%) (%) (%)

  8525 L 556160 2356540 7526 30.42 30.77 30.83 0 iosd

  8525 L 1 8525 712558 284117 0 23.14 23.33 23.38 0 iosd

  59 I 1115452 4168181 0 42.22 39.55 39.33 0 ARP Snoop

  198 I 3442960 4168186 0 25.33 24.22 24.77 0 IP Host Track Proce

  30 I 3802130 4168183 0 24.66 27.88 27.66 0 ARP Input

  283 I 574800 3225649 0 4.33 4.00 4.11 0 DAI Packet Process

  3850-2#show processes cpu detailed process fed sorted | ex 0.0

  Core 0: CPU utilization for five seconds: 45%; one minute: 44%; five minutes: 44%

  Core 1: CPU utilization for five seconds: 38%; one minute: 44%; five minutes: 45%

  Core 2: CPU utilization for five seconds: 42%; one minute: 41%; five minutes: 40%

  Core 3: CPU utilization for five seconds: 32%; one minute: 30%; five minutes: 31%

  PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process

  (%) (%) (%)

  5638 L 612840 1143306 536 13.22 12.90 12.93 1088 fed

  5638 L 3 8998 396500 602433 0 9.87 9.63 9.61 0 PunjectTx

  5638 L 3 8997 159890 66051 0 2.70 2.70 2.74 0 PunjectRx

  在IOSd的輸出中,看到ARP Snoop, IP Host Track Process, 和 ARP Input 比較高. 比較常見的是由于ARP包造成CPU中斷高.

ARP導致3850交換機CPU中斷高圖片

  第二步:確定導致CPU利用率高的CPU隊列

  思科3850系列交換機為不同類型的包提供了不同的隊列(FED維護32個RX CPU Queue,這些queue會直接上CPU).

  監視這些Queue可以發現哪些包被送上了CPU,哪些被IOSd進程處理. 這些Queue是基于每個Port-ASIC的. 例如,有2個Port-ASIC, 接口1到24屬于Port-ASIC 0.

  可以使用showplatform punt statistics port-asic cpuq direction 命令去看.

  在showplatform punt statistics port-asic 0 cpuq -1 direction rx 命令中, -1參數代理所有queue, 引起,這條命令將顯示Port-ASIC 0上所有receivequeue.

  現在,必須確定哪個Queue有大量的數據包在以高速率進入queue. 在這個案例中, 檢查發現了名叫 CPU_Q_PROTO_SNOOPING的第16個queue有異常.

  RX (ASIC2CPU) Stats (asic 0 qn 16lqn 16):

  RXQ 16: CPU_Q_PROTO_SNOOPING

  ----------------------------------------

  Packets received from ASIC : 79099152

  Send to IOSd total attempts : 79099152

  Send to IOSd failed count : 1240331

  RX suspend count : 1240331

  RX unsuspend count : 1240330

  RX unsuspend send count : 1240330

  RX unsuspend send failed count : 0

  RX dropped count : 0

  RX conversion failure dropped : 0

  RX pkt_hdr allocation failure : 0

  RX INTACK count : 0

  RX packets dq'd after intack : 0

  Active RxQ event : 9906280

  RX spurious interrupt : 0

  另一種方法是使用show platform punt client 命令去發現:

  3850-2#show platform punt client

  tag buffer jumbo fallback packets received failures

  65559 0/ 16/1600 0/4 0/0 0 0 0 0 0

  65560 0/ 16/1600 0/4 0/0 0 0 0 0 0

  s65561 421/ 512/1600 0/0 0/128 79565859131644697 478984244 0 37467

  65563 0/ 512/1600 0/16 0/256 0 0 0 0 0

  65564 0/ 512/1600 0/16 0/256 0 0 0 0 0

  找到最多數據包這一行對應的tag,在這個案例中, tag為65561.

  *Handle:A handle can be thought of as a pointer. It is a means to discover moredetailed information about specific variables that are used in the outputs thatthe box produces. This is similar to the concept of Local Target Logic (LTL)indices on the Cisco Catalyst 6500 Series Switch.

  *PacketDelivery System (PDS): This is thearchitecture and process of how packets aredelivered to and from varioussubsystem. As an example, it controls how packetsare delivered from the FED tothe IOSd and vice versa.

  輸入以下命令,可以看到queue名是Rx Proto Snoop.

  3850-2#show pds tag all | in Active|Tags|65561

  Active Client Client

  Tags HandleName TDA SDA FDA TBufD TBytD

  65561 7296672 Punt Rx ProtoSnoop 79821397 79821397 0 79821397 494316524

  在65561前的s表示大量入向數據包導致FED handle被暫停或者被壓垮(FEDhandle is suspended and overwhelmed). 如果s不消失,代表queue永久性的卡住了.

  第三步:轉儲發送到CPU的數據包

  在show pds tag all 的輸出中看到 handle是7296672.

  在使用show pds client packet last sink命令之前,需要開啟debug pds pktbuf-last.否則將看到以下輸出:

  3850-2#show pds client 7296672 packet last sink

  % switch-2:pdsd:This command works in debug mode only. Enable debug using

  "debug pds pktbuf-last" command

  開啟debug pds pktbuf-last后,將看到以下輸出:

  3850-2#show pds client 7296672 packet last sink

  Dumping Packet(54528) # 0 of Length 60

  -----------------------------------------

  Meta-data

  0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

  0010 00 00 16 1d 00 00 00 00 00 00 00 00 55 5a 57 f0 ............UZW.

  0020 00 00 00 00 fd 01 10 df 00 5b 70 00 00 10 43 00 .........[p...C.

  0030 00 10 43 00 00 41 fd 00 00 41 fd 00 00 00 00 00 ..C..A...A......

  0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

  0050 00 00 00 3c 00 00 00 00 00 01 00 19 00 00 00 00 ...<............

  0060 01 01 b6 80 00 00 00 4f 00 00 00 00 00 00 00 00 .......O........

  0070 01 04 d8 80 00 00 00 33 00 00 00 00 00 00 00 00 .......3........

  0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

  0090 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

  00a0 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 ................

  Data

  0000 ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01 ................

  0010 08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a ................

  0020 ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05 ................

  0030 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ............

  Meta-data是被系統內部使用的,Data輸出是真實數據包的信息. 以下命令將使用Meta-data 0070開始的前16bits.

  * IIF=Interface ID Factory

  3850-2#show platform port-asic ifm iif-id 0x0104d88000000033

  Interface Table

  Interface IIF-ID : 0x0104d88000000033

  Interface Name : Gi2/0/20

  Interface Block Pointer : 0x514d2f70

  Interface State : READY

  Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD

  Interface Ref-Cnt : 6

  Interface Epoch : 0

  Interface Type : ETHER

  Port Type : SWITCH PORT

  Port Location : LOCAL

  Slot : 2

  Unit : 20

  Slot Unit : 20

  Acitve : Y

  SNMP IF Index : 22

  GPN : 84

  EC Channel : 0

  EC Index : 0

  ASIC : 0

  ASIC Port : 14

  Port LE Handle : 0x514cd990

  Non Zero Feature Ref Counts

  FID : 48(AL_FID_L2_PM), Ref Count : 1

  FID : 77(AL_FID_STATS), Ref Count : 1

  FID : 51(AL_FID_L2_MATM), Ref Count : 1

  FID : 13(AL_FID_SC), Ref Count : 1

  FID : 26(AL_FID_QOS), Ref Count : 1

  Sub block information

  FID : 48(AL_FID_L2_PM), Private Data : 0x54072618

  FID : 26(AL_FID_QOS), Private Data : 0x514d31b8

  從以上輸出可以看到ARP數據包來源于接口Gi2/0/20. 此時,如果關閉這個接口,將解決這個問題并降低CPU利用率.

  第四步:使用FED tracing

  在第3步討論的方法的缺點是只可以解析列隊中最后一個包,也許并不是真正導致CPU高的數據包.

  有一個更好的排錯方法是使用 FED tracing, 可以將FED送上CPU的數據包進行抓包.

  =====

  步驟:

  =====

  1/ 需要enable detail tracking, 才能抓包

  3850-2#set trace control fed-punject-detail enable

  2/ 調整抓包buffer. 按需調整detail tracing的buffer.默認buffer大小是32768.

  3850-2#show mgmt-infra trace settings fed-punject-detail

  One shot Trace Settings:

  Buffer Name: fed-punject-detail

  Default Size: 32768

  Current Size: 32768

  Traces Dropped due to internal error: No

  Total Entries Written: 0

  One shot mode: No

  One shot and full: No

  Disabled: False

  可以通過以下命令修改buffer大小

  3850-2#set trace control fed-punject-detail buffer-size

  3850-2#set trace control fed-punject-detail buffer-size ?

  <8192-67108864> The new desired buffer size, in bytes

  default Reset trace buffer size to default

  3/ 添加抓包filter. 你可以添加不同的filter,并且使用match all 或者match any來混合使用這些filter進行抓包.

  3850-2#set trace fed-punject-detail direction rx filter_add

  3850-2#set trace fed-punject-detail direction rx filter_add ?

  cpu-queue rxq 0..31

  field field

  offset offset

  3850-2#set trace fed-punject-detail direction rx ?

  Clear_all Clear all debug configured

  Dump_all Dump all debug info: on

  Dump_all_off Dump all debug info: off

  Filter_add Add debug filter condition

  Filter_clear Clear a debug filter configured

  Filter_disable Disable configured filter condition(s)

  Filter_enable Enable configured filter condition(s)

  Match_all Match all configured filter conditions

  Match_any Match any configured filter condition

  Show_all Show all debug configured

  在第二步案例分析中, 我們看到Queue16有大量數據包送上CPU, 所以需要查看這個queue,看看哪些數據包被送上CPU了.

  使用以下命令去設置查看哪個cpu queue

  3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue

  例如:

  3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue 16 16

  使用match all 或者match any來混合使用已經定義了的filter來進行抓包

  3850-2#set trace fed-punject-detail direction rx match_all

  3850-2#set trace fed-punject-detail direction rx filter_enable

  4/顯示抓到的數據包

  使用show mgmt-infra trace messages fed-punject-detail命令查看抓到的包.

  3850-2#show mgmt-infra trace messages fed-punject-detail

  [11/25/13 07:05:53.814 UTC 2eb0c9 5661]

  00 00 00 00 00 4e 00 40 07 00 02 08 00 00 51 3b

  00 00 00 00 00 01 00 00 03 00 00 00 00 00 00 01

  00 00 00 00 20 00 00 0e 00 00 00 00 00 01 00 74

  00 00 00 04 00 54 41 02 00 00 00 00 00 00 00 00

  [11/25/13 07:05:53.814 UTC 2eb0ca 5661]

  ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01

  08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a

  ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05

  06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 f6 b9 10 32

  [11/25/13 07:05:53.814 UTC 2eb0cb 5661] Frame descriptors:

  [11/25/13 07:05:53.814 UTC 2eb0cc 5661]

  =========

  fdFormat=0x4 systemTtl=0xe

  loadBalHash1=0x8 loadBalHash2=0x8

  spanSessionMap=0x0 forwardingMode=0x0

  destModIndex=0x0 skipIdIndex=0x4

  srcGpn=0x54 qosLabel=0x41

  srcCos=0x0 ingressTranslatedVlan=0x3

  bpdu=0x0 spanHistory=0x0

  sgt=0x0 fpeFirstHeaderType=0x0

  srcVlan=0x1 rcpServiceId=0x2

  wccpSkip=0x0 srcPortLeIndex=0xe

  cryptoProtocol=0x0 debugTagId=0x0

  vrfId=0x0 saIndex=0x0

  pendingAfdLabel=0x0 destClient=0x1

  appId=0x0 finalStationIndex=0x74

  decryptSuccess=0x0 encryptSuccess=0x0

  rcpMiscResults=0x0 stackedFdPresent=0x0

  spanDirection=0x0 egressRedirect=0x0

  redirectIndex=0x0 exceptionLabel=0x0

  destGpn=0x0 inlineFd=0x0

  suppressRefPtrUpdate=0x0 suppressRewriteSideEfects=0x0

  cmi2=0x0 currentRi=0x1

  currentDi=0x513b dropIpUnreachable=0x0

  srcZoneId=0x0 srcAsicId=0x0

  originalDi=0x0 originalRi=0x0

  srcL3IfIndex=0x2 dstL3IfIndex=0x0

  dstVlan=0x0 frameLength=0x40

  fdCrc=0x7 tunnelSpokeId=0x0

  =========

  [11/25/13 07:05:53.814 UTC 2eb0cd 5661]

  [11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:

  830):RX: Q: 16, Tag: 65561

  [11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:

  579):RX: Physical IIF-id 0x104d88000000033

  [11/25/13 07:05:53.814 UTC 2eb0d0 5661] PUNT PATH (fed_punject_get_src_l3if_index:

  434):RX: L3 IIF-id 0x101b6800000004f

  [11/25/13 07:05:53.814 UTC 2eb0d1 5661] PUNT PATH (fed_punject_fd_2_pds_md:478):

  RX: l2_logical_if = 0x0

  [11/25/13 07:05:53.814 UTC 2eb0d2 5661] PUNT PATH (fed_punject_get_source_cos:638):

  RX: Source Cos 0

  [11/25/13 07:05:53.814 UTC 2eb0d3 5661] PUNT PATH (fed_punject_get_vrf_id:653):

  RX: VRF-id 0

  [11/25/13 07:05:53.814 UTC 2eb0d4 5661] PUNT PATH (fed_punject_get_src_zoneid:667):

  RX: Zone-id 0

  [11/25/13 07:05:53.814 UTC 2eb0d5 5661] PUNT PATH (fed_punject_fd_2_pds_md:518):

  RX: get_src_zoneid failed

  [11/25/13 07:05:53.814 UTC 2eb0d6 5661] PUNT PATH (fed_punject_get_acl_log_direction:

  695): RX: : Invalid CMI2

  [11/25/13 07:05:53.814 UTC 2eb0d7 5661] PUNT PATH (fed_punject_fd_2_pds_md:541):RX:

  get_acl_log_direction failed

  [11/25/13 07:05:53.814 UTC 2eb0d8 5661] PUNT PATH (fed_punject_get_acl_full_direction:

  724):RX: DI 0x513b ACL Full Direction 1

  [11/25/13 07:05:53.814 UTC 2eb0d9 5661] PUNT PATH (fed_punject_get_source_sgt:446):

  RX: Source SGT 0

  [11/25/13 07:05:53.814 UTC 2eb0da 5661] PUNT PATH (fed_punject_get_first_header_type:680):

  RX: FirstHeaderType 0

  [11/25/13 07:05:53.814 UTC 2eb0db 5661] PUNT PATH (fed_punject_rx_process_packet:916):

  RX: fed_punject_pds_send packet 0x1f00 to IOSd with tag 65561

  [11/25/13 07:05:53.814 UTC 2eb0dc 5661] PUNT PATH (fed_punject_rx_process_packet:744):

  RX: **** RX packet 0x2360 on qn 16, len 128 ****

  [11/25/13 07:05:53.814 UTC 2eb0dd 5661]

  buf_no 0 buf_len 128

  以上輸出提供了足夠信息,讓我們知道數據包從哪里來 ,以及含有什么信息.

  ff ff ff ff ff ff - 目的 MAC地址

  aa bb cc dd 00 00 - 源 MAC地址

  現在可以根據源MAC地址來找到相應的接口.

  在log中, 也包含了非常有用的信息:

  [11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:

  830):RX: Q: 16, Tag: 65561

  [11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:

  579):RX: Physical IIF-id 0x104d88000000033

  通過第一條log很容易看到數據包來自哪個queue和tag.

  第二條log更有用, 因為它包含了源接口的IIF-ID, 可以使用以下命令找到源接口.

  3850-2#show platform port-asic ifm iif-id 0x0104d88000000033

  Interface Table

  Interface IIF-ID : 0x0104d88000000033

  Interface Name : Gi2/0/20

  Interface Block Pointer : 0x514d2f70

  Interface State : READY

  Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD

  Interface Ref-Cnt : 6

  Interface Epoch : 0

  Interface Type : ETHER

  Port Type : SWITCH PORT

  Port Location : LOCAL

  Slot : 2

  Unit : 20

  Slot Unit : 20

  Acitve : Y

  SNMP IF Index : 22

  GPN : 84

  EC Channel : 0

  EC Index : 0

  ASIC : 0

  ASIC Port : 14

  Port LE Handle : 0x514cd990

  Non Zero Feature Ref Counts

  FID : 48(AL_FID_L2_PM), Ref Count : 1

  FID : 77(AL_FID_STATS), Ref Count : 1

  FID : 51(AL_FID_L2_MATM), Ref Count : 1

  FID : 13(AL_FID_SC), Ref Count : 1

  FID : 26(AL_FID_QOS), Ref Count : 1

  Sub block information

  FID : 48(AL_FID_L2_PM), Private Data : 0x54072618

  FID : 26(AL_FID_QOS), Private Data : 0x514d31b8



400-0806-056
主站蜘蛛池模板: 亚洲乱码中文字幕综合234| 狠狠色丁香婷婷综合尤物| 欧美日韩综合精品| 天天做天天爱天天综合网2021| 欧美日韩亚洲综合一区二区三区| 激情综合婷婷色五月蜜桃| 精品福利一区二区三区精品国产第一国产综合精品| 亚洲第一区欧美国产不卡综合| 欧美综合区自拍亚洲综合天堂| 亚洲亚洲人成综合网络| 亚洲欧美成人综合久久久| 婷婷久久综合九色综合绿巨人| 综合久久一区二区三区| 亚洲综合另类小说色区| 久久综合久久美利坚合众国| 欧美国产综合欧美视频| 亚洲欧美日韩综合在线观看不卡顿| 丁香婷婷综合网| 一97日本道伊人久久综合影院| 欧美综合在线观看| 一本久道久久综合狠狠爱| 青青草原综合久久大伊人| 亚洲色欧美色国产综合色| 狠狠综合久久AV一区二区三区| 亚洲成a人v欧美综合天堂| 久久综合九色综合久99| 伊人久久综合无码成人网| 色欲天天婬色婬香视频综合网| 久久亚洲精品人成综合网| 18和谐综合色区| 欧美综合在线观看| 国产综合精品一区二区三区| 精品国产综合成人亚洲区| 综合在线视频精品专区| 久久婷婷五月综合色99啪ak| 亚洲精品综合久久| 亚洲综合色在线观看亚洲| 婷婷成人丁香五月综合激情| 日韩欧美国产综合| 狠狠色丁香婷婷综合久久来来去| 久久综合五月丁香久久激情|