nmap os detection原理及golang实现

admin

141259
文章

117
评论

2022年12月31日01:39:24评论13 views字数 26794阅读89分18秒阅读模式

近期对nmap的操作系统识别功能造了个轮子，用golang实现了一遍，想未来能用于扫描器，资产发现/管理系统，网络空间引擎中。本文记录一些原理和踩过的坑。

造轮子也是一次深入理解它原理的过程，造完轮子后感觉到所有代码尽在我掌握之中，之后大规模扫描测试就可以以最有效率，发最小包，绕过防火墙的方式进行集成，也能轻易的进行扩展。

成果图，实现了对一个主机在5s内能够识别出其操作系统，能预测开机时间(部分系统有误差),用无状态扫描技术去识别大量主机的操作系统时，平均时间会更低。

造轮子其实是非常不容易的，虽然nmap是开源项目，但与操作系统识别有关的代码就超过6000行，nmap官网有文档专门有一节讲述了nmap操作系统指纹的技术实现 https://nmap.org/book/osdetect.html ，它们都不太容易看懂，我是文档和源码交互着看了几十遍才大概明白整个流程。

基本原理

Nmap OS 指纹识别通过向目标机器的Open和Close 端口发送多达 16 个 TCP、UDP 和 ICMP 探针来工作

其中 TCP探针13个，UDP探针 1 个，ICMP探针2个。

这些探针是根据RFC协议中的各种歧义来设计的。然后 Nmap 监听响应。分析响应中的数十个属性以生成124 个指纹，并和nmap内置的指纹库进行对比。

nmap对操作系统识别的代码大多集中在 osscan.cc osscan2.cc中。

osscan.cc：主要负责os指纹的解析、对比函数，可直接看如下的函数定义。

/* Parses a single fingerprint from the memory region given.  If a
 non-null fingerprint is returned, the user is in charge of freeing it
 when done.  This function does not require the fingerprint to be 100%
 complete since it is used by scripts such as scripts/fingerwatch for
 which some partial fingerprints are OK. */
FingerPrint *parse_single_fingerprint(const char *fprint_orig);

/* These functions take a file/db name and open+parse it, returning an
   (allocated) FingerPrintDB containing the results.  They exit with
   an error message in the case of error. */
FingerPrintDB *parse_fingerprint_file(const char *fname);

/* Compares 2 fingerprints -- a referenceFP (can have expression
   attributes) with an observed fingerprint (no expressions).  If
   verbose is nonzero, differences will be printed.  The comparison
   accuracy (between 0 and 1) is returned).  MatchPoints is
   a special "fingerprints" which tells how many points each test is worth. */
double compare_fingerprints(const FingerPrint *referenceFP, const FingerPrint *observedFP,
                            const FingerPrintDef *MatchPoints, int verbose);

osscan2.cc：主要负责发送探针以及根据返回内容生成指纹，下面是定义的一些主要函数，send开头为相关探针的发送函数，process为中间处理指纹函数，make为最终生成指纹函数。

  /* Probe send functions. */
  void sendTSeqProbe(HostOsScanStats *hss, int probeNo);
  void sendTOpsProbe(HostOsScanStats *hss, int probeNo);
  void sendTEcnProbe(HostOsScanStats *hss);
  void sendT1_7Probe(HostOsScanStats *hss, int probeNo);
  void sendTUdpProbe(HostOsScanStats *hss, int probeNo);
  void sendTIcmpProbe(HostOsScanStats *hss, int probeNo);
  /* Response process functions. */
  bool processTSeqResp(HostOsScanStats *hss, const struct ip *ip, int replyNo);
  bool processTOpsResp(HostOsScanStats *hss, const struct tcp_hdr *tcp, int replyNo);
  bool processTWinResp(HostOsScanStats *hss, const struct tcp_hdr *tcp, int replyNo);
  bool processTEcnResp(HostOsScanStats *hss, const struct ip *ip);
  bool processT1_7Resp(HostOsScanStats *hss, const struct ip *ip, int replyNo);
  bool processTUdpResp(HostOsScanStats *hss, const struct ip *ip);
  bool processTIcmpResp(HostOsScanStats *hss, const struct ip *ip, int replyNo);

  /* Generic sending functions used by the above probe functions. */
  int send_tcp_probe(HostOsScanStats *hss,
                     int ttl, bool df, u8* ipopt, int ipoptlen,
                     u16 sport, u16 dport, u32 seq, u32 ack,
                     u8 reserved, u8 flags, u16 window, u16 urp,
                     u8 *options, int optlen,
                     char *data, u16 datalen);
  int send_icmp_echo_probe(HostOsScanStats *hss,
                           u8 tos, bool df, u8 pcode,
                           unsigned short id, u16 seq, u16 datalen);
  int send_closedudp_probe(HostOsScanStats *hss,
                           int ttl, u16 sport, u16 dport);

  void makeTSeqFP(HostOsScanStats *hss);
  void makeTOpsFP(HostOsScanStats *hss);
  void makeTWinFP(HostOsScanStats *hss);

nmap将指纹分为以下几类

SEQ：基于探针进行序列分析的指纹结果
OPS：基于探针接受到的TCP选项
WIN：基于探针接受到的响应窗口大小 (TCP Windows Size)
T系列：基于探针响应的TCP数据包各种测试值的结果
ECN：ECN探针返回结果

ECN 是一种通过允许路由器在开始不得不丢弃数据包之前发出拥塞问题信号来提高 Internet 性能的方法。它记录在RFC 3168中.

当生成许多包通过路由器时会导致其负载变大，这称之为拥塞。其结果就是系统会变慢以降低拥堵，以便路由器不会发生丢包。这个包仅为了得到目标系统的响应而发送。因为不同的操作系统以不同的方式处理这个包，所以返回的特定值可以用来判断操作系统。
IE：ICMP响应的数据包测试值结果
U1：UDP响应的数据包测试值结果

nmap OS探测时，会向目标主机的一个Open状态TCP端口，一个Close状态TCP端口，一个关闭的UDP端口发送数据包，以及一个ICMP数据包。

nmap发包函数是os_scan_ipv4,可以通过源码看发包流程

/* Performs the OS detection for IPv4 hosts. This method should not be called
 * directly. os_scan() should be used instead, as it handles chunking so
 * you don't do too many targets in parallel */
int OSScan::os_scan_ipv4(std::vector<Target *> &Targets) {
  ...

  /* Initialize the pcap session handler in HOS */
  begin_sniffer(&HOS, Targets);
  
  ...
  // 准备测试，删除旧信息，初始化变量
  startRound(&OSI, &HOS, itry);
  // 执行顺序产生测试（发送6个TCP探测包，每隔100ms一个）
  doSeqTests(&OSI, &HOS);
  // 执行TCP、UDP、ICMP探测包测试
  doTUITests(&OSI, &HOS);
  // 对结果做指纹对比，获取OS扫描信息
  endRound(&OSI, &HOS, itry);
  // 将超时未匹配的主机移动
  expireUnmatchedHosts(&OSI, &unMatchedHosts);
  }

TCP探针

TCP将发送13个数据包，这些数据包分为三类，一类是对tcp option的测试，一类是对tcp/ip 其他字段的测试，最后是ECN测试。

可以直接看nmap构造tcp option数据包源码

/* 8 options:
 *  0~5: six options for SEQ/OPS/WIN/T1 probes.
 *  6:   ECN probe.
 *  7-12:   T2~T7 probes.
 *
 * option 0: WScale (10), Nop, MSS (1460), Timestamp, SackP
 * option 1: MSS (1400), WScale (0), SackP, T(0xFFFFFFFF,0x0), EOL
 * option 2: T(0xFFFFFFFF, 0x0), Nop, Nop, WScale (5), Nop, MSS (640)
 * option 3: SackP, T(0xFFFFFFFF,0x0), WScale (10), EOL
 * option 4: MSS (536), SackP, T(0xFFFFFFFF,0x0), WScale (10), EOL
 * option 5: MSS (265), SackP, T(0xFFFFFFFF,0x0)
 * option 6: WScale (10), Nop, MSS (1460), SackP, Nop, Nop
 * option 7-11: WScale (10), Nop, MSS (265), T(0xFFFFFFFF,0x0), SackP
 * option 12: WScale (15), Nop, MSS (265), T(0xFFFFFFFF,0x0), SackP
 */
static struct {
  u8* val;
  int len;
} prbOpts[] = {
  {(u8*) "x03x03x0Ax01x02x04x05xb4x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x02x04x05x78x03x03x00x04x02x08x0Axffxffxffxffx00x00x00x00x00", 20},
  {(u8*) "x08x0Axffxffxffxffx00x00x00x00x01x01x03x03x05x01x02x04x02x80", 20},
  {(u8*) "x04x02x08x0Axffxffxffxffx00x00x00x00x03x03x0Ax00", 16},
  {(u8*) "x02x04x02x18x04x02x08x0Axffxffxffxffx00x00x00x00x03x03x0Ax00", 20},
  {(u8*) "x02x04x01x09x04x02x08x0Axffxffxffxffx00x00x00x00", 16},
  {(u8*) "x03x03x0Ax01x02x04x05xb4x04x02x01x01", 12},
  {(u8*) "x03x03x0Ax01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x03x03x0Ax01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x03x03x0Ax01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x03x03x0Ax01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x03x03x0Ax01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20},
  {(u8*) "x03x03x0fx01x02x04x01x09x08x0Axffxffxffxffx00x00x00x00x04x02", 20}
};

发包每次的TCP Windows窗口大小也不一样

/* TCP Window sizes. Numbering is the same as for prbOpts[] */
u16 prbWindowSz[] = { 1, 63, 4, 4, 16, 512, 3, 128, 256, 1024, 31337, 32768, 65535 };

前六个数据包为 SEQ/OPS/WIN/T1 探针，第7个为ECN探针，后面的是T探针。

SEQ/OPS/WIN探针发SYN包，ECN探针发 TH_CWR|TH_ECE|TH_SYN，Urgent为63477，T探针发包flag如下，发包端口前四个为开放的tcp端口，后三个为关闭的TCP端口。

如果nmap未找到关闭的TCP端口，将随机取值

 closedTCPPort = (get_random_uint() % 14781) + 30000;

T	flag	Dst Port
T1	TH_SYN	openTCPPort
T2	0	openTCPPort
T3	TH_SYN\|TH_FIN\|TH_URG\|TH_PUSH	openTCPPort
T4	TH_ACK	openTCPPort
T5	TH_SYN	closedTCPPort
T6	TH_ACK	closedTCPPort
T7	TH_FIN\|TH_PUSH\|TH_URG	closedTCPPort

ICMP探针

发送两个 ICMP 探针。

第一个设置 IP协议 DF 位，TOS 为零，ICMP Code字段为 9（即ICMPv4CodeNetAdminProhibited），Seq为 295，payload 120 字节的 0x00。

第二个探针类似ping 查询，除了 TOS设置为IP_TOS_RELIABILITY，ICMP Code为0(ICMPv4CodeNet)，发送 150 字节的数据。

UDP探针

向一个关闭的UDP端口发包，IP协议的ID设置为0x1042,payload为300字节的0x43

如果nmap未找到关闭的UDP端口，将随机取值

closedUDPPort = (get_random_uint() % 14781) + 30000;

指纹生成

这是造轮子过程中最麻烦的部分，需要将这些指纹结果一一实现。

TCP ISN 最大公约数 ( `GCD`)

tcp前六个探测包中，tcp seq数值的差值作为一个数组，这个数组及有5个元素。取这个数组的最大公约数。

TCP ISN 计数器速率 ( `ISR`)

取探针返回包 SEQ的差除以发送时间的毫秒差即 SEQ的发送速率，再得出探针每个速率的平均值即seq_rate，最后通过一个公式得出最后的值即ISR。

seq_rate = log(seq_rate) / log(2.0);
seq_rate = (unsigned int) (seq_rate * 8 + 0.5);

TCP ISN 序列可预测性指数 ( `SP`)

代码和文档都难懂，弄一个简化版的代码就看懂了

seq_stddev = 0
for i =0;i<responseNum -1;i++{
  seq_stddev += （（SEQ[i]的发送速率 - SEQ平均速率）  / GCD 最大公约数）的平方
}
seq_stddev  /= responseNum-2
seq_stddev = sqrt(seq_stddev);
seq_stddev = log(seq_stddev) / log(2.0);
sp = (int) (seq_stddev * 8 + 0.5);

以我仅学过的线性代数，这个可以弄成一个公式的，但是我不会在markdown上展示，算了~

IP ID序列生成算法( `TI`, `CI`, `II`)

从TCP Open端口，Tcp Close端口，ICMP协议中，计算ip id的生成算法

如果所有 ID 号均为零，则测试值为Z。
如果 IP ID 序列至少增加 20,000，则该值为RD（随机）。这个结果是不可能的，II因为没有足够的样本来支持它。
如果所有 IP ID 都相同，则测试设置为该十六进制值。
如果两个连续 ID 之间的任何差异超过 1,000，并且不能被 256 整除，则测试的值为 RI（随机正增量）。如果差异可以被 256 整除，则它必须至少为 256,000 才能导致此 RI结果。
如果所有差异都可以被 256 整除且不大于 5,120，则测试设置为BI（中断增量）。这种情况发生在 Microsoft Windows 等系统上，其中 IP ID 以主机字节顺序而不是网络字节顺序发送。它工作正常并且没有任何类型的 RFC 违规，尽管它确实泄露了对攻击者有用的主机架构细节。
如果所有差异都小于十，则该值为I（增量）。我们在这里允许最多 10 个差异（而不是要求按顺序排列），因为来自其他主机的流量会导致序列间隙。
如果前面的步骤都没有识别生成算法，则从指纹中省略测试。

共享 IP ID 序列布尔值 ( `SS`)

根据前面推测出的IP ID增长方式，记录目标是否在 TCP 和 ICMP 协议之间共享其 IP ID 序列。

 /* SS: Shared IP ID sequence boolean */
  if ((tcp_ipid_seqclass == IPID_SEQ_INCR ||
        tcp_ipid_seqclass == IPID_SEQ_BROKEN_INCR ||
        tcp_ipid_seqclass == IPID_SEQ_RPI) &&
       (icmp_ipid_seqclass == IPID_SEQ_INCR ||
        icmp_ipid_seqclass == IPID_SEQ_BROKEN_INCR ||
        icmp_ipid_seqclass == IPID_SEQ_RPI)) {
    /* Both are incremental. Thus we have "SS" test. Check if they
       are in the same sequence. */
    u32 avg = (hss->ipid.tcp_ipids[good_tcp_ipid_num - 1] - hss->ipid.tcp_ipids[0]) / (good_tcp_ipid_num - 1);
    if (hss->ipid.icmp_ipids[0] < hss->ipid.tcp_ipids[good_tcp_ipid_num - 1] + 3 * avg) {
      test.setAVal("SS", "S");
    } else {
      test.setAVal("SS", "O");
    }
  }

TCP 时间戳选项算法 ( `TS`)

这个能预测出开机时间！

TS是另一个测试，它试图根据它如何生成一系列数字来确定目标操作系统的特征。这个查看响应SEQ探测的 TCP 时间戳选项（如果有）。它检查 TSval（选项的前四个字节）而不是回显的 TSecr（最后四个字节）值。它采用每个连续 TSval 之间的差异，并将其除以 Nmap 发送生成这些响应的两个探测器之间经过的时间量。

结果值给出了每秒时间戳增量的速率。Nmap 计算所有连续探测的平均每秒增量，然后计算TS如下：

如果任何响应没有时间戳选项，TS则设置为U（不支持）。
如果任何时间戳值为零，TS则设置为0。
如果每秒平均增量在、或范围内0-5.66，则分别设置为 1、7 或 8。这三个范围得到特殊处理，因为它们对应于许多主机使用的 2 Hz、100 Hz 和 200 Hz 频率。70-150``150-350``TS
在所有其他情况下，Nmap 记录每秒平均增量的二进制对数，四舍五入到最接近的整数。由于大多数主机使用 1,000 Hz 频率， A这是一个常见的结果。

这个结果可以推断出开机时间

/* Now we look at TCP Timestamp sequence prediction */
  /* Battle plan:
     1) Compute average increments per second, and variance in incr. per second
     2) If any are 0, set to constant
     3) If variance is high, set to random incr. [ skip for now ]
     4) if ~10/second, set to appropriate thing
     5) Same with ~100/sec
  */
  if (hss->si.ts_seqclass == TS_SEQ_UNKNOWN && hss->si.responses >= 2) {
    time_t uptime = 0;
    avg_ts_hz = 0.0;
    for (i = 0; i < hss->si.responses - 1; i++) {
      double dhz;

      dhz = (double) ts_diffs[i] / (time_usec_diffs[i] / 1000000.0);
      /*       printf("ts incremented by %d in %li usec -- %fHZn", ts_diffs[i], time_usec_diffs[i], dhz); */
      avg_ts_hz += dhz / (hss->si.responses - 1);
    }

    if (avg_ts_hz > 0 && avg_ts_hz < 5.66) { /* relatively wide range because sampling time so short and frequency so slow */
      hss->si.ts_seqclass = TS_SEQ_2HZ;
      uptime = hss->si.timestamps[0] / 2;
    }
    else if (avg_ts_hz > 70 && avg_ts_hz < 150) {
      hss->si.ts_seqclass = TS_SEQ_100HZ;
      uptime = hss->si.timestamps[0] / 100;
    }
    else if (avg_ts_hz > 724 && avg_ts_hz < 1448) {
      hss->si.ts_seqclass = TS_SEQ_1000HZ;
      uptime = hss->si.timestamps[0] / 1000;
    }
    else if (avg_ts_hz > 0) {
      hss->si.ts_seqclass = TS_SEQ_OTHER_NUM;
      uptime = hss->si.timestamps[0] / (unsigned int)(0.5 + avg_ts_hz);
    }

    if (uptime > 63072000) {
      /* Up 2 years?  Perhaps, but they're probably lying. */
      if (o.debugging) {
        /* long long is probably excessive for number of days, but sick of
         * truncation warnings and finding the right format string for time_t
         */
        log_write(LOG_STDOUT, "Ignoring claimed %s uptime of %lld daysn",
        hss->target->targetipstr(), (long long) (uptime / 86400));
      }
      uptime = 0;
    }
    hss->si.lastboot = hss->seq_send_times[0].tv_sec - uptime;
  }
  
  

switch (hss->si.ts_seqclass) {

  case TS_SEQ_ZERO:
    test.setAVal("TS", "0");
    break;
  case TS_SEQ_2HZ:
  case TS_SEQ_100HZ:
  case TS_SEQ_1000HZ:
  case TS_SEQ_OTHER_NUM:
    /* Here we "cheat" a little to make the classes correspond more
       closely to common real-life frequencies (particularly 100)
       which aren't powers of two. */
    if (avg_ts_hz <= 5.66) {
      /* 1 would normally range from 1.4 - 2.82, but we expand that
         to 0 - 5.66, so we won't ever even get a value of 2.  Needs
         to be wide because our test is so fast that it is hard to
         match slow frequencies exactly.  */
      tsnewval = 1;
    } else if (avg_ts_hz > 70 && avg_ts_hz <= 150) {
      /* mathematically 7 would be 90.51 - 181, but we change to 70-150 to
         better align with common freq 100 */
      tsnewval = 7;
    } else if (avg_ts_hz > 150 && avg_ts_hz <= 350) {
      /* would normally be 181 - 362.  Now aligns better with 200 */
      tsnewval = 8;
    } else {
      /* Do a log base2 rounded to nearest int */
      tsnewval = (unsigned int)(0.5 + log(avg_ts_hz) / log(2.0));
    }

    test.setAVal("TS", hss->target->FPR->cp_hex(tsnewval));
    break;
  case TS_SEQ_UNSUPPORTED:
    test.setAVal("TS", "U");
    break;
  }

TCP 选项 ( `O`, `O1–O6`)

TCP 数据包的Options。指纹保留了原始顺序，以及选项的值。有些操作系统没有实现这些选项或者实现不全。

Option Name	Character	Argument (if any)
End of Options List (EOL)	L
No operation (NOP)	N
Maximum Segment Size (MSS)	M	The value is appended. Many systems echo the value used in the corresponding probe.
Window Scale (WS)	W	The actual value is appended.
Timestamp (TS)	T	The T is followed by two binary characters representing the TSval and TSecr values respectively. The characters are 0 if the field is zero and 1 otherwise.
Selective ACK permitted (SACK)	S

int HostOsScan::get_tcpopt_string(const struct tcp_hdr *tcp, int mss, char *result, int maxlen) const {
  char *p;
  const char *q;
  u16 tmpshort;
  u32 tmpword;
  int length;
  int opcode;

  p = result;
  length = (tcp->th_off * 4) - sizeof(struct tcp_hdr);
  q = ((char *)tcp) + sizeof(struct tcp_hdr);

  /*
   * Example parsed result: M5B4ST11NW2
   *   MSS, Sack Permitted, Timestamp with both value not zero, Nop, WScale with value 2
   */

  /* Be aware of the max increment value for p in parsing,
   * now is 5 = strlen("Mxxxx") <-> MSS Option
   */
  while (length > 0 && (p - result) < (maxlen - 5)) {
    opcode = *q++;
    if (!opcode) { /* End of List */
      *p++ = 'L';
      length--;
    } else if (opcode == 1) { /* No Op */
      *p++ = 'N';
      length--;
    } else if (opcode == 2) { /* MSS */
      if (length < 4)
        break; /* MSS has 4 bytes */
      *p++ = 'M';
      q++;
      memcpy(&tmpshort, q, 2);
      /*  if (ntohs(tmpshort) == mss) */
      /*    *p++ = 'E'; */
      sprintf(p, "%hX", ntohs(tmpshort));
      p += strlen(p); /* max movement of p is 4 (0xFFFF) */
      q += 2;
      length -= 4;
    } else if (opcode == 3) { /* Window Scale */
      if (length < 3)
        break; /* Window Scale option has 3 bytes */
      *p++ = 'W';
      q++;
      snprintf(p, length, "%hhX", *((u8*)q));
      p += strlen(p); /* max movement of p is 2 (max WScale value is 0xFF) */
      q++;
      length -= 3;
    } else if (opcode == 4) { /* SACK permitted */
      if (length < 2)
        break; /* SACK permitted option has 2 bytes */
      *p++ = 'S';
      q++;
      length -= 2;
    } else if (opcode == 8) { /* Timestamp */
      if (length < 10)
        break; /* Timestamp option has 10 bytes */
      *p++ = 'T';
      q++;
      memcpy(&tmpword, q, 4);
      if (tmpword)
        *p++ = '1';
      else
        *p++ = '0';
      q += 4;
      memcpy(&tmpword, q, 4);
      if (tmpword)
        *p++ = '1';
      else
        *p++ = '0';
      q += 4;
      length -= 10;
    }
  }

  if (length > 0) {
    /* We could reach here for one of the two reasons:
     *  1. At least one option is not correct. (Eg. Should have 4 bytes but only has 3 bytes left).
     *  2. The option string is too long.
     */
    *result = '';
    return -1;
  }

  *p = '';
  return p - result;
}

TCP 初始窗口大小 ( `W`, `W1`– `W6`)

TCP.Windows

响应度 ( `R`)

记录目标是否响应给定的探测。可能的值为Y和N。如果没有回复，则省略测试的其余字段。

有返回包为Y，否则为N

IP 不分片位 ( `DF`)

#define IP_DF 0x4000 /* don't fragment flag */

IP.Flag && IP_DF

不要分段 (ICMP) ( `DFI`)

两个ICMP请求中计算 IP.Flag && IP_DF

都为真则为Y
一真一假则为S
都为假为N
否则为0

IP 初始生存时间 ( `T`)

IP.TTL

IP 初始生存时间猜测 ( TG)

根据ttl算tg

int get_initial_ttl_guess(u8 ttl) {
  if (ttl <= 32)
    return 32;
  else if (ttl <= 64)
    return 64;
  else if (ttl <= 128)
    return 128;
  else
    return 255;
}

显式拥塞通知 ( `CC`)

此测试仅用于ECN探头。该探测是一个 SYN 数据包，其中包括 CWR 和 ECE 拥塞控制标志。当接收到响应 SYN/ACK 时，检查这些标志以设置CC（拥塞控制）测试值

  /* Explicit Congestion Notification support test */
  if ((tcp->th_flags & TH_ECE) && (tcp->th_flags & TH_CWR))
    /* echo back */
    test.setAVal("CC", "S");
  else if (tcp->th_flags & TH_ECE)
    /* support */
    test.setAVal("CC", "Y");
  else if (!(tcp->th_flags & TH_CWR))
    /* not support */
    test.setAVal("CC", "N");
  else
    test.setAVal("CC", "O");

TCP 杂项怪癖 ( `Q`)

一些tcp怪癖 (俺也不知道现在还有用没)

 /* TCP miscellaneous quirks test */
  p = quirks_buf;
  if (tcp->th_x2) {
    /* Reserved field of TCP is not zero */
    assert(p + 1 < quirks_buf + sizeof(quirks_buf));
    *p++ = 'R';
  }
  if (!(tcp->th_flags & TH_URG) && tcp->th_urp) {
    /* URG pointer value when urg flag not set */
    assert(p + 1 < quirks_buf + sizeof(quirks_buf));
    *p++ = 'U';
  }
  *p = '';
  test.setAVal("Q", hss->target->FPR->cp_dup(quirks_buf, p - quirks_buf));

TCP 序列号 ( `S`)

测试返回包中tcp.seq和发送时tcp.ack的关系

 /* Seq test values:
     Z   = zero
     A   = same as ack
     A+  = ack + 1
     O   = other
  */
  if (ntohl(tcp->th_seq) == 0)
    test.setAVal("S", "Z");
  else if (ntohl(tcp->th_seq) == tcpAck)
    test.setAVal("S", "A");
  else if (ntohl(tcp->th_seq) == tcpAck + 1)
    test.setAVal("S", "A+");
  else
    test.setAVal("S", "O");

TCP 确认号 ( `A`)

  /* ACK test values:
     Z   = zero
     S   = same as syn
     S+  = syn + 1
     O   = other
  */
  if (ntohl(tcp->th_ack) == 0)
    test.setAVal("A", "Z");
  else if (ntohl(tcp->th_ack) == tcpSeqBase)
    test.setAVal("A", "S");
  else if (ntohl(tcp->th_ack) == tcpSeqBase + 1)
    test.setAVal("A", "S+");
  else
    test.setAVal("A", "O");

TCP 标志 ( `F`)

 /* Flags. They must be in this order:
     E = ECN Echo
     U = Urgent
     A = Acknowledgement
     P = Push
     R = Reset
     S = Synchronize
     F = Final
  */
  struct {
    u8 flag;
    char c;
  } flag_defs[] = {
    { TH_ECE, 'E' },
    { TH_URG, 'U' },
    { TH_ACK, 'A' },
    { TH_PUSH, 'P' },
    { TH_RST, 'R' },
    { TH_SYN, 'S' },
    { TH_FIN, 'F' },
  };
  assert(sizeof(flag_defs) / sizeof(flag_defs[0]) < sizeof(flags_buf));
  p = flags_buf;
  for (i = 0; i < (int) (sizeof(flag_defs) / sizeof(flag_defs[0])); i++) {
    if (tcp->th_flags & flag_defs[i].flag)
      *p++ = flag_defs[i].c;
  }
  *p = '';
  test.setAVal("F", hss->target->FPR->cp_dup(flags_buf, p - flags_buf));

TCP RST 数据校验和 ( `RD`)

 /* Rst Data CRC32 */
  length = (int) ntohs(ip->ip_len) - 4 * ip->ip_hl -4 * tcp->th_off;
  if ((tcp->th_flags & TH_RST) && length>0) {
    test.setAVal("RD", hss->target->FPR->cp_hex(nbase_crc32(((u8 *)tcp) + 4 * tcp->th_off, length)));
  } else {
    test.setAVal("RD", "0");
  }

IP总长( `IPL`)

ip.length

未使用的端口不可达字段非零 ( `UN`)

IP字段中 ID和SEQ字段整体的 uin32位数值

返回的探测 IP 总长度值 ( `RIPL`)

UDP测试中，对icmp返回包中ip结构的长度校验，返回长度的十六进制

  /* OK, lets check the returned IP length, some systems @$@ this
     up */
  if (ntohs(ip2->ip_len) == 328)
    test.setAVal("RIPL", "G");
  else
    test.setAVal("RIPL", hss->target->FPR->cp_hex(ntohs(ip2->ip_len)));

返回的探针 IP ID 值 ( `RID`)

UDP测试中，对icmp返回包中ip结构的id校验

  /* Now lets see how they treated the ID we sent ... */
  if (ntohs(ip2->ip_id) == hss->upi.ipid)
    test.setAVal("RID", "G"); /* The good "expected" value */
  else
    test.setAVal("RID", hss->target->FPR->cp_hex(ntohs(ip2->ip_id)));

返回的探测 IP 校验和值的完整性 ( `RIPCK`)

UDP测试中，对icmp返回包中ip完整性校验

  /* Let us see if the IP checksum we got back computes */

  /* Thanks to some machines not having struct ip member ip_sum we
     have to go with this BS */
  checksumptr = (unsigned short *)   ((char *) ip2 + 10);
  checksum = *checksumptr;

  if (checksum == 0) {
    test.setAVal("RIPCK", "Z");
  } else {
    *checksumptr = 0;
    if (in_cksum((unsigned short *)ip2, 20) == checksum) {
      test.setAVal("RIPCK", "G"); /* The "expected" good value */
    } else {
      test.setAVal("RIPCK", "I"); /* They modified it */
    }
    *checksumptr = checksum;
  }

返回的探测 UDP 校验和的完整性 ( `RUCK`)

  /* UDP checksum */
  if (udp->uh_sum == hss->upi.udpck)
    test.setAVal("RUCK", "G"); /* The "expected" good value */
  else
    test.setAVal("RUCK", hss->target->FPR->cp_hex(ntohs(udp->uh_sum)));

返回的 UDP 数据的完整性 ( `RUD`)

  /* Finally we ensure the data is OK */
  datastart = ((unsigned char *)udp) + 8;
  dataend = (unsigned char *)  ip + ntohs(ip->ip_len);

  while (datastart < dataend) {
    if (*datastart != hss->upi.patternbyte)
      break;
    datastart++;
  }
  if (datastart < dataend)
    test.setAVal("RUD", "I"); /* They modified it */
  else
    test.setAVal("RUD", "G");

ICMP 响应代码 ( `CD`)


  /* ICMP Code value. Test values:
   * [Value]. Both set Code to the same value [Value];
   * S. Both use the Code that the sender uses;
   * O. Other.
   */
  value1 = icmp1->icmp_code;
  value2 = icmp2->icmp_code;
  if (value1 == value2) {
    if (value1 == 0)
      test.setAVal("CD", "Z");
    else
      test.setAVal("CD", hss->target->FPR->cp_hex(value1));
  }
  else if (value1 == 9 && value2 == 0)
    /* both the same as in the corresponding probe */
    test.setAVal("CD", "S");
  else
    test.setAVal("CD", "O");

指纹匹配

nmap os指纹文件在https://github.com/nmap/nmap/blob/master/nmap-os-db

nmap指纹解析

Fingerprint关键字定义一个新的指纹，紧随其后的是指纹名字。

Class行用于指定该指纹所属的类别，依次指定该系统的vendor（生产厂家）,OS family（系统类别）,OS generation（第几代操作系统）,and device type（设备类型）。

接下来是CPE行，此行非常重要，使用CPE（CommonPlatformEnumeration，通用平台枚举）格式描述该系统的信息。以标准的CPE格式来描述操作系统类型，便于Nmap与外界信息的交换，比如可以很快从网上开源数据库查找到CPE描述的操作系统具体信息。

剩下SEQ OPS WIN 等等即各类指纹类型。

表达式

指纹可以是表达式类型的，nmap支持的表达式

大于：>
小于：<
范围： 1-3
或关系： GCD=1-6|64|256

指纹解析代码

nmap的指纹解析在parse_fingerprint_file函数。通过遍历每一行，n和#开头的跳过，Fingerprint开头则创建新指纹，Class ，CPE开头即新指纹的属性，发现(、)字符即新的指纹内容。

精简后的代码如下


FingerPrintDB *parse_fingerprint_file(const char *fname) {

  fp = fopen(fname, "r");

top:
  while (fgets(line, sizeof(line), fp)) {
    lineno++;
    /* Read in a record */
    // 换行 和 # 开头的跳过
    if (*line == 'n' || *line == '#')
      continue;

fparse:
    if (strncmp(line, "Fingerprint", 11) == 0) {
      // 指纹的开始行,创建新指纹
      current = new FingerPrint;
    } else if (strncmp(line, "MatchPoints", 11) == 0) {
      // 这是MatchPoint
    } else {
      error("Parse error on line %d of nmap-os-db file: %s", lineno, line);
      continue;
    }

    DB->prints.push_back(current);
    
    p = line + 12;
    while (*p && isspace((int) (unsigned char) *p))
      p++;

    q = strpbrk(p, "n#");
    while (isspace((int) (unsigned char) *(--q)));

    current->match.OS_name = cp_strndup(p, q - p + 1); // 当前指纹os name
    current->match.line = lineno; // 当前指纹行数

    /* Now we read the fingerprint itself */
    while (fgets(line, sizeof(line), fp)) {
      lineno++;
      if (*line == '#')
        continue;
      if (*line == 'n')
        break;

      q = strchr(line, 'n');

      if (0 == strncmp(line, "Fingerprint ",12)) {
        goto fparse;
      } else if (strncmp(line, "Class ", 6) == 0) {
        parse_classline(current, line, q, lineno);
      } else if (strncmp(line, "CPE ", 4) == 0) {
        parse_cpeline(current, line, q, lineno);
      } else {
        p = line;
        q = strchr(line, '(');
        FingerTest test(FPstr(p, q), *DB->MatchPoints);
        p = q+1;
        q = strchr(p, ')');
        if (!test.str2AVal(p, q)) {
          error("Parse error on line %d of nmap-os-db file: %s", lineno, line);
          goto top;
        }
        current->setTest(test);
      }
    }
  }

  fclose(fp);
  return DB;
}

指纹匹配算法

指纹的第一行有个叫MatchPoints的东东，它不是指纹，它定义了指纹的权重

# This first element provides the number of points every fingerprint
# test is worth.  Tests like TTL or Don't fragment are worth less
# (insectionidually) because there are so many of them and the values are
# often correlated with each other.  Meanwhile, elements such as TS
# (TCP timestamp) which are only used once, get more points.  Points
# are used when there are no perfect matches to determine which OS
# fingerprint matches a target machine most closely.
MatchPoints
SEQ(SP=25%GCD=75%ISR=25%TI=100%CI=50%II=100%SS=80%TS=100)
OPS(O1=20%O2=20%O3=20%O4=20%O5=20%O6=20)
WIN(W1=15%W2=15%W3=15%W4=15%W5=15%W6=15)
ECN(R=100%DF=20%T=15%TG=15%W=15%O=15%CC=100%Q=20)
T1(R=100%DF=20%T=15%TG=15%S=20%A=20%F=30%RD=20%Q=20)
T2(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T3(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T4(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T5(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T6(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T7(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
U1(R=50%DF=20%T=15%TG=15%IPL=100%UN=100%RIPL=100%RID=100%RIPCK=100%RUCK=100%RUD=100)
IE(R=50%DFI=40%T=15%TG=15%CD=100)

nmap通过逐行对比指纹，指纹正确加上权重的分数，最后对每个指纹计算一个概率，即成功分数/总数，输出概率高的指纹。

nmap默认的概率阈值是0.85，即概率小于这个数，则认为指纹不准确了。

表达式解析

具体的指纹匹配函数，包含解析表达式。val是扫描生成的指纹数值，expr是内置OS库中指纹数值。

用 | 分割expr，分别比对,比对成功直接返回 true
将val转为数字，如果val 是数字，就根据逻辑比对 < > - 符号
对于其他结果，就直接对比 val和expr的文本即可。

/* Compare an observed value (e.g. "45") against an OS DB expression (e.g.
   "3B-47" or "8|A" or ">10"). Return true iff there's a match. The syntax uses
     < (less than)
     > (greater than)
     | (or)
     - (range)
   No parentheses are allowed. */
static bool expr_match(const char *val, const char *expr) {
  const char *p, *q, *q1;  /* OHHHH YEEEAAAAAHHHH!#!@#$!% */
  char *endptr;
  unsigned int val_num, expr_num, expr_num1;
  bool is_numeric;

  p = expr;

  val_num = strtol(val, &endptr, 16);
  is_numeric = !*endptr;
  // TODO: this could be a lot faster if we compiled fingerprints to a bytecode
  // instead of re-parsing every time.
  do {
    q = strchr(p, '|');
    if (is_numeric && (*p == '<' || *p == '>')) {
      expr_num = strtol(p + 1, &endptr, 16);
      if (endptr == q || !*endptr) {
        if ((*p == '<' && val_num < expr_num)
            || (*p == '>' && val_num > expr_num)) {
          return true;
        }
      }
    } else if (is_numeric && ((q1 = strchr(p, '-')) != NULL)) {
      expr_num = strtol(p, &endptr, 16);
      if (endptr == q1) {
        expr_num1 = strtol(q1 + 1, &endptr, 16);
        if (endptr == q || !*endptr) {
          assert(expr_num1 > expr_num);
          if (val_num >= expr_num && val_num <= expr_num1) {
            return true;
          }
        }
      }
    } else {
      if ((q && !strncmp(p, val, q - p)) || (!q && !strcmp(p, val))) {
        return true;
      }
    }
    if (q)
      p = q + 1;
  } while (q);

  return false;
}

用Golang造轮子

上述只是原理，真正写代码调试的时候会遇到更多困难。为什么我发的包和nmap不一样，为什么指纹计算的结果和nmap不一样，为什么指纹匹配不到。有几个小技巧可以缓解一下这些问题。

nmap 有个参数可以显示发送的包 --packet-trace，再将-vv 详细输出打开，事先找好Open和Close的TCP端口，用以下命令

nmap -p80,20 --packet-trace -vv -O 123.123.123.123

能看到具体发送的包和返回的包

以及最终输出的指纹

发包和收包用的是 https://github.com/google/gopacket

它基于libpcap，支持多个系统，并且可以发原始数据包。之前写过 https://github.com/boy-hack/ksubdomain ，所以对这个库也很熟悉。

nmap也是基于libpcap发包的。

数据链路层

因为这个包太底层，以至于要自己组装协议，数据链路层，网络层，传输层，都要自己组装。

对于数据链路层，需要知道自己的网卡Mac以及目标网卡Mac，对于外网，目标网卡就是路由器的Mac，对于内网，目标网卡就是arp探测到的mac。

在ksubdomain中，我用了一个取巧的方式，我先调用系统的nslookup去请求一个地址，然后监听返回包，返回包中会包含数据链路层，直接拿来用即可。

这次我采用一个古老又复杂的方法，是 https://github.com/v-byte-cpu/sx 这个项目提供的灵感，但它也没有很好实现。

先确定目标IP
调用系统路由表，确定目标IP该使用的网卡，此时就能获得本地网卡的mac地址和网关IP。
调用系统arp缓存，查找目标ip是否有，如果有，就能获得目标地址的网卡mac（无论是内网主机的mac还是网卡的mac，缓存中都有）
如果系统arp缓存没有找到，则使用arp协议，像内网段广播获得网关的mac，如果目标IP是同一个网段，则广播查找目标IP的Mac。

其他

其他部分暂时没有踩坑，按照nmap源码的实现来就行了。

参考

https://zhuanlan.zhihu.com/p/66075884
https://nmap.org/book/osdetect-methods.html

原文始发于微信公众号（Hacking就是好玩）：nmap os detection原理及golang实现

免责声明:文章中涉及的程序(方法)可能带有攻击性，仅供安全研究与教学之用，读者将其信息做其他用途，由读者承担全部法律及连带责任，本站不承担任何法律及连带责任；如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截，联系方式见首页)，望知悉。

左青龙
微信扫一扫

右白虎
微信扫一扫

基本原理

TCP探针

ICMP探针

UDP探针

指纹生成

TCP ISN 最大公约数 ( GCD)

TCP ISN 计数器速率 ( ISR)

TCP ISN 序列可预测性指数 ( SP)

IP ID序列生成算法( TI, CI, II)

共享 IP ID 序列布尔值 ( SS)

TCP 时间戳选项算法 ( TS)

TCP 选项 ( O, O1–O6)

TCP 初始窗口大小 ( W, W1– W6)

响应度 ( R)

IP 不分片位 ( DF)

不要分段 (ICMP) ( DFI)

IP 初始生存时间 ( T)

IP 初始生存时间猜测 ( TG)

显式拥塞通知 ( CC)

TCP 杂项怪癖 ( Q)

TCP 序列号 ( S)

TCP 确认号 ( A)

TCP 标志 ( F)

TCP RST 数据校验和 ( RD)

IP总长( IPL)

未使用的端口不可达字段非零 ( UN)

返回的探测 IP 总长度值 ( RIPL)

返回的探针 IP ID 值 ( RID)

返回的探测 IP 校验和值的完整性 ( RIPCK)

返回的探测 UDP 校验和的完整性 ( RUCK)

返回的 UDP 数据的完整性 ( RUD)

ICMP 响应代码 ( CD)