【万事屋】一个区域斜向经纬网格的补全(下)

2023年9月11日11:34:22评论7 views字数 19842阅读66分8秒阅读模式

以下全文代码均已发布至和鲸社区，复制下面链接，可一键fork跑通：https://www.heywhale.com/mw/project/64f1be79abb16766700babe6?shareby=620ce369c1ae5e001747085e#

预警：本期内容相当干燥

大家好，在上一期《一个区域斜向经纬网格的补全(上)》中，我们先对原始表格数据进行了一轮“清洗”，把中间缺失的点用 NaN 补齐了。现在本期我们将开始这道题的重头戏，就是重建斜向坐标网格。

有了上一期中对坐标点的一些查找经验，我们知道了要怎么沿着斜纵列和斜横行进行坐标查找，于是我们可以复用一些上一期的代码，对其进行细微的改造，先导入我们要用的函数：

import os
from typing import List, Tuple, Optional, Union

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from tqdm import tqdm

然后构建4个坐标查找函数

def find_nearest_of_left_lower(
        lons: np.ndarray, 
        lats: np.ndarray, 
        lon: float, 
        lat: float, 
        existed_indices: List[int]) -> Optional[Union[int, Tuple[int, int]]]:
    """
    根据给定的经纬度坐标，找到左下方的最近点的索引。
    
    参数:
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        lon (float): 目标经度值。
        lat (float): 目标纬度值。
        existed_indices (List[int]): 已经存在的索引列表。
        
    返回:
        Optional[Union[int, Tuple[int, int]]]: 返回左下方的最近点的索引，如果找不到则返回None。
    """
    
    # 找出所有在目标点左下方的点的索引
    filter_idx = np.where((lons < lon)&(lats < lat))
    lons1 = lons[filter_idx]
    lats1 = lats[filter_idx]
    
    # 计算这些点与目标点的距离
    distance = ((lons1 - lon) ** 2 + (lats1 - lat) ** 2) ** 0.5
    
    # 对距离进行排序，并筛选在DIST_LIMIT以内并且大于0的距离
    sorted_dist = np.sort(distance)
    sorted_dist = sorted(sorted_dist[(sorted_dist <= DIST_LIMIT) & (sorted_dist > 0)])
    
    # 遍历排序后的距离，找到第一个不在existed_indices中的点的索引并返回
    if len(sorted_dist) > 0:
        for min_dist in sorted_dist:
            idx_ = np.where(distance == min_dist)
            lon_ = lons1[idx_]
            lat_ = lats1[idx_]
            idx = np.where((lons == lon_) & (lats == lat_))
            if idx[0][0] not in existed_indices:
                return idx

    return None

def find_nearest_of_right_upper(
        lons: np.ndarray, 
        lats: np.ndarray, 
        lon: float, 
        lat: float, 
        existed_indices: List[int]) -> Optional[Union[int, Tuple[int, int]]]:
    """
    根据给定的经纬度坐标，找到右上方的最近点的索引。
    
    参数:
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        lon (float): 目标经度值。
        lat (float): 目标纬度值。
        existed_indices (List[int]): 已经存在的索引列表。
        
    返回:
        Optional[Union[int, Tuple[int, int]]]: 返回右上方的最近点的索引，如果找不到则返回None。
    """
    
    # 找出所有在目标点右上方的点的索引
    filter_idx = np.where((lons > lon) & (lats > lat))
    lons1 = lons[filter_idx]
    lats1 = lats[filter_idx]
    
    # 计算这些点与目标点的距离
    distance = ((lons1 - lon) ** 2 + (lats1 - lat) ** 2) ** 0.5
    
    # 对距离进行排序，并筛选在DIST_LIMIT以内并且大于0的距离
    sorted_dist = np.sort(distance)
    sorted_dist = sorted(sorted_dist[(sorted_dist <= DIST_LIMIT) & (sorted_dist > 0)])
    
    # 遍历排序后的距离，找到第一个不在existed_indices中的点的索引并返回
    if len(sorted_dist) > 0:
        for min_dist in sorted_dist:
            idx_ = np.where(distance == min_dist)
            lon_ = lons1[idx_]
            lat_ = lats1[idx_]
            idx = np.where((lons == lon_) & (lats == lat_))
            if idx[0][0] not in existed_indices:
                return idx

    return None
    
def find_nearest_of_right_lower(
        lons: np.ndarray, 
        lats: np.ndarray, 
        lon: float, 
        lat: float) -> Optional[Union[int, Tuple[int, int]]]:
    """
    根据给定的经纬度坐标，找到右下方的最近点的索引。
    
    参数:
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        lon (float): 目标经度值。
        lat (float): 目标纬度值。
        
    返回:
        Optional[Union[int, Tuple[int, int]]]: 返回右下方的最近点的索引，如果找不到则返回None。
    """
    
    # 找出所有在目标点右下方的点的索引
    filter_idx = np.where((lons > lon) & (lats < lat))
    lons1 = lons[filter_idx]
    lats1 = lats[filter_idx]
    
    # 计算这些点与目标点的距离
    distance = ((lons1 - lon) ** 2 + (lats1 - lat) ** 2) ** 0.5
    
    # 对距离进行筛选，选出在DIST_LIMIT以内并且大于0的距离
    sorted_dist = sorted(distance[(distance <= DIST_LIMIT) & (distance > 0)])
    
    # 若存在满足条件的距离，找到距离最近的那个点的索引并返回
    if len(sorted_dist) > 0:
        idx_ = np.where(distance == sorted_dist[0])
        lon_ = lons1[idx_]
        lat_ = lats1[idx_]
        idx = np.where((lons == lon_) & (lats == lat_))
        return idx

    return None


def find_nearest_of_left_upper(
        lons: np.ndarray, 
        lats: np.ndarray, 
        lon: float, 
        lat: float) -> Optional[Union[int, Tuple[int, int]]]:
    """
    根据给定的经纬度坐标，找到左上方的最近点的索引。
    
    参数:
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        lon (float): 目标经度值。
        lat (float): 目标纬度值。
        
    返回:
        Optional[Union[int, Tuple[int, int]]]: 返回左上方的最近点的索引，如果找不到则返回None。
    """
    
    # 找出所有在目标点左上方的点的索引
    filter_idx = np.where((lons < lon) & (lats > lat))
    lons1 = lons[filter_idx]
    lats1 = lats[filter_idx]
    
    # 计算这些点与目标点的距离
    distance = ((lons1 - lon) ** 2 + (lats1 - lat) ** 2) ** 0.5
    
    # 对距离进行筛选，选出在DIST_LIMIT以内并且大于0的距离
    sorted_dist = sorted(distance[(distance <= DIST_LIMIT) & (distance > 0)])
    
    # 若存在满足条件的距离，找到距离最近的那个点的索引并返回
    if len(sorted_dist) > 0:
        idx_ = np.where(distance == sorted_dist[0])
        lon_ = lons1[idx_]
        lat_ = lats1[idx_]
        idx = np.where((lons == lon_) & (lats == lat_))
        return idx

    return None

这四个函数主要肩负着“按图索骥”的具体步骤，之所以要把四个方向都集齐，是因为为了结果的完整，我们需要从左上向右下做一次查找，再从右下向左上查找一次，最后把它们合并起来去重得到最终的结果。

需要注意的是，我们在这里对 find_nearest_of_left_lower 和 find_nearest_of_right_upper 做了一点小升级，增加了一个根据需要进行的去重的功能，传入 existed_indices 参数，可以实现让函数在本轮查找的过程中排除那些已经在上轮中查询过的点，以防止非预期的情况发生（例如查询方向改变）。

现在我们要把上一期的 seek_col_elements 函数也做一次升级。

def seek_col_elements(
        idx: Tuple[int, int], 
        lons: np.ndarray, 
        lats: np.ndarray, 
        existed_indices: List[int]) -> Tuple[List[float], List[float], Optional[Tuple[int, int]], List[int]]:
    """
    在给定的经纬度数组中，基于起始索引查找一列的元素，并返回这些元素的经纬度和索引。
    
    参数:
        idx (Tuple[int, int]): 起始点的索引。
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        existed_indices (List[int]): 已经存在的索引列表。
        
    返回:
        Tuple[List[float], List[float], Optional[Tuple[int, int]], List[int]]:
            - col_lons (List[float]): 找到的经度值列。
            - col_lats (List[float]): 找到的纬度值列。
            - next_idx (Optional[Tuple[int, int]]): 下一个可能的起始索引。
            - indices (List[int]): 找到的点的索引列。
    """
    
    first_idx = idx
    col_lons = []
    col_lats = []
    indices = []
    
    # 从起始索引开始，持续查找左下方的最近点，并添加到结果列表中
    while idx is not None:
        col_lons.append(lons[idx][0])
        col_lats.append(lats[idx][0])
        indices.append(idx[0][0])
        idx = find_nearest_of_left_lower(lons, lats, lons[idx], lats[idx], existed_indices)

    # 查找右下方的最近点，作为下一个可能的起始索引
    next_idx = find_nearest_of_right_lower(lons, lats, lons[first_idx], lats[first_idx])
    
    return col_lons, col_lats, next_idx, indices
    
def seek_col_elements2(
        idx: Tuple[int, int], 
        lons: np.ndarray, 
        lats: np.ndarray, 
        existed_indices: List[int]) -> Tuple[List[float], List[float], Optional[Tuple[int, int]], List[int]]:
    """
    在给定的经纬度数组中，基于起始索引查找一列的元素，并返回这些元素的经纬度和索引。
    与 `seek_col_elements` 相反，此函数从起始索引开始查找右上方的点。
    
    参数:
        idx (Tuple[int, int]): 起始点的索引。
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        existed_indices (List[int]): 已经存在的索引列表。
        
    返回:
        Tuple[List[float], List[float], Optional[Tuple[int, int]], List[int]]:
            - col_lons (List[float]): 找到的经度值列。
            - col_lats (List[float]): 找到的纬度值列。
            - next_idx (Optional[Tuple[int, int]]): 下一个可能的起始索引。
            - indices (List[int]): 找到的点的索引列。
    """
    
    first_idx = idx
    col_lons = []
    col_lats = []
    indices = []
    
    # 从起始索引开始，持续查找右上方的最近点，并添加到结果列表中
    while idx is not None:
        col_lons.append(lons[idx][0])
        col_lats.append(lats[idx][0])
        indices.append(idx[0][0])
        idx = find_nearest_of_right_upper(lons, lats, lons[idx], lats[idx], existed_indices)

    # 查找左上方的最近点，作为下一个可能的起始索引
    next_idx = find_nearest_of_left_upper(lons, lats, lons[first_idx], lats[first_idx])
    
    return col_lons, col_lats, next_idx, indices

在这里，我们除了 seek_col_elements 以外还引入了 seek_col_elements2 函数，二者互相为相反的查询逻辑，即seek_col_elements 是左上->右下，seek_col_elements2 是右下->左上。另外，两个行数相比上一期的那个版本的查找引入了 existed_indices 函数增加去重逻辑，同时在返回值上增加了数据索引值 indices 的返回，因为最后当我们要重塑表格的时候，需要按照新的顺序来排列原有的数据表，必须要有原始数据按坐标点的对应索引才能正确重塑。

下面，我们的一个重头戏来了，那就是对单列坐标点进行延长扩展，如果我们想要让原始坐标转为一个整齐斜向的规则矩阵，就必须将其行或列扩展补齐到满足预期规则的长度，下面就是实现一个在所在列上向前向后各延长一定长度，并返回相应坐标参数的函数：

def extend_col_coords(
        first_idx: Tuple[int, int], 
        lons: np.ndarray, 
        lats: np.ndarray, 
        existed_inidces: List[int],
        max_row: int,
        min_row: int) -> Tuple[List[float], List[float], List[int], List[float], List[float], List[int], Tuple[int, int]]:
    """
    基于已知的列坐标，向上和向下延长坐标序列，直到达到指定的最大和最小行数。
    
    参数:
        first_idx (Tuple[int, int]): 起始点的索引。
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        existed_inidces (List[int]): 已经存在的索引列表。
        max_row (int): 最大行数限制。
        min_row (int): 最小行数限制。
        
    返回:
        Tuple[List[float], List[float], List[int], List[float], List[float], List[int], Tuple[int, int]]:
            - x1s (List[float]): 向上延长的经度值列表。
            - y1s (List[float]): 向上延长的纬度值列表。
            - row1 (List[int]): 向上延长的行号列表。
            - x2s (List[float]): 向下延长的经度值列表。
            - y2s (List[float]): 向下延长的纬度值列表。
            - row2 (List[int]): 向下延长的行号列表。
            - next_idx (Tuple[int, int]): 下一个可能的起始索引。
    """
    
    col_lons, col_lats, next_idx, indices = seek_col_elements(first_idx, lons, lats, existed_inidces)
    
    # 计算经纬度值之间的差值
    deltax = np.array(col_lons)[1:] - np.array(col_lons)[:-1]
    deltay = np.array(col_lats)[1:] - np.array(col_lats)[:-1]
    
    # 计算平均差值
    deltax_mean = deltax.mean()
    deltay_mean = deltay.mean()
    
    # 使用已知的经纬度值创建一个插值函数
    interp_func = interp1d(col_lats, col_lons, fill_value='extrapolate')
    
    y1s = []
    x1s = []
    y2s = []
    x2s = []
    row1 = []
    row2 = []
    
    # 向上延长坐标序列
    i = 0
    row = len(col_lons)
    while row < max_row:
        y1 = i*deltay_mean+col_lats[-1]
        x1 = float(interp_func(y1))
        x1s.append(x1)
        y1s.append(float(y1))
        row = len(col_lons) + i
        row1.append(row)
        i += 1

    # 向下延长坐标序列
    i = 1
    row = 0
    while row > min_row:
        y2 = col_lats[0]-i*deltay_mean
        x2 = float(interp_func(y2))
        x2s.append(x2)
        y2s.append(float(y2))
        row = 0 - i
        row2.append(0 - i)
        i += 1

    return x1s, y1s, row1, x2s, y2s, row2, next_idx

当然，这是左上->右下查找模式的函数，对应地，右下->左上模式的函数：

def extend_col_coords2(
        first_idx: Tuple[int, int], 
        lons: np.ndarray, 
        lats: np.ndarray, 
        existed_inidces: List[int],
        max_row: int,
        min_row: int) -> Tuple[List[float], List[float], List[int], List[float], List[float], List[int], Tuple[int, int]]:
    """
    基于已知的列坐标，向上和向下延长坐标序列，直到达到指定的最大和最小行数。
    这是为 `seek_col_elements2` 函数定制的版本。
    
    参数:
        first_idx (Tuple[int, int]): 起始点的索引。
        lons (np.ndarray): 所有经度值的数组。
        lats (np.ndarray): 所有纬度值的数组。
        existed_inidces (List[int]): 已经存在的索引列表。
        max_row (int): 最大行数限制。
        min_row (int): 最小行数限制。
        
    返回:
        Tuple[List[float], List[float], List[int], List[float], List[float], List[int], Tuple[int, int]]:
            - x1s (List[float]): 向上延长的经度值列表。
            - y1s (List[float]): 向上延长的纬度值列表。
            - row1 (List[int]): 向上延长的行号列表。
            - x2s (List[float]): 向下延长的经度值列表。
            - y2s (List[float]): 向下延长的纬度值列表。
            - row2 (List[int]): 向下延长的行号列表。
            - next_idx (Tuple[int, int]): 下一个可能的起始索引。
    """
    
    col_lons, col_lats, next_idx, indices = seek_col_elements2(first_idx, lons, lats, existed_inidces)
    
    # 计算经纬度值之间的差值
    deltax = np.array(col_lons)[1:] - np.array(col_lons)[:-1]
    deltay = np.array(col_lats)[1:] - np.array(col_lats)[:-1]
    
    # 计算平均差值
    deltax_mean = deltax.mean()
    deltay_mean = deltay.mean()
    
    # 使用已知的经纬度值创建一个插值函数
    interp_func = interp1d(col_lats, col_lons, fill_value='extrapolate')
    
    y1s = []
    x1s = []
    y2s = []
    x2s = []
    row1 = []
    row2 = []
    
    # 向上延长坐标序列
    i = 0
    row = len(col_lons)
    while row < max_row:
        y1 = i*deltay_mean+col_lats[-1]
        x1 = float(interp_func(y1))
        x1s.append(x1)
        y1s.append(float(y1))
        row = len(col_lons) + i
        row1.append(row)
        i += 1

    # 向下延长坐标序列
    i = 1
    row = 0
    while row > min_row:
        y2 = col_lats[0]-i*deltay_mean
        x2 = float(interp_func(y2))
        x2s.append(x2)
        y2s.append(float(y2))
        row = 0 - i
        row2.append(0 - i)
        i += 1

    return x1s, y1s, row1, x2s, y2s, row2, next_idx

All right！准备工作已经完成，下面我们来真正开始坐标查找。我们还沿用上一期中提到的 DIST_LIMIT 参数值，先来按照类似上期的逻辑来查找补齐后数据的左上角点。

DIST_LIMIT = 0.011  # 相邻点距离应不超过这个数
df = pd.read_csv('./complete.csv')
df = df.sort_values(['lon', 'lat'])
lons = df['lon'].values
lats = df['lat'].values
# 查找最左上角的点
distance = ((lons - lons.min()) ** 2 + (lats - lats.max()) ** 2) ** 0.5
sorted_dist = sorted(distance)
first_idx = np.where(distance==sorted_dist[0])
first_idx
(array([184]),)

可以看到，左上角点还是之前的 184 位置，也是符合预期的。下面我们要让 extend_col_coords 函数小试牛刀，从左上到右下扩展补齐一下各斜向列的长度，让它形成一个整齐的斜向矩阵。

# 初始化next_idx为first_idx
next_idx = first_idx

# 创建一个20x20的图形
plt.figure(figsize=(20,20))

# 初始化一个字典来存储延伸后的坐标
extend1 = {}
col = 0  # 初始化列计数器

# 当还有下一个坐标点时进行循环
while next_idx is not None:
    # 通过extend_col_coords函数获取延伸后的坐标点
    x1s, y1s, row1, x2s, y2s, row2, next_idx = extend_col_coords(next_idx, lons, lats, set(), max_row=max_row, min_row=min_row)
    
    # 在图形上以绿色绘制x1s和y1s的坐标点
    plt.scatter(x1s, y1s, s=5, c='g') 
    # 在图形上以红色绘制x2s和y2s的坐标点
    plt.scatter(x2s, y2s, s=5, c='r',alpha=0.5) 
    
    # 更新extend1字典以存储当前列的坐标信息
    extend1[col] = {
        'lons': x1s + x2s,
        'lats': y1s + y2s,
        'row': row1 + row2
    }
    col += 1  # 更新列计数器

# 在图形上以灰色绘制lons和lats的原始坐标点
plt.scatter(lons, lats, s=5, c='grey',alpha=0.5)

【万事屋】一个区域斜向经纬网格的补全(下)

结果如图，可以看到该函数把左下方很好地做了“补齐”，但是在右上方似乎与原始矩阵的位置有了不少重合，这是由于我们的行查找方式不是沿着原始坐标上边界查找的。不过问题不大，我们后续会对它们进行去重处理，现在我们当务之急是构建能覆盖完整原始坐标的矩阵坐标，即使有重复也先不管。

现在我们按右下->左上的查找方式再进行一次查找。

# 查找右下角的点
distance = ((lons - lons.max()) ** 2 + (lats - lats.min()) ** 2) ** 0.5
sorted_dist = sorted(distance)
last_idx = np.where(distance==sorted_dist[0])

# 初始化next_idx为last_idx
next_idx = last_idx

# 创建一个20x20的图形
plt.figure(figsize=(20,20))

# 初始化一个变量来跟踪已经设置的坐标点
already_set = None

# 初始化一个字典来存储延伸后的坐标
extend2 = {}
col = 0  # 初始化列计数器

# 当还有下一个坐标点时进行循环
while next_idx is not None:
    # 通过extend_col_coords2函数获取延伸后的坐标点
    x1s, y1s, row1, x2s, y2s, row2, next_idx = extend_col_coords2(next_idx, lons, lats, set(), max_row=max_row, min_row=min_row)
    
    # 在图形上以绿色绘制x1s和y1s的坐标点
    plt.scatter(x1s, y1s, s=5, c='g') 
    # 在图形上以红色绘制x2s和y2s的坐标点
    plt.scatter(x2s, y2s, s=5, c='r',alpha=0.5) 
    
    # 更新extend2字典以存储当前列的坐标信息
    extend2[col] = {
        'lons': x1s + x2s,
        'lats': y1s + y2s,
        'row': row1 + row2
    }
    col += 1  # 更新列计数器

# 在图形上以灰色绘制lons和lats的原始坐标点
plt.scatter(lons, lats, s=5, c='grey',alpha=0.5)

【万事屋】一个区域斜向经纬网格的补全(下)

可以看出来第二种查找方式对右上部分进行了很好的补齐，在左下部分与原始数据有重合，而如果我们把两种查询方式查询的结果融合在一起再去重，就能构建出完整的斜向格点矩阵了。

现在我们开始进行去重工作，去重的原理其实就是查找最近点，我们取一个阈值，凡是两点距离低于该阈值的，都可被视为同一个点，并根据实际需要来选择删除哪一个点。为了提高查找效率，在这个过程中应该尽量使用 numpy 矩阵的向量运算技巧。

我们先以左上->右下查找方式的扩展点为例：

# 初始化两个列表，用于存储所有的延伸坐标点
extend1_lons = []
extend1_lats = []

# 遍历extend1字典中的每一列坐标数据
for col, line in extend1.items():
    extend1_lons_ = line['lons']
    extend1_lats_ = line['lats']

    # 合并坐标点
    extend1_lons += extend1_lons_
    extend1_lats += extend1_lats_

# 将延伸坐标点组合成一个NumPy数组
extend1_coords = np.array(list(zip(extend1_lons, extend1_lats)))
# 将原始坐标点组合成一个NumPy数组
raw_coords = np.array(list(zip(lons, lats)))

# 初始化一个列表，用于存储与原始坐标点非常接近的扩展坐标点的索引
nearby_indices = []

# 遍历每个原始坐标点
for coord in tqdm(raw_coords):
    # 计算与每个延伸坐标点的欧氏距离
    dist = np.sum((extend1_coords - coord) ** 2, axis=1) ** 0.5
    # 找到距离小于0.005的坐标点的索引
    idx = np.where((dist < 0.005)&(dist>=0))
    nearby = dist[idx]
    
    # 如果找到接近的坐标点，将其索引添加到列表中
    if len(nearby) > 0:
        nearby_indices.append(idx[0][0])

# 删除与原始坐标点非常接近的扩展坐标点
new_extend1_coords = np.delete(extend1_coords, nearby_indices, axis=0)

然后我们画出图来：

【万事屋】一个区域斜向经纬网格的补全(下)

可以看到延长点的重复部分已经完全去除，类似方法套用到右下->左上的结果：

extend2_lons = []
extend2_lats = []
for col, line in extend2.items():
    extend2_lons_ = line['lons']
    extend2_lats_ = line['lats']

    extend2_lons += extend2_lons_
    extend2_lats += extend2_lats_

extend2_coords = np.array(list(zip(extend2_lons, extend2_lats)))
nearby_indices = []
for coord in tqdm(raw_coords):
    dist = np.sum((extend2_coords - coord) ** 2, axis=1) ** 0.5
    idx = np.where((dist < 0.005)&(dist>=0))
    nearby = dist[idx]
    if len(nearby) > 0:
        nearby_indices.append(idx[0][0])
new_extend2_coords = np.delete(extend2_coords, nearby_indices, axis=0)

我们把两次延长的结果画出图来：

# 出图检查两种方案延长的坐标数据（未去重）
plt.figure(figsize=(20,20))
plt.scatter(new_extend1_coords[:,0], new_extend1_coords[:,1], s=5, c='r',alpha=0.5)
plt.scatter(new_extend2_coords[:,0], new_extend2_coords[:,1], s=5, c='r',alpha=0.5)
plt.scatter(lons, lats, s=5, c='grey',alpha=0.5)

【万事屋】一个区域斜向经纬网格的补全(下)

可以看到，两次扩展的结果互相之间也会有重合部分，那么我们在它们身上再套用一次去重逻辑：

nearby_indices = []
for coord in tqdm(new_extend1_coords):
    dist = np.sum((new_extend2_coords - coord) ** 2, axis=1) ** 0.5
    idx = np.where((dist < 0.005)&(dist>=0))
    nearby = dist[idx]
    if len(nearby) > 0:
        nearby_indices.append(idx[0][0])
new_extend2_coords = np.delete(new_extend2_coords, nearby_indices, axis=0)

现在我们再画出图来看一下：

plt.figure(figsize=(20,20))
plt.scatter(new_extend1_coords[:,0], new_extend1_coords[:,1], s=5, c='r',alpha=0.5)
plt.scatter(new_extend2_coords[:,0], new_extend2_coords[:,1], s=5, c='r',alpha=0.5)
plt.scatter(lons, lats, s=5, c='grey',alpha=0.5)

【万事屋】一个区域斜向经纬网格的补全(下)

可以看到斜向坐标矩阵基本已经构建出来了。现在我们要做的是把这些红色区域的点，按 NaN 的数据值填入原数据表中。

new_extend_coords = np.concatenate([new_extend1_coords, new_extend2_coords], axis=0)
extend_df = pd.DataFrame({'lon': new_extend_coords[:,0], 'lat': new_extend_coords[:,1], 'RASTERVALU': np.nan})
new_df = pd.concat([df, extend_df])
new_df.to_csv('./middle_result.csv', index=True)

最后，再重新套用一遍之前的左上->右下的查找逻辑，并在过程中记录斜向行列号和重新排序后的数据值。

# 查找最左上角的索引值
new_df = new_df.sort_values(['lon', 'lat'])

values = new_df['RASTERVALU'].values
lons = new_df['lon'].values
lats = new_df['lat'].values
distance = ((lons - lons.min()) ** 2 + (lats - lats.max()) ** 2) ** 0.5
sorted_dist = sorted(distance)
first_idx = np.where(distance==sorted_dist[0])

col_n = 1
existed_inidces = set()
next_idx = first_idx
columns = []
while next_idx is not None:
    col_lons, col_lats, next_idx_, indices  = seek_col_elements(next_idx, lons, lats, existed_inidces)
    columns.append({
        'col_n': [col_n] * len(indices), # 坐标所属的列号
        'lons': col_lons, 
        'lats': col_lats,
        'values': values[indices],  # 原始数据值的按新坐标顺序取值
        'row_n': list(range(len(indices)))  # 坐标所属的行号
    })
    next_idx = next_idx_
    existed_inidces |= set(indices)   # 在循环中，每轮都更新已存在索引集合，用于下一轮的去重
    col_n += 1
    
dfs = []
for column in columns:
    dfs.append(pd.DataFrame(column))

result_df = pd.concat(dfs)
result_df.sort_values(['row_n', 'col_n'], inplace=True)
result_df[['row_n', 'col_n', 'lons', 'lats', 'values']].to_csv('final.csv',index=False)

浅看一下最后的结果：

result_df

col_n lons lats values row_n
0 1 108.013114 27.134567 NaN 0
0 2 108.023316 27.134252 NaN 0
0 3 108.033278 27.133938 NaN 0
0 4 108.043403 27.133596 NaN 0
0 5 108.053517 27.133284 NaN 0
... ... ... ... ... ...
242 201 109.918037 24.873391 NaN 242
242 202 109.928063 24.872987 NaN 242
242 203 109.937678 24.872566 NaN 242
242 204 109.947661 24.872149 NaN 242
242 205 109.957554 24.871724 NaN 242
49815 rows × 5 columns

我们来看一下重建后矩阵的行和列的数量：

len(result_df[result_df['row_n']==1]) # 列数
205
len(result_df[result_df['col_n']==1]) # 行数
243

最后我们把数据构建为格点矩阵的方式来展现一下我们的成果：

lons = result_df['lons'].values
lats = result_df['lats'].values
values = result_df['values'].values

# 将铺平的一维向量转为二维格点矩阵
lats.shape = (243, 205)  
lons.shape = (243, 205)
values.shape = (243, 205)

plt.figure(figsize=(20,20))
plt.contourf(lons, lats, values)  #熟悉的 contourf 又可以用了
plt.scatter(lons, lats, s=5, c='r',alpha=0.5)
plt.savefig('./final.png', bbox_inches='tight')

【万事屋】一个区域斜向经纬网格的补全(下)

得到的是一个规整的二维矩阵数据，可以保存为常规格点的 nc 数据。

以上，示例代码有很多重复，为了避免在合并功能设计时引入更高的复杂度增加解释难度，所以没有进行合并精简。

原文始发于微信公众号（Clarmy吱声）：【万事屋】一个区域斜向经纬网格的补全(下)

左青龙
微信扫一扫

右白虎
微信扫一扫

【万事屋】一个区域斜向经纬网格的补全(下)

G.O.S.S.I.P 资源分享 2024-04-30 EXPLOIT.EDUCATION

【AI速读】深度剖析与降低OSINT脆弱性

《上飞机！(1)》

确定风险优先级的最佳方法 - 第 2 部分

VMware安装华为操作系统openEuler

手上3个offer怎么选，深信服的技服、奇安信的安全服务、绿盟的渗透工程师

面试经验分享 | 某普科技网络安全工程师

理解数据

我的妈妈

G.O.S.S.I.P 阅读推荐 2024-04-29 利用Fuzzing降低静态分析的误报

发表评论

在线咨询

微信