Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate the metrics [email protected], [email protected], ROUGE and METEOR score in table 7, 8, 9? #32

Open
simba0626 opened this issue Apr 3, 2024 · 6 comments

Comments

@simba0626
Copy link

Hi author,
It is nice work.
When run the evaluation codes, I find the output is json file.
My questions: How to calculate the metrics in table 7, 8, 9?
Would you like to provide the code for computing the metrics?

Thank you

ywsun

python geochat/eval/batch_geochat_grounding.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/

python geochat/eval/batch_geochat_referring.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/

@Hoteryoung
Copy link

Hoteryoung commented Apr 7, 2024

I wrote a function for calculating the metrics of table 8, and it looks good.

def evaluation_metrics(data_path):
    base = []
    with open(data_path, "r") as fp:
        lines = fp.readlines()
    for line in lines:
        base.append(json.loads(line))

    correct = 0
    incorrect = 0
    comp_correct = 0
    comp_incorrect = 0
    pre_correct = 0
    pre_incorrect = 0
    ru_correct = 0
    ru_incorrect = 0
    for answers in tqdm(base):
        gt = answers["gt"].lower()
        type_ = answers["type"]
        answer = answers["answer"].replace(" ", "").lower().replace(".", "")
        if gt == answer:
            correct = correct + 1
            if type_ == "comp":
                comp_correct = comp_correct + 1
            if type_ == "presence":
                pre_correct = pre_correct + 1
            if type_ == "rural_urban":
                ru_correct = ru_correct + 1
        else:
            incorrect = incorrect + 1
            if type_ == "comp":
                comp_incorrect = comp_incorrect + 1
            if type_ == "presence":
                pre_incorrect = pre_incorrect + 1
            if type_ == "rural_urban":
                ru_incorrect = ru_incorrect + 1

    print("presence_correct:", pre_correct)
    print("presence_incorrect:", pre_incorrect)
    print("presence_Total:", pre_correct + pre_incorrect)
    print("presence_Acc:", (pre_correct / (pre_correct + pre_incorrect)))
    print("-" * 100)
    print("comparison_correct:", comp_correct)
    print("comparison_incorrect:", comp_incorrect)
    print("comparison_Total:", comp_correct + comp_incorrect)
    print("comparison_Acc:", (comp_correct / (comp_correct + comp_incorrect)))
    print("-" * 100)
    if ru_correct + ru_incorrect != 0:
        print("rural_urban_correct:", ru_correct)
        print("rural_urban_incorrect:", ru_incorrect)
        print("rural_urban_Total:", ru_correct + ru_incorrect)
        print("rural_urban_Acc:", (ru_correct / (ru_correct + ru_incorrect)))
        print("-" * 100)
    print("total_correct:", correct)
    print("total_incorrect:", incorrect)
    print("total_Total:", correct + incorrect)
    print("total_Acc:", correct / (correct + incorrect))

I am also waiting for the metric calculation function of Table 7 and Table 9.

@Yting68
Copy link

Yting68 commented Jun 2, 2024

I wrote a function for calculating the metrics of table 8, and it looks good.

def evaluation_metrics(data_path):
    base = []
    with open(data_path, "r") as fp:
        lines = fp.readlines()
    for line in lines:
        base.append(json.loads(line))

    correct = 0
    incorrect = 0
    comp_correct = 0
    comp_incorrect = 0
    pre_correct = 0
    pre_incorrect = 0
    ru_correct = 0
    ru_incorrect = 0
    for answers in tqdm(base):
        gt = answers["gt"].lower()
        type_ = answers["type"]
        answer = answers["answer"].replace(" ", "").lower().replace(".", "")
        if gt == answer:
            correct = correct + 1
            if type_ == "comp":
                comp_correct = comp_correct + 1
            if type_ == "presence":
                pre_correct = pre_correct + 1
            if type_ == "rural_urban":
                ru_correct = ru_correct + 1
        else:
            incorrect = incorrect + 1
            if type_ == "comp":
                comp_incorrect = comp_incorrect + 1
            if type_ == "presence":
                pre_incorrect = pre_incorrect + 1
            if type_ == "rural_urban":
                ru_incorrect = ru_incorrect + 1

    print("presence_correct:", pre_correct)
    print("presence_incorrect:", pre_incorrect)
    print("presence_Total:", pre_correct + pre_incorrect)
    print("presence_Acc:", (pre_correct / (pre_correct + pre_incorrect)))
    print("-" * 100)
    print("comparison_correct:", comp_correct)
    print("comparison_incorrect:", comp_incorrect)
    print("comparison_Total:", comp_correct + comp_incorrect)
    print("comparison_Acc:", (comp_correct / (comp_correct + comp_incorrect)))
    print("-" * 100)
    if ru_correct + ru_incorrect != 0:
        print("rural_urban_correct:", ru_correct)
        print("rural_urban_incorrect:", ru_incorrect)
        print("rural_urban_Total:", ru_correct + ru_incorrect)
        print("rural_urban_Acc:", (ru_correct / (ru_correct + ru_incorrect)))
        print("-" * 100)
    print("total_correct:", correct)
    print("total_incorrect:", incorrect)
    print("total_Total:", correct + incorrect)
    print("total_Acc:", correct / (correct + incorrect))

I am also waiting for the metric calculation function of Table 7 and Table 9.

I am currently facing this issue. Have you implemented the metric calculations in other tables?

@Hoteryoung
Copy link

Not yet

@YizhuoQ
Copy link

YizhuoQ commented Jul 10, 2024

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py.
(https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度(弧度)
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')
  

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

@simba0626
Copy link
Author

extract_bboxes

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py. (https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度(弧度)
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')
  

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

How to implement the function of extract_bboxes?
thank you.

@YizhuoQ
Copy link

YizhuoQ commented Jul 23, 2024

extract_bboxes

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py. (https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度(弧度)
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')
  

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

How to implement the function of extract_bboxes? thank you.

I implement the extract_bboxes function as follows:

import re

def extract_bboxes(output):
"""
Extract bounding box coordinates from the given string using regular expressions.
:param output: String containing bounding box coordinates in the format {<bx_left><by_top><bx_right><by_bottom>|θ}
:return: List of bounding boxes, each in the format [bx_left, by_top, bx_right, by_bottom, θ]
"""
  # 修改正则表达式,确保最后一个数字和管道符号能够正确匹配
  pattern = r'{<(\d+)><(\d+)><(\d+)><(\d+)>|<(\d+)>}'
  matches = re.findall(pattern, output)
  bboxes = []
  for match in matches:
      # 将所有匹配的坐标转换为浮点数,并添加到 bboxes 列表中
      bbox = [int(coord) for coord in match]  # 用int而不是float, 坐标是整数
      bboxes.append(bbox)
return bboxes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants