推广的网站需要备案吗,做个公司网站,个人单页网站,wordpress 支持rarOverview 多模态统计图表综述一、图表分类1.1 Survey1.2 常见分类数据集#xff1a;1.3 常见图表类型 二、图表理解2.1 VQA2..1.1 DVQA CVPR20182.1.2 PlotQA 20192.1.3 ChartQA 2022 2.2 Summary2.2.1 Chart-to-text ACL 2022 三、图表生成四、图表大一统模型4.1 UniChart 20… Overview 多模态统计图表综述一、图表分类1.1 Survey1.2 常见分类数据集1.3 常见图表类型 二、图表理解2.1 VQA2..1.1 DVQA CVPR20182.1.2 PlotQA 20192.1.3 ChartQA 2022 2.2 Summary2.2.1 Chart-to-text ACL 2022 三、图表生成四、图表大一统模型4.1 UniChart 2023 多模态统计图表综述
一、图表分类
1.1 Survey
题目A Survey and Approach to Chart Classification 机构印度理工学院 什么是信息图形 (Infographic) An infographic is a collection of imagery, data visualizations like pie charts and bar graphs, and minimal text that gives an easy-to-understand overview of a topic. As in the example below, infographics use striking, engaging visuals to communicate information quickly and clearly. 1.2 常见分类数据集 UB-PMCChart-OCRDocFigure论文
1.3 常见图表类型
DocFigure提到的28种figure图表数据类型 (a) Line graph, (b) Natural image, ©Table, (d) 3D object, (e) Bar plot, (f) Scatter plot, (g) Medical image, (h) Sketch, (i) Geographic map, (j) Flow chart, (k) Heat map, (l) Mask, (m) Block diagram, (n) Venn diagram, (o) Confusion matrix, § Histogram, (q) Box plot, ® Vector plot, (s) Pie chart, (t) Surface plot, (u) Algorithm, (v) Contour plot, (w) Tree diagram, (x) Bubble chart, (y) Polar plot, (z) Area chart, (A) Pareto chart and (B) Radar chart.
UB-PMCsample的15种图表类型
二、图表理解
代表性任务
图表VQA图表Caption
代表性工作
2.1 VQA
2…1.1 DVQA CVPR2018
题目: DVQA: Understanding Data Visualizations via Question Answering 机构罗彻斯特理工学院adobe 论文: https://arxiv.org/pdf/1801.08163.pdf 代码: https://github.com/kushalkafle/DVQA_dataset 任务: 统计图表VQA柱状图 特点: 早期统计图表VQA工作基于模板构造QA对不涉及复杂推理 数据集概况一种图表类别bar300K图像3.4M VQA pair数据26个模板数据和bar都是生成的 Our work will enable algorithms to automatically extract numeric and semantic information from vast quantities of bar charts found in scientific publications, Internet articles, business reports, and many other areas. 三种问题类型
Structure Understanding. 主要用于理解bar图的全局结构其有下面的这些问题模板
How many bars are there?How many groups/stacks of bars are there?How many bars are there per group?Does the chart contain any negative values?Are the bars horizontal?Does the chart contain stacked bars?Is each bar a single solid color without patterns?
Data Retrieva. 关注于bar的某个局部区域问题模板如下
Are the values in the chart presented in a logarithmic scale?Are the values in the chart presented in a percentage scale?What percentage of people prefer the object O?What is the label of the third bar from the left?What is the label of the first group of bars from the left?What is the label of the second bar from the left in each group?What element does the C color represent?How many units of the item I were sold in the store S?
Reasoning. 根据bar里面的多个组件进行推理问题模板如下
Which algorithm has the highest accuracy?How many items sold more than N units?What is the difference between the largest and the smallest value in the chart?How many algorithms have accuracies higher than N?What is the sum of the values of L1 and L2?Did the item I1 sold less units than I2?How many groups of bars contain at least one bar with value greater than N?Which item sold the most units in any store?Which item sold the least number of units summed across all the stores?Is the accuracy of the algorithm A1 in the dataset D1 larger than the accuracy of the algorithm A2 in the dataset D2?
数据集下载链接https://github.com/kushalkafle/DVQA_dataset QA数据组织格式 image: The image filename which the given question-answer pair applies to question: Question answer: Answer to the Questions. Remember that (cardinal numbers (1,2,3…) are used when the number denotes the value and words (one,two,three…) are used to denote count question_type: Denotes whether the question is structure, data or reasoning type bbox_answer: If the answer is a text in the bar_chart, bounding box in form of [x,y,w,h], else [] question_id: Unique question_id associated with the question meta data组织格式 image: The image filename which the given metadata applies to bars: bboxes: Bounding boxes for different bars (number_of_bars x number_of_legends x 4) names: Names for each bar in the form (number_of_bars x number_of_legends) colors: Color of each bar (number_of_bars x number_of_legends) texts: text: The string of the text-block in the bar-chart text_function: The function of text (e.g., title, legend, etc) bbox: The bounding box surrounding the text-block table: Underlying table used to create the chart saved in the following format. single row charts:C_1 C_2 C_3 ... C_N-------------------------------------V_1 V_2 V_3 ... V_Nmulti row charts:None | C_1 C_2 C_3 ... C_N-----|---------------------------------------R_1 | V_11 V_21 V_31 ... V_N1R_2 | V_12 V_22 V_32 ... V_N2... | ... ... ... ... ...R_M | V_1M V_2M V_3M ... V_NM2.1.2 PlotQA 2019
题目: PlotQA: Reasoning over Scientific Plots 机构印度理工学院 论文https://arxiv.org/pdf/1909.00997.pdf 代码https://github.com/NiteshMethani/PlotQA 任务: 图表VQA 特点: 相比于figure VQADVQA数据采自真实且数值分布会更广泛0 to 3.50e15. 数据集概况三种图表类别bar plots, line plots, and scatter plots224K图像28M pair数据76个模板数据是真实的图表是生成的 we provide bounding box annotations for legend boxes, legend names, legend markers, axes titles, axes ticks, bars, lines, and title. 一些数据增强 To ensure variety in the plots, we randomly chose the following parameters: grid lines(present/absent), font size, notation used for tick labels (scientific-E notation or standard notation), line style (solid, dashed, dotted, dash-dot), marker styles for marking data points (asterisk, circle, diamond, square, triangle, inverted triangle), position of legends (bottom-left, bottom-centre, bottom-right, center-right, top-right), and colors for the lines and bars from a set of 73 colors. The number of discrete elements on the x-axis varies from 2 to 12 and the number of entries in the legendbox varies from 1 to 4. This approach of creating questions on real-world plot data with carefully curated question templates followed by manual paraphrasing is a key contribution of our work. 2.1.3 ChartQA 2022
题目ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning 机构约克大学南洋理工Salesforce 论文https://arxiv.org/pdf/2203.10244.pdf 代码https://github.com/vis-nlp/ChartQA 任务图表VQA 特点三种图表类别21.9K图像32.7K 9.6K human21.3K generatedReal-world charts from a web crawl To address the unique challenges in our benchmark involving visual and logical reasoning over charts Answering such questions requires a significant amount of perceptual and cognitive efforts as people need to combine multiple operations such as retrieving values, comparing values, finding maximum, calculating sums and differences of values. 分析了现存数据集存在的主要问题
问题模板化图表都是基于编程工具例如matplotlib构建的不能反应真实世界的图表多样性回答往往是固定词汇集合会忽略许多问题往往涉及到复杂的推理涉及许多数值操作比如聚合/比较。
从多个源去爬取图表
Statista (statista.com) is an online platform that presents charts covering a variety of topics including economy, politics, and industry.The Pew research (pewresearch.org) publishes report about social and economic issues, demographic trends and public opinion with a wide variety of charts.Our World In Data or OWID (ourworldin-data.org) is another platform that contains thousands of charts about different global issues such as economy, finance, and society.Organisationfor Economic Co-operation and Development or OECD (oecd.org) is a global organization which shares reports and data analysis for policymaking. For the Pew dataset, we only crawled chart images since the underlying data tables are not available. For the other three, we extracted the underlying data tables, metadata (e.g., title, chart type), SVG file and associate text description. Finally, we extracted the bounding boxes information of the different chart elements (e.g., x-axis labels) from the SVG files to train our data extraction models. 数据标注有如下两种方式 We have two main annotations procedures: (i) collect human-authored QA pairs using Amazon Mechanical Turk (AMT) and (ii) generate QA pairs from the Statista human-written summaries. 使用人工标注的时候关注的问题维度主要包括两种
Compositional questions contain at least two mathematical/logical operations like sum, difference and averageVisual questions refer to the visual attributes such as color, height, and length of graphical marks (e.g., bars) in the chart.
基于上述关注的重点一个标注者标注两个问题和对应的答案另一个标注者也去回答这个问题如果两者匹配则是一个合理的qa对否则会进行复查完全匹配统计下来有61.04%如果忽略掉一些typo的表示法不同那么这个数字会达到78.55%。
对于机器生成的QA对采用的方式是利用T5模型输入chartsummary去生成但仅关注可以直接出chart里面能够得到答案的cases忽略掉一些需要结合常识的cases。
ChartQA使用的方法架构
一些可视化结果
2.2 Summary
2.2.1 Chart-to-text ACL 2022
题目Chart-to-Text: A Large-Scale Benchmark for Chart Summarization 机构约克大学南洋理工Salesforce 论文https://aclanthology.org/2022.acl-long.277.pdf 代码https://github.com/vis-nlp/chart-to-text 任务图表summary 特点六种图表类别44K图像44K pairs
两种方式
原始data table存在直接从chart里面抽取
一个summary样例 数据采集 也是和ChartVQA类似从两个第三方网站进行爬取
https://www.statista.com/对于每一张图表获取它的图像以及原始的data table包括标题轴标签人工写的描述。将图表分为两个组一种是简单图表只有两列复杂图标有stacked/group的bar折线图也有多条线。最终从December 2022获得总计34811张统计图表图像。https://www.pewresearch.org/这个网站主要是发表一些数据驱动的文章主要关注社会事件公众观点以及人口趋势。文章往往伴随着多个图表并且自带专家/编辑的高质量描述。本文从这儿抓取了3999个网页2021.01最终获的9285个图表。与Statista不一样的是pew大多数图表都是不提供原始的data table的。对于每一张图表下载了chart图像包裹的段落描述alt attributesif available。像标题一样alt text通常提供了相对简洁的描述。因为原始的data table不存在因为人工进行划分simple以及complex图表。
数据标注
对于statista我们选择了文本的第一部分来自图表图标到下一个标题作为图表summary。这个源的数据提取相对容易因为提供了原始的data table但是大多数charts32660 out of 34811是没有提供x轴的标签的因此进行了手工标注赋予合适的x轴名字。对于pew标注会更加具有挑战性因为每个webpage包含多个图表并且段落并没有显示地refer到对应的chart。大多数chart也没有提供原始的data table。为了处理这些挑战分为三步进行数据集构建 (i) 从chart里面进行数据抽取借助ocr以及检测的bbox去标注少量数据319 examples (171 bar, 68 line, and 80 pie charts) 划分训练集验证集测试集去训练一个分类模型最终整体的准确率是95.0%标题的分类准确率是97.6%。 (ii) 辨别段落备选项 (iii) 选择相关段落 from 笔者从上面数据集的构建可以看出chart2text核心并不是要把chart转化为类似table这样的精确描述而是自然语言描述因此抓中核心其它不是最重要的数值/趋势可能不一定需要面面俱到用符合人类观察图表的习惯进行语言组织描述即可。
文章结尾也做了error分析总结了如下几种主要的pattern
Perceptual and reasoning aspectsHallucinationsFactual errorsComputer vision challengesGeneralizability
三、图表生成
四、图表大一统模型
4.1 UniChart 2023
题目UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning 机构约克大学南洋理工Salesforce 论文https://arxiv.org/pdf/2305.14761.pdf 代码https://github.com/vis-nlp/unichart 任务图表预训练以及大一统模型 特点三种图表类别627K图像7M pairs