CogVLM2 智谱开源多模态大模型
类别: 视觉问答 CogVLM 标签: CogVLM2 OCR Llama3 LLM目录
CogVLM2
- 在许多关键指标上有了显著提升,例如
TextVQA
,DocVQA
。 - 支持 8K 文本长度。
- 支持高达 1344 * 1344 的图像分辨率。
- 提供支持中英文双语的开源模型版本。
您可以在下表中看到 CogVLM2 系列开源模型的详细信息:
模型名称 | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B |
---|---|---|
基座模型 | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct |
语言 | 英文 | 中文、英文 |
模型大小 | 19B | 19B |
任务 | 图像理解,对话模型 | 图像理解,对话模型 |
模型链接 | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model |
体验链接 | 📙 Official Page | 📙 Official Page 🤖 ModelScope |
Int4模型 | 暂未推出 | 暂未推出 |
文本长度 | 8K | 8K |
图片分辨率 | 1344 * 1344 | 1344 * 1344 |
总结
- 能力非常强大 👍
- OCR 已经成为基础能力。包含印刷、手写、中文、英文。
- 图像描述。
- 基于图像问答。
- 信息提取。包含保单、车牌、火车票、手机充值。
- 表格识别。包含复杂表格。
- 文字识别。包含中文、英文。
- 作文识别。
- 电网领域的缺陷识别。
- 图像生成代码。
- 调提示词真不容易,要尽可能描述清楚,如:
红色数字
;手写英文
。
文字识别
识别中文
提示词
:输出图像中的文字
贵公司被确认为中标人。中标价格为:307.6万元。
请贵公司在收到本中标通知书之日起30天内,携带所有签订合同所需的资料(包括但不限于法定代表人授权书、技术规范、技术图纸等),并按照招标文件和中标人的投标文件与项目单位订立书面合同。合同签订的安排由项目单位另行通知。
请贵公司收到本中标通知书后,签收并速回函确认。
识别手写英文
提示词
:对图像文字进行识别
I'm here to help you with any information you need. How can I assist you with the text in the image you've provided?
提示词
:这张图片上写了什么?
图片上是一段手写的英文文本,内容是一位用户对于学生是否需要参加课后班的看法。
提示词
:图像中的手写英文是什么?
图像中的手写英文是一篇关于学生课后活动的文章。
👍 提示词
:识别图像中的手写英文
I think student have many after-school classes is don’t good for they. So I thing the student don’t have after-school classes. On school, student have many classes, many home work and compate with classmate. And after school, they are 7:00 p.m go home and they can do housework and do homework. It is so tired, but if they have many after-school classes, They don’t go home & quickly and they don’t have enough sleep, so that they don’t good for go to school. So after-school classes is don’t good for student. Student don’t have many after-school classes. Student have after-school classes for a student, but student have many after-school classes, don’t good for student. Student don’t have many after-school classes.
识别手写英文作文
提示词
:识别出上面的英语作文
My school life
Happiness in life is about moving forward and growing up. The past three years taught me that there is much to cherish in my school life.
Three years ago, my English was so bad, and every class like a bad dream, in English class I always be silent silence because of my poor English pronunciation. But one day I make a friend his name is H. He was good at English and liked helping other, then he help me to study English and make my English get good.
All in all, this relationship not only improve my English but also make me learn a lot.
很好地遵循了指令,只识别了作文,没有多余的信息。
这里它做了一些小的错误纠正:
Three years age
改为Three years ago
This Relationship
改为this relationship
对于分隔不清晰的单词,它会自动识别并分隔,Allinall,thisrelationshipnotonlyimprovemyEnglishbutalsomakemelearnalot
识别为 All in all, this relationship not only improve my English but also make me learn a lot.
。
表格
复杂表格
提示词
:将表格的文字识别转换为markdown格式输出
非常规表格样式
提示词
:将表格的文字识别转换为markdown格式输出
电网领域的表格识别
简单表格
提示词
:将表格转换为 markdown 格式
复杂表格
提示词
:将表格转换为 markdown 格式
保单信息提取
提示词
:识别保单上面的信息:被保险人名称、被保险人身份证号、被保险人地址,使用 json 格式输出。
{
"被保险人名称": "张垚",
"被保险人身份证号": "310109199305244011",
"被保险人地址": "中国上海市"
}
手机充值
提示词
:读取图像中的充值号码,商品总价,订单编号,下单时间,充值金额。
好的,根据图片内容,我可以回答您的问题。
充值号码:15552561555
商品总价:¥199.80
订单编号:RP1409132301302034086454
下单时间:2023-01-30 20:34:08
充值金额:200元
火车票
提示词
:识别火车票上的信息
好的,根据图片信息,以下是火车票上的详细信息:
车次: K1348
始发站: 广州站(Guangzhou)
到达站: 西安站(Xi'an)
日期: 2019年08月27日10:08开
座位类型: 新空调硬座
金额: ¥236.0元
限乘当日当次车
车厢号: 07车048号
购票号码: 6101242005****0934
提示: 买票请到12306,发货请到95306
温馨提示: 中国铁路祝您旅途愉快
验证码: 23693310450828E096687
请注意,验证码是用于验证购票信息的重要部分,您可以将其提供给我以便我为您提供更准确的服务。如果您有其他问题或需要进一步的帮助,请随时告诉我。
提示词
:识别火车票上的信息:车次、始发站、到达站、日期、车厢号、金额、姓名。
好的,根据图片信息,车次为K1348,从广州站出发,目的地为西安站,日期为2019年08月27日10:08开,车厢号为07车048号,金额为¥236.0元,座位类型为新空调硬座,乘客姓名为张润宇。购买这张票的提示为:买票请到12306,发货请到95306,中国铁路祝您旅途愉快。车票号为23693310450828E096687。
提示词
:读取图像中的车次、始发站、到达站、日期、车厢号、金额、姓名。
车次:K1348
始发站:广州站
到达站:西安站
日期:2019年08月27日10:08开
车厢号:07车048号
金额:¥236.0元
姓名:张润宇
购买提示:买票请到12306 发货请到95306
中国铁路祝您旅途愉快
座位类型:新空调硬座
限乘当日当次车
6101242005****0934
23693310450828E096687广州售
提示词
:读取图像中的车次、始发站、到达站、日期、车厢号、金额、姓名。不需要返回其它信息。
车次:K1348
始发站:广州站
到达站:西安站
日期:2019年08月27日10:08开
车厢号:07车048号
金额:¥236.0元
姓名:张润宇
购买提示:买票请到12306,发货请到95306
中国铁路祝您旅途愉快
提示词
:读取图像中的姓名、车次、始发站、到达站、日期、车厢号、金额。
姓名:张润宇
车次:K1348
始发站:广州站
到达站:西安站
日期:2019年08月27日10:08开
车厢号:07车048号
金额:¥236.0元
座位类型:新空调硬座
限乘当日当次车
6101242005****0934
提示词
:读取图像中的火车票信息,返回姓名、车次、始发站、到达站、日期、车厢号、金额。
姓名:张润宇
车次:K1348
始发站:广州站
到达站:西安站
日期:2019年08月27日10:08开
车厢号:07车048号
金额:¥236.0元
座位类型:新空调硬座
限乘当日当次车
6101242005****0934
👍 提示词
:读取图像中的火车票信息,返回姓名、车次、始发站、到达站、日期、车厢号、金额。使用 json 格式输出。
{
"姓名": "张润宇",
"车次": "K1348",
"始发站": "广州站",
"到达站": "西安站",
"日期": "2019年08月27日10:08开",
"车厢号": "07车048号",
"金额": "¥236.0元"
}
车牌识别
提示词
:识别车牌
车牌号是“豫U·88888”。
👍 提示词
:使用 json 格式输出车牌、汽车厂商、型号
{
"车牌": "豫U·88888",
"汽车厂商": "奥迪",
"型号": "A8L"
}
化学
提示词
:文字识别
好的,我将为您回答问题。请您随时告诉我您需要帮助的地方。
1.单项选择题答案是:1-5 DB DAB 6-10BBCAD
2.多项选择题答案是:11.CD 12.ABC 13.BD 14.ACD 15.BC
3.非选择题答案是:
16.(1)①H2 ②SO2 ③Cu(OH)2 ④ZnCl2
(2)①分子在不断运动; 4:5 ②56 ③无机非金属 ④稳定
17.(1)B (2)Cl2+H2O= HCl+HClO (3)C (4)C (5)①t1 ②不饱和 4:5
③小于 ④KNO3的溶解度受温度变化影响明显,且随温度的升高而增大
18.(1)①干燥二氧化碳 ②CaCO3+2HCl=H2O+CO2↑+CaCl2
(2)①BCD ②2H2O2=2H2O+O2↑ ③带火星 ④偏大
19.(1)Fe ②D
(2)①NH3 ②K2CO3 ③K+、Na+ ④K2SO4+Ba(NO3)2=BaSO4↓+2KNO3
20.(1)①Na2CO3 ②饱和氨盐水呈碱性,更易吸收二氧化碳
(2)①Cl-、SO42- ②14.2 ③不饱正溶液质量:25.9g+200g+208g-23.3g=410.6g
提示词
:识别图像中的化学答案,化学符号采用 LaTeX 公式表示
输出同上
数学
提示词
:答题
👍 1、6的算术平方根有两个值,一个是正数,一个是负数。正数值是6,负数值是-6。所以正确答案是C。√6表示6的平方根,是一个正数,约等于2.449489742783178。因此,选项D是错误的。
❌ 2、水星的半径约为24400000米,用科学记数法表示为2.44×10^8米。所以正确答案是A。0.244×10^8表示0.244乘以10的8次方,约等于2.44×10^6米,而2.44×10^7表示2.44乘以10的7次方,约等于24.4×10^6米,D选项是错误的。
英语
提示词
:下面空白处应该填写什么单词
- 好像直接答题不理想 ❌
The underlined parts of the passage suggest that the missing words are present simple tense verbs that describe the actions taking place. Here's a step-by-step guide to finding them:
Look for verbs that describe the actions of the subjects (the narrator, the aunt, and the cousin). These actions include talking, flying, sitting, and looking up.
Consider the context of the sentences. For example, in "While my aunt was talking with my mom at home," the verb talking is present simple because it is an ongoing action.
Identify the verbs that fit the context and the sentence structure. For instance, "We bought two kites and started flying them" implies the action of buying and flying happened simultaneously, so a present simple verb like bought and flying would be appropriate.
Check for verbs that express continuation or progression. For example, "As we were sitting there," the verb sitting is present simple because it is an ongoing action, but it's followed by "Alice looked up at me and said," which indicates a pause or a new action, so the next verb should also be present simple.
Following this process, we can fill in the blanks as follows:
"My aunt came to my home" should be "came" (present simple).
"While my aunt was talking with my mom at home" should be "talking" (present simple).
"my cousin felt a bit boring" should be "bored" (present simple).
"We bought two kites and started flying them" should be "bought" and "flying" (present simple).
"As we were sitting there" should be "sitting" (present simple).
"Alice looked up at me and said" should be "looked up" (past simple, since it's an action directed towards the narrator).
"What a great day today!" should be "said" (past simple, since it's the narrator's reaction to Alice's statement).
With these corrections, the passage should read coherently:
Last Sunday, I had a perfect day! My aunt came to my home. She was talking with my mom at home when my cousin felt a bit bored. So I took Alice to the park near my home. We bought two kites and started flying them. While we were doing that, Alice jumped up and down, and I knew she was truly having great fun. Later, we sat down on a bench and looked at the gardens. As we were sitting there, Alice looked up at me and said, "What a great day today!" I smiled at her and agreed. We did something quite common, yet we were very happy. Many people wait for their "best days." But I think today is our perfect day. As adults, we often make plans carefully in order to have a great day. Some people think that making a lot of money will make them have a nice day. I think we should be more like children. They don't think about money. What will happen tomorrow. They really live in the moment and enjoy the moment! We often hear the expression that we should live in the present. Yesterday is a history. Tomorrow is a mystery. Today is a gift. That's why it's call present. So seize the day!
提示词
:对图像的文字进行识别
The image text is as follows:
VI. 阅读下面短文,在空白处填入1个适当的单词或括号内单词的正确形式.(共 10小题;每小 题1.5分,满分15分)
Last Sunday, I had a perfect day! My aunt came to my home \underline{66} her six-year-old
daughter Alice in the morning. While my aunt was talking with my mom at home, my cousin felt a
bit \underline{67} (boring). So I took Alice to the park near my home. There we bought two kites and
started flying \underline{68} (they). While we were doing that, Alice jumped up and down and I knew
she was truly having great fun. Later, we \underline{69} (sit) down on a bench and looked at the gardens.
As we were sitting there, Alice looked up at me and said, "What \underline{70} great day today!I
smiled at her and agreed. We did something quite common, yet we were very happy.
Many people wait for their "best days". But I think today is our perfect day. As adults, we often
make plans \underline{71} (careful) in order to have a great day. Some people think that making a lot of
money \underline{72} (make) them have a nice day. I think we should be more like \underline{73} (child).
They don't think about money \underline{74} what will happen tomorrow. They really live in the moment
and enjoy the moment!
We often hear the expression that we should live in the present. Yesterday is a history.
Tomorrow is a mystery. Today is a gift. That's why it \underline{75} (call) present. So seize the day!
通过 CogVLM2 对题进行文字识别,然后使用 Kimi 进行填空。经验证全部正确 👍。
提示词
:
答下面的题
VI. 阅读下面短文,在空白处填入1个适当的单词或括号内单词的正确形式.(共 10小题;每小 题1.5分,满分15分)
Last Sunday, I had a perfect day! My aunt came to my home \underline{66} her six-year-old
daughter Alice in the morning. While my aunt was talking with my mom at home, my cousin felt a
bit \underline{67} (boring). So I took Alice to the park near my home. There we bought two kites and
started flying \underline{68} (they). While we were doing that, Alice jumped up and down and I knew
she was truly having great fun. Later, we \underline{69} (sit) down on a bench and looked at the gardens.
As we were sitting there, Alice looked up at me and said, "What \underline{70} great day today!I
smiled at her and agreed. We did something quite common, yet we were very happy.
Many people wait for their "best days". But I think today is our perfect day. As adults, we often
make plans \underline{71} (careful) in order to have a great day. Some people think that making a lot of
money \underline{72} (make) them have a nice day. I think we should be more like \underline{73} (child).
They don't think about money \underline{74} what will happen tomorrow. They really live in the moment
and enjoy the moment!
We often hear the expression that we should live in the present. Yesterday is a history.
Tomorrow is a mystery. Today is a gift. That's why it \underline{75} (call) present. So seize the day!
66. with
67. bored
68. them
69. sat
70. a
71. carefully
72. will make
73. children
74. or
75. is called
解释如下:
66. 根据句子结构,这里需要一个介词来表示“带着”,所以用 "with"。
67. 根据句子结构,这里需要一个形容词来修饰 "a bit",所以用 "bored"。
68. 根据句子结构,这里需要一个代词来指代 "two kites",所以用 "them"。
69. 根据句子结构,这里需要一个动词的过去式来表示 "我们坐下来",所以用 "sat"。
70. 根据句子结构,这里需要一个冠词来修饰 "great day",所以用 "a"。
71. 根据句子结构,这里需要一个副词来修饰 "make plans",所以用 "carefully"。
72. 根据句子结构,这里需要一个动词的将来时态来表示 "赚钱会让他们过上美好的一天",所以用 "will make"。
73. 根据句子结构,这里需要一个名词的复数形式来表示 "像孩子们一样",所以用 "children"。
74. 根据句子结构,这里需要一个连词来连接 "money" 和 "what will happen tomorrow",所以用 "or"。
75. 根据句子结构,这里需要一个动词的被动语态来表示 "这就是为什么它被称为 '现在'",所以用 "is called"。
图像生成代码
提示词
:将图像转换成HTML代码
- 没有布局,左边菜单没有识别。
- 搜索框也没有识别。
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Launch Model</title>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<header>
<div class="logo">Xinference</div>
<h1>Launch Model</h1>
<nav>
<ul>
<li><a href="#">Language Models</a></li>
<li><a href="#">Embedding Models</a></li>
<li><a href="#">Rank Models</a></li>
<li><a href="#">Image Models</a></li>
<li><a href="#">Audio Models</li>
<li><a href="#">Custom Models</a></li>
</ul>
</nav>
</header>
<section class="models">
<div class="model-card">
<h3>aquila2</h3>
<p>Aquila2 series models are the base language models</p>
<p>2K context length</p>
<p>generate model</p>
<div class="icon"></div>
</div>
<div class="model-card">
<h3>aquila2-chat</h3>
<p>Aquila2-chat series models are the chat models</p>
<p>2K context length</p>
<p>chat model</p>
<div class="icon"></div>
</div>
<div class="model-card">
<h3>aquila2-chat-16k</h3>
<p>AquaChat2-16k series models are the long-text chat models</p>
<p>16K context length</p>
<p>chat model</p>
<div class="icon"></div>
</div>
<div class="model-card">
<h3>baichuan</h3>
<p>Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data.</p>
<p>4K context length</p>
<p>generate model</p>
<div class="icon"></div>
</div>
<div class="model-card">
<h3>baichuan-2</h3>
<p>Baichuan-2 is an open-source Transformer based LLM that is trained on both Chinese and English data.</p>
<p>4K context length</p>
<p>generate model</p>
<div class="icon"></div>
</div>
<div class="model-card">
<h3>baichuan-2-chat</h3>
<p>Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.</p>
<p>4K context length</p>
<p>chat model</p>
<div class="icon"></div>
</div>
</section>
<footer>
<p>© 2023 Xinference. All rights reserved.</p>
</footer>
</body>
</html>
数数
提示词
:图像中有多少开关?
- 数量对了,但解释错误。
在图片中可以看到总共有18个开关。它们排列在三个不同的行中,每行有6个开关。每个开关都有其独特的标签,用于指示其功能或连接点。
选择题
上面的选择题探索后,发现不能识别。
提示词
:根据图像中涂的 ABCD,来判断用户的回答。
根据您提供的图片内容,用户的回答是 C.
提示词
:根据图像中填涂的 ABCD,来判断用户的选择,图像中有5题,答案:
根据您提供的图片内容,用户的回答是 C.
提示词
:根据图像中的填涂,来判断用户的选择(A,B,C,D),图像中有5道题,答案分别是多少?
根据您提供的图片内容,用户的回答是 C.
提示词
:图像中用户填涂的选项是?
图片中用户填涂的选项是A。
试卷分数
提示词
:图像中红色数字是老师打的分数,请输出分数
图像中的红色数字是18。
提示词
:请输出图像中的分数
黑色边框有干扰。
图片中的分数是1 8/100。
图像描述
提示词
:图像中描述的是什么?
这张图片展示了一只小狗和一只小猫并排坐在白色背景上。小狗看起来很安静,目光朝向左侧,而小猫则张开嘴巴,似乎在叫喊或发出声音。两只小动物都显得非常活泼可爱。
提示词
:图像中有几只猫
图像中有一只猫。
提示词
:图像中有几只动物
图像中有两只动物,一只小狗和一只小猫。
❌ 提示词
:图像包含的每个动物的坐标 [[x0,y0,x1,y1]] ?
小狗的坐标为[[50,100,75,150]],小猫的坐标为[[75,100,100,150]].
提示词
:图像中描述的是什么?
图片描述了一个名为“Ingest.py”的流程图,它展示了如何处理文件、从中提取文档、将文档分割成更小的部分,然后将其转换为嵌入式表示,并存储在一个名为“Chroma”的数据库中。
提示词
:图像中描述了什么
图片中展示了一个电气变电站的场景。在画面中央,一名身穿蓝色工作服的工人站在一个高高的作业平台上,似乎正在对电气设备进行检查或维护。这个作业平台是橙色的,带有伸缩的机械臂。在平台的下方,可以看到一些电线和电气设备。画面左侧,另一名身穿黑色工作服的工人站在白色围栏旁,似乎在观察或等待指示。周围的环境看起来像是在户外,天气晴朗。
电力场景的缺陷识别
电线绝缘层破损
提示词
:图像上是否属于电力设备的缺陷
是的,图片显示的是电力设备的缺陷。可以看到电线的绝缘层已经严重破损,外皮也烧焦了,这样的情况可能会导致电流泄漏,甚至引起火灾等安全隐患。
提示词
:解释一下图像的信息
图片展示了一根被损坏的电线。电线的外皮已经剥落,露出里面的铜线。电线的颜色是黑色和白色相间的。在图片的右下角有一个红色的日期标记,显示为“30 9:14 AM”。背景是一片农田,可以看到犁过的土地和一些小植物。
吊臂下有人
提示词
:安规:在起吊、 牵引 过程中,力钢丝绳的周围、 上下方、 转向滑车内角侧、 吊臂和起吊物的下面,禁止有人逗留和通过。 这是电力作业场景的图像,根据安规进行检查。
根据图片,起重机的力钢丝绳的周围、上下方、转向滑车内角侧、吊臂和起吊物的下面都有人。因此,这个场景不符合安全规定。
杆塔上有鸟窝
提示词
:图像上是否属于电力设备的缺陷
图片上是一个巨大的鸟巢,位于电力设备的顶部。这个鸟巢看起来是由树枝和其他材料构成的,可能对电力设备造成一定的影响。因此,可以推断出图片上的电力设备可能存在缺陷。
提示词
:杆塔上是否有鸟窝
是的,杆塔上有一个鸟窝。
提示词
:杆塔上是否有风筝
杆塔上没有风筝。
表计读数
提示词
:读取图像中仪表指针指的数
读取仪表指针指的数是50MPa。