现在学做网站赚钱吗,做商品网站数据库有哪些,广告字制作哪家好,十大经典广告营销案例专属领域论文订阅 VX 扫吗关注{晓理紫|小李子}#xff0c;每日更新论文#xff0c;如感兴趣#xff0c;请转发给有需要的同学#xff0c;谢谢支持 分类: 大语言模型LLM视觉模型VLM扩散模型视觉导航具身智能#xff0c;机器人强化学习开放词汇#xff0c;检测分割 [晓理紫…专属领域论文订阅 VX 扫吗关注{晓理紫|小李子}每日更新论文如感兴趣请转发给有需要的同学谢谢支持 分类: 大语言模型LLM视觉模型VLM扩散模型视觉导航具身智能机器人强化学习开放词汇检测分割 [晓理紫]每日论文分享(有中文摘要源码或项目地址) Embodied Artificial Intelligencerobotic agenthuman robot interaction
标题: Augmented Reality User Interface for Command, Control, and Supervision of Large Multi-Agent Teams
作者: Frank Regal, Chris Suarez, Fabian Parra
中文摘要: 多智能体人——机器人团队通过利用和结合人类和机器人的优势可以更有效地收集各种环境的信息。在国防、搜索和救援、急救等行业异构人机团队有望通过将人类从未知和潜在危险的情况中移除来加速数据收集和提高团队安全性。这项工作建立在AugRE的基础上AugRE是一个基于增强现实AR的可扩展人机团队框架。它使用户能够本地化并与50多个自主代理通信。通过我们的努力用户能够指挥、控制和监督大型团队中的代理无论是视距还是非视距而无需事先修改环境也无需用户使用典型的硬件即操纵杆、键盘、笔记本电脑、平板电脑等。在外地。演示的工作表明早期迹象表明将这些基于AR-HMD的用户交互模式结合起来进行指挥、控制和监督将有助于改善人机团队协作、健壮性和信任。
摘要: Multi-agent human-robot teaming allows for the potential to gather information about various environments more efficiently by exploiting and combining the strengths of humans and robots. In industries like defense, search and rescue, first-response, and others alike, heterogeneous human-robot teams show promise to accelerate data collection and improve team safety by removing humans from unknown and potentially hazardous situations. This work builds upon AugRE, an Augmented Reality (AR) based scalable human-robot teaming framework. It enables users to localize and communicate with 50 autonomous agents. Through our efforts, users are able to command, control, and supervise agents in large teams, both line-of-sight and non-line-of-sight, without the need to modify the environment prior and without requiring users to use typical hardware (i.e. joysticks, keyboards, laptops, tablets, etc.) in the field. The demonstrated work shows early indications that combining these AR-HMD-based user interaction modalities for command, control, and supervision will help improve human-robot team collaboration, robustness, and trust.
[Downlink:]http://arxiv.org/abs/2401.05665v1
[Project:]https://sites.google.com/view/xr-robotics-iros2023/home?authuser0| 标题: Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction
作者: Shaunak A. Mehta, Dylan P. Losey
中文摘要: 人类可以利用物理交互来教授机器人手臂。这种物理交互有多种形式取决于任务、用户和机器人到目前为止学到的东西。最先进的方法专注于从单一模态中学习或者通过假设机器人具有关于人类预期任务的先验信息来组合多种交互类型。相比之下在本文中我们介绍了一种算法形式主义它将从演示、纠正和偏好中学习结合起来。我们的方法对人类想要教给机器人的任务没有任何假设相反我们通过将人类的输入与附近的替代方案进行比较从头开始学习奖励模型。我们首先导出一个损失函数它训练一组奖励模型来匹配人类的演示、纠正和偏好。反馈的类型和顺序由人类老师决定我们让机器人被动或主动地收集反馈。然后我们应用约束优化将我们学习到的奖励转换成期望的机器人轨迹。通过模拟和用户研究我们证明了我们提出的方法比现有的基线更准确地从物理人类交互中学习操纵任务特别是当机器人面临新的或意想不到的目标时。我们的用户研究视频可在以下网站获得https//youtu.be/FSUJsTYvEKU
摘要: Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human’s intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU
[Downlink:]http://arxiv.org/abs/2207.03395v2
[Project:]https://youtu.be/FSUJsTYvEKU| 标题: StROL: Stabilized and Robust Online Learning from Humans
作者: Shaunak A. Mehta, Forrest Meng, Andrea Bajcsy
中文摘要: 在当前的互动中机器人经常需要在线学习人类的奖励功能。这种实时学习需要快速但近似的学习规则当人类的行为有噪声或次优时当前的近似会导致机器人学习不稳定。因此在本文中我们试图增强梯度下降学习规则在推断人类奖励参数时的鲁棒性和收敛性。我们将机器人的学习算法建模为基于人类偏好参数的动态系统其中人类的真实但未知偏好是平衡点。这使我们能够执行李亚普诺夫稳定性分析以推导机器人学习动力学收敛的条件。我们提出的算法StROL使用这些条件来学习设计鲁棒的学习规则给定原始的学习动态StROL输出修改的学习规则该规则现在在更大的人类输入集下收敛到人类的真实参数。在实践中这些自主生成的学习规则可以正确地推断出人类试图传达的内容即使人类是嘈杂的、有偏见的和次优的。通过模拟和用户研究我们发现StROL比最先进的在线奖励学习方法产生更准确的估计和更少的遗憾。请点击此处查看视频和代码https://github.com/VT-Collab/StROL_RAL
摘要: Robots often need to learn the human’s reward function online, during the current interaction. This real-time learning requires fast but approximate learning rules: when the human’s behavior is noisy or suboptimal, current approximations can result in unstable robot learning. Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human’s reward parameters. We model the robot’s learning algorithm as a dynamical system over the human preference parameters, where the human’s true (but unknown) preferences are the equilibrium point. This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot’s learning dynamics converge. Our proposed algorithm (StROL) uses these conditions to learn robust-by-design learning rules: given the original learning dynamics, StROL outputs a modified learning rule that now converges to the human’s true parameters under a larger set of human inputs. In practice, these autonomously generated learning rules can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal. Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning. See videos and code here: https://github.com/VT-Collab/StROL_RAL
[Downlink:]http://arxiv.org/abs/2308.09863v2
[GitHub:]https://github.com/VT-Collab/StROL_RAL| 标题: Sample-efficient Reinforcement Learning in Robotic Table Tennis
作者: Jonas Tebbe, Lukas Krauch, Yapeng Gao
中文摘要: 强化学习RL最近在各种计算机游戏和模拟中取得了一些令人印象深刻的成功。这些成功中的大多数都是基于代理人可以从中学习的大量情节。然而在典型的机器人应用中可行的尝试次数非常有限。在本文中我们提出了一个样本有效的RL算法应用于一个乒乓球机器人的例子。在乒乓球比赛中每一次击球都是不同的位置、速度和旋转都不同。因此必须根据高维连续状态空间找到精确的返回。为了使在少数试验中学习成为可能该方法被嵌入到我们的机器人系统中。这样我们就可以使用一步到位的环境。状态空间取决于击球时的球位置、速度、旋转动作是击球时的球拍状态方向、速度。提出了一种基于行动者——批评家的确定性策略梯度算法用于加速学习。在许多具有挑战性的场景中我们的方法在模拟和真实机器人上都具有竞争力。在不到200美元的训练中无需预训练即可获得准确的结果。展示我们实验的视频可在https//youtu.be/uRAtdoL6Wpw。
摘要: Reinforcement learning (RL) has achieved some impressive recent successes in various computer games and simulations. Most of these successes are based on having large numbers of episodes from which the agent can learn. In typical robotic applications, however, the number of feasible attempts is very limited. In this paper we present a sample-efficient RL algorithm applied to the example of a table tennis robot. In table tennis every stroke is different, with varying placement, speed and spin. An accurate return therefore has to be found depending on a high-dimensional continuous state space. To make learning in few trials possible the method is embedded into our robot system. In this way we can use a one-step environment. The state space depends on the ball at hitting time (position, velocity, spin) and the action is the racket state (orientation, velocity) at hitting. An actor-critic based deterministic policy gradient algorithm was developed for accelerated learning. Our approach performs competitively both in a simulation and on the real robot in a number of challenging scenarios. Accurate results are obtained without pre-training in under 200 200 200 episodes of training. The video presenting our experiments is available at https://youtu.be/uRAtdoL6Wpw.
[Downlink:]http://arxiv.org/abs/2011.03275v4
[Project:]https://youtu.be/uRAtdoL6Wpw.| 标题: Motion Control of Interactive Robotic Arms Based on Mixed Reality Development
作者: Hanxiao Chen
中文摘要: 混合现实MR正在不断发展以激发机器人的新模式
摘要: Mixed Reality (MR) is constantly evolving to inspire new patterns of robot manipulation for more advanced Human- Robot Interaction under the 4th Industrial Revolution Paradigm. Consider that Mixed Reality aims to connect physical and digital worlds to provide special immersive experiences, it is necessary to establish the information exchange platform and robot control systems within the developed MR scenarios. In this work, we mainly present multiple effective motion control methods applied on different interactive robotic arms (e.g., UR5, UR5e, myCobot) for the Unity-based development of MR applications, including GUI control panel, text input control panel, end-effector object dynamic tracking and ROS-Unity digital-twin connection.
[Downlink:]http://arxiv.org/abs/2401.01644v1
[Project:]http://www.icca.net/,| 标题: Transferability of HRI Research: Potential and Challenges
作者: Wafa Johal
中文摘要: 随着机器人技术和人工智能的进步机器人技术的应用正在蓬勃发展。人机交互HRI是机器人学的一个重要领域因为它允许机器人更接近人类与人类一起或为人类工作。HRI研究成功的一个关键因素是可转移性这是指研究成果被行业采用并为社会提供利益的能力。在本文中我们探讨了HRI研究中可转移性的潜力和挑战。首先我们检查了HRI研究的现状并确定了可能导致成功结果的各种类型的贡献。其次我们讨论了每种类型的贡献的潜在好处并确定了可以促进行业采用HRI研究的因素。然而我们也认识到有几个与可转移性相关的挑战如人力资源机构从业者所需的明确定义的工作/技能组合的多样性缺乏行业主导的研究以及人力资源机构研究方法缺乏标准化。我们讨论了这些挑战并提出了潜在的解决方案以弥合行业期望和HRI学术研究之间的差距。
摘要: With advancement of robotics and artificial intelligence, applications for robotics are flourishing. Human-robot interaction (HRI) is an important area of robotics as it allows robots to work closer to humans (with them or for them). One crucial factor for the success of HRI research is transferability, which refers to the ability of research outputs to be adopted by industry and provide benefits to society. In this paper, we explore the potentials and challenges of transferability in HRI research. Firstly, we examine the current state of HRI research and identify various types of contributions that could lead to successful outcomes. Secondly, we discuss the potential benefits for each type of contribution and identify factors that could facilitate industry adoption of HRI research. However, we also recognize that there are several challenges associated with transferability, such as the diversity of well-defined job/skill-sets required from HRI practitioners, the lack of industry-led research, and the lack of standardization in HRI research methods. We discuss these challenges and propose potential solutions to bridge the gap between industry expectations and academic research in HRI.
[Downlink:]http://arxiv.org/abs/2401.05802v1 Reinforcement Learning RL
标题: Bridging the Gap Between Target Networks and Functional Regularization
作者: Alexandre Piche, Valentin Thomas, Joseph Marino
中文摘要: 自举是深度强化学习许多成功的背后原因。然而通过自举学习价值函数往往会由于目标值的快速变化而导致训练不稳定。通过使用一组附加的滞后参数来估计目标值目标网络被用来稳定训练。尽管目标网络很受欢迎但它们对优化的影响仍然被误解。在这项工作中我们表明他们作为一个隐式正则化。这种正则化器具有不灵活和非凸等缺点。为了克服这些问题我们提出了一个显式函数正则化它是函数空间中的一个凸正则化子并且易于调整。我们从理论上分析了我们的方法的收敛性并从经验上证明了用更有理论基础的函数正则化方法代替目标网络导致更好的样本效率和性能改进。
摘要: Bootstrapping is behind much of the successes of Deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer. This regularizer has disadvantages such as being inflexible and non convex. To overcome these issues, we propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned. We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
[Downlink:]http://arxiv.org/abs/2210.12282v2
[Project:]https://openreview.net/forum?idBFvoemrmqX| 标题: Understanding the Effects of RLHF on LLM Generalisation and Diversity
作者: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis
中文摘要: 大型语言模型LLMs通过从人类反馈RLHF的强化学习进行了微调已被用于迄今为止一些部署最广泛的人工智能模型如OpenAI的ChatGPT或Anthropic的Claude。%或Meta的美洲驼-2。虽然在开发这些方法方面已经做了大量的工作但是我们对RLHF每个阶段的优点和缺点的理解仍然有限。为了填补这一空白我们对该过程的每个阶段即监督微调SFT、奖励建模和RLHF如何影响两个关键属性进行了广泛的分析分布外OOD概括和输出多样性。考虑到这些模型被使用的真实世界场景的广泛范围OOD泛化是至关重要的而输出多样性是指模型生成不同输出的能力并且对于各种用例是重要的。我们对总结和指导任务的两个基本模型进行分析后者与当前的LLM用例高度相关。我们发现RLHF比SFT更能推广到新的输入特别是当训练和测试之间的分布偏移变大时。然而与SFT相比RLHF在各种测量中显著降低了输出多样性这意味着当前LLM微调方法在泛化和多样性之间进行了权衡。我们的结果为根据应用应该使用哪种微调方法提供了指导并表明需要更多的研究来改善普遍性和多样性之间的权衡。
摘要: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI’s ChatGPT or Anthropic’s Claude. % , or Meta’s LLaMA-2. While there has been significant work developing these methods, our understanding of the benefits and downsides of each stage in RLHF is still limited. To fill this gap, we present an extensive analysis of how each stage of the process (i.e.~supervised fine-tuning (SFT), reward modelling, and RLHF) affects two key properties: out-of-distribution (OOD) generalisation and output diversity. OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model’s ability to generate varied outputs and is important for a variety of use cases. We perform our analysis across two base models on both summarisation and instruction following tasks, the latter being highly relevant for current LLM use cases. We find that RLHF generalises better than SFT to new inputs, particularly as the distribution shift between train and test becomes larger. However, RLHF significantly reduces output diversity compared to SFT across a variety of measures, implying a tradeoff in current LLM fine-tuning methods between generalisation and diversity. Our results provide guidance on which fine-tuning method should be used depending on the application, and show that more research is needed to improve the tradeoff between generalisation and diversity.
[Downlink:]http://arxiv.org/abs/2310.06452v2
[GitHub:]https://github.com/facebookresearch/rlfh-gen-div| 标题: Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
作者: Marco Pleines, Matthias Pallasch, Frank Zimmer
中文摘要: Memory Gym提供了一套2D部分可观察的环境即迫击炮伤害、神秘路径和灼热的聚光灯旨在对决策代理的记忆能力进行基准测试。这些最初任务有限的环境被扩展成创新的、无止境的格式反映了累积记忆游戏如“我打包了我的包”不断升级的挑战。任务设计的这一进展将重点从仅仅评估样本效率转移到探索动态、长时间场景中的记忆效率水平。为了解决可用的基于内存的深度强化学习基线中的差距我们引入了一种将Transformer model-XLTrXL与近似策略优化相集成的实现。这种方法利用TrXL作为情景记忆的一种形式采用滑动窗口技术。我们对门控循环单元GRU和TrXL的比较研究揭示了不同设置下的不同性能。在有限环境下TrXL在神秘路径中表现出优越的采样效率在迫击炮伤害中表现出色。然而GRU在灼热的聚光灯下效率更高。最值得注意的是在所有没完没了的任务中GRU取得了显著的复苏持续大幅超过TrXL。网站和源代码https://github.com/MarcoMeter/endless-memory-gym/
摘要: Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as I packed my bag’. This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To address the gap in available memory-based Deep Reinforcement Learning baselines, we introduce an implementation that integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This approach utilizes TrXL as a form of episodic memory, employing a sliding window technique. Our comparative study between the Gated Recurrent Unit (GRU) and TrXL reveals varied performances across different settings. TrXL, on the finite environments, demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins. Website and Source Code: https://github.com/MarcoMeter/endless-memory-gym/
[Downlink:]http://arxiv.org/abs/2309.17207v3
[GitHub:]https://github.com/MarcoMeter/endless-memory-gym/| 标题: DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality
作者: Ankur Handa, Arthur Allshire, Viktor Makoviychuk
中文摘要: 最近的工作证明了深度强化学习RL算法在模拟中学习复杂机器人行为的能力包括在多指操作领域。然而由于模拟和现实之间的差距这种模型很难转移到现实世界中。在本文中我们介绍了我们的技术来训练a可以在拟人化机器人手上执行鲁棒灵巧操作的策略和b适合于提供关于被操纵物体状态的可靠实时信息的鲁棒姿态估计器。我们的策略经过训练可以适应模拟中的各种条件。因此在相同的重定向任务上我们基于视觉的策略明显优于文献中的最佳视觉策略并且与通过运动捕捉系统给予特权状态信息的策略具有竞争力。我们的工作重申了在各种硬件和模拟器设置中灵巧操作的模拟到真实转换的可能性在我们的例子中是基于Allegro Hand和Isaac Gym GPU的模拟。此外它为研究人员提供了使用常见的、负担得起的机器人手和相机实现这些结果的可能性。由此产生的视频政策及补充包括实验和演示在内的信息可以在https//dextreme.org/
摘要: Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at https://dextreme.org/
[Downlink:]http://arxiv.org/abs/2210.13702v2
[Project:]https://dextreme.org/| 标题: Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic
作者: Wei Zhou, Dong Chen, Jun Yan
中文摘要: 自动驾驶在过去吸引了大量的研究兴趣 二十年因为它提供了许多潜在的好处包括释放司机 从疲惫的驾驶和缓解交通拥堵等等。 尽管取得了可喜的进展但变道仍然是一个巨大的挑战 自动驾驶汽车AV尤其是在混合和动态交通场景中。 最近强化学习RL一种强大的数据驱动控制方法 已被广泛研究用于AVs的变道决策 取得了令人鼓舞的成果。然而这些研究中的大多数是 侧重于单车设置以及在变道的背景下 与人类驾驶车辆HDV共存的多种AVs很少收到 注意。在本文中我们制定了车道变换决策 混合交通公路环境中多个AVs作为多agent的研究 强化学习MARL问题其中每个AV进行车道变换 基于相邻AVs和hdv的运动的决策。具体来说 提出了一种新的多智能体优势演员——评论家网络MA2C 局部奖励设计和参数共享方案。特别是 提出了多目标奖励函数 驾驶舒适性和自动驾驶的安全性。综合实验 在三种不同交通密度和不同水平下进行的结果 表明我们提出的MARL框架 在以下方面始终优于几个最先进的基准 效率、安全性和驾驶员舒适性。
摘要: Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.
[Downlink:]http://arxiv.org/abs/2111.06318v2 标题: Adaptive Discounting of Training Time Attacks
作者: Ridhima Bector, Abhay Aradhya, Chai Quek
中文摘要: 对强化学习RL解决方案最阴险的攻击之一是训练时攻击TTAs它在学习行为中制造漏洞和后门。不限于简单的破坏建设性的TTAsC-TTAs现在是可用的其中攻击者将特定的目标行为强加于训练的RL代理受害者。然而即使是最先进的C-TTAs也关注目标行为如果不是因为C-TTAs利用的环境动态的特定特征受害者可能会自然采用这些行为。在这项工作中我们表明即使当目标行为由于环境动态以及相对于受害者目标的非最优性而不可采用时C-TTA也是可能的。为了在这种情况下找到有效的攻击我们开发了一种专门的DDPG算法我们称之为gammaDDPG它学习这种更强版本的C-TTA。gammaDDPG根据受害者的当前行为动态改变攻击策略规划范围。这改善了整个攻击时间线的工作分配并减少了攻击者对受害者的不确定性的影响。为了展示我们方法的特点并更好地将结果与之前的研究联系起来我们从最先进的C-TTA借用了一个3D网格域进行实验。代码可从“bit.ly/github-rb-gDDPG”获得。
摘要: Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behaviours that could be naturally adopted by the victim if not for a particular feature of the environment dynamics, which C-TTAs exploit. In this work, we show that a C-TTA is possible even when the target behaviour is un-adoptable due to both environment dynamics as well as non-optimality with respect to the victim objective(s). To find efficient attacks in this context, we develop a specialised flavour of the DDPG algorithm, which we term gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically alters the attack policy planning horizon based on the victim’s current behaviour. This improves effort distribution throughout the attack timeline and reduces the effect of uncertainty the attacker has about the victim. To demonstrate the features of our method and better relate the results to prior research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our experiments. Code is available at “bit.ly/github-rb-gDDPG”.
[Downlink:]http://arxiv.org/abs/2401.02652v1 Object Detection SegmentationOpen vocabulary detection
标题: OMG-Seg: Is One Model Good Enough For All Segmentation?
作者: Xiangtai Li, Haobo Yuan, Wei Li
中文摘要: 在这项工作中我们解决了各种分割任务每个任务传统上都由不同的或部分统一的模型来解决。我们提出了OMG-Seg这是一个足够好的模型可以高效和有效地处理所有分割任务包括图像语义、实例和全景分割以及它们的视频对应物、开放词汇设置、提示驱动的交互式分割如SAM和视频对象分割。据我们所知这是第一个在一个模型中处理所有这些任务并实现令人满意的性能的模型。我们表明OMG-Seg是一种基于Transformer model的编码器——解码器架构具有特定于任务的查询和输出可以支持十多种不同的分割任务同时显著降低各种任务和数据集的计算和参数开销。我们严格评估了合作训练中任务间的影响和相关性。代码和模型可在https//github.com/lxtGH/OMG-Seg获得。
摘要: In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Code and models are available at https://github.com/lxtGH/OMG-Seg.
[Downlink:]http://arxiv.org/abs/2401.10229v1
[Project:]https://lxtgh.github.io/project/omg_seg/|
[GitHub:]https://github.com/lxtGH/OMG-Seg.| 标题: RAP-SAM: Towards Real-Time All-Purpose Segment Anything
作者: Shilin Xu, Haobo Yuan, Qingyu Shi
中文摘要: 由Transformer model架构推进视觉基础模型VFMs在性能和泛化能力方面取得了显著进步。Segment Anything模型SAM是一种能够实现广义分割的出色模型。然而大多数VFM不能实时运行这使得很难将它们转移到几个产品中。另一方面目前的实时分割主要有一个目的比如对驾驶场景进行语义分割。我们认为实际应用需要不同的输出。因此本工作探索了一种新的实时分段设置称为实时通用分段以在实时部署中传输VFMs。它包含三个不同的任务包括交互式分割、全景分割和视频分割。我们的目标是使用一个模型来实时完成上述任务。我们首先对几个强基线进行基准测试。然后我们提出了实时通用SAMRAP-SAM。它包含一个高效的编码器和一个高效的解耦解码器来执行提示驱动解码。此外我们进一步探索不同的训练策略和调整方法以进一步提高共同训练的表现。我们的代码和模型可在https//github.com/xushilin1/RAP-SAM/获得。
摘要: Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/.
[Downlink:]http://arxiv.org/abs/2401.10228v1
[Project:]https://xushilin1.github.io/rap_sam/|
[GitHub:]https://github.com/xushilin1/RAP-SAM/.| 标题: Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
作者: Yumeng Li, Margret Keuper, Dan Zhang
中文摘要: 尽管大规模扩散模型最近取得了进展但布局到图像L2I合成任务进展甚微。当前的L2I模型要么通过文本的可编辑性差要么生成的图像和输入布局之间的对齐弱。这限制了它们在实践中的可用性。为了减轻这一点我们建议将对抗性监督整合到L2I扩散模型ALDM的传统训练管道中。具体来说我们采用基于分割的鉴别器该鉴别器向扩散发生器提供关于去噪图像和输入布局之间的像素级对齐的显式反馈。为了鼓励在采样步骤中一致地遵守输入布局我们进一步引入了多步展开策略。我们不是查看单个时间步长而是递归地展开几个步骤来模拟推理过程并要求鉴别器在特定时间窗口内评估去噪图像与布局的对齐情况。我们的实验表明ALDM能够实现生成图像的布局忠实性同时允许通过文本提示进行广泛的编辑。此外我们展示了它在实际应用中的有用性通过文本控制合成目标分布样本我们大大提高了语义分割模型的领域泛化能力约1200万分。
摘要: Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM). Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout. To encourage consistent adherence to the input layout over the sampling steps, we further introduce the multistep unrolling strategy. Instead of looking at a single timestep, we unroll a few steps recursively to imitate the inference process, and ask the discriminator to assess the alignment of denoised images with the layout over a certain time window. Our experiments show that ALDM enables layout faithfulness of the generated images, while allowing broad editability via text prompts. Moreover, we showcase its usefulness for practical applications: by synthesizing target distribution samples via text control, we improve domain generalization of semantic segmentation models by a large margin (~12 mIoU points).
[Downlink:]http://arxiv.org/abs/2401.08815v1
[Project:]https://yumengli007.github.io/ALDM/|
[GitHub:]https://github.com/boschresearch/ALDM| 标题: LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation
作者: Alou Diakite, Cheng Li, Lei Xie
中文摘要: 最近的研究显示了深度学习在基于多参数MRI的视觉路径VP分割中的潜力。然而获取用于训练的标记数据既费力又耗时。因此在标记样本有限的情况下开发有效的算法至关重要。在这项工作中我们提出了一种标签有效的自集成深度学习方法LESEN。LESEN结合了监督和非监督损失使学生和教师模型能够相互学习形成一个自我集成的平均教师框架。此外我们引入了可靠的无标记样本选择RUSS机制以进一步提高LESEN的有效性。我们在人类连接体项目HCP数据集上的实验证明了我们的方法与最先进的技术相比的卓越性能推进了临床和研究环境中综合分析的多模态VP分割。实现代码可在以下网址获得https//github.com/aldiak/semi-supervised-multimodal-visual-pathway-delineation。
摘要: Recent research has shown the potential of deep learning in multi-parametric MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for training is laborious and time-consuming. Therefore, it is crucial to develop effective algorithms in situations with limited labeled samples. In this work, we propose a label-efficient deep learning method with self-ensembling (LESEN). LESEN incorporates supervised and unsupervised losses, enabling the student and teacher models to mutually learn from each other, forming a self-ensembling mean teacher framework. Additionally, we introduce a reliable unlabeled sample selection (RUSS) mechanism to further enhance LESEN’s effectiveness. Our experiments on the human connectome project (HCP) dataset demonstrate the superior performance of our method when compared to state-of-the-art techniques, advancing multimodal VP segmentation for comprehensive analysis in clinical and research settings. The implementation code will be available at: https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway- Delineation.
[Downlink:]http://arxiv.org/abs/2401.01654v1
[GitHub:]https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-| 标题: S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery
作者: Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan
中文摘要: 立体匹配和语义分割是双目卫星三维重建中的重要任务。然而以前的研究主要将这些任务视为独立的并行任务缺乏一个完整的多任务学习框架。本文介绍了一种解决方案单分支语义立体网络S3Net它创新性地将语义分割和立体匹配结合起来使用自融合和互融合模块。与以前独立利用语义或差异信息的方法不同我们的方法确定并利用这两个任务之间的内在联系导致对语义信息和差异估计的更准确理解。在US3D数据集上的对比测试证明了我们的S3Net的有效性。我们的模型将语义分割中的mIoU从61.38提高到67.39并将视差估计中的D1误差和平均端点误差EPE分别从10.051降低到9.579和1.439降低到1.403超过了现有的竞争方法。我们的代码可在以下网址查阅https://github.com/CVEO/S3Net。
摘要: Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.
[Downlink:]http://arxiv.org/abs/2401.01643v1
[GitHub:]https://github.com/CVEO/S3Net.| 标题: Context-Aware Interaction Network for RGB-T Semantic Segmentation
作者: Ying Lv, Zhi Liu, Gongyang Li
中文摘要: RGB-T语义分割是自动驾驶场景理解的关键技术。然而对于现有的RGB-T语义分割方法没有在多层次的信息交互中实现对不同模态之间互补关系的有效探索。为了解决这一问题提出了用于RGB-T语义分割的上下文感知交互网络CAINet该网络构建交互空间以利用辅助任务和全局上下文进行显式引导学习。具体来说我们提出了一个上下文感知互补推理CACR模块旨在建立多模态特征与长期上下文在空间和通道维度上的互补关系。此外考虑到全局上下文和细节信息的重要性我们提出了全局上下文建模GCM模块和细节聚合DA模块并引入了特定的辅助监督来明确指导上下文交互和细化分割图。在MFNet和PST900的两个基准数据集上的大量实验表明所提出的CAINet实现了最先进的性能。代码可在https://github.com/YingLv1106/CAINet。
摘要: RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding. For the existing RGB-T semantic segmentation methods, however, the effective exploration of the complementary relationship between different modalities is not implemented in the information interaction between multiple levels. To address such an issue, the Context-Aware Interaction Network (CAINet) is proposed for RGB-T semantic segmentation, which constructs interaction space to exploit auxiliary tasks and global context for explicitly guided learning. Specifically, we propose a Context-Aware Complementary Reasoning (CACR) module aimed at establishing the complementary relationship between multimodal features with the long-term context in both spatial and channel dimensions. Further, considering the importance of global contextual and detailed information, we propose the Global Context Modeling (GCM) module and Detail Aggregation (DA) module, and we introduce specific auxiliary supervision to explicitly guide the context interaction and refine the segmentation map. Extensive experiments on two benchmark datasets of MFNet and PST900 demonstrate that the proposed CAINet achieves state-of-the-art performance. The code is available at https://github.com/YingLv1106/CAINet.
[Downlink:]http://arxiv.org/abs/2401.01624v1
[GitHub:]https://github.com/YingLv1106/CAINet.|