Proprioceptive State Estimation for Amphibious Tactile Sensing

Ning Guo^#, Xudong Han^#, Shuqiao Zhong^#, Zhiyuan Zhou, Jian Lin, Jian S. Dai, Fang Wan*, Chaoyang Song*

Southern University of Science and Technology

^#Equal Contribution, *Corresponding authors

IEEE Transactions on Robotics

Abstract

This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compact in-finger camera captures high-framerate images of the finger's deformation during contact, extracting crucial tactile data in real time. We present a volumetric discretized model of the soft finger and use the geometry constraints captured by the camera to find the optimal estimation of the deformed shape. The approach is benchmarked using a motion capture system with sparse markers and a haptic device with dense measurements. Both results show state-of-the-art accuracies, with a median error of 1.96 mm for overall body deformation, corresponding to 2.1% of the finger's length. More importantly, the state estimation is robust in both on-land and underwater environments as we demonstrate its usage for underwater object shape sensing. This combination of passive adaptation and real-time tactile sensing paves the way for amphibious robotic grasping applications.

Soft Polyhedral Network with In-Finger Vision

In this study, we adopted our previous work in a class of Soft Polyhedral Networks (SPNs) with in-finger vision as the soft robotic finger. An ArUco tag is attached to the bottom side of a rigid plate mechanically fixed with the four lower crossbeams of the soft finger. A monocular RGB camera with a field of view (FOV) of 130° is fixed at the bottom inside a transparent support frame as in-finger vision, video-recording in a high frame rate of 120 FPS (frames per second) at 640x480 pixels resolution. When the soft robotic finger interacts with the external environment, live video streams captured by the in-finger vision camera provide real-time pose data of the ArUco tag as rigid-soft kinematics coupling constraints for the proprioceptive state estimation (PropSE) of the soft robotic finger. This marker-based in-finger vision design is equivalent to a miniature motion capture system, efficiently converting the soft robotic finger's spatial deformation into real-time 6D pose data.

Proprioceptive State Estimation of Soft Finger

Volumetric Modeling of Soft Deformation

Our proposed solution begins by formulating a volumetric model of the soft robotic finger in a 3D space filled with homogeneous elastic material. The visual observed marker area is approximated as aggregated multi-handles (AMH) on the mesh, and uniform rigid motion is applied to the AMH to drive the soft finger to a deformed configuration. The soft finger shape estimation can be translated into a constrained geometry optimization problem.

Comparison with Conventional Methods

We performed a comparative analysis with two widely adopted techniques, Abaqus and As-Rigid-As-Possible (ARAP), to showcase the efficacy of our shape estimation method. Results show that our method is 40 to 700 times faster than Abaqus and 1 to 2 times faster than ARAP at different resolutions, and our method's mean error decreases significantly, from 0.346 mm to 0.086 mm, as the number of elements increases, showing significant advantages over Abaqus and ARAP.

Evaluation with Motion Capture System

To evaluate the performance of the deformation estimation, the soft robotic finger was mounted on a three-axis motion platform for interactive deformation estimation. A motion capture system (Mars2H, Nokov) was used to track finger deformations. There were nine markers attached to the soft finger with rigid links. Among them, six markers were divided into three pairs, which were rigidly attached to the fingertip, the first layer, and the second layer of the soft finger, respectively. The other three markers were attached to the platform and used as the reference reading to align the motion capture system's reference frame with the platform's coordinate frame.

We visualize the error distribution with 3k pairs of the estimated and ground truth positions of the markers. The norm of the total error is within 3 mm, while error distribution along each axis is centered around the (-2, 2) mm range. As the marker prediction model comprises calibration and geometric optimization, the error distribution of six sparse markers may only partially validate the proposed method, leading to the next experiment.

Evaluation with Touch Haptic Device

We designed another validation experiment using the pen-nib's position of a haptic device (Touch, 3D Systems) as ground truth measurement. an operator holding the pen-nib initiated contact at a random point on the soft robotic finger by pushing it five times. Fifty points were sampled, spreading over half of the soft robotic finger with recorded pen-nib position and the corresponding point of contact on the estimated deformation in the mesh model.

Similar to the calibration process when using the motion capture system, we solve the barycentric coordinates using the initial contact position of pen-nib and the undeformed vertex position of the tetrahedron nearest to the contact point. Due to the variations among the pushing trajectories among the three locations, the errors are slightly different, but all lie within a 2.5 mm range. The haptic device measurements cover an extensive portion of the soft robotic finger, revealing further details regarding the spatial distribution of the estimation errors. We visualize the mean errors of deformation estimation evaluated at the fifty randomly selected contact locations. Contact locations near the observed AMH constraint are expected to exhibit fewer errors due to penalized computation near this region during deformation optimization. The median of estimated error for the whole-body deformation is 1.96 mm, corresponding to 2.1% of the finger's length.

Amphibious Tactile Sensing

Benchmarking VBTS against Turbidity

Our proposed rigidity-aware AMH method effectively transforms the visual perception process for deformable shape reconstruction into a marker-based pose recognition problem. Therefore, the benchmarking of our vision-based tactile sensing solution underwater is directly determined by successfully recognizing the fiducial marker poses used in our system under different turbidity conditions. Turbidity is an optical characteristic that measures the clarity of a water body and is reported in Nephelometric Turbidity Units (NTU). It influences the visibility of optical cameras for underwater inspection, inducing light attenuation effects caused by the suspended particles. As one of the critical indicators for characterizing water quality, there have been rich studies on the turbidity of large water bodies worldwide. For example, previous research shows that the Yangtze River's turbidity is measured between 1.71 and 154 NTU.

We investigated the robustness of our proposed VBTS solution in different water clarity conditions by mixing condensed standard turbidity liquid with clear water to reach different turbidity ratings. Our proposed soft robotic finger is installed on a linear actuator in a tank filled with 56 liters of clear water. A probe is fixed under the soft robotic finger, inducing contact-based whole-body deformation when the finger is commanded to move downward. In our experiment, for the turbidity range between 0 and 40 NTU, the raw images captured by our in-finger vision achieved a 100\% success rate in ArUco pose recognition. At 50 NTU turbidity, the first sign of failed marker pose recognition was observed when the most considerable deformation was induced at 8 mm. Our experiment shows that this issue can be alleviated using simple image enhancement techniques to regain a 100% marker pose recognition success rate. However, the marker pose recognition performance under large-scale whole-body deformation quickly deteriorated when the turbidity reached 60 NTU and eventually became unusable at 70 NTU. Image enhancement could effectively increase the upper bound to 100 NTU to reach an utterly unusable marker pose recognition in large-scale whole-body deformation. For turbidity above 100 NTU, simple image enhancement provides limited contributions to our system. Our experiment shows that when the turbidity reached 160 NTU, our in-finger system failed to recognize any ArUco pose underwater, even after image enhancement.

Underwater Exteroceptive Estimation of Object Shape

While proprioception refers to being aware of one's movement, tactile sensing involves gathering information about the external environment through the sense of touch. This section presents an object shape estimation approach by extending the PropSE method proposed to tactile sensing.

Since our soft finger can provide large-scale, adaptive deformation conforming to the object's geometric features through contact, we could infer shape-related contact information from the finger's estimated shape during the process. We assume the soft finger's contact patch coincides with that of the object during grasping. As a result, we can predict object surface topography using spatially distributed contact points on the touching interface. In this case, we used a parallel two-finger gripper (HandE from Robotiq, Inc.) attached to the wrist flange of a robotic manipulator (Franka Emika) through a 3D-printed cylindrical rod for an extended range of motion. Our soft robotic fingers are attached to each fingertip of the gripper through a customized adapter fabricated by 3D printing. With the gripper submerged underwater, the system is programmed to sequentially execute a series of actions, including gripping and releasing the object and moving along a prescribed direction for a fixed distance to acquire underwater object shape information.

We present our method on actual data collected during the underwater tactile exploration experiment. The shape estimates at each cutting sectional plane are compared concerning the ground truth using the Chamfer Distance (CD). We chose five vertical cutting planes and one horizontal sectional plane for reconstructed object surface evaluation. For each cutting plane, a calibration error exists between the vase and the Hand-E gripper, leading to the expected gap between the reconstructed and ground truth points. In addition to the systematic error, we have observed a slight decrease in the CD metric values between planes 1 and 5 compared to planes 2, 3, and 4, which could be attributed to the limitations of the soft finger in adapting to small objects with significant curvature. On the other hand, by employing tactile exploration actions with a relatively large contact area on the soft finger's surface, the shape estimation of objects similar in size to the vase can be accomplished more efficiently, typically within 8-12 touches.

Vision-based Tactile Grasping with an Underwater ROV

Here, we provide a full-system demonstration using our vision-based soft robotic fingers on an underwater Remotely Operated Vehicle (ROV, FIFISH PRO V6 PLUS, QYSEA). It includes a single-DOF robotic gripper, which can be modified using the proposed soft fingers with customized adaptors. Our design conveniently introduced omni-directional adaptation capability to the gripper's existing functionality with added capabilities in real-time tactile sensing underwater. Using the in-finger images, we can use the methods proposed in this work to achieve real-time reconstruction of contact events on our soft robotic finger.

BibTeX

        
@article{guo2024proprioceptive,
    title={Proprioceptive State Estimation for Amphibious Tactile Sensing},
    author={Guo, Ning and Han, Xudong and Zhong, Shuqiao and Zhou, Zhiyuan and Lin, Jian and Dai, Jian S and Wan, Fang and Song, Chaoyang},
    journal={IEEE Transactions on Robotics},
    volume={40},
    pages={4684-4698},
    year={2024},
    publisher={IEEE},
    doi={10.1109/TRO.2024.3463509}
}