[2024.02] 4주차 Today I Learned

02/19 월

1. HybrIK Multi Person 데모 코드 작성

- 우선 반복문으로 사람을 Detection한 Bounding Box를 모두 Pose 추정 -> 3D Mesh -> 시각화 진행

- 추후 Inference 시간 감소를 위한 코드 수정 예정

2. Pytorch3D를 이용한 3D 렌더링 시간 감소

- 단순히 시각화 코드를 돌렸을 때 3D Mesh를 시각화 하는데 120ms정도의 많은 시간이 걸림

- 아래의 Github 이슈를 통해 코드를 수정하였더니 24ms정도로 감소하는 것을 확인!!

https://github.com/facebookresearch/pytorch3d/issues/591

Speed of rendering · Issue #591 · facebookresearch/pytorch3d

❓ How is the speed to render a mesh, could we have a performance table to compare? To render a mesh with 25k vertices of image width-height-channel 4096x4096x4. And I use GPU. My shader only uses t...

github.com

02/20 화

1. 3D Mesh Research Paper 논문 읽기

https://arxiv.org/abs/2203.01923

Recovering 3D Human Mesh from Monocular Images: A Survey

Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and p

arxiv.org

2. ROMP, BEV 코드 Inference

- 실시간 Inference가 가능할만큼 빠른 속도를 보였다.

https://github.com/Arthur151/ROMP

GitHub - Arthur151/ROMP: Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023] - Arthur151/ROMP

github.com

02/21 수

1. ROMP 논문 리딩

https://arxiv.org/abs/2008.12272

Monocular, One-stage, Regression of Multiple 3D People

This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contra

arxiv.org

2. Gradio로 Video를 넣고 딥러닝 모델(HybrIK-X)을 돌려예측한 결과가 용량이 작게 나오는 문제가 자꾸 발생하여 이를 해결하려 시도

-> 실패... 명확한 원인을 아직 찾지 못해 계속 오류 해결해보아야 할듯

3. 영어 회화 스터디

02/22 목

1. HybrIK-X Gradio 데모 FFMPEG 오류 해결

- Torchvision을 Conda 명령어로 깔 경우 ffmpeg 패키지를 건드려서 시스템 ffmpeg이 아닌 다른 ffmpeg을 사용하게 된다...

- 이 때문에 H.264 코덱을 사용할 수 없었고 웹 데모가 안되기 때문에 실행이 불가능하는 문제가 있었다.

- 따라서 아래와 같은 방법으로 해결하였다.

(1) 기존에 conda 명령어로 설치한 torch, torchvision을 삭제하고 pip으로 재설치
(2) 영상을 만들어주는 코드 후에 os 패키지를 활용한 ffmpeg 명령어로 만들어진 영상을 H264코덱으로 변경(기존 mp4v)

2. 영어 회화 스터디

02/23 금

1. BEV 논문 리딩

https://arxiv.org/abs/2112.08274

Putting People in their Place: Monocular Regression of 3D People in Depth

Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This i

arxiv.org

2. OSX 모델 Inference

https://github.com/IDEA-Research/OSX

GitHub - IDEA-Research/OSX: [CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Compone

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer" - IDEA-Research/OSX

github.com

02/24 토

1. 영어 회화 스터디

저작자표시

'TIL' 카테고리의 다른 글

[2024.02] 5주차 Today I Learned (0)	2024.02.26
[2024.02] 3주차 Today I Learned (0)	2024.02.15
[2024.02] 2주차 Today I Learned (2)	2024.02.05

rahites' AI story

[2024.02] 4주차 Today I Learned

02/19 월

02/20 화

02/21 수

02/22 목

02/23 금

02/24 토

'TIL' 카테고리의 다른 글

댓글

티스토리툴바

[2024.02] 4주차 Today I Learned

02/19 월

02/20 화

02/21 수

02/22 목

02/23 금

02/24 토

'TIL' 카테고리의 다른 글

관련글

댓글

티스토리툴바