Additional Publication Information
Journal : IET Electronics Letters
Website: link
Volume/Issue : 46/12
Page(s) : 837-839
Date : June 2010
Download : here
A new quality assessment model is presented to predict sensation of depth in 3D video in the format of colour (monocular) video augmented by the grey scale depth map. The proposed technique is capable of evaluating both monoscopic and stereoscopic contributions towards depth perception. Results show that sensation of depth can be effectively modelled with the proposed model by combining visually important features to the brain.
Conference : 3DTV Conference (3DTV-Con 2010)
Website: link
Place : Tampere, Finland
Page(s) : 1- 4
Date(s) : 7-9 June 2010
Download : here
The emergence of three dimensional (3D) video applications, based on Depth Image Based Rendering (DIBR) has brought about new dimensions to the video transmission problem, due to the need to transmit additional depth information to the receiver. Until the transmission problem of 3D video is adequately addressed, consumer applications based on 3D video will not gain much popularity. Exploiting the unique correlations that exist between the color and their corresponding depth images, will lead to more error resilient video encoding schemes for 3D video. In this paper we present an error resilient 3D video communication scheme that exploits the correlation of motion vectors in color and depth video streams. The presented method achieves up to 0.8 dB gain for color sequences and up to 0.7 dB gain for depth sequences over error prone communication channels.
Conference : 3DTV Conference (3DTV-Con 2010)
Website: link
Place : Tampere, Finland
Page(s) : 1- 4
Date(s) : 7-9 June 2010
Download : here
One method of evaluating the quality of stereoscopic video is the use of conventional two dimensional (2D) objective metrics. Metrics with good representation of the Human Visual System (HVS) will present more accurate evaluation. In this paper we propose a perceptual based objective metric for 2D videos for 3D video quality evaluation. The proposed Perceptual Quality Metric (PQM) shows better results for 3D video quality evaluation and outperforms the Video Quality Metric (VQM); as it is sensitive to slight changes in image degradation and error quantification starts at pixel level right up to the sequence level. Verifications are done through series of subjective tests to show the level of correlation of PQM and user scores.
Conference : IEEE International Conference on Multimedia and Expo (ICME 2010)
Website: link
Place : Singapore
Page(s) : 1219 - 1224
Date(s) : 19-23 July 2010
Download : here
The most distinguishing feature of 3D display systems, compared to their traditional 2D counterparts, is their ability to provide an additional perception of depth to its viewers. Thus, the mechanisms behind human depth perception play a significant role in 3D video systems. While there has been significant amounts of research carried out to understand human depth perception, in the areas of physiology and psychology, its applicability to 3D display systems is seldom spoken. Understanding the mechanisms of depth perception is of utmost importance to the development of 3D video technologies that are heavily based on exploitation of human perception. In this paper, it is explained with the aid of existing physiological and psychological models how humans perceive depth in 3D video displays. Based on these explanations a mathematical model is derived to explain the just noticeable difference in depth (JNDD) as perceived by a viewer, watching 3D video. The derived model is experimentally validated on an auto-stereoscopic display. This model is expected to be useful in both 3D content productions as well as in 3D content processing and compression.
Conference : IEEE International Conference on Multimedia and Expo (ICME 2010)
Website: link
Place : Singapore
Page(s) : 1712 - 1717
Date(s) : 19-23 July 2010
Download : here
Research on error recovery in multi-view coding has received considerable interest in the recent past. However, at present research on error resilience in multi-view coding is at a very primitive level. This paper addresses the above issue by presenting redundant data coding method by means of disparity vectors. The proposed system is implemented using the JSVM codec and it is tested in Internet Packet (IP) network environment. Due to the addition of redundant data the proposed system experience slight quality degradation at error free environment. However, experiments results shows that the proposed algorithm provides effective error recovery in both subjective as well as objective quality level, under error prone environments.
Conference : IEEE International Conference on Multimedia and Expo (ICME 2010)
Website: link
Place : Singapore
Page(s) : 667 - 672
Date(s) : 19-23 July 2010
Download : here
Motion estimation is the most computationally intensive element in a block based video codec. In this paper, a novel Predictive Intensive Direction Searching (PIDS) algorithm is proposed to reduce the computational load of H.264/AVC video codec. Based on the direction of predicted motion vector, the search area is adaptively divided into one region of intensive-direction-searching and several regions of' coarse-direction-searching. Then the extended hexagon search is used to refine the final optimal motion vector. The experimental results indicate that the proposed algorithm achieves a reduction of 15% motion estimation time, while incurring only 0.5% increment on the bite rate compared with that of UMHexagonS algorithm, which is adopted by the H.264/AVC reference software.
Conference : IEEE International Conference on Multimedia and Expo (ICME 2010)
Website: link
Place : Singapore
Page(s) : 1718 - 1723
Date(s) : 19-23 July 2010
Download : here
Interest in 3D video visualization systems is an ever growing field. Such areas include the provision of 3D content to users thus opening the exploration of 3D video communication and transmission. To address communication and transmission one must consider error resilience. Multiple Description Coding (MDC) can provide a robust video communication over wireless networks. However it can introduce high levels of redundancy. In this paper, we propose a scalable MDC architecture using motion vector (MV) encoding for 3D video. Experimental results show that the proposed algorithm can improve the frame quality by up to 2dB over a pixel based interpolation scheme with residual coding while significantly reducing the bit rate compared to a pixel and motion interpolation schemes.
Conference : International Conference on User Centric Media (UCMedia 2010)
Website: link
Place : Mallorca, Spain
Page(s) : N/A
Date(s) : September 2010
Download : here
For enjoying 3D video to its full extent, access and consumption of 3D content should be user centric, which in turn ensures enhanced quality of user experience. The experience nevertheless is easily influenced by several factors, including content characteristics, users� preferences, contexts prevailing in various usage environments, etc. Utilizing ambient illumination as an environmental context for the purposes of efficient provision of 3D video to users has particularly not been studied in literature in detail. This paper investigates the effects of ambient illumination on 3D video quality and depth perception for utilizing this information as one of the key context elements in future user centric 3D access and consumption environments. Subjective tests conducted under different illumination conditions demonstrate that the illumination of the viewing environment encircling the users has significant effects on the perceived 3D video quality as well as depth perception.
Conference : IBC 2010
Website: link
Place : Amsterdam, The Netherlands
Page(s) : N/A
Date(s) : 9- 14 September 2010
Download : here
MUSCADE (Multimedia Scalable 3D for Europe) is a European project, funded under the European Commission ICT 7th Framework Programme. MUSCADE aims at generating major innovations in the fields of 3DTV production equipment and tools, data representation, compression, transmission and rendering on various kinds of 3D displays. This paper provides an overview of the MUSCADE system architecture and the first technical choices made by the consortium. The final objective of MUSCADE is to demonstrate a complete multiview 3DTV live chain over wireline, wireless and satellite networks.
Conference : IEEE International Conference on Image Processing
Website: link
Place : Hong Kong, China
Page(s) : N/A
Date(s) : 26-29 September 2010
Download : here
The ability to provide a realistic perception of depth is the core added functionality of modern 3D video display systems. At present, there is no standard method to assess the perception of depth in 3D video. Existence of such methods would immensely enhance the progression of 3D video research. This paper focuses on the depth perception assessment in color plus depth representation of 3D video. In this paper, we subjectively evaluate the depth perceived by the users on an auto stereoscopic display, and analyze its variation with the impairments introduced during the compression of the depth images. The variation of the subjective perception of depth is explained based on another evaluation that is carried out to identify the Just Noticeable Difference in Depth (JNDD) perceived by the subjects. The JNDD corresponds to the sensitivity of the observers to the changes in depth in a 3D video scene. Even though only the effects of compression artifacts are considered in this paper, the proposed assessment technique, based on the JNDD values can be used in any future depth perception assessment work.
Conference : IEEE International Conference on Image Processing
Website: link
Place : Hong Kong, China
Page(s) : N/A
Date(s) : 26-29 September 2010
Download : here
The paper discusses an assistance system for stereo shooting and 3D production, called Stereoscopic Analyzer (STAN). A feature-based scene analysis estimates in real-time the relative pose of the two cameras in order to allow optimal camera alignment and lens settings directly at the set. It automatically eliminates undesired vertical disparities and geometrical distortions through image rectification. In addition, it detects the position of near- and far objects in the scene to derive the optimal inter-axial distance (stereo baseline), and gives a framing alert in case of stereoscopic window violation. Against this background the paper describes the system architecture, explains the theoretical background and discusses future developments.
Conference : IEEE International Workshop on Multimedia Signal Processing (MMSP 2010)
Website: link
Place : Saint-Malo, France
Page(s) : N/A
Date(s) : 4-6 October
Download : here
The paper discusses an assistance system for stereo shooting and 3D production, called Stereoscopic Analyzer (STAN). A feature-based scene analysis estimates in real-time the relative pose of the two cameras in order to allow optimal camera alignment and lens settings directly at the set. It automatically eliminates undesired vertical disparities and geometrical distortions through image rectification. In addition, it detects the position of near- and far objects in the scene to derive the optimal inter-axial distance (stereo baseline), and gives a framing alert in case of stereoscopic window violation. Against this background the paper describes the system architecture, explains the theoretical background and discusses future developments.
Conference : NEM Summit 2010
Website: link
Place : Barcelona, Spain
Page(s) : N/A
Date(s) : 13-15 October 2010
Download : here
3D content to the home is becoming more and more a reality. Consequently new challenges have to be addressed to ensure a good Quality of Experience with these new contents. Since the human vision system is not perfectly adapted to visualize 3D content on TV, dedicated video processing must be applied to enhance the 3D rendering. Furthermore, variability of the human vision is so high that 3D intensity adjustment tools will be required for some people and/or for some specific contents. Several use cases are presented to illustrate this necessity. The view interpolation technology is described, using stereo content associated with dense disparity map. Depending on the application, the insertion of this processing in the content workflow is discussed and some standardization challenges are presented.
Journal : IEEE Transactions on Consumer Electronics
Website: link
Volume/Issue : 56/4
Page(s) : 2735-2740
Date : November 2010
Download : here
A technique to minimize distortions in synthesized virtual views, while encoding depth maps that are used in Depth Image Based Rendering (DIBR) applications is proposed. Depth maps are not viewed by end users, but are used for virtual view generation. Therefore, it is important to compress depth maps in a way that it minimizes distortion in views rendered with them. In doing so, it would be possible to generate high quality virtual views using compressed depth maps. Firstly, an error model to approximate rendering distortion caused by disparity changes is proposed. Thereafter, this error model is used at the encoding mode selection stage of coding depth maps. Experimental results illustrate an average bit rate saving of 19%-76%, compared with the mode selection method, which is based on minimizing pixel errors only of the depth map. Further, encoding depth maps with the proposed technique improves the overall visual quality of rendered views.
Journal : IET Electronics Letters
Website: link
Volume/Issue : 46/23
Page(s) : 1546 - 1548
Date : November 2010
Download : here
Depth maps, which can be represented as greyscale images, are used to aid rendering of novel views in three-dimensional (3D) video systems. However, compressing them using existing video codecs, such as H.264/AVC, leads to low quality rendered views. Presented is a sharpening method based on adaptive bilateral filtering to eliminate certain artifacts observed in compressed depth maps to improve the quality of rendered views. Experimental results demonstrate that significant rendering quality improvements of up to 1.9 dB can be achieved with the proposed method.
Conference : IEEE International Conference on Consumer Electronics (ICCE 2011)
Website: link
Place : Las Vegas, NV, USA
Page(s) : N/A
Date(s) : 9-12 January 2011
Download : here
Depth maps are used for rendering novel views in 3-Dimensional (3D) Television systems. When depth maps are compressed using existing codecs, the compression artifacts will cause undesirable distortions in the rendered views. This paper proposes an adaptive bilateral filtering technique to eliminate such artifacts at the receiver end. The experimental results demonstrate that the proposed method significantly improves the quality of rendered views up to 1.5dB, with minimal increase in complexity.
Conference : NEM Summit 2010
Website: link
Place : Barcelona, Spain
Page(s) : N/A
Date(s) : 13-15 October 2010
Download : here
MUSCADE (Multimedia Scalable 3D for Europe) is a European project, funded under the European Commission ICT 7th Framework Programme. MUSCADE aims at generating major innovations in the fields of 3DTV production equipment and tools, data representation, compression, transmission and rendering on various kinds of 3D displays. This paper provides an overview of the MUSCADE system architecture and the first technical choices made by the consortium. The final objective of MUSCADE is to demonstrate a complete multiview 3DTV live chain over wireline, wireless and satellite networks.
Conference : 5th IEEE 3DTV Conference (3DTV-CON 2011)
Website: link
Place : Antalya, Turkey
Page(s) : N/A
Date(s) : 16-18 May 2011
Download : here
To speed-up the proliferation of advanced 3-Dimensional (3D) technologies into the consumer market, the influence of these technologies on the perception of 3D video should be determined. Currently, this can only be achieved using either subjective assessment techniques or 2D objective quality evaluation models. Even though the subjective assessment techniques are better than the objective models from the accuracy point of view, they are time consuming and costly. Thus, 2D objective quality evaluation models correlating with Human Visual System (HVS) should be used to predict the 3D video quality perception of users in a reliable way with less effort. Video Quality Metric (VQM), which is a standardized 2D objective quality measurement model due to its well correlation with HVS, is used to predict 3D video quality perception of users reliably. However, ambient illumination context of the viewing environment, which has an effect on 3D video quality perception, is not considered in the quality assessments by VQM. Content adaptation is one of the key applications that need to use the perceived 3D quality assessments under different ambient illumination conditions at regular basis for ensuring improved video experience of users. Therefore, the standardized VQM model is extended using ambient illumination context and content related contexts (i.e., motion, structural feature, and luminance contrast) to predict 3D video quality measurement under a particular ambient illumination condition in this paper. The results prove that the extended VQM model can be efficiently utilized to predict the video quality perception of 3D video under a particular ambient illumination condition.
Conference : 5th IEEE 3DTV Conference (3DTV-CON 2011)
Website: link
Place : Antalya, Turkey
Page(s) : N/A
Date(s) : 16-18 May 2011
Download : here
This paper presents an approach for rendering heavily extrapolated novel views to be used as input for light-field displays. This view generation method builds on a combination and enhancement of existing methods. The interpolation quality is assured by detecting and keeping the most reliable gap area information from the content using depth layers. Concerning the extrapolation process, which is the most important part of this paper, we implemented an algorithm that prefers isophotes lines in order to reconstruct objects and patterns using gradient filling and Poisson reconstruction. Using the algorithms described, it is possible to generate wide baseline light field data from Multi-View plus Depth (MVD) data of moderate baseline. The approach is demonstrated by generating interpolated and extrapolated views for feeding a HoloVizio large-scale display with captured video data.
Conference : IEEE International Conference on Multimedia and Expo (ICME 2011)
Website: link
Place : Barcelona, Spain
Page(s) : N/A
Date(s) : 11-15 July 2011
Download : N/A
Content creation for autostereoscopic displays is a widely unresolved task. Typical methods rely on view synthesis based on depth image based rendering. Our method applies purely image domain warping instead. Input video is analyzed and information about sparse disparity, vertical edges and saliency is extracted. A constrained energy minimization problem is formulated and efficiently solved. The resulting image warping functions are used to synthesize novel views. Our approach is fully automatic, accurate, and reliable. Disocclusions and related artifacts are avoided due to smooth, saliency-driven warping functions. Our method also works well for extrapolation of views in a limited range, thus supporting multiview creation from stereo input, which is the most relevant use case scenario.
Conference : SMPTE 2nd annual International Conference on Stereoscopic 3D for Media & Entertainment
Website: N/A
Place : New York, USA
Page(s) : N/A
Date(s) : 21-22 June 2011
Download : here
What stereoscopic 3D content brings to the viewer is essentially a binocular cue to better understand the depth of a scene. A disparity map associated to the stereoscopic content is a key data to ensure that the 3D effect will be well accepted by the user. Limitation of the human vision system in term of binocular cue acceptance will be presented. It will explain when specific video processing can improve the quality of experience. Several uses cases either in post-production or on the end-user side will be presented. All of them will use the dense disparity map as additional input data. Means to generate this disparity map will be introduced. Then technical details of disparity-based algorithms will be presented as well as their positioning in the global 3D workflow. Some standardization challenges will also be addressed.
Conference : IEEE International Conference on Industrial and Information Systems (ICIIS)
Website: N/A
Place : Peradenitya, Sri Lanka
Page(s) : N/A
Date(s) : 16-19 August 2011
Download : here
Performance of real-time video processing applications such as surveillance systems, contentbased search, is limited by the complexity of video content analysis in the pixel domain. A low complex alternative is to analyse the video in the compressed domain, where content features already available in the compressed video are directly used in the analysis. However, this is achieved at the expense of output precision and reliability, due to compression-efficiency driven feature selection at the encoder. Therefore, video applications could benefit from enhanced reliability of data embedded in the compressed video. In this paper, we present a scalable optimization model that addresses the accuracy of content features in parallel with the conventional rate-distortion optimization criterion. We analyse and optimize rate-distortion performance of video encoder under content description accuracy constrain, using a motion calibrated synthetic data set containing a range of scene and motion complexity levels. Finally, using a natural video data set, we demonstrate that the proposed optimization framework can be used to enhance compressed feature accuracy without incurring a rate-distortion overhead.
Conference : IEEE International Conference on Industrial and Information Systems (ICIIS)
Website: N/A
Place : Peradenitya, Sri Lanka
Page(s) : N/A
Date(s) : 16-19 August 2011
Download : here
The proliferation of video consumption, especially over mobile devices, has created a demand for efficient interactive video applications and high-level video analysis. This is particularly significant in real-time applications and resource-limited scenarios. Pixel-domain video processing is often inefficient for many of these applications due to its complexity, whereas compressed domain processing offer fast but unreliable results. In order to achieve fast and effective video processing, this paper proposes a novel video encoding architecture that facilitate efficient compressed domain processing, while maintaining compliance with the mainstream coding standards. This is achieved by optimizing the accuracy of motion information embedded in the compressed video, in addition to compression efficiency. In a motion detection application, we demonstrate that the motion estimated by the proposed encoder can be directly used to extract object information, as opposed to conventionally coded video. The incurred rate distortion overheads can be weighed against the reduced processing required for video analysis targeting a wide spectrum of computer vision applications.