Updated on 2024/12/20

写真a

 
HARA Sunao
 
Organization
Faculty of Environmental, Life, Natural Science and Technology Associate Professor
Position
Associate Professor
Profile

He received the B.S., M.S., Ph.D degrees from Nagoya University in 2003, 2005 and 2011, respectively.
He is currently an assistant professor in the Graduate School of Information Science, Nara Institute of Science and Technology.
His research interests include development and evaluation of spoken dialog in real environments.
He is a member of the Acoustic Society in Japan, Human Interface Society in Japan, and Information Processing Society of Japan.

External link

Degree

  • Ph.D (Information science) ( Nagoya university )

Research Interests

  • Human Interface

  • Spoken dialogue

  • Speech recognition

  • lifelog

  • Acoustic scene analysis

  • Acoustic event detection

  • Deep Learning

  • Machine Learning

  • Speech processing

  • Spoken dialog system

Research Areas

  • Informatics / Intelligent informatics

  • Informatics / Web informatics and service informatics

  • Informatics / Perceptual information processing

Research History

  • Okayama University   Faculty of Environmental, Life, Natural Science and Technology   Associate Professor

    2024.4

      More details

  • Okayama University   Graduate School of Interdisciplinary Science and Engineering in Health Systems   Assistant Professor

    2019.4 - 2024.4

      More details

    Country:Japan

    Notes:工学部 情報系学科

    researchmap

  • Okayama University   The Graduate School of Natural Science and Technology   Assistant Professor

    2012.9 - 2019.3

      More details

    Country:Japan

    Notes:工学部 情報系学科

    researchmap

  • Nara Institute of Science and Technology   Assistant Professor

    2011.11 - 2012.9

      More details

Professional Memberships

  • IEEE

    2016.6

      More details

  • THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.

    2012.2

      More details

  • INFORMATION PROCESSING SOCIETY OF JAPAN

    2007

      More details

  • ACOUSTICAL SOCIETY OF JAPAN

    2004

      More details

  • The Japanese Society for Artificial Intelligence

    2024.5

      More details

Committee Memberships

  • 日本音響学会   編集委員会 会誌部会 委員  

    2023.6   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会 関西支部   第24回若手研究者交流研究発表会 実行委員長  

    2021.4 - 2022.3   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   広報・電子化委員会 委員  

    2013.10   

      More details

    Committee type:Academic society

    電子化・広報推進委員会

    researchmap

  • 日本音響学会   研究発表会準備委員会 委員  

    2023.6   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   2023年春季研究発表会 遠隔開催実行委員会 委員  

    2022.12 - 2023.3   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   2022年春季研究発表会 遠隔開催実行委員会 委員  

    2021.10 - 2022.3   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   2021年秋季研究発表会 遠隔開催実行委員会 委員  

    2021.7 - 2022.3   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   2021年春季研究発表会 遠隔開催実行委員会 委員  

    2020.11 - 2021.3   

      More details

  • 日本音響学会   2020年秋季研究発表会 遠隔開催実行委員会 委員  

    2020.7 - 2020.9   

      More details

    Committee type:Academic society

    researchmap

  • 電子情報通信学会   ソサイエティ論文誌編集委員会 査読委員  

    2017.8   

      More details

    Committee type:Academic society

    researchmap

  • 情報処理学会 中国支部   支部運営委員会 委員  

    2015.5 - 2019.5   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   編集委員会 査読委員  

    2014.2   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会 関西支部   若手研究者交流研究発表会 実行委員  

    2011.11 - 2022.3   

      More details

    Committee type:Academic society

    researchmap

  • 日本音響学会   学生・若手フォーラム幹事会  

    2007.3 - 2012.3   

      More details

    Committee type:Academic society

    researchmap

▼display all

 

Papers

  • Continual learning on audio scene classification using representative data and memory replay GANs Reviewed International coauthorship International journal

    Ibnu Daqiqil ID, Masanobu Abe, Sunao Hara

    Bulletin of Electrical Engineering and Informatics   14 ( 1 )   568 - 580   2025.2

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Advanced Engineering and Science  

    This paper proposes a methodology aimed at resolving catastropic forgetting problem by choosing a limited portion of the historical dataset to act as a representative memory. This method harness the capabilities of generative adversarial networks (GANs) to create samples that expand upon the representative memory. The main advantage of this method is that it not only prevents catastrophic forgetting but also improves backward transfer and has a relatively stable and small size. The experimental results show that combining real representative data with artificially generated data from GANs, yielded better outcomes and helped counteract the negative effects of catastrophic forgetting more effectively than solely relying on GAN-generated data. This mixed approach creates a richer training environment, aiding in the retention of previous knowledge. Additionally, when comparing different methods for selecting data as the proportion of GAN-generated data increases, the low probability and mean cluster methods performed the best. These methods exhibit resilience and consistency by selecting more informative samples, thus improving overall performance.

    DOI: 10.11591/eei.v14i1.8127

    researchmap

  • OtologMap: a case study on the construction of an environmental sound map recorded by smart devices at Okayama and Kurashiki Reviewed International journal

    Sunao Hara, Masanobu Abe

    Noise Control Engineering Journal   1 - 12   2025.2

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Noise Control Engineering of the USA  

    DOI: 10.3397/1/37731

    researchmap

  • Explicit Prosody Control to Realize Discourse Focus in End-to-End Text-to-Speech Reviewed International journal

    Takumi WADA, Sunao HARA, Masanobu ABE

    IEEE International Workshop on Machine Learning for Signal Processing   2024.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Speech Synthesis Using Ambiguous Inputs From Wearable Keyboards Reviewed International journal

    Matsuri Iwasaki, Sunao Hara, Masanobu Abe

    2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2023)   1172 - 1178   2023.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    This paper proposes a new application in speech communication using text-to-speech (TTS), and the goal is to enable dysarthria, articulation disorder, or persons who have difficulty in speaking to communicate anywhere and anytime using speech to express their thoughts and feelings. To achieve this goal, an input method is required. Thus, we propose a new text-entry method based on three concepts. First, from an easy-to-carry perspective, we used a wearable keyboard that inputs digits from 0 to 9 in decimal notation according to 10-finger movements. Second, from a no-training perspective, users input sentences in a way of touch typing using the wearable keyboard. Following this method, we obtained a sequence of numbers corresponding to the sentence. Third, a neural machine translation (NMT) method is applied to estimate texts from the sequence of numbers. The NMT was trained using two datasets; one is a Japanese-English parallel corpus containing 2.8 million pairs of sentences, which were extracted from TV and movie subtitles, while the other is a Japanese text dataset containing 32 million sentences, which were extracted from a question-and-answer platform. Using the model, phonemes and accent symbols were estimated from a sequence of numbers. Thus, the result accuracy in symbol levels was 91.48% and 43.45% of all the sentences were completely estimated with no errors. To subjectively evaluate feasibility of the NMT model, a two-person word association game was conducted; one gave hints using synthesized speech that is generated from symbols estimated by NMT, while the other guessed answers. As a result, 67.95% of all the quizzes were correctly answered, and experiment results show that the proposed method has the potential for dysarthria to communicate with TTS using a wearable keyboard.

    DOI: 10.1109/APSIPAASC58517.2023.10317228

    Scopus

    researchmap

  • Speech-Emotion Control for Text-to-Speech in Spoken Dialogue Systems Using Voice Conversion and x-vector Embedding Reviewed International journal

    Shunichi Kohara, Masanobu Abe, Sunao Hara

    2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2023)   2280 - 2286   2023.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    In this paper, we propose an algorithm to control both speaker individuality and emotional expressions in synthesized speech, where the most important feature is the controllability of intensity in emotional expressions. An aim of the proposed algorithm is to generate various responses including emotions in text-to-speech (TTS) for spoken dialogue systems (SDS), which results in making the system more human-like. An idea is to control emotion and its intensity in line with the user's utterances. For example, when a user happily talks to SDS, the agent of the SDS responses with happy voice. Generally, voice quality of a user and the agent are different. Therefore, the proposed algorithm consists of two steps: (1) voice conversion to change speaker individuality including emotional expressions and (2) TTS with x-vector acting as an embedding vector to mainly control speech quality related to the intensity of emotions. Evaluation experiments are carried out using a scenario of a spoken dialogue system, where a teacher system of TTS encourages or cheers up students according to students' utterances. The experiment results showed that TTS can successfully reproduce the emotion and its intensity that are extracted from students' utterances, while maintaining the teacher's speaker individuality.

    DOI: 10.1109/APSIPAASC58517.2023.10317413

    Scopus

    researchmap

  • Sound map of urban areas recorded by smart devices: case study at Okayama and Kurashiki Invited International journal

    Sunao Hara, Masanobu Abe

    Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering (Inter-Noise 2023)   1 - 12   2023.8

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Predictions for sound events and soundscape impressions from environmental sound using deep neural networks Invited International journal

    Sunao Hara, Masanobu Abe

    Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering (Inter-Noise 2023)   1 - 12   2023.8

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Prediction method of Soundscape Impressions using Environmental Sounds and Aerial Photographs Reviewed International journal

    Yusuke Ono, Sunao Hara, Masanobu Abe

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)   1222 - 1227   2022.11

     More details

    Authorship:Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.23919/apsipaasc55919.2022.9980290

    researchmap

    Other Link: https://arxiv.org/abs/2209.04077

  • Incremental Audio Scene Classifier Using Rehearsal-Based Strategy Reviewed International journal

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    2022 IEEE 10th Global Conference on Consumer Electronics (GCCE)   69 - 623   2022.10

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    researchmap

  • Speech-Like Emotional Sound Generation Using WaveNet Reviewed

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    IEICE Transactions on Information and Systems   E105.D ( 9 )   1581 - 1589   2022.9

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Electronics, Information and Communications Engineers (IEICE)  

    In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional expressions may be the most important factor in human communication, and speech is one of the most useful means of expressing emotions. Although speech generally conveys both emotional and linguistic information, we have undertaken the challenge of generating sounds that convey emotional information alone. We call the generated sounds "speech-like," because the sounds do not contain any linguistic information. SES can provide another way to generate emotional response in human-computer interaction systems. To generate "speech-like" sound, we propose employing WaveNet as a sound generator conditioned only by emotional IDs. This concept is quite different from the WaveNet Vocoder, which synthesizes speech using spectrum information as an auxiliary feature. The biggest advantage of our approach is that it reduces the amount of emotional speech data necessary for training by focusing on non-linguistic information. The proposed algorithm consists of two steps. In the first step, to generate a variety of spectrum patterns that resemble human speech as closely as possible, WaveNet is trained with auxiliary mel-spectrum parameters and Emotion ID using a large amount of neutral speech. In the second step, to generate emotional expressions, WaveNet is retrained with auxiliary Emotion ID only using a small amount of emotional speech. Experimental results reveal the following: (1) the two-step training is necessary to generate the SES with high quality, and (2) it is important that the training use a large neutral speech database and spectrum information in the first step to improve the emotional expression and naturalness of SES.

    DOI: 10.1587/transinf.2021edp7236

    Web of Science

    researchmap

    Other Link: https://www.webofscience.com/wos/woscc/full-record/WOS:000852731400008

  • Concept drift adaptation for audio scene classification using high-level features Reviewed International journal

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    2022 IEEE International Conference on Consumer Electronics (ICCE)   2022.1

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icce53296.2022.9730332

    researchmap

  • Acoustic Scene Classifier Based on Gaussian Mixture Model in the Concept Drift Situation Reviewed International journal

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    Advances in Science, Technology and Engineering Systems Journal   6 ( 5 )   167 - 176   2021.9

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:ASTES Journal  

    DOI: 10.25046/aj060519

    researchmap

  • Phonetic and Prosodic Information Estimation from Texts for Genuine Japanese End-to-End Text-to-Speech Reviewed International journal

    Naoto Kakegawa, Sunao Hara, Masanobu Abe, Yusuke Ijima

    Interspeech 2021   126 - 130   2021.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    The biggest obstacle to develop end-to-end Japanese text-to-speech (TTS) systems is to estimate phonetic and prosodic information (PPI) from Japanese texts. The following are the reasons: (1) the Kanji characters of the Japanese writing system have multiple corresponding pronunciations, (2) there is no separation mark between words, and (3) an accent nucleus must be assigned at appropriate positions. In this paper, we propose to solve the problems by neural machine translation (NMT) on the basis of encoder-decoder models, and compare NMT models of recurrent neural networks and the Transformer architecture. The proposed model handles texts on token (character) basis, although conventional systems handle them on word basis. To ensure the potential of the proposed approach, NMT models are trained using pairs of sentences and their PPIs that are generated by a conventional Japanese TTS system from 5 million sentences. Evaluation experiments were performed using PPIs that are manually annotated for 5,142 sentences. The experimental results showed that the Transformer architecture has the best performance, with 98.0% accuracy for phonetic information estimation and 95.0% accuracy for PPI estimation. Judging from the results, NMT models are promising toward end-to-end Japanese TTS.

    DOI: 10.21437/interspeech.2021-914

    Web of Science

    researchmap

  • Model architectures to extrapolate emotional expressions in DNN-based text-to-speech Reviewed International journal

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

    Speech Communication   126   35 - 43   2021.2

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Elsevier BV  

    This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep neural network (DNN)-based text-to-speech (TTS). In this study, the meaning of “extrapolate emotional expressions” is to borrow emotional expressions from others, and the collection of emotional speech uttered by target speakers is unnecessary. Although a DNN has potential power to construct DNN-based TTS with emotional expressions and some DNN-based TTS systems have demonstrated satisfactory performances in the expression of the diversity of human speech, it is necessary and troublesome to collect emotional speech uttered by target speakers. To solve this issue, we propose architectures to separately train the speaker feature and the emotional feature and to synthesize speech with any combined quality of speakers and emotions. The architectures are parallel model (PM), serial model (SM), auxiliary input model (AIM), and hybrid models (PM&AIM and SM&AIM). These models are trained through emotional speech uttered by few speakers and neutral speech uttered by many speakers. Objective evaluations demonstrate that the performances in the open-emotion test provide insufficient information. They make a comparison with those in the closed-emotion test, but each speaker has their own manner of expressing emotion. However, subjective evaluation results indicate that the proposed models could convey emotional information to some extent. Notably, the PM can correctly convey sad and joyful emotions at a rate of 60%.

    DOI: 10.1016/j.specom.2020.11.004

    Web of Science

    researchmap

    Other Link: https://arxiv.org/abs/2102.10345

  • Module Comparison of Transformer-TTS for Speaker Adaptation based on Fine-tuning Reviewed International journal

    Katsuki Inoue, Sunao Hara, Masanobu Abe

    Proceedings of APSIPA Annual Summit and Conference (APSIPA-ASC 2020)   826 - 830   2020.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE/APSIPA  

    End-to-end text-to-speech (TTS) models have achieved remarkable results in recent times. However, the model requires a large amount of text and audio data for training. A speaker adaptation method based on fine-tuning has been proposed for constructing a TTS model using small scale data. Although these methods can replicate the target speaker s voice quality, synthesized speech includes the deletion and/or repetition of speech. The goal of speaker adaptation is to change the voice quality to match the target speaker ' s on the premise that adjusting the necessary modules will reduce the amount of data to be fine-tuned. In this paper, we clarify the role of each module in the Transformer-TTS process by not updating it. Specifically, we froze character embedding, encoder, layer predicting stop token, and loss function for estimating sentence ending. The experimental results showed the following: (1) fine-tuning the character embedding did not result in an improvement in the deletion and/or repetition of speech, (2) speech deletion increases if the encoder is not fine-tuned, (3) speech deletion was suppressed when the layer predicting stop token is not fine-tuned, and (4) there are frequent speech repetitions at sentence end when the loss function estimating sentence ending is omitted.

    Web of Science

    researchmap

  • Concept Drift Adaptation for Acoustic Scene Classifier Based on Gaussian Mixture Model Reviewed

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    Proceedings of IEEE REGION 10 CONFERENCE (TENCON 2020)   450 - 455   2020.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/tencon50793.2020.9293766

    researchmap

  • Controlling the Strength of Emotions in Speech-Like Emotional Sound Generated by WaveNet Reviewed International journal

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    Proceedings of Interspeech 2020   3421 - 3425   2020.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2020-2064

    researchmap

    Other Link: https://www.webofscience.com/wos/woscc/full-record/WOS:000833594103112

  • Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models Reviewed International coauthorship International journal

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

    Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)   7634 - 7638   2020.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    Recently, end-to-end text-to-speech (TTS) models have achieved a remarkable performance, however, requiring a large amount of paired text and speech data for training. On the other hand, we can easily collect unpaired dozen minutes of speech recordings for a target speaker without corresponding text data. To make use of such accessible data, the proposed method leverages the recent great success of state-of-the-art end-to-end automatic speech recognition (ASR) systems and obtains corresponding transcriptions from pretrained ASR models. Although these models could only provide text output instead of intermediate linguistic features like phonemes, end-to-end TTS can be well trained with such raw text data directly. Thus, the proposed method can greatly simplify a speaker adaptation pipeline by consistently employing end-to-end ASR/TTS ecosystems. The experimental results show that our proposed method achieved comparable performance to a paired data adaptation method in terms of subjective speaker similarity and objective cepstral distance measures.

    DOI: 10.1109/icassp40776.2020.9053371

    Web of Science

    researchmap

  • DNN-based Voice Conversion with Auxiliary Phonemic Information to Improve Intelligibility of Glossectomy Patients' Speech Reviewed International journal

    Hiroki Murakami, Sunao Hara, Masanobu Abe

    Proceedings of APSIPA Annual Summit and Conference 2019   138 - 142   2019.11

     More details

    Authorship:Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we propose using phonemic information in addition to acoustic features to improve the intelligibility of speech uttered by patients with articulation disorders caused by a wide glossectomy. Our previous studies showed that voice conversion algorithm improves the quality of glossectomy patients' speech. However, losses in acoustic features of glossectomy patients' speech are so large that the quality of the reconstructed speech is low. To solve this problem, we explored potentials of several additional information to improve speech intelligibility. One of the candidates is phonemic information, more specifically Phoneme Labels as Auxiliary input (PLA). To combine both acoustic features and PLA, we employed a DNN-based algorithm. PLA is represented by a kind of one-of-k vector, i.e., PLA has a weight value (<1.0) that gradually changes in time axis, whereas one-of-k has a binary value (0 or 1). The results showed that the proposed algorithm reduced the mel-frequency cepstral distortion for all phonemes, and almost always improved intelligibility. Notably, the intelligibility was largely improved in phonemes /s/ and /z/, mainly because the tongue is used to sustain constriction to produces these phonemes. This indicates that PLA works well to compensate the lack of a tongue.

    DOI: 10.1109/APSIPAASC47483.2019.9023168

    Web of Science

    researchmap

  • Speech-like Emotional Sound Generator by WaveNet Reviewed International journal

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    Proceedings of APSIPA Annual Summit and Conference 2019   143 - 147   2019.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional information plays an important role in human communication, and speech is one of the most useful media to express emotions. Although, in general, speech conveys emotional information as well as linguistic information, we have undertaken the challenge to generate sounds that convey emotional information without linguistic information, which results in making conversations in human-machine interactions more natural in some situations by providing non-verbal emotional vocalizations. We call the generated sounds "speech-like", because the sounds do not contain any linguistic information. For the purpose, we propose to employ WaveNet as a sound generator conditioned by only emotional IDs. The idea is quite different from WaveNet Vocoder that synthesizes speech using spectrum information as auxiliary features. The biggest advantage of the idea is to reduce the amount of emotional speech data for the training. The proposed algorithm consists of two steps. In the first step, WaveNet is trained to obtain phonetic features using a large speech database, and in the second step, WaveNet is re-trained using a small amount of emotional speech. Subjective listening evaluations showed that the SES could convey emotional information and was judged to sound like a human voice.

    DOI: 10.1109/APSIPAASC47483.2019.9023346

    Web of Science

    researchmap

  • A signal processing perspective on human gait: Decoupling walking oscillations and gestures Reviewed International journal

    Adrien Gregorj, Zeynep Yücel, Sunao Hara, Akito Monden, Masahiro Shiomi

    Proceedings of the 4th International Conference on Interactive Collaborative Robotics 2019 (ICR 2019)   11659   75 - 85   2019.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:SPRINGER INTERNATIONAL PUBLISHING AG  

    This study focuses on gesture recognition in mobile interaction settings, i.e. when the interacting partners are walking. This kind of interaction requires a particular coordination, e.g. by staying in the field of view of the partner, avoiding obstacles without disrupting group composition and sustaining joint attention during motion. In literature, various studies have proven that gestures are in close relation in achieving such goals.Thus, a mobile robot moving in a group with human pedestrians, has to identify such gestures to sustain group coordination. However, decoupling of the inherent -walking- oscillations and gestures, is a big challenge for the robot. To that end, we employ video data recorded in uncontrolled settings and detect arm gestures performed by human-human pedestrian pairs by adopting a signal processing approach. Namely, we exploit the fact that there is an inherent oscillatory motion at the upper limbs arising from the gait, independent of the view angle or distance of the user to the camera. We identify arm gestures as disturbances on these oscillations. In doing that, we use a simple pitch detection method from speech processing and assume data involving a low frequency periodicity to be free of gestures. In testing, we employ a video data set recorded in uncontrolled settings and show that we achieve a detection rate of 0.80.

    DOI: 10.1007/978-3-030-26118-4_8

    Web of Science

    researchmap

  • Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion Reviewed International journal

    Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    Interspeech 2018, 19th Annual Conference of the International Speech Communication Association   2464 - 2468   2018.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.

    DOI: 10.21437/Interspeech.2018-1239

    Web of Science

    researchmap

  • Sound sensing using smartphones as a crowdsourcing approach Reviewed International journal

    Sunao Hara, Asako Hatakeyama, Shota Kobayashi, Masanobu Abe

    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017   1328 - 1333   2017.12

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    Sounds are one of the most valuable information sources for human beings from the viewpoint of understanding the environment around them. We have been now investigating the method of detecting and visualizing crowded situations in the city in a sound-sensing manner. For this purpose, we have developed a sound collection system oriented to a crowdsourcing approach and carried out the sound-collection in two Japanese cities, Okayama and Kurashiki. In this paper, we present an overview of sound collections. Then, to show an effectiveness of analyzation by sensed sounds, we profile characteristics of the cities through the visualization results of the sound.

    DOI: 10.1109/APSIPA.2017.8282238

    Web of Science

    researchmap

  • An investigation to transplant emotional expressions in DNN-based TTS synthesis Reviewed International journal

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017   1253 - 1258   2017.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we investigate deep neural network (DNN) architectures to transplant emotional expressions to improve the expressiveness of DNN-based text-to-speech (TTS) synthesis. DNN is expected to have potential power in mapping between linguistic information and acoustic features. From multispeaker and/or multi-language perspectives, several types of DNN architecture have been proposed and have shown good performances. We tried to expand the idea to transplant emotion, constructing shared emotion-dependent mappings. The following three types of DNN architecture are examined; (1) the parallel model (PM) with an output layer consisting of both speaker-dependent layers and emotion-dependent layers, (2) the serial model (SM) with an output layer consisting of emotion-dependent layers preceded by speaker-dependent hidden layers, (3) the auxiliary input model (AIM) with an input layer consisting of emotion and speaker IDs as well as linguistics feature vectors. The DNNs were trained using neutral speech uttered by 24 speakers, and sad speech and joyful speech uttered by 3 speakers from those 24 speakers. In terms of unseen emotional synthesis, subjective evaluation tests showed that the PM performs much better than the SM and slightly better than the AIM. In addition, this test showed that the SM is the best of the three models when training data includes emotional speech uttered by the target speaker.

    DOI: 10.1109/APSIPA.2017.8282231

    Web of Science

    researchmap

  • New monitoring scheme for persons with dementia through monitoring-area adaptation according to stage of disease Reviewed International journal

    Shigeki Kamada, Yuji Matsuo, Sunao Hara, Masanobu Abe

    Proceedings of the 1st ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks, LocalRec@SIGSPATIAL 2017   1:1 - 1:7   2017.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3148150.3148151

    researchmap

    Other Link: http://doi.acm.org/10.1145/3148150.3148151

  • Prediction of subjective assessments for a noise map using deep neural networks Reviewed International journal

    Shota Kobayashi, Masanobu Abe, Sunao Hara

    Adjunct Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers, UbiComp/ISWC 2017   113 - 116   2017.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    In this paper, we investigate a method of creating noise maps that take account of human senses. Physical measurements are not enough to design our living environment and we need to know subjective assessments. To predict subjective assessments from loudness values, we propose to use metadata related to where, who and what is recording. The proposed method is implemented using deep neural networks because these can naturally treat a variety of information types. First, we evaluated its performance in predicting five-point subjective loudness levels based on a combination of several features: location specific, participant-specific, and sound-specific features. The proposed method achieved a 16.3 point increase compared with the baseline method. Next, we evaluated its performance based on noise map visualization results. The proposed noise maps were generated from the predicted subjective loudness level. Considering the differences between the two visualizations, the proposed method made fewer errors than the baseline method.

    DOI: 10.1145/3123024.3123091

    Web of Science

    researchmap

    Other Link: http://doi.acm.org/10.1145/3123024.3123091

  • Speaker Dependent Approach for Enhancing a Glossectomy Patient’s Speech via GMM-Based Voice Conversion

    Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    Interspeech 2017   2017.8

     More details

    Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2017-841

    Web of Science

    researchmap

  • Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion

    Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6   3384 - 3388   2017

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    In this paper, using GMM-based voice conversion algorithm, we propose to generate speaker-dependent mapping functions to improve the intelligibility of speech uttered by patients with a wide glossectomy. The speaker-dependent approach enables to generate the mapping functions that reconstruct missing spectrum features of speech uttered by a patient without having influences of a speaker's factor. The proposed idea is simple, i.e., to collect speech uttered by a patient before and after the glossectomy, but in practice it is hard to ask patients to utter speech just for developing algorithms. To confirm the performance of the proposed approach, in this paper, in order to simulate glossectomy patients, we fabricated an intraoral appliance which covers lower dental arch and tongue surface to restrain tongue movements. In terms of the Mel-frequency cepstrum (MFC) distance, by applying the voice conversion, the distances were reduced by 25% and 42% for speaker dependent case and speaker-independent case, respectively. In terms of phoneme intelligibility, dictation tests revealed that speech reconstructed by speaker-dependent approach almost always showed better performance than the original speech uttered by simulated patients, while speaker-independent approach did not.

    DOI: 10.21437/Interspeech.2017-841

    Web of Science

    researchmap

  • Enhancing a glossectomy patient's speech via GMM-based voice conversion Reviewed International journal

    Kei Tanaka, Sunao Hara, Masanobu Abe, Shogo Minagi

    2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   1 - 4   2016.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipa.2016.7820909

    Web of Science

    researchmap

  • LiBS: Lifelog browsing system to support sharing of memories Reviewed International journal

    Atsuya Namba, Sunao Hara, Masanobu Abe

    UbiComp 2016 Adjunct - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing   165 - 168   2016.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    We propose a lifelog browsing system through which users can share memories of their experiences with other users. Most importantly, by using global positioning system data and time stamps, the system simultaneously displays a variety of log information in a time-synchronous manner. This function empowers users with not only an easy interpretation of other users' experiences but also nonverbal notifications. Shared information on this system includes photographs taken by users, Google street views, shops and restaurants on the map, daily weather, and other items relevant to users' interests. In evaluation experiments, users preferred the proposed system to conventional photograph albums and maps for explaining and sharing their experiences. Moreover, through displayed information, the listeners found out their interest items that had not been mentioned by the speakers.

    DOI: 10.1145/2968219.2971401

    Scopus

    researchmap

    Other Link: http://doi.acm.org/10.1145/2968219.2971401

  • Safety vs. Privacy: User preferences from the monitored and monitoring sides of a monitoring system Reviewed International journal

    Shigeki Kamada, Sunao Hara, Masanobu Abe

    UbiComp 2016 Adjunct - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing   101 - 104   2016.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    In this study, in order to develop a monitoring system that takes into account privacy issues, we investigated user preferences in terms of the monitoring and privacy protec-tion levels. The people on the monitoring side wanted the monitoring system to allow them to monitor in detail. Con-versely, it was observed for the people being monitored that the more detailed the monitoring, the greater the feelings of being surveilled intrusively. Evaluation experiments were performed using the location data of three people in differ-ent living areas. The results of the experiments show that it is possible to control the levels of monitoring and privacy protection without being affected by the shape of a living area by adjusting the quantization level of location informa-tion. Furthermore, it became clear that the granularity of location information satisfying the people on the monitored side and the monitoring side is different.

    DOI: 10.1145/2968219.2971412

    Scopus

    researchmap

    Other Link: http://doi.acm.org/10.1145/2968219.2971412

  • Sound collection systems using a crowdsourcing approach to construct sound map based on subjective evaluation Reviewed International journal

    Sunao Hara, Shota Kobayashi, Masanobu Abe

    IEEE ICME Workshop on Multimedia Mobile Cloud for Smart City Applications (MMCloudCity-2016)   1 - 6   2016

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper presents a sound collection system that uses crowdsourcing to gather information for visualizing area characteristics. First, we developed a sound collection system to simultaneously collect physical sounds, their statistics, and subjective evaluations. We then conducted a sound collection experiment using the developed system on 14 participants. We collected 693,582 samples of equivalent Aweighted loudness levels and their locations, and 5,935 samples of sounds and their locations. The data also include subjective evaluations by the participants. In addition, we analyzed the changes in sound properties of some areas before and after the opening of a large-scale shopping mall in a city. Next, we implemented visualizations on the server system to attract users' interests. Finally, we published the system, which can receive sounds from any Android smartphone user. The sound data were continuously collected and achieved a specified result.

    DOI: 10.1109/ICMEW.2016.7574694

    Web of Science

    researchmap

  • A Spoken Dialog System with Redundant Response to Prevent User Misunderstanding Reviewed International journal

    Masaki Yamaoka, Sunao Hara, Masanobu Abe

    Proceedings of APSIPA Annual Summit and Conference 2015   229 - 232   2015.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    We propose a spoken dialog strategy for car navigation systems to facilitate safe driving. To drive safely, drivers need to concentrate on their driving; however, their concentration may be disrupted due to disagreement with their spoken dialog system. Therefore, we need to solve the problems of user misunderstandings as well as misunderstanding of spoken dialog systems. For this purpose, we introduced a driver workload level in spoken dialog management in order to prevent user misunderstandings. A key strategy of the dialog management is to make speech redundant if the driver's workload is too high in assuming that the user probably misunderstand the system utterance under such a condition. An experiment was conducted to compare performances of the proposed method and a conventional method using a user simulator. The simulator is developed under the assumption of two types of drivers: an experienced driver model and a novice driver model. Experimental results showed that the proposed strategies achieved better performance than the conventional one for task completion time, task completion rate, and user's positive speech rate. In particular, these performance differences are greater for novice users than for experienced users.

    DOI: 10.1109/APSIPA.2015.7415511

    Web of Science

    researchmap

  • Extracting Daily Patterns of Human Activity Using Non-Negative Matrix Factorization Reviewed International journal

    Masanobu Abe, Akihiko Hirayama, Sunao Hara

    Proceedings of IEEE International Conference on Consumer Electronics (IEEE-ICCE 2015)   36 - 39   2015.1

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper presents an algorithm to mine basic patterns of human activities on a daily basis using non-negative matrix factorization (NMF). The greatest benefit of the algorithm is that it can elicit patterns from which meanings can be easily interpreted. To confirm its performance, the proposed algorithm was applied to PC logging data collected from three occupations in offices. Daily patterns of software usage were extracted for each occupation. Results show that each occupation uses specific software in its own time period, and uses several types of software in parallel in its own combinations. Experiment results also show that patterns of 144 dimension vectors were compressible to those of 11 dimension vectors without degradation in occupation classification performance. Therefore, the proposed algorithm compressed basic software usage patterns to about one-tenth of their original dimensions while preserving the original information. Moreover, the extracted basic patterns showed reasonable interpretation of daily working patterns in offices.

    DOI: 10.1109/ICCE.2015.7066309

    Web of Science

    researchmap

  • Sub-Band Text-to-Speech Combining Sample-Based Spectrum with Statistically Generated Spectrum Reviewed International journal

    Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno

    Proceedings of Interspeech 2015   264 - 268   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    As described in this paper, we propose a sub-band speech synthesis approach to develop a high quality Text-to-Speech (TTS) system: a sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech synthesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smoothness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.

    Web of Science

    researchmap

    Other Link: http://dblp.uni-trier.de/db/conf/interspeech/interspeech2015.html#conf/interspeech/InaiHAIMM15

  • Sound collection and visualization system enabled participatory and opportunistic sensing approaches Reviewed International journal

    Sunao Hara, Masanobu Abe, Noboru Sonehara

    Proceedings of 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops)   390 - 395   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper presents a sound collection system to visualize environmental sounds that are collected using a crowd sourcing approach. An analysis of physical features is generally used to analyze sound properties; however, human beings not only analyze but also emotionally connect to sounds. If we want to visualize the sounds according to the characteristics of the listener, we need to collect not only the raw sound, but also the subjective feelings associated with them. For this purpose, we developed a sound collection system using a crowdsourcing approach to collect physical sounds, their statistics, and subjective evaluations simultaneously. We then conducted a sound collection experiment using the developed system on ten participants. We collected 6,257 samples of equivalent loudness levels and their locations, and 516 samples of sounds and their locations. Subjective evaluations by the participants are also included in the data. Next, we tried to visualize the sound on a map. The loudness levels are visualized as a color map and the sounds are visualized as icons which indicate the sound type. Finally, we conducted a discrimination experiment on the sound to implement a function of automatic conversion from sounds to appropriate icons. The classifier is trained on the basis of the GMM-UBM (Gaussian Mixture Model and Universal Background Model) method. Experimental results show that the F-measure is 0.52 and the AUC is 0.79.

    DOI: 10.1109/PERCOMW.2015.7134069

    Web of Science

    researchmap

    Other Link: https://ousar.lib.okayama-u.ac.jp/ja/53271

  • Algorithm to Estimate a Living Area Based on Connectivity of Places with Home Reviewed International journal

    Yuji Matsuo, Sunao Hara, Masanobu Abe

    HCI International 2015 - Posters’ Extended Abstracts (Part II), CCIS 529   529   570 - 576   2015

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:SPRINGER-VERLAG BERLIN  

    We propose an algorithm to estimate a person's living area using his/her collected Global Positioning System (GPS) data. The most important feature of the algorithm is the connectivity of places with a home, i.e., a living area must consist of a home, important places, and routes that connect them. This definition is logical because people usually go to a place from home, and there can be several routes to that place. Experimental results show that the proposed algorithm can estimate living area with a precision of 0.82 and recall of 0.86 compared with the grand truth established by users. It is also confirmed that the connectivity of places with a home is necessary to estimate a reasonable living area.

    DOI: 10.1007/978-3-319-21383-5_95

    Web of Science

    researchmap

  • Extraction of Key Segments from Day-Long Sound Data Reviewed International journal

    Akinori Kasai, Sunao Hara, Masanobu Abe

    HCI International 2015 - Posters’ Extended Abstracts (Part I), CCIS 528   528   620 - 626   2015

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:SPRINGER-VERLAG BERLIN  

    We propose a method to extract particular sound segments from the sound recorded during the course of a day in order to provide sound segments that can be used to facilitate memory. To extract important parts of the sound data, the proposed method utilizes human behavior based on a multisensing approach. To evaluate the performance of the proposed method, we conducted experiments using sound, acceleration, and global positioning system data collected by five participants for approximately two weeks. The experimental results are summarized as follows: (1) various sounds can be extracted by dividing a day into scenes using the acceleration data; (2) sound recorded in unusual places is preferable to sound recorded in usual places; and (3) speech is preferable to nonspeech sound.

    DOI: 10.1007/978-3-319-21380-4_105

    Web of Science

    researchmap

  • Inhibitory Effects of an Orally Active Small Molecule Alpha4beta1/Alpha4beta7 Integrin Antagonist, TRK-170, on Spontaneous Colitis in HLA-B27 Transgenic Rats

    Hiroe Hirokawa, Yoko Koga, Rie Sasaki, Sunao Hara, Hiroyuki Meguro, Mie Kainoh

    GASTROENTEROLOGY   146 ( 5 )   S640 - S640   2014.5

     More details

    Language:English   Publisher:W B SAUNDERS CO-ELSEVIER INC  

    Web of Science

    researchmap

  • A graph-based spoken dialog strategy utilizing multiple understanding hypotheses Reviewed

    Norihide Kitaoka, Yuji Kinoshita, Sunao Hara, Chiyomi Miyajima, Kazuya Takeda

    Information and Media Technologies   9 ( 1 )   111 - 120   2014.3

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Information and Media Technologies Editorial Board  

    We regarded a dialog strategy for information retrieval as a graph search problem and proposed several novel dialog strategies that can recover from misrecognition through a spoken dialog that traverses the graph. To recover from misrecognition without seeking confirmation, our system kept multiple understanding hypotheses at each turn and searched for a globally optimal hypothesis in the graph whose nodes express understanding states across user utterances in a whole dialog. In the search, we used a new criterion based on efficiency in information retrieval and consistency with understanding hypotheses, which is also used to select an appropriate system response. We showed that our system can make more efficient and natural dialogs than previous ones.

    DOI: 10.11185/imt.9.111

    CiNii Article

    researchmap

  • New approach to emotional information exchange: Experience metaphor based on life logs Reviewed International journal

    Masanobu Abe, Daisuke Fujioka, Kazuto Hamano, Sunao Hara, Rika Mochizuki, Tomoki Watanabe

    2014 IEEE International Conference on Pervasive Computing and Communication Workshops, PerCom 2014 Workshops   191 - 194   2014.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    We are striving to develop a new communication technology based on individual experiences that can be extracted from life logs. We have proposed the "Emotion Communication Model" and confirmed that significant correlation exists between experience and emotion. As the second step, particularly addressing impressive places and events, this paper describes an investigation of the extent to which we can share emotional information with others through individuals' experiences. Subjective experiments were conducted using life log data collected during 7-47 months. Experiment results show that (1) impressive places are determined by the distance from home, visit frequency, and direction from home and that (2) positive emotional information is highly consistent among people (71.4%), but it is not true for negative emotional information. Therefore, experiences are useful as metaphors to express positive emotional information.

    DOI: 10.1109/PerComW.2014.6815198

    researchmap

  • Development of a Toolkit Handling Multiple Speech-Oriented Guidance Agents for Mobile Applications Reviewed International journal

    Sunao Hara, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

    Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice   79 - 85   2014

     More details

    Authorship:Lead author   Language:English   Publishing type:Part of collection (book)   Publisher:Springer  

    DOI: 10.1007/978-1-4614-8280-2_8

    researchmap

  • Evaluation of Invalid Input Discrimination Using Bag-of-Words for Speech-Oriented Guidance System Reviewed International journal

    Haruka Majima, Rafael Torres, Hiromichi Kawanami, Sunao Hara, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano

    Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice   389 - 397   2014

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:Springer  

    DOI: 10.1007/978-1-4614-8280-2_35

    researchmap

  • A Hybrid Text-to-Speech Based on Sub-Band Approach Reviewed International journal

    Takuma Inoue, Sunao Hara, Masanobu Abe

    Proceedings of Asia-Pacific Signal and Information Processing Association 2014 Annual Summit and Conference   1 - 4   2014

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS). For the low-frequency band and high-frequency band, Hidden Markov Model (HMM)-based speech synthesis and waveform -based speech synthesis are used, respectively. Both speech synthesis methods are widely known to show good performance and to have benefits and shortcomings from different points of view. One motivation is to apply the right speech synthesis method in the right frequency band. Experiment results show that in terms of the smoothness the proposed approach shows better performance than waveform -based speech synthesis, and in terms of the clarity it shows better than HMM-based speech synthesis. Consequently, the proposed approach combines the inherent benefits from both waveform-based speech synthesis and HMM-based speech synthesis.

    DOI: 10.1109/APSIPA.2014.7041575

    Web of Science

    researchmap

  • Invalid Input Rejection Using Bag-of-Words for Speech-oriented Guidance System Reviewed

    Haruka Majima, Yoko Fujita, Rafael Torres, Hiromichi Kawanami, Sunao Hara, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano

    Journal of Information Processing   54 ( 2 )   443 - 451   2013.2

     More details

    Language:Japanese  

    On a real environment speech-oriented information guidance system, a valid and invalid input discrimination is important as invalid inputs such as noise, laugh, cough and utterances between users lead to unpredictable system responses. Generally, acoustic features such as MFCC (Mel-Frequency Cepstral Coefficient) are used for discrimination. Comparing acoustic likelihoods of GMMs (Gaussian Mixture Models) from speech data and noise data is one of the typical methods. In addition to that, using linguistic features, such as speech recognition result, is considered to improve discrimination accuracy as it reflects the task-domain of invalid inputs and meaningless recognition results from noise inputs. In this paper, we introduce Bag-of-Words (BOW) as a feature to discriminate between valid and invalid inputs. Support Vector Machine (SVM) and Maximum Entropy method (ME) are also employed to realize robust classification. We experimented the methods using real environment data obtained from the guidance system "Takemaru-kun." By applying BOW on SVM, the F-measure is improved to 85.09%, from 82.19% when using GMMs. In addition, experiments using features combining BOW with acoustic likelihoods from GMMs, Duration and SNR were conducted, improving the F-measure to 86.58%.

    CiNii Article

    CiNii Books

    researchmap

  • On-line detection of task incompletion for spoken dialog systems based on utterance and behavior tag N-gram Reviewed

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    The IEICE Transactions on Information and Systems (Japanese edition)   J96-D ( 1 )   81 - 93   2013.1

     More details

    Authorship:Lead author   Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

    Other Link: http://search.ieice.org/bin/summary.php?id=j96-d_1_81&category=D&year=2013&lang=J&abst=

  • Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition Reviewed

    Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    IEICE TRANSACTIONS on Information and Systems   E95D ( 10 )   2479 - 2485   2012.10

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

    DOI: 10.1587/transinf.E95.D.2479

    Web of Science

    researchmap

    Other Link: http://search.ieice.org/bin/summary.php?id=e95-d_10_2479

  • Causal analysis of task completion errors in spoken music retrieval interactions Reviewed International journal

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of the 8th international conference on Language Resources and Evaluation (LREC 2012)   1365 - 1372   2012.5

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA  

    In this paper, we analyze the causes of task completion errors in spoken dialog systems, using a decision tree with N-gram features of the dialog to detect task-incomplete dialogs. The dialog for a music retrieval task is described by a sequence of tags related to user and system utterances and behaviors. The dialogs are manually classified into two classes: completed and uncompleted music retrieval tasks. Differences in tag classification performance between the two classes are discussed. We then construct decision trees which can detect if a dialog finished with the task completed or not, using information gain criterion. Decision trees using N-grams of manual tags and automatic tags achieved 74.2% and 80.4% classification accuracy, respectively, while the tree using interaction parameters achieved an accuracy rate of 65.7%. We also discuss more details of the causality of task incompletion for spoken dialog systems using such trees.

    Web of Science

    researchmap

    Other Link: http://www.lrec-conf.org/proceedings/lrec2012/summaries/1059.html

  • Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation Reviewed International journal

    Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of 2011 Automatic Speech Recognition and Understanding Workshop (ASRU 2011)   169 - 172   2011.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models. © 2011 IEEE.

    DOI: 10.1109/ASRU.2011.6163925

    Scopus

    researchmap

  • Training Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations Reviewed International journal

    Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of 2011 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011)   1 - 5   2011.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:APSIPA  

    In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

    Scopus

    researchmap

  • On-line detection of task incompletion for spoken dialog systems using utterance and behavior tag N-gram vectors Reviewed International journal

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of the Paralinguistic Information and Its Integration in Spoken Dialogue Systems Workshop   215 - 225   2011.9

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Detection of task-incomplete dialogs based on utterance-and-behavior tag N-gram for spoken dialog systems Reviewed International journal

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of INTERSPEECH 2011   1312 - 1315   2011

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    We propose a method of detecting "task incomplete" dialogs in spoken dialog systems using N-gram-based dialog models. We used a database created during a field test in which inexperienced users used a client-server music retrieval system with a spoken dialog interface on their own PCs. In this study, the dialog for a music retrieval task consisted of a sequence of user and system tags that related their utterances and behaviors. The dialogs were manually classified into two classes: the dialog either completed the music retrieval task or it didn't. We then detected dialogs that did not complete the task, using N-gram probability models or a Support Vector Machine with N-gram feature vectors trained using manually classified dialogs. Off-line and on-line detection experiments were conducted on a large amount of real data, and the results show that our proposed method achieved good classification performance.

    Web of Science

    researchmap

    Other Link: http://www.isca-speech.org/archive/interspeech_2011/i11_1305.html

  • Music Recommendation System Based on Human-to-human Conversation Recognition Reviewed International journal

    Hiromasa Ohashi, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Workshop Proceedings of the 7th International Conference on Intelligent Environments: Ambient Intelligence and Smart Environments   10   352 - 361   2011

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:IOS PRESS  

    We developed an ambient system that plays music suitable for the mood of a human-human conversation using words obtained from a continuous-speech recognition system. Using the correspondence between a document space based on the texts related to the music and an acoustic space that expresses various audio features, the continuous-speech recognition results are mapped to an acoustic space. We performed a subjective evaluation of the system. The subjects rated the recommended music and the result reveals that the 10 most highly recommended selections included suitable music.

    DOI: 10.3233/978-1-60750-795-6-352

    Web of Science

    researchmap

  • Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act N-gram Reviewed International journal

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of INTERSPEECH2010   3034 - 3037   2010.9

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    In this paper, we propose a method of detecting task-incompleted users for a spoken dialog system using an N-gram-based dialog history model. We collected a large amount of spoken dialog data accompanied by usability evaluation scores by users in real environments. The database was made by a field test in which naive users used a client-server music retrieval system with a spoken dialog interface on their own PCs. An N-gram model was trained from sequences that consist of user dialog acts and/or system dialog acts for two dialog classes, that is, the dialog completed the music retrieval task or the dialog incompleted the task. Then the system detects unknown dialogs that is not completed the task based on the N-gram likelihood. Experiments were conducted on large real data, and the results show that our proposed method achieved good classification performance. When the classifier correctly detected all of the task-incompleted dialogs, our proposed method achieved a false detection rate of 6%.

    Web of Science

    researchmap

    Other Link: http://www.isca-speech.org/archive/interspeech_2010/i10_3034.html

  • Rapid acoustic model adaptation using inverse MLLR-based feature generation Reviewed International journal

    Arata Ito, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    Proceedings of ICA2010   5   1 - 6   2010.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.

    Scopus

    researchmap

  • Estimation method of user satisfaction using N-gram-based dialog history model for spoken dialog system Reviewed International journal

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION   78 - 83   2010.5

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA  

    In this paper, we propose an estimation method of user satisfaction for a spoken dialog system using an N-gram-based dialog history model. We have collected a large amount of spoken dialog data accompanied by usability evaluation scores by users in real environments. The database is made by a field-test in which naive users used a client-server music retrieval system with a spoken dialog interface on their own PCs. An N-gram model is trained from the sequences that consist of users' dialog acts and/or the system's dialog acts for each one of six user satisfaction levels: from 1 to 5 and phi (task not completed). Then, the satisfaction level is estimated based on the N-gram likelihood. Experiments were conducted on the large real data and the results show that our proposed method achieved good classification performance; the classification accuracy was 94.7% in the experiment on a classification into dialogs with task completion and those without task completion. Even if the classifier detected all of the task incomplete dialog correctly, our proposed method achieved the false detection rate of only 6%.

    Web of Science

    researchmap

    Other Link: http://www.lrec-conf.org/proceedings/lrec2010/summaries/579.html

  • Data collection and usability study of a PC-based speech application in various user environments Reviewed International journal

    Sunao Hara, Chiyomi Miyajima, Katsunobu Ito, Norihide Kitaoka, Kazuya Takeda

    Proceedings of Oriental-COCOSDA 2008   39 - 44   2008.11

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • In-car Speech Data Collection along with Various Multimodal Signals Reviewed International journal

    Akira Ozaki, Sunao Hara, Takashi Kusakawa, Chiyomi Miyajima, Takanori Nishino, Norihide Kitaoka, Katunobu Itou, Kazuya Takeda

    Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)   1846 - 1851   2008.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA  

    In this paper, a large-scale real-world speech database is introduced along with other multimedia driving data. We designed a data collection vehicle equipped with various sensors to synchronously record twelve-channel speech, three-channel video, driving behavior including gas and brake pedal pressures, steering angles, and vehicle velocities, physiological signals including driver heart rate, skin conductance, and emotion-based sweating on the palms and soles, etc. These multimodal data are collected while driving on city streets and expressways under four different driving task conditions including two kinds of monologues, human-human dialog, and human-machine dialog. We investigated the response timing of drivers against navigator utterances and found that most overlapped with the preceding utterance due to the task characteristics and the features of Japanese. When comparing utterance length, speaking rate, and the filler rate of driver utterances in human-human and human-machine dialogs, we found that drivers tended to use longer and faster utterances with more fillers to talk with humans than machines.

    Web of Science

    researchmap

    Other Link: http://www.lrec-conf.org/proceedings/lrec2008/summaries/472.html

  • Data Collection System for the Speech Utterances to an Automatic Speech Recognition System under Real Environments Reviewed

    Sunao Hara, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda

    The IEICE transactions on information and systems   J90-D ( 10 )   2807 - 2816   2007.10

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

  • An online customizable music retrieval system with a spoken dialogue interface Reviewed International journal

    Sunao Hara, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda

    The Journal of the Acoustical Society of America   120 ( 5-2 )   3378 - 3379   2006.11

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    CiNii Article

    researchmap

  • Novel orally active alpha 4 integrin antagonist, T-728, attenuates dextran sodium sulfate-induced chronic colitis in mice

    Ken-Ichi Hayashi, Hiroyuki Meguro, Sunao Hara, Rie Sasaki, Yoko Koga, Meiko Takeshita, Naoyoshi Yamamoto, Hiroe Hirokawa, Mie Kainoh

    GASTROENTEROLOGY   130 ( 4 )   A352 - A352   2006.4

     More details

    Language:English   Publisher:W B SAUNDERS CO-ELSEVIER INC  

    0

    Web of Science

    researchmap

  • Preliminary Study of a Learning Effect on Users to Develop a New Evaluation of the Spoken Dialogue System Reviewed International journal

    Sunao Hara, Ayako Shirose, Chiyomi Miyajima, Katsunobu Ito, Kazuya Takeda

    Proceedings of Oriental-COCOSDA 2005   164 - 168   2005.12

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

▼display all

MISC

  • 機械学習による環境音からの主観的な騒音マップ生成 Invited

    原直, 阿部匡伸

    騒音制御   46 ( 3 )   126 - 130   2022.6

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    researchmap

  • クラウドセンシングによる環境音の収集 Invited

    阿部匡伸, 原直

    騒音制御   42 ( 1 )   20 - 23   2018

     More details

    Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    researchmap

  • Environmental sound sensing by smartdevices, and its applications Invited

    73 ( 8 )   483 - 490   2017.8

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    DOI: 10.20697/jasj.73.8_483

    CiNii Article

    CiNii Books

    researchmap

  • イベントを比喩に用いた感情伝達法の検討 Reviewed

    濱野和人, 原直, 阿部匡伸

    電子情報通信学会論文誌   J97-D ( .12 )   1680 - 1683   2014.12

     More details

    Language:Japanese   Publishing type:Rapid communication, short report, research note, etc. (scientific journal)   Publisher:電子情報通信学会  

    researchmap

  • Potential Applications of Acoustic Signal Processing from Lifelog Research Perspectives Invited

    38 ( 1 )   15 - 21   2014

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    CiNii Article

    CiNii Books

    researchmap

  • 「音声対話システムの実用化に向けて」10年間の長期運用を支えた音声情報案内システム「たけまるくん」の技術 Invited

    西村竜一, 原直, 川波弘道, LEE Akinobu, 鹿野清宏

    人工知能学会誌   28 ( 1 )   52 - 59   2013.1

     More details

    Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)   Publisher:The Japanese Society for Artificial Intelligence  

    DOI: 10.11517/jjsai.28.1_52

    CiNii Article

    CiNii Books

    J-GLOBAL

    researchmap

  • Detection of Task Incomplete Dialogs Based on Utterance Sequences N-gram for Spoken Dialog System Reviewed

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    The IEICE transactions on information and systems (Japanese edition)   J94-D ( 2 )   497 - 500   2011.2

     More details

    Language:Japanese   Publishing type:Rapid communication, short report, research note, etc. (scientific journal)   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

  • 撥弦楽器特有のアーティキュレーションを考慮したDNNによる楽器音合成の検討

    廣畑 和音, 阿部 匡伸, 原 直

    情報処理学会技術研究報告(音楽情報科学MUS)   2024-MUS-140 ( 56 )   1 - 6   2024.6

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)  

    researchmap

  • GPSデータを用いた個人認証における生活圏の有効性の検討

    遠山大督, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   123 ( 364 )   57 - 62   2024.1

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)  

    researchmap

  • 音声対話システムのテキスト音声合成における声質変換とx-vector埋め込みを用いた感情制御方式の検討

    小原俊一, 阿部匡伸, 原直

    日本音響学会講演論文集   1275 - 1278   2023.9

     More details

    Language:Japanese   Publisher:一般社団法人日本音響学会  

    researchmap

  • ウェアラブルデバイスを用いた曖昧な入力からの会話支援システムの検討

    岩崎茉理, 原直, 阿部匡伸

    日本音響学会講演論文集   1369 - 1372   2023.9

     More details

    Language:Japanese   Publisher:一般社団法人日本音響学会  

    researchmap

  • 音声対話システムのための入力音声の感情に同調する声質変換とx-vector埋め込みを用いたテキストからの音声合成方式の検討

    小原俊一, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   122 ( 389 )   203 - 208   2023.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 小説オーディオブックの強調部分を学習に用いる抑揚制御可能な End-to-End 音声合成方式の検討

    和田拓海, 阿部匡伸, 原直

    日本音響学会講演論文集   903 - 906   2023.3

     More details

    Language:Japanese   Publisher:一般社団法人日本音響学会  

    researchmap

  • ライフログに基づく共感的対話システムにおけるユーザの感情極性に応じた応答生成方式の検討

    前薗そよぎ, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   122 ( 423 )   102 - 107   2023.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • GPSデータに基づく粒度変更可能な生活圏を用いた個人認証のための類似度計算方式の検討

    遠山大督, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   122 ( 338 )   25 - 30   2023.1

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 差分メルケプストラムを用いた声質変換による喉締め歌唱音声改善方式の検討

    植田遥人, 阿部匡伸, 原直

    日本音響学会講演論文集   1405 - 1408   2022.9

     More details

    Language:Japanese   Publisher:一般社団法人日本音響学会  

    researchmap

  • ライフログに応じて発話を変えることでユーザに親密さを感じさせる対話システムの検討

    前薗そよぎ, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2022)講演論文集   1182 - 1190   2022.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • 話者特徴量の操作によりシームレスに話者性を制御できるEnd-to-End音声合成方式の検討

    青谷直樹, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   122 ( 81 )   55 - 60   2022.6

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • SSQPによる場所の印象情報を環境音と航空写真から推定する方式の検討

    小野祐介, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   121 ( 401 )   51 - 56   2022.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 口唇特徴量を利用した知識蒸留による舌亜全摘出者の音韻明瞭度改善法の検討

    高島和嗣, 阿部匡伸, 原直

    電子情報通信学会技術研究報告   121 ( 385 )   108 - 113   2022.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • バックコーラス歌唱合成のためのDNNを用いた自然性の高い歌声合成方式の検討

    木岡智宏, 阿部匡伸, 原直

    電子情報通信学会技術研究報告   121 ( 385 )   102 - 107   2022.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • インソール型圧力センサを用いたパーキンソン病重症度推定

    林倖生, 原直, 阿部匡伸, 武本麻美

    電子情報通信学会総合大会(H-4-7)   2022.3

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 音声と映像から議論への関与姿勢を推定するための特徴量の検討

    金岡翼, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   121 ( 401 )   57 - 62   2022.3

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 環境音と航空写真を用いた場所の印象を推定する方式の検討

    小野祐介, 原直, 阿部匡伸

    第24回 日本音響学会関西支部 若手研究者交流研究発表会 発表概要集   34 - 34   2021.12

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:日本音響学会関西支部  

    researchmap

  • 呼気流路の容易な制御を目的とした面接触型人工舌の構音改善に関する実験的研究

    長塚 弘亮, 川上 滋央, 古寺 寛志, 佐藤 匡晃, 田中 祐貴, 兒玉 直紀, 原 直, 皆木 省吾

    顎顔面補綴   44 ( 2 )   76 - 76   2021.12

     More details

    Language:Japanese   Publisher:(一社)日本顎顔面補綴学会  

    researchmap

  • 歌声合成のための双方向LSTM によるビブラート表現方式の検討

    金子隼人, 阿部匡伸, 原直

    日本音響学会講演論文集   1109 - 1112   2021.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 音素情報を知識蒸留する舌亜全摘出者の音韻明瞭度改善法

    高島和嗣, 阿部匡伸, 原直

    日本音響学会講演論文集   1057 - 1060   2021.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • パーキンソン病重症度推定に向けたインソール型圧力センサで計測した歩行データの分析

    林倖生, 原直, 阿部匡伸, 武本麻美

    第20回情報科学技術フォーラム (FIT 2021),CK-001   3   71 - 74   2021.8

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • 呼気流路の容易な制御を目的とした面接触型人工舌の構音改善に関する実験的研究

    長塚弘亮, 川上滋央, 古寺寛志, 佐藤匡晃, 田中祐貴, 兒玉直紀, 原直, 皆木省吾

    日本顎顔面補綴学会 第38回総会・学術大会   30 - 30   2021.6

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:日本顎顔面補綴学会  

    researchmap

  • 人対人の会話で自然な話題展開を支援するための対話戦略の検討

    前薗そよぎ, 原直, 阿部匡伸

    情報処理学会研究報告   2021-SLP-137 ( 16 )   1 - 6   2021.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • ニューラル機械翻訳により推定された読み仮名・韻律記号を入力とする日本語 End-to-End 音声合成の評価

    懸川直人, 原直, 阿部匡伸, 井島勇祐

    日本音響学会講演論文集   847 - 850   2021.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • Evaluation of Concept Drift Adaptation for Acoustic Scene Classifier Based on Kernel Density Drift Detection and Combine Merge Gaussian Mixture Model

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    日本音響学会講演論文集   201 - 204   2021.3

     More details

    Language:English   Publisher:日本音響学会  

    researchmap

  • 歌唱表現を付与できるBidirectional-LSTM を用いた歌声合成方式の検

    金子隼人, 原直, 阿部匡伸

    日本音響学会講演論文集   987 - 990   2021.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • TTSによる会話支援システムのための感圧センサを用いた手袋型入力デバイスの開発と入力速度の評価

    小林誠, 原直, 阿部匡伸

    情報処理学会研究報告   2020-HCI-190 ( 20 )   1 - 6   2020.12

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • パーキンソン病重症度推定のためのインソール型圧力センサを用いた時間的特徴量の検討

    林倖生, 原直, 阿部匡伸, 武本麻美

    2020年度(第71回)電気・情報関連学会中国支部連合大会,R20-14-02-03   1 - 1   2020.10

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:電気・情報関連学会中国支部  

    researchmap

  • Transformerを用いた日本語テキストからの読み仮名・韻律記号列推定

    懸川直人, 原直, 阿部匡伸, 井島勇祐

    日本音響学会講演論文集   829 - 832   2020.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • WaveNetを用いた言語情報なし感情音声合成における感情の強さ制御の検討

    松本剣斗, 原直, 阿部匡伸

    日本音響学会講演論文集   867 - 870   2020.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 映像と音声を用いた議論への関与姿勢や肯定的・否定的態度の推定方式の検討

    金岡翼, 上原佑太郎, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2020)講演論文集   1422 - 1429   2020.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • 話題の対象に対する親密度に応じて応答する音声対話システムの検討

    加藤大地, 原直, 阿部匡伸

    情報処理学会研究報告   2020-SLP-132 ( 21 )   1 - 6   2020.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • GPSデータのクラスタリングによる日常生活における場所の重要度の分析

    平田瑠, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2020)講演論文集   785 - 793   2020.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • ウェアラブルデバイスによる曖昧な入力からのニューラル機械翻訳を用いた日本語文章推定方式

    渡邊淳, 原直, 阿部匡伸

    情報処理学会研究報告   2020-HCI-187 ( 7 )   1 - 7   2020.3

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • 舌亜全摘出者の音韻明瞭度改善のための推定音素事後確率を用いた声質変換の検討

    荻野聖也, 原直, 阿部匡伸

    電子情報通信学会総合大会 情報・システムソサイエティ特別企画 学生ポスターセッション予稿集   124 - 124   2020.3

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • End-to-End 音声認識を用いた音声合成の半教師あり話者適応 International coauthorship

    井上勝喜, 原直, 阿部匡伸, 林知樹, 山本龍一, 渡部晋治

    日本音響学会講演論文集   1095 - 1098   2020.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 言語情報なし感情合成音を学習に用いたCycleGANによる感情変換方式の検討

    松本剣斗, 原直, 阿部匡伸

    日本音響学会講演論文集   1165 - 1168   2020.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • End-to-End 音声認識を用いた End-to-End 音声合成の性能評価

    井上勝喜, 原直, 阿部匡伸, 渡部晋治

    第22回 日本音響学会 関西支部若手研究者交流研究発表会 概要集   6 - 6   2019.12

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:日本音響学会関西支部  

    researchmap

  • WaveNet による言語情報を含まない感情音声合成方式における話者性の検討

    松本剣斗, 原直, 阿部匡伸

    日本音響学会講演論文集   993 - 996   2019.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 新たにデザインされた人工舌と解剖学的人工舌の効果ならびにその選択基準

    佐藤匡晃, 長塚弘亮, 川上滋央, 兒玉直紀, 原直, 阿部匡伸, 皆木省吾

    日本補綴歯科学会 中国・四国支部学術大会,抄録集   28 - 28   2019.9

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:日本補綴歯科学会  

    researchmap

  • CNN Autoencoderから抽出したボトルネック特徴量を用いた環境音分類

    松原拓未, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2019)講演論文集   339 - 346   2019.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • GPSデータに基づく日常生活における特別な行動の検出

    小林誠, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2019)講演論文集   846 - 853   2019.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • WaveNetによる言語情報を含まない感情音声合成方式の検討

    松本剣斗, 原直, 阿部匡伸

    情報処理学会研究報告   2019-SLP-127 ( 61 )   1 - 6   2019.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • i-vectorに基づく賑わい音の推定方式の検討

    呉セン陽, 朝田興平, 原直, 阿部匡伸

    情報処理学会研究報告   2019-SLP-127 ( 33 )   1 - 6   2019.6

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • DNN音声合成における少量の目標感情音声を用いた感情付与方式の検討

    井上勝喜, 原直, 阿部匡伸, 井島勇祐

    日本音響学会講演論文集   1085 - 1088   2019.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 声質変換による舌亜全摘出者の音韻明瞭度改善のための音素補助情報の推定方式の検討

    荻野聖也, 村上博紀, 原直, 阿部匡伸

    日本音響学会講演論文集   1155 - 1158   2019.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 舌亜全摘出者の音韻明瞭度改善のための Bidirectional LSTM-RNN に基づく音素補助情報を用いた声質変換方式の検討

    村上博紀, 原直, 阿部匡伸

    日本音響学会講演論文集   1151 - 1154   2019.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • ながら聴き用楽曲の作業負荷に及ぼす影響とその選択方式の検討

    高瀬郁, 阿部匡伸, 原直

    情報処理学会研究報告   2018-MUS-121 ( 19 )   1 - 6   2018.11

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • DNN音声合成における感情付与方式の評価

    井上勝喜, 原直, 阿部匡伸, 北条伸克, 井島勇祐

    日本音響学会講演論文集   1105 - 1108   2018.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • Speech Enhancement of Glossectomy Patient's Speech using Voice Conversion Approach

    Masanobu Abe, Seiya Ogino, Hiroki Murakami, Sunao Hara

    日本生物物理学会第56回年会,シンポジウム:ヘルスシステムの理解と応用   198 - 198   2018.9

     More details

    Authorship:Corresponding author   Language:English   Publishing type:Research paper, summary (national, other academic conference)   Publisher:日本生物物理学会  

    researchmap

  • 声質変換による舌亜全摘出者の音韻明瞭度改善のための補助情報の検討

    村上博紀, 原直, 阿部匡伸

    日本音響学会講演論文集   1175 - 1178   2018.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • クラウドソーシングによる環境音マップ構築のための主観的な騒々しさ推定方式の検討

    原直, 阿部匡伸

    第17回情報科学技術フォーラム (FIT 2018),O-001   4   343 - 346   2018.9

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • 音声と口唇形状を用いた声質変換による舌亜全摘出者の音韻明瞭度改善の検討

    荻野聖也, 村上博紀, 原直, 阿部匡伸

    電子情報通信学会技術研究報告   118 ( 112 )   7 - 12   2018.6

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 舌亜全摘出者の音韻明瞭性改善のためのマルチモーダルデータベースの構築

    村上博紀, 荻野聖也, 原直, 阿部匡伸, 佐藤匡晃, 皆木省吾

    日本音響学会講演論文集   355 - 358   2018.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • DNN音声合成における感情付与のための継続時間長モデルの検討

    井上勝喜, 原直, 阿部匡伸, 北条伸克, 井島勇祐

    日本音響学会講演論文集   279 - 282   2018.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • クラウドソーシングによる賑わい音識別方式のフィールド実験評価

    朝田興平, 原直, 阿部匡伸

    日本音響学会講演論文集   79 - 82   2018.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • DNN音声合成における話者と感情の情報を扱うためのモデル構造の検討

    井上勝喜, 原直, 阿部匡伸, 北条伸克, 井島勇祐

    日本音響学会講演論文集   263 - 266   2017.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • DNNに基づく差分スペクトル補正を用いた声質変換による舌亜全摘出者の音韻明瞭性改善の検討

    村上博紀, 原直, 阿部匡伸, 佐藤匡晃, 皆木省吾

    日本音響学会講演論文集   297 - 300   2017.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • DNNによる人間の感覚を考慮した騒々しさ推定方式に基づく騒音マップの作成

    小林将大, 原直, 阿部匡伸

    日本音響学会講演論文集   623 - 626   2017.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 環境音収集に効果的なインセンティブを与える可視化方式の検討 Reviewed

    畠山晏彩子, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2017)講演論文集   255 - 262   2017.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • DNN音声合成における感情付与のためのモデル構造の検討

    井上勝喜, 原直, 阿部匡伸, 北条伸克, 井島勇祐

    電子情報通信学会技術研究報告   117 ( 106 )   23 - 28   2017.6

     More details

    Language:Japanese   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • 2つの粒度の生活圏に基づく見守りシステム

    鎌田成紀, 原直, 阿部匡伸

    電子情報通信学会総合大会 (D-9-12)   1 - 1   2017.3

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • DNNによる人間の感覚を考慮した騒音マップ作成のための騒々しさ推定方式

    小林将大, 原直, 阿部匡伸

    日本音響学会講演論文集   799 - 802   2017.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • スマートフォンで収録した環境音データベースを用いたCNNによる環境音分類

    鳥羽隼司, 原直, 阿部匡伸

    日本音響学会講演論文集   139 - 142   2017.3

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 基本周波数変形を考慮したスペクトル変換手法の検討 Reviewed

    床建吾, 阿部匡伸, 原直

    第18回IEEE広島支部学生シンポジウム(HISS 18th)   174 - 176   2016.11

     More details

    Language:Japanese   Publisher:IEEE広島支部  

    researchmap

  • RNNによる実環境データからのマルチ音響イベント検出

    鳥羽隼司, 原直, 阿部匡伸

    日本音響学会講演論文集   43 - 44   2016.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • スマートフォンで収録した環境音に含まれるタップ音除去方式の検討

    朝田興平, 原直, 阿部匡伸

    日本音響学会講演論文集   45 - 48   2016.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • 重複音を含む環境音データベースにおける環境音検出のための特徴量の基本検討

    原直, 田中智康, 阿部匡伸

    日本音響学会講演論文集   3 - 6   2016.9

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publisher:日本音響学会  

    researchmap

  • GMMに基づく声質変換を用いた舌亜全摘出者の音韻明瞭性改善の検討

    田中慧, 原直, 阿部匡伸, 皆木省吾

    日本音響学会講演論文集   141 - 144   2016.9

     More details

    Language:Japanese   Publisher:日本音響学会  

    researchmap

  • Sound collection systems using a crowdsourcing approach for constructing subjective evaluation-based sound maps

    116 ( 189 )   41 - 46   2016.8

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese  

    CiNii Article

    CiNii Books

    researchmap

  • 人間の感覚を考慮した騒音マップ作成のための騒々しさ推定方式 Reviewed

    小林将大, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2016)講演論文集   141 - 148   2016.7

     More details

    Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • GPSデータ匿名化レベルの主観的許容度を客観的に表現する指標の検討 Reviewed

    三藤優介, 原直, 阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2016)講演論文集   798 - 805   2016.7

     More details

    Authorship:Corresponding author   Language:Japanese   Publisher:一般社団法人情報処理学会  

    researchmap

  • A measure for transfer tendency between staying places

    116 ( 23 )   95 - 100   2016.5

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • A watching method to protect users' privacy using living area

    115 ( 409 )   19 - 24   2016.1

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • A classification method for crowded situation using environmental sounds based on Gaussian mixture model-universal background model Reviewed

    Tomoyasu Tanaka, Sunao Hara, Masanobu Abe

    The Journal of the Acoustical Society of America   140 ( 4 )   3110 - 3110   2016

     More details

    Language:Japanese   Publishing type:Research paper, summary (international conference)  

    DOI: 10.1121/1.4969721

    researchmap

  • Method to efficiently retrieve memorable scenes from video using automatically collected life log

    115 ( 27 )   23 - 28   2015.5

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • Method to efficiently retrieve memorable scenes from video using automatically collected life log

    IPSJ SIG Notes   2015 ( 4 )   1 - 6   2015.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    One of the applications using life log is to retrieve memorable scenes. In this paper, for extracting memorable scenes from video, we propose a method to use life log that are automatically collected together with video during an event. Here, data in life log are GPS, pulse and sound. Three kinds of important points are extracted from the three data, and based on the important points, particular parts of video are extracted. According to subjective evaluation experiments, it is revealed that users can easily remember things by watching the extracted video and can remember details of the events including what were not recorded in the video.

    CiNii Article

    CiNii Books

    researchmap

  • A "a big day" search method using features of staying place

    HAYASHI Keigo, HARA Sunao, ABE Masanobu

    IEICE technical report. Life intelligence and office information systems   114 ( 500 )   89 - 94   2015.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In recent years, a life log is getting popular and used to provide specific services for a particular person. One of them is to retrieve memories. Life log helps us to recall events, activities, accidents, etc., but the huge amount of data in life log make it difficult for us to find out what we really want. In this paper, we propose a method to retrieve "a big day" using a feature value that is calculated from GPS data; i.e., the feature is defined as function of visiting frequency of places. According to experiments results, the proposed method can retrieve "a big day" at a rate of 60% and unusual day at a rate of 90%. As subjective evaluations were also carried out from the perspective of effectiveness, efficiency and satisfaction. The results showed that the proposal method has better performance than a conventional method.

    CiNii Article

    CiNii Books

    researchmap

  • Living area extraction using staying places and routes

    MATSUO Yuji, HARA Sunao, ABE Masanobu

    IEICE technical report. Life intelligence and office information systems   114 ( 500 )   77 - 82   2015.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In these days, demands for watching out for the safety of elderly and children are extremely increasing. To make the quality of the watching better, we think that living area plays an important role. Therefore, in this paper, we propose an algorithm to estimate living area of a person from his/her accumulated GPS data. The living area is defined by important places and routes. First, taking visiting frequency into account, important places are extracted, then routes are found so as to connect the important places using best-first search. Experiments are carried out for 3 users and evaluated by precision and recall. We confirmed that the proposed algorithm has better performance than a conventional method.

    CiNii Article

    CiNii Books

    researchmap

  • Behavior analysis of persons by classification of moving routes

    SETO Ryo, HARA Sunao, ABE Masanobu

    IEICE technical report. Life intelligence and office information systems   114 ( 500 )   31 - 36   2015.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this paper, we propose a method to behavior analysis of persons using GPS data to focus on how human have activity. We evaluated whether or not judging a person is active or not from moving routes data using PCA or NMF classification approach. The evaluation results show that the number of important eigenvalues by PCA and approximation error by NMF is effective whether the person is active or not. In addition, we extracted the pattern of moving routes and we evaluated the difference of moving routes extracted by PCA or NMF. As a result, PCA extracted high frequency patterns. On the other hand, NMF extracted not only high frequency patterns but also low frequency patterns.

    CiNii Article

    CiNii Books

    researchmap

  • FLAG: Lifelog aggregation system that was centered on position information

    2014 ( 6 )   1 - 6   2014.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, the application and the service which utilize location information from GPS have highly prevailed. In this paper, we developed the system called FLAG which aggregates the variety of Lifelog under location information. FLAG manages location information discriminate between moving and staying. With using FLAG, we visualize categorized location information on the map and the time table. And implement set the function which registers individual name according to users in the staying state. We also link the location information from FLAG to Twitter using the posting time for an example of aggregating various kinds of Lifelog, This function enables Lifelog to show on the map even if the Lifelog has no positional information. For an evaluation of the FLAG system, we created correct data of staying by six users. And we compared accuracies of staying by using two detection methods. As a result, we confirmed that FLAG can be detected high accuracy staying than the original data.

    CiNii Article

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00102345/

  • FLAG : Lifelog aggregation system that was centered on position information

    IEICE technical report. SC, Services Computing   114 ( 157 )   29 - 34   2014.7

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Recently, the application and the service which utilize location information from GPS have highly prevailed. In this paper, we developed the system called FLAG which aggregates the variety of Lifelog under location information. FLAG manages location information discriminate between moving and staying. With using FLAG, we visualize categorized location information on the map and the time table. And implement set the function which registers individual name according to users in the staying state. We also link the location information from FLAG to Twitter using the posting time for an example of aggregating various kinds of Lifelog, This function enables Lifelog to show on the map even if the Lifelog has no positional information. For an evaluation of the FLAG system, we created correct data of staying by six users. And we compared accuracies of staying by using two detection methods. As a result, we confirmed that FLAG can be detected high accuracy staying than the original data.

    CiNii Article

    CiNii Books

    researchmap

  • Development of environmental sound collection system using smart devices based on crowd-sourcing approach

    HARA Sunao, KASAI Akinori, ABE Masanobu, SONEHARA Noboru

    IEICE technical report. Speech   114 ( 52 )   177 - 180   2014.5

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this study, we aimed to construct environmental sound database for various sounds as Wisdom of the crowds. For example, considering environmental noise-pollution problems as a kind of the environmental sound, we need to consider not only signature sound, e.g. car noise and railway noise, but also life-space noise, e.g. festivity noise on streets or in parks. However, daily/widely sound collection is difficult to substantiate by few participant. Therefore, we aimed to measure environmental sound covered a vast area by applying crowdsourcing approach. First, we develop a prototype application which is run on the smart device with Android OS, and then we develop a prototype server system for collection and browsing the collected sound data. Then, we calibrated noise level measured by smart devices and carried out a sound collection experiment for validate an accuracy of sensors on the smart devices. In this report, we introduce the environemental sound collection system and the sound-collection experiment using the system.

    CiNii Article

    CiNii Books

    researchmap

  • Preliminary study for behavior analysis based on degree of nodes in a network constructed from GPS data

    2014 ( 6 )   1 - 6   2014.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We discuss a behavior-anaysis method based on strucural features in networks constructed from a personal location history. In this paper, a directed network constructed from GPS location history is called a stay network. Stay network treat set of stay extratcted from the location history as nodes in networks. Generally, nodes of directed networks have out-degrees and in-degrees as structural features. The stay network has biased values of out-degrees, in-degrees and their ratio. Therefore, we assumed existance of relationships between the degree of biased values and human behaviors in the stay, and analyzed the relationships. We forcused on a purpose of the stay, which is assumed to be occur as a result of human behavior, and particularly analyzed relationships between the degree of biased values and the purpose of the stay.

    CiNii Article

    CiNii Books

    researchmap

  • Development of environmental sound collection system using smart devices based on crowd-sourcing approach

    2014 ( 36 )   1 - 4   2014.5

     More details

  • Influence analysis on user's workload in a spoken dialog strategy for a car navigation system

    Masaki Yamaoka, Sunao Hara, Masanobu Abe

    IPSJ SIG Notes   2014 ( 7 )   1 - 6   2014.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We assess dialog strategies from user's workload standpoint to suggest usage of spoken dialog systems while driving. We evaluate several dialog strategies, which are combinations of methods of dialog initiative and confirmation manner, with objective evaluation method using computer simulation. Two conditions of the simulation are set up; One is that the system speaks if the user has leeway to talk with the system, and the another one is that the system speaks even if the user will be fail to recognize the system's. We also evaluate spoken dialog systems applying these methods with subjective evaluation method. As a result of the evaluations, user initiative strategy has advantages in lower turn number and lower task completion rate than both system initiative strategy and mixed initiative strategies when cognitive rate is high. The result also shows that system initiative strategy and mixed initiative strategy have advantages in lower turn number and lower task completion rate than user initiative strategy when cognitive rate is low. Additionally, the result shows that the method, which system speaks only when user has enough time driving operation, makes user's workload low, however, it need more time to complete tasks.

    CiNii Article

    CiNii Books

    researchmap

  • Working Patterns Extractions by Applying Nonnegative Matrix Factorization to PC Operation Logs

    HIRAYAMA Akihiko, HARA Sunao, ABE Masanobu

    IEICE technical report. Life intelligence and office information systems   114 ( 32 )   33 - 38   2014.5

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this paper, we tried to extract working patterns by nonnegative matrix factorization using PC operation logs. Experiments were carried out for three occupations, in terms of daily-basis working patterns, we successfully extracted particular patterns for each occupation and some common working patterns for all occupations. We also presented that the extracted patterns can be easily interpreted as the ways of working.

    CiNii Article

    CiNii Books

    researchmap

  • Influence analysis on user's workload in a spoken dialog strategy for a car navigation system

    Masaki Yamaoka, Sunao Hara, Masanobu Abe

    IPSJ SIG Notes   2014 ( 7 )   1 - 6   2014.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We assess dialog strategies from user's workload standpoint to suggest usage of spoken dialog systems while driving. We evaluate several dialog strategies, which are combinations of methods of dialog initiative and confirmation manner, with objective evaluation method using computer simulation. Two conditions of the simulation are set up; One is that the system speaks if the user has leeway to talk with the system, and the another one is that the system speaks even if the user will be fail to recognize the system's. We also evaluate spoken dialog systems applying these methods with subjective evaluation method. As a result of the evaluations, user initiative strategy has advantages in lower turn number and lower task completion rate than both system initiative strategy and mixed initiative strategies when cognitive rate is high. The result also shows that system initiative strategy and mixed initiative strategy have advantages in lower turn number and lower task completion rate than user initiative strategy when cognitive rate is low. Additionally, the result shows that the method, which system speaks only when user has enough time driving operation, makes user's workload low, however, it need more time to complete tasks.

    CiNii Article

    CiNii Books

    researchmap

  • Working Patterns Extractions by Applying Nonnegative Matrix Factorization to PC Operation Logs

    2014 ( 6 )   1 - 6   2014.5

     More details

  • Working Patterns Extractions by Applying Nonnegative Matrix Factorization to PC Operation Logs

    2014 ( 6 )   1 - 6   2014.5

     More details

  • 滞在地の特徴量を利用した「特別な日」検索方式の検討

    林啓吾, 原直, 阿部匡伸

    第76回全国大会講演論文集   2014 ( 1 )   459 - 460   2014.3

     More details

    Language:Japanese  

    ライフログとは,人間の行動をデジタルデータとして記録に残すことである.ライフログデータを用いて自分の行動を振り返ることを考える.例えば,現在広く普及しているSocial Networking Service(SNS)の過去の発言や写真を見返すことで,振り返りが可能であるが,それらに記録するのは自分の意志で記録したいと思ったことに限られてしまうという問題点がある.無意識に記録できるライフログデータの一つに,Global Positioning System (GPS) による位置情報データ(GPS データ)があげられる.本研究では,GPSデータから得られる滞在地の特徴量を利用し,振り返りたいと感じる「特別な日」を検索する方式を提案した.評価実験により,提案方式で「特別な日」が高い精度で検索可能であることが示された.

    CiNii Article

    CiNii Books

    researchmap

  • 滞在地と経路に着目した生活圏抽出法の検討

    松尾雄二, 原直, 阿部匡伸

    第76回全国大会講演論文集   2014 ( 1 )   37 - 38   2014.3

     More details

    Language:Japanese  

    近年、高齢者による徘徊行動が社会問題となっている。既存の見守りシステムではあらかじめ行動範囲を設定する必要があり、その範囲も詳細に決定できないため十分な見守りが不可能である。 そこで、本研究では蓄積したGPSデータを用いて見守りシステムに応用するための生活圏(行動範囲)を抽出する方法を検討する。GPSデータに含まれる位置情報(緯度経度)をGEOHEXを用いてHEXコードという量子化符号に変換する。それらのHEXコードを滞在地と経路に分類し、頻度の高いHEXコードをそれぞれの生活圏として抽出し、滞在地に属するデータの生活圏と経路に属するデータの生活圏の整合性を評価した。

    CiNii Article

    CiNii Books

    researchmap

  • D-9-4 Individual behavior analysis by comparing GPS logs with others

    Seto Ryo, Abe Masanobu, Hara Sunao

    Proceedings of the IEICE General Conference   2014 ( 1 )   88 - 88   2014.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

  • A development of a smart-device application for environmental sound collection based on crowdsourcing approach

    Sunao Hara, Akinori Kasai, Masanobu Abe, Noboru Sonehara

    IEICE Technical Report   113 ( 479 )   29 - 34   2014.3

     More details

    Authorship:Lead author   Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this study, we aimed to construct environmental sound database for various sounds as Wisdom of the crowds. For example, considering environmental noise-pollution problems as a kind of the environmental sound, we need to consider not only signature sound, e.g. car noise and railway noise, but also life-space noise, e.g. festivity noise on streets or in parks. However, daily/widely sound collection is difficult to substantiate by few participant. Therefore, we aimed to measure environmental sound covered a vast area by applying crowdsourcing approach. First, we develop a prototype application which is run on the smart device with Android OS, and then we develop a prototype server system for collection and browsing the collected sound data. Then, we calibrated noise level measured by smart devices and carried out a sound collection experiment for validate an accuracy of sensors on the smart devices. As a result of the experiment, we collected nine hundred minutes of sound data, and analyzed the relationships between the measured noise level and some subjective evaluations.

    CiNii Article

    CiNii Books

    researchmap

  • Evaluation of HMM-based speech synthesis using high-frequency component of speech waveform

    349 - 352   2014

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • Relationship between the size of speech database and subjective scores on phone-sized unit selection speech synthesis

    331 - 334   2014

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • Sound-map construction method based on symbolization for environmental sounds collected by crowd-sensing

    1535 - 1538   2014

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • Estimation of fuel consumption using an acoustic signal and multi-sensing signals of smartphone

    NANBA Shohei, HARA Sunao, HARA Sunao

    IEICE technical report. Signal processing   113 ( 28 )   1 - 6   2013.5

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    Fuel-consumption meters are equipped with many vehicles, however, they can only show the fuel-consumption/-efficiency value but not allow to use for other purpose, e.g., gathering, analyzing, etc. One of a method to output vehicles data is to use a diagnostic connector having compliant with OBD2 standards, that can output several vehicles' signals, such as, velocity, revolution of engine, and fuel consumption. However, because the protocols depend on manufactures or types of the vehicles, the method is not easy to use for the public. In this study, we aim to estimate the fuel consumption using acoustic signals and several sensor's signals equipped with a smartphone. An estimation of a number of revolutions of engine and an estimation of torque are needed for the estimation of fuel consumption. For the estimation of the number of revolutions, we analyze the acoustic signals from the engine by fast Fourier transform and calculate the estimation value from acoustic signal reducing road-noise approximated as a Gamma mixture distribution. For the estimation of the torque, we use physics of the car with the outputs of several sensors and the vehicle's data. We finally get the fuel consumption refer to a table of fuel-consumption rate, which is created in advance, by the estimated number of revolutions and the estimated torque. As a result of a experiment for the estimation of fuel-consumption, we can achieved a acceptable values of instantaneous fuel consumption, although values of average fuel comsumption have some errors.

    CiNii Article

    CiNii Books

    researchmap

  • The Individual Feature Analysis of the Network of the Stay Extracted from GPS Data

    FUJIOKA Daisuke, HARA Sunao, ABE Masanobu

    IEICE technical report. Life intelligence and office information systems   112 ( 466 )   179 - 184   2013.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this paper, we propose a novel technique for behavior analysis focusing on a structure of a personal "stay" network obtained by GPS data. We analyzed network features, which is called scale-free property, small-world property and cluster-state property, for six subjects. We also analyzed a distribution of a feature. that is called "motif, and then compared with other networks based on these features. We evaluated biases of a degree of a hub and a unique number of connection nodes, and clarified a difference between distributions of each subjects' networks.

    CiNii Article

    CiNii Books

    researchmap

  • Examination of an event sampling process with the similar impression from the others' life log

    HAMANO Kazuto, ABE Masanobu, HARA Sunao, FUJIOKA Daisuke, MOTIZUKI Rika, WATANABE Tomoki

    IEICE technical report. Life intelligence and office information systems   112 ( 466 )   173 - 178   2013.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    When we tell a story of our experience, we try to describe the story replacing our experience with other's one that can recall a similar impression to help understanding of the story. In this paper, we study about an extraction method of a common sense of impressive event between people having different backgounds and experiences. First, we showed several emotional words to the subjects and then asked them to recall the events matching the words. Using the recalled events, we compared two method for evaluate the similarity of the events; a method is based on quantification of the events by his/her emotion with five point scale, and the another one is based on decision the similarity of the event by discussion between two subjects. Experimental result suggested that the decision of the similarity by discussion is heavily affected by the strong emotion of the event.

    CiNii Article

    CiNii Books

    researchmap

  • 音声情報案内システムにおけるBag‐of‐Wordsを用いた無効入力棄却モデルの可搬性の評価

    真嶋温佳, TORRES Rafael, 川波弘道, 原直, 松井知子, 猿渡洋, 鹿野清宏

    日本音響学会研究発表会講演論文集(CD-ROM)   2013   ROMBUNNO.3-9-5   2013.3

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • The 2nd stage activity report of ASJ students and young researchers forum

    Okamoto Takuma, Okuzono Takeshi, Kidani Shunsuke, Hara Sunao, Ohta Tatsuya, Imoto Keisuke

    THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN   69 ( 9 )   519 - 520   2013

     More details

    Language:Japanese   Publisher:Acoustical Society of Japan  

    DOI: 10.20697/jasj.69.9_519

    CiNii Article

    researchmap

  • 音声情報システムにおける最大エントロピー法を用いた無効入力棄却の評価

    真嶋温佳, TORRES Rafael, 川波弘道, 原直, 松井知子, 猿渡洋, 鹿野清宏

    日本音響学会研究発表会講演論文集(CD-ROM)   2012   ROMBUNNO.3-1-8   2012.9

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • Invalid Input Rejection Using Bag-of-Words for Speech-Oriented Guidance System

    Majima Haruka, Fujita Yoko, Torres Rafael, Kawanami Hiromichi, Hara Sunao, Matsui Tomoko, Saruwatari Hiroshi, Shikano Kiyohiro

    IPSJ SIG Notes   2012 ( 7 )   1 - 6   2012.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    On a real environment speech-oriented information guidance system, a valid and invalid input discrimination is important as invalid inputs such as noise, laugh, cough and utterances between users lead to unpredictable system responses. Generally, acoustic features are used for discrimination. Comparing acoustic likelihoods of GMMs (Gaussian Mixture Models) from speech data and noise data is one of the typical methods. In addition to that, using linguistic features is considered to improve discrimination accuracy as it reflects the task-domain of invalid inputs and meaningless recognition re...

    CiNii Article

    CiNii Books

    researchmap

  • New Speech Research Paradigm in the Cloud Era

    Tomoyoshi Akiba, Koji Iwano, Jun Ogata, Tetsuji Ogawa, Nobutaka Ono, Takahiro Shinozaki, Koichi Shinoda, Hiroaki Nanjo, Hiromitsu Nishizaki, Masafumi Nishida, Ryuichi Nishimura, Sunao Hara, Takaaki Hori

    IPSJ SIG Notes   2012 ( 4 )   1 - 7   2012.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently most individuals have come to use mobile information devices, and daily upload the information obtained by such devices to Internet Cloud. Accordingly the applications of speech information processing have been changing drastically. We need to create a new paradigm for the research and development of speech information processing to adapt to this change. In this paper, we summarize the state-of-the-art speech technologies, propose how to create a research platform for this new paradigm, and discuss the problems we should solve to realize it.

    CiNii Article

    CiNii Books

    researchmap

  • Design of a network service for developing a speech-oriented guidance system used on mobile comuputers

    Sunao Hara, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

    IPSJ SIG Notes   2012 ( 1 )   1 - 6   2012.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    In this paper we propose a novel speech service software for speech-oriented guidance systems. This software has been developed based on Takemaru-kun system, that is implemented at a community center since Nov. 2002. It is consisted of several modules, such as, Automatic Speech Recognition, Dialog Management, Text-to-Speech, Internet browser, and Computer Graphic Agent. This software and toolkit is plan to be freely distributed. It will be used as the speech service software as Software-as-a-Service (SaaS) for WWW site developers, and also used for an upgrade system of our system for advanc...

    CiNii Article

    CiNii Books

    researchmap

  • D-9-36 DEVELOPMENT OF A SPEECH-ORIENTED GUIDANCE PLATFORM AS A SOFTWARE-AS-A-SERVICE FOR VARIOUS USAGE AND ENVIRONEMENTS

    Hara Sunao, Kawanami Hiromichi, Saruwatari Hiroshi, Shikano Kiyohiro

    Proceedings of the IEICE General Conference   2012 ( 1 )   168 - 168   2012.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

  • Multi-band Speech Recognition using Confidence of Blind Source Separation

    ANDO Atsushi, OHASHI Hiromasa, HARA Sunao, KITAOKA Norihide, TAKEDA Kazuya

    IEICE technical report. Speech   111 ( 431 )   219 - 224   2012.2

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    One of the main applications of Blind Source Separation (BSS) is to improve performance of Automatic Speech Recognition (ASR) systems. However, conventional BSS algorithm has been applied only to speech signals as a pre-processing approach. In this paper, a closely coupled framework between FDICA-based BSS algorithm and speech recognition system is proposed. In the source separation step, a confidence score of the separation accuracy for each frequency bin is first estimated. Subsequently, by employing multi-band speech recognition system, acoustic likelihood is calculated in the Mel-scale filter bank energy using the estimated BSS confidence scores. Therefore, our proposed method can reduce ASR errors which caused by separation errors in BSS and permutation errors in ICA, as in the conventional approach. Experimental results showed that our proposed method improved word correct rate of ASR by 8.2 % and word accuracy rate by 5.7 % on average.

    CiNii Article

    CiNii Books

    researchmap

  • 多様な利用環境における音声情報案内サービスソフトウェアの開発

    原直

    信学総大講演論文集, 2012   168   2012

  • Robust Acoustic Modeling Using MLLR Transformation-based Speech Feature Generation

    2010 ( 5 )   1 - 6   2011.2

     More details

  • MLLR変換行列により制約された音響特徴量生成による頑健な音響モデル (音声)

    伊藤 新, 原 直, 北岡 教英

    電子情報通信学会技術研究報告   110 ( 357 )   55 - 60   2010.12

     More details

    Publisher:電子情報通信学会  

    CiNii Article

    researchmap

  • Music recommendation system based on chat speech recognition

    OHASHI Hiromasa, KITAOKA Norihide, HARA Sunao, TAKEDA Kazuya

    IEICE technical report   110 ( 220 )   59 - 64   2010.10

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We developed an ambient system that plays a music suitable for the mood of a human-human conversation using words obtained from a continuous speech recognition system. Using the correspondent between a document space based on the texts related to the musics and an acoustic space that express various audio features, the continuous speech recognition results are mapped to an acoustic space. Proper names, which are not coverd by the continuous speech recognizer, are recognized by a wordspotter. In this paper, we show the result of the perfomance evaluation for the system. For read music review texts, the system obtained in MRR of 0.83, which is not bad, with high WER of 70.55%, not low F measure of 31.58. We also show an example result for chat conversations.

    CiNii Article

    CiNii Books

    researchmap

  • Estimation method of user satisfaction based on dialog history N-gram for spoken dialog system

    Hara Sunao, Kitaoka Norihide, Takeda Kazuya

    IEICE technical report   109 ( 355 )   77 - 82   2009.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    researchmap

  • Estimation method of user satisfaction based on dialog history N-gram for spoken dialog system

    HARA SUNAO, KITAOKA NORIHIDE, TAKEDA KAZUYA

    2009 ( 14 )   1 - 6   2009.12

     More details

  • User modeling for a satisfaction evaluation of a speech recognition system

    HARA Sunao, KITAOKA Norihide, TAKEDA Kazuya

    IEICE technical report   108 ( 338 )   61 - 66   2008.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    A mathematical model for predicting the user satisfaction of a speech dialogue systems is studied based on a field trial of a voice-navigated music retrieval system. The Subjective Word Accuracy (subjective-WA), of the user is introduced as a background psychometrics for the satisfaction. In the field test, subjective-WA is collected through questionnaires together with satisfactory indexes and various user profiles. First we show that the subjective-WA is more significant to the user satisfactory than (Objective) Word Accuracy (objective-WA ), which is calculated using the manually given transcriptions for the recorded dialogue. Then through top-down clustering of the joint distribution of subjective- and objective-WAs, we show that the user population can be grouped into several sub-groups in terms of sensitivity to recognition accuracy. The lower bound of the objective-WA for the given subjective-WA is also calculated from the joint distribution. Finally, a graphical model is build that predicts the user satisfactory index from user profiles and reduces the distribution uncertainty of user satisfaction by 13% of its variance.

    CiNii Article

    CiNii Books

    researchmap

  • Evaluation of training effects by long-term use of a spoken dialogue interface

    HARA Sunao, MIYAJIMA Chiyomi, ITOU Katsunobu, KITAOKA Norihide, TAKEDA Kazuya

    70 ( 5 (3L-4) )   5-341 - 5-342   2008.3

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)  

    CiNii Article

    CiNii Books

    researchmap

  • User modeling for a satisfaction evaluation of a speech recognition system

    HARA Sunao, KITAOKA Norihide, TAKEDA Kazuya

    IPSJ SIG Notes   2008 ( 123 )   61 - 66   2008

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    A mathematical model for predicting the user satisfaction of a speech dialogue systems is studied based on a field trial of a voice-navigated music retrieval system. The Subjective Word Accuracy (subjective-WA), of the user is introduced as a background psychometrics for the satisfaction. In the field test, subjective-WA is collected through questionnaires together with satisfactory indexes and various user profiles. First we show that the subjective-WA is more significant to the user satisfactory than (Objective) Word Accuracy (objective-WA), which is calculated using the manually given transcriptions for the recorded dialogue. Then through top-down clustering of the joint distribution of subjective- and objective-WAs, we show that the user population can be grouped into several sub-groups in terms of sensitivity to recognition accuracy. The lower bound of the objective-WA for the given subjective-WA is also calculated from the joint distribution. Finally, a graphical model is build that predicts the user satisfactory index from user profiles and reduces the distribution uncertainty of user satisfaction by 13% of its variance.

    CiNii Article

    CiNii Books

    researchmap

  • Data Collection System for the Speech Utterances to an Automatic Speech Recognition System under Real Environments

    HARA Sunao, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    The IEICE transactions on information and systems   90 ( 10 )   2807 - 2816   2007.10

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Article

    CiNii Books

    researchmap

  • Constructing Acoustic Model for User-specific Song List in a Music Retrieval System

    HARA Sunao, MIYAJIMA Chiyomi, KITAOKA Norihide, ITOU Katsunobu, TAKEDA Kazuya

    IPSJ SIG Notes   2007 ( 75 )   87 - 90   2007.7

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper discusses a training method for the HMM acoustic model that efficiently cover the given vocabulary in order to apply it to the speech interface of a music retrieval system. Customizing the acoustic model to each user is important in this application because 1) song titles and artist names contain many phonetic contexts that are rare in general, e.g. text reading corpora, and 2) the songs stored in a device are different among users. In particular, finding an optimal state-tying structure for the given vocabulary is a new problem in acoustic model training. We propose a method for building a task-dependent acoustic model that uses task-related synthetic utterances of more than one hundred speakers by means of HMM-based speech synthesis. From the experimental evaluation using field test data, we confirmed that the task-dependent acoustic model trained by the proposed method can reduce word error rate by 10% compared to a task-independent model.

    CiNii Article

    CiNii Books

    researchmap

  • Evaluation of a music retrieval system using spoken dialogue

    2007   47 - 50   2007

     More details

    Language:Japanese  

    CiNii Article

    researchmap

  • Speech data collection and evaluation by using a spoken dialogue system on general purpose PCs

    HARA Sunao, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    IEICE technical report   106 ( 443 )   167 - 172   2006.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We developed a user customizable speech dialogue system and a framework for automatic speech data collection in field experiments over the Internet. Users can download and install the speech dialogue system onto their own PCs and customize the system on a remote server for their own use. The speech data recorded on their PCs are transferred to the remote server through the Internet. The system enables us to collect speech data spoken by many users with wide variety of acoustic environments. During a two-month field test, we obtained 59 hours of recorded data including 5 hours and 41 minutes detected as speech, which corresponds to 11351 speech segments. The word correct rate for the 4716 speech utterances spoken to the dialogue system was 66.0%, which was improved to 70.5% after applying unsupervised MLLR for each user.

    CiNii Article

    CiNii Books

    researchmap

  • Speech data collection and evaluation by using a spoken dialogue system on general purpose PCs

    HARA Sunao, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    IPSJ SIG Notes   2006 ( 136 )   167 - 172   2006

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We developed a user customizable speech dialogue system and a framework for automatic speech data collection in field experiments over the Internet. Users can download and install the speech dialogue system onto their own PCs and customize the system on a remote server for their own use. The speech data recorded on their PCs are transferred to the remote server through the Internet. The system enables us to collect speech data spoken by many users with wide variety of acoustic environments. During a two-month field test, we obtained 59 hours of recorded data including 5 hours and 41 minutes detected as speech, which corresponds to 11351 speech segments. The word correct rate for the 4716 speech utterances spoken to the dialogue system was 66.0%, which was improved to 70.5% after applying unsupervised MLLR for each user.

    CiNii Article

    CiNii Books

    researchmap

  • Evaluation of training effects of a spoken dialogue interface by long-term use

    HARA Sunao, SHIROSE Ayako, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    2005 ( 1 )   153 - 154   2005.3

     More details

  • Evaluation of training effects by long-term use of a spoken dialogue interface

    HARA Sunao, SHIROSE Ayako, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    IPSJ SIG Notes   2005 ( 12 )   17 - 22   2005

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We are developing a music retrieval system for in-car use based on a spoken dialogue interface. The system can retrieve and play musics that a user wants to listen to. We have previously conducted experiments where each subject uses the system for one hour. In the experiments, we have found that the speech recognition performance is improved as the subjects get used to the system, although the degree of training depends on the subject. In this paper, we conduct extended experiments where each subject uses the system over five one-hour sessions. Experimental results for twelve subjects show that the system achieves about 60% relative improvement in recognition performance at the fifth session compared to the first session.

    CiNii Article

    CiNii Books

    researchmap

  • A music searching system by spoken dialogue

    HARA Sunao, SHIROSE Ayako, MIYAJIMA Chiyomi, ITOU Katsunobu, TAKEDA Kazuya

    IPSJ SIG Notes   2004 ( 103 )   31 - 36   2004.2

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    Recently, various applications equipped with speech recognition are developed. For example, it is used for the car-navigation systems with handsfree operation. There are some systems of music download via the Internet so a music search interface which is easy to use is expected. Then we create a music search system supposing use in the car was used by spoken dialogue. This system can search and play the music what the user want to listen. In this paper, we discuss a detail of the system and spoken dialogue recording with the s)'stem. Experimental results of 150 subjects with a prototype system show that the system could achieve about 80% word correct indoor environment and about 76% word correct in car environment.

    CiNii Article

    CiNii Books

    researchmap

  • Preliminary study on the evaluation of a quality of spoken dialogue system in terms of user factors

    SHIROSE Ayako, HARA Sunao, FUJIMURA Hiroshi, ITO Katsunobu, TAKEDA Kazuya, ITAKURA Fumitada

    IEICE technical report. Natural language understanding and models of communication   103 ( 518 )   7 - 12   2003.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    This study aims to describe user problems and process of learning skill in using spoken dialogue systems and to reveal how these impact on the evaluation of the system usefulness. For this aim, we designed a new dialogue system and carried out a field test for a large number of subjects and asked them to evaluate the usefulness of the system. The results showed that the evaluation of the system did not correlate a recognition rate but user satisfaction and comprehension. This suggested that the spoken dialogue systems should be evaluated in terms of user factors. Controlled experiments are needed to discuss in detail.

    CiNii Article

    CiNii Books

    researchmap

  • Preliminary study on the evaluation of a quality of spoken dialogue system in terms of user factors

    Ayako Shirose, Sunao Hara, Hiroshi Fujimura, Katsunobu Ito, Kazuya Takeda, Fumitada Itakura

    IPSJ SIG Technical Report   2003 ( 124 (2003-SLP-049) )   253 - 258   2003.12

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:Information Processing Society of Japan (IPSJ)  

    This study aims to describe user problems and process of learning skill in using spoken dialogue systems and to reveal how these impact on the evaluation of the system usefulness. For this aim, we designed a new dialogue system and carried out a field test for a large number of subjects and asked them to evaluate the usefulness of the system. The results showed that the evaluation of the system did not correlate a recognition rate but user satisfaction and comprehension. This suggested that the spoken dialogue systems should be evaluated in terms of user factors. Controlled experiments are needed to discuss in detail.

    CiNii Article

    CiNii Books

    researchmap

  • Implementation and evaluation of Julius/Julian on the PDA environment

    Sunao Hara, Nobuo Kawaguchi, Kazuya Takeda, Fumitada Itakura

    IPSJ SIG Technical Report   2003 ( 14 (2002-SLP-045) )   131 - 136   2003.2

     More details

    Language:Japanese   Publishing type:Research paper, summary (national, other academic conference)   Publisher:Information Processing Society of Japan (IPSJ)  

    In order to develop an open source platform of the speech recognition on Personal Digital Assistant (PDA), a general-purpose speech recognition engine Julius/Julian is ported to the PDA environment. From the experimental evaluations the following results are obtained. In die isolated word recognition, 90% accuracy is obtained by about 1.9 times of real time, that in about 73 times to a standard PC environment. In the connected digit recognition, 99% word accuracy is obtained using HMMs trained by sentences recorded by PDA.

    CiNii Article

    CiNii Books

    researchmap

▼display all

Presentations

  • 人対人の会話で自然な話題展開を支援するための対話戦略の検討

    前薗そよぎ, 原直, 阿部匡伸

    音学シンポジウム2021(情報処理学会 音声言語処理研究会)  2021.6.18  情報処理学会

     More details

    Event date: 2021.6.18 - 2021.6.19

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 呼気流路の容易な制御を目的とした面接触型人工舌の構音改善に関する実験的研究

    長塚弘亮, 川上滋央, 古寺寛志, 佐藤匡晃, 田中祐貴, 兒玉直紀, 原直, 皆木省吾

    日本顎顔面補綴学会 第38回総会・学術大会  2021.6.4 

     More details

    Event date: 2021.6.3 - 2021.6.5

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • ニューラル機械翻訳により推定された読み仮名・韻律記号を入力とする日本語 End-to-End 音声合成の評価

    懸川直人, 原直, 阿部匡伸, 井島勇祐

    日本音響学会2021年春季研究発表会  2021.3.11  日本音響学会

     More details

    Event date: 2021.3.10 - 2021.3.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Evaluation of Concept Drift Adaptation for Acoustic Scene Classifier Based on Kernel Density Drift Detection and Combine Merge Gaussian Mixture Model

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    2021.3.10 

     More details

    Event date: 2021.3.10 - 2021.3.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 歌唱表現を付与できるBidirectional-LSTM を用いた歌声合成方式の検討

    金子隼人, 原直, 阿部匡伸

    日本音響学会2021年春季研究発表会  2021.3.10  日本音響学会

     More details

    Event date: 2021.3.10 - 2021.3.12

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • TTSによる会話支援システムのための感圧センサを用いた手袋型入力デバイスの開発と入力速度の評価

    IPSJ SIG-HCI  2020.12.9 

     More details

    Event date: 2020.12.8 - 2020.12.9

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Module Comparison of Transformer-TTS for Speaker Adaptation based on Fine-tuning International conference

    Katsuki Inoue, Sunao Hara, Masanobu Abe

    APSIPA Annual Summit and Conference 2020  2020.12  APSIPA

     More details

    Event date: 2020.12.7 - 2020.12.10

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Online/Virtual Conference (Auckland, New Zealand)  

    researchmap

    Other Link: https://ieeexplore.ieee.org/document/9306250

  • Concept Drift Adaptation for Acoustic Scene Classifier Based on Gaussian Mixture Model International conference

    Ibnu Daqiqil Id, Masanobu Abe, Sunao Hara

    The 2020 IEEE Region 10 Conference (IEEE-TENCON 2020)  2020.11  IEEE

     More details

    Event date: 2020.11.16 - 2020.11.19

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Online/Virtual Conference (Osaka, Japan)  

    researchmap

  • Controlling the Strength of Emotions in Speech-like Emotional Sound Generated by WaveNet International conference

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    Interspeech 2020  2020.10  ISCA

     More details

    Event date: 2020.10.25 - 2020.10.29

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Online/Virtual Conference (Shanghai, China)  

    researchmap

  • パーキンソン病重症度推定のためのインソール型圧力センサを用いた時間的特徴量の検討

    林倖生, 原直, 阿部匡伸

    2020年度(第71回)電気・情報関連学会中国支部連合大会  2020.10.24 

     More details

    Event date: 2020.10.24

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Transformerを用いた日本語テキストからの読み仮名・韻律記号列推定

    懸川直人, 原直, 阿部匡伸, 井島勇祐

    日本音響学会2020年秋季研究発表会  2020.9.11  日本音響学会

     More details

    Event date: 2020.9.9 - 2020.9.11

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:オンライン  

    researchmap

  • WaveNetを用いた言語情報なし感情音声合成における感情の強さ制御の検討

    松本剣斗, 原直, 阿部匡伸

    日本音響学会2020年秋季研究発表会  2020.9.10  日本音響学会

     More details

    Event date: 2020.9.9 - 2020.9.11

    Language:Japanese   Presentation type:Poster presentation  

    Venue:オンライン  

    researchmap

  • 映像と音声を用いた議論への関与姿勢や肯定的・否定的態度の推定方式の検討

    Tsubasa Kanaoka, Yutaro Uehara, Sunao Hara, Masanobu Abe

    DICOMO 2020  2020.6.26 

     More details

    Event date: 2020.6.24 - 2020.6.26

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • GPSデータのクラスタリングによる日常生活における場所の重要度の分析

    Rui Hirata, Sunao Hara, Masanobu Abe

    DICOMO 2020  2020.6.25 

     More details

    Event date: 2020.6.24 - 2020.6.26

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • End-to-End 音声認識を用いた音声合成の半教師あり話者適応 International coauthorship

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

    2020.6.7 

     More details

    Event date: 2020.6.6 - 2020.6.7

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 話題の対象に対する親密度に応じて応答する音声対話システムの検討

    Daichi Kato, Sunao Hara, Masanobu Abe

    2020.6.6 

     More details

    Event date: 2020.6.6 - 2020.6.7

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • Semi-supervised speaker adaptation for end-to-end speech synthesis with the pretrained models International coauthorship International conference

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

    ICASSP 2020  2020.5  IEEE

     More details

    Event date: 2020.5.4 - 2020.5.8

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Online/Virtual Conference (Barcelona, Spain)  

    researchmap

  • 舌亜全摘出者の音韻明瞭度改善のための推定音素事後確率を用いた声質変換の検討

    Seiya Ogino, Sunao Hara, Masanobu Abe

    2020.3.17 

     More details

    Event date: 2020.3.17 - 2020.3.18

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 言語情報なし感情合成音を学習に用いたCycleGANによる感情変換方式の検討

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    2020.3.18 

     More details

    Event date: 2020.3.16 - 2020.3.18

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • End-to-End 音声認識を用いた音声合成の半教師あり話者適応 International coauthorship

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

    2020.3.17 

     More details

    Event date: 2020.3.16 - 2020.3.18

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • ウェアラブルデバイスによる曖昧な入力からのニューラル機械翻訳を用いた日本語文章推定方式

    Jun Watanabe, Sunao Hara, Masanobu Abe

    IPSJ SIG-HCI  2020.3.16 

     More details

    Event date: 2020.3.16 - 2020.3.17

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • End-to-End⾳声認識を⽤いたEnd-to-End⾳声合成の性能評価 International coauthorship

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Shinji Watanabe

    2019.11.30 

     More details

    Event date: 2019.11.30

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN-based Voice Conversion with Auxiliary Phonemic Information to Improve Intelligibility of Glossectomy Patients’ Speech International conference

    Hiroki Murakami, Sunao Hara, Masanobu Abe

    APSIPA Annual Summit and Conference 2019  2019.11.19  APSIPA

     More details

    Event date: 2019.11.18 - 2019.11.21

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Lanzhou, China  

    researchmap

  • Speech-like Emotional Sound Generator by WaveNet International conference

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    APSIPA Annual Summit and Conference 2019  2019.11  APSIPA

     More details

    Event date: 2019.11.18 - 2019.11.21

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Lanzhou, China  

    researchmap

  • WaveNet による言語情報を含まない感情音声合成方式における話者性の検討

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    2019.9.6 

     More details

    Event date: 2019.9.4 - 2019.9.6

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 新たにデザインされた人工舌と解剖学的人工舌の効果ならびにその選択基準

    佐藤匡晃, 長塚弘亮, 川上滋央, 兒玉直紀, 原直, 阿部匡伸, 皆木省吾

    日本補綴歯科学会 中国・四国支部学術大会  2019.9.1  日本補綴歯科学会 中国・四国支部

     More details

    Event date: 2019.8.31 - 2019.9.1

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:広島県福山市  

    researchmap

  • A signal processing perspective on human gait: Decoupling walking oscillations and gestures International coauthorship International conference

    Adrien Gregorj, Zeynep Yücel, Sunao Hara, Akito Monden, Masahiro Shiomi

    The 4th International Conference on Interactive Collaborative Robotics 2019 (ICR 2019)  2019.8 

     More details

    Event date: 2019.8.20 - 2019.8.25

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • GPSデータに基づく日常生活における特別な行動の検出

    Makoto Kobayashi, Sunao Hara, Masanobu Abe

    DICOMO 2019  2019.7.4 

     More details

    Event date: 2019.7.3 - 2019.7.5

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • CNN Autoencoder から抽出したボトルネック特徴量を用いた環境音分類

    Takumi Matsubara, Sunao Hara, Masanobu Abe

    DICOMO 2019  2019.7.3 

     More details

    Event date: 2019.7.3 - 2019.7.5

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • WaveNetによる言語情報を含まない感情音声合成方式の検討

    Kento Matsumoto, Sunao Hara, Masanobu Abe

    2019.6.23 

     More details

    Event date: 2019.6.22 - 2019.6.23

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • i-vectorに基づく賑わい音の推定方式の検討

    Zhenyang Wu, Kohei Tomoda, Sunao Hara, Masanobu Abe

    2019.6.22 

     More details

    Event date: 2019.6.22 - 2019.6.23

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 舌亜全摘出者の音韻明瞭度改善のための Bidirectional LSTM-RNN に基づく音素補助情報を用いた声質変換方式の検討

    Hiroki Murakami, Sunao Hara, Masanobu Abe

    2019.3.4 

     More details

    Event date: 2019.3.3 - 2019.3.5

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 声質変換による舌亜全摘出者の音韻明瞭度改善のための音素補助情報の推定方式の検討

    Seiya Ogino, Hiroki Murakami, Sunao Hara, Masanobu Abe

    2019.3.4 

     More details

    Event date: 2019.3.3 - 2019.3.5

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN音声合成における少量の目標感情音声を用いた感情付与方式の検討

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Yusuke Ijima

    2019.3.3 

     More details

    Event date: 2019.3.3 - 2019.3.5

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • ながら聴き用楽曲の作業負荷に及ぼす影響とその選択方式の検討

    Kaoru Takase, Masanobu Abe, Sunao Hara

    2018.11.21 

     More details

    Event date: 2018.11.21 - 2018.11.22

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • クラウドソーシングによる環境音マップ構築のための主観的な騒々しさ推定方式の検討

    Sunao Hara, Masanobu Abe

    2018.9.20 

     More details

    Event date: 2018.9.19 - 2018.9.21

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Speech Enhancement of Glossectomy Patient’s Speech using Voice Conversion Approach

    Masanobu Abe, Seiya Ogino, Hiroki Murakami, Sunao Hara

    日本生物物理学会第56回年会,シンポジウム:ヘルスシステムの理解と応用  2018.9.15  日本生物物理学会

     More details

    Event date: 2018.9.15 - 2018.9.17

    Language:English   Presentation type:Oral presentation (general)  

    Venue:岡山大学 津島キャンパス  

    researchmap

  • 声質変換による舌亜全摘出者の音韻明瞭度改善のための補助情報の検討

    Hiroki Murakami, Sunao Hara, Masanobu Abe

    2018.9.12 

     More details

    Event date: 2018.9.12 - 2018.9.14

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN音声合成における感情付与方式の評価

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Katsunobu Houjou, Yusuke Ijima

    2018.9.12 

     More details

    Event date: 2018.9.12 - 2018.9.14

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient’s Speech Using Spectral Differential Modification in Voice Conversion International conference

    Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    Interspeech 2018  2018.9.5  ISCA

     More details

    Event date: 2018.9.2 - 2018.9.6

    Language:English   Presentation type:Poster presentation  

    Venue:Hyderabad, India  

    researchmap

  • 音声と口唇形状を用いた声質変換による舌亜全摘出者の音韻明瞭度改善の検討

    Seiya Ogino, Hiroki Murakami, Sunao Hara, Masanobu Abe

    SP  2018.6.28 

     More details

    Event date: 2018.6.28 - 2018.6.29

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 舌亜全摘出者の音韻明瞭性改善のためのマルチモーダルデータベースの構築

    Hiroki Murakami, Seiya Ogino, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

     More details

    Event date: 2018.3.13 - 2018.3.15

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • クラウドソーシングによる賑わい音識別方式のフィールド実験評価

    Kohei Tomoda, Sunao Hara, Masanobu Abe

     More details

    Event date: 2018.3.13 - 2018.3.15

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN音声合成における感情付与のための継続時間長モデルの検討

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Katsunobu Houjou, Yusuke Ijima

     More details

    Event date: 2018.3.13 - 2018.3.15

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • Sound sensing using smartphones as a crowdsourcing approach International conference

    Sunao Hara, Asako Hatakeyama, Shota Kobayashi, Masanobu Abe

    APSIPA Annual Summit and Conference 2017  2017.12.15  APSIPA

     More details

    Event date: 2017.12.12 - 2017.12.15

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Kuala Lumpur, Malaysia  

    researchmap

  • An investigation to transplant emotional expressions in DNN-based TTS synthesis, International conference

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

    APSIPA Annual Summit and Conference 2017  2017.12.14  APSIPA

     More details

    Event date: 2017.12.12 - 2017.12.15

    Language:English   Presentation type:Poster presentation  

    Venue:Kuala Lumpur, Malaysia  

    researchmap

  • New monitoring scheme for persons with dementia through monitoring-area adaptation according to stage of disease, International conference

    Shigeki Kamada, Yuji Matsuo, Sunao Hara, Masanobu Abe

    ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks (LocalRec 2017)  ACM

     More details

    Event date: 2017.11.7 - 2017.11.10

    Language:English   Presentation type:Poster presentation  

    Venue:Redondo Beach, CA, USA  

    researchmap

  • DNN に基づく差分スペクトル補正を用いた声質変換による舌亜全摘出者の音韻明瞭性改善の検討

    Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    2017.9.26 

     More details

    Event date: 2017.9.25 - 2017.9.27

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN による人間の感覚を考慮した騒々しさ推定方式に基づく騒音マップの作成

    Shota Kobayashi, Sunao Hara, Masanobu Abe

    2017.9.26 

     More details

    Event date: 2017.9.25 - 2017.9.27

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNN音声合成における話者と感情の情報を扱うためのモデル構造の検討

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Katsunobu Houjou, Yusuke Ijima

    2017.9.25 

     More details

    Event date: 2017.9.25 - 2017.9.27

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • Prediction of subjective assessments for a noise map using deep neural networks International conference

    Shota Kobayashi, Sunao Hara, Masanobu Abe

    2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UniComp 2017)  2017.9.13  ACM

     More details

    Event date: 2017.9.11 - 2017.9.15

    Language:English   Presentation type:Poster presentation  

    Venue:Maui, Hawaii, USA  

    researchmap

  • Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion International conference

    Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

    Interspeech 2017  2017.8.23  ISCA

     More details

    Event date: 2017.8.20 - 2017.8.24

    Language:English   Presentation type:Poster presentation  

    Venue:Stockholm, Sweden  

    researchmap

  • 環境音収集に効果的なインセンティブを与える可視化方式の検討

    Asako Hatakeyama, Sunao Hara, Masanobu Abe

    DICOMO 2017  2017.6.28 

     More details

    Event date: 2017.6.28 - 2017.6.30

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • DNN音声合成における感情付与のためのモデル構造の検討

    Katsuki Inoue, Sunao Hara, Masanobu Abe, Katsunobu Houjou, Yusuke Ijima

    SP  2017.6.22 

     More details

    Event date: 2017.6.22 - 2017.6.23

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 2つの粒度の生活圏に基づく見守りシステム

    Shigeki Kamada, Sunao Hara, Masanobu Abe

    2017.3.22 

     More details

    Event date: 2017.3.22 - 2017.3.25

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • スマートフォンで収録した環境音データベースを用いたCNNによる環境音分類

    Shunji Toba, Sunao Hara, Masanobu Abe

    2017.3.16 

     More details

    Event date: 2017.3.15 - 2017.3.17

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • DNNによる人間の感覚を考慮した騒音マップ作成のための騒々しさ推定方式

    Shota Kobayashi, Sunao Hara, Masanobu Abe

    2017.3.16 

     More details

    Event date: 2017.3.15 - 2017.3.17

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Enhancing a Glossectomy Patient’s Speech via GMM-based Voice Conversion International conference

    Kei Tanaka, Sunao Hara, Masanobu Abe, Shogo Minagi

    APSIPA Annual Summit and Conference 2016  2016.12.13  APSIPA

     More details

    Event date: 2016.12.13 - 2016.12.16

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Jeju, Korea  

    researchmap

  • A classification method for crowded situation using environmental sounds based on Gaussian mixture model-universal background model International conference

    Tomoyasu Tanaka, Sunao Hara, Masanobu Abe

    ASA/ASJ 5th Joint Meeting  2016.11.29  米国音響学会/日本音響学会

     More details

    Event date: 2016.11.28 - 2016.12.2

    Language:English   Presentation type:Poster presentation  

    Venue:Honolulu, Hawaii  

    researchmap

  • 基本周波数変形を考慮したスペクトル変換手法の検討

    Kengo Toko, Sunao Hara, Masanobu Abe

    2016.11.19 

     More details

    Event date: 2016.11.19 - 2016.11.20

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • スマートフォンで収録した環境音に含まれるタップ音除去方式の検討

    Kohei Tomoda, Sunao Hara, Masanobu Abe

    2016.9.15 

     More details

    Event date: 2016.9.14 - 2016.9.16

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 重複音を含む環境音データベースにおける環境音検出のための特徴量の基本検討

    Sunao Hara, Tomoyasu Tanaka, Masanobu Abe

    2016.9.15 

     More details

    Event date: 2016.9.14 - 2016.9.16

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • GMMに基づく声質変換を用いた舌亜全摘出者の音韻明瞭性改善の検討

    Kei Tanaka, Sunao Hara, Masanobu Abe, Shogo Minagi

    2016.9.15 

     More details

    Event date: 2016.9.14 - 2016.9.16

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • RNNによる実環境データからのマルチ音響イベント検出

    Shunji Toba, Sunao Hara, Masanobu Abe

    2016.9.15 

     More details

    Event date: 2016.9.14 - 2016.9.16

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • LiBS: Lifelog browsing system to support sharing of memories International conference

    Atsuya Namba, Sunao Hara, Masanobu Abe

    2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UniComp 2016)  2016.9.13  ACM

     More details

    Event date: 2016.9.12 - 2016.9.16

    Language:English   Presentation type:Poster presentation  

    Venue:Heidelberg, Germany  

    researchmap

  • Safety vs. Privacy: User Preferences from the Monitored and Monitoring Sides of a Monitoring System International conference

    Shigeki Kamada, Sunao Hara, Masanobu Abe

    2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UniComp 2016)  2016.9.13  ACM

     More details

    Event date: 2016.9.12 - 2016.9.16

    Language:English   Presentation type:Poster presentation  

    Venue:Heidelberg, Germany  

    researchmap

  • 主観的評価に基づいた騒音マップ構築のためのクラウドソーシングによる環境音収集システム

    Sunao Hara, Shota Kobayashi, Masanobu Abe

    SP  2016.8.25 

     More details

    Event date: 2016.8.24 - 2016.8.25

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Sound collection systems using a crowdsourcing approach to construct sound map based on subjective evaluation International conference

    Sunao Hara, Shota Kobayashi, Masanobu Abe

    IEEE ICME Workshop on Multimedia Mobile Cloud for Smart City Applications (MMCloudCity-2016)  2016.7.15  IEEE

     More details

    Event date: 2016.7.11 - 2016.7.15

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Seattle, WA, USA  

    researchmap

  • GPSデータ匿名化レベルの主観的許容度を客観的に表現する指標の検討

    Yusuke Mitou, Sunao Hara, Masanobu Abe

    DICOMO 2016  2016.7.7 

     More details

    Event date: 2016.7.6 - 2016.7.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 人間の感覚を考慮した騒音マップ作成のための騒々しさ推定方式

    Shota Kobayashi, Sunao Hara, Masanobu Abe

    DICOMO 2016  2016.7.6 

     More details

    Event date: 2016.7.6 - 2016.7.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • A measure for transfer tendency between staying places

    Takashi Ofuji, Sunao Hara, Masanobu Abe

    LOIS  2016.5.13 

     More details

    Event date: 2016.5.12 - 2016.5.13

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 賑わい度推定のための環境音データベースの構築

    Tomoyasu Tanaka, Sunao Hara, Masanobu Abe

    2016 Spring Meeting of ASJ  2016.3.11 

     More details

    Event date: 2016.3.9 - 2016.3.11

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • A watching method to protect users' privacy using living area

    Shigeki Kamada, Sunao Hara, Masanobu Abe

    LOIS  2016.1.21 

     More details

    Event date: 2016.1.21 - 2016.1.22

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • A Spoken Dialog System with Redundant Response to Prevent User Misunderstanding International conference

    Masaki Yamaoka, Sunao Hara, Masanobu Abe

    APSIPA Annual Summit and Conference 2015  2015.12.19  APSIPA

     More details

    Event date: 2015.12.16 - 2015.12.19

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Hong Kong  

    researchmap

  • 環境音検出器を用いた環境音の可視化に関する検討

    田中智康,原直,阿部匡伸

    第17回IEEE広島支部学生シンポジウム(HISS 17th)  IEEE広島支部

     More details

    Event date: 2015.11.21 - 2015.11.22

    Language:Japanese   Presentation type:Poster presentation  

    Venue:岡山大学  

    researchmap

  • 心地よく話すことができる聞き役音声対話システムのための対話戦略

    齊藤椋太,原直,阿部匡伸

    第17回IEEE広島支部学生シンポジウム(HISS 17th)  IEEE広島支部

     More details

    Event date: 2015.11.21 - 2015.11.22

    Language:Japanese   Presentation type:Poster presentation  

    Venue:岡山大学  

    researchmap

  • クラウドソーシングによる環境音収集システムを用いた予備収録実験

    原直,阿部匡伸,曽根原登

    2015年日本音響学会秋季研究発表会  2015.9.18  日本音響学会

     More details

    Event date: 2015.9.16 - 2015.9.18

    Language:Japanese   Presentation type:Poster presentation  

    Venue:会津大学  

    researchmap

  • 冗長なシステム応答を用いたユーザの誤認識に頑健な音声対話システムに関する検討

    山岡将綺,原直,阿部匡伸

    2015年日本音響学会秋季研究発表会  2015.9.18  日本音響学会

     More details

    Event date: 2015.9.16 - 2015.9.18

    Language:Japanese   Presentation type:Poster presentation  

    Venue:会津大学  

    researchmap

  • 音楽を用いた生活収録音の振り返り手法の検討

    鳥羽隼司,原直,阿部匡伸

    2015年日本音響学会秋季研究発表会  2015.9.17  日本音響学会

     More details

    Event date: 2015.9.16 - 2015.9.18

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:会津大学  

    researchmap

  • A Sub-Band Text-to-Speech by Combining Sample-Based Spectrum with Statistically Generated Spectrum International conference

    Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki and Hideyuki Mizuno

    Interspeech 2015  ISCA

     More details

    Event date: 2015.9.6 - 2015.9.10

    Language:English   Presentation type:Poster presentation  

    Venue:Dresden, Germany  

    researchmap

  • Algorithm to estimate a living area based on connectivity of places with home International conference

    Yuji Matsuo, Sunao Hara, Masanobu Abe

    HCI International 2015 

     More details

    Event date: 2015.8.2 - 2015.8.7

    Language:English   Presentation type:Poster presentation  

    Venue:Los Angels, CA, USA  

    researchmap

  • Extraction of key segments from day-long sound data International conference

    Akinori Kasai, Sunao Hara, Masanobu Abe

    HCI International 2015 

     More details

    Event date: 2015.8.2 - 2015.8.7

    Language:English   Presentation type:Poster presentation  

    Venue:Los Angels, CA, USA  

    researchmap

  • LiBS:発見と気付きを可能とするライフログブラウジング方式

    難波敦也, 原直,阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2015)  2015.7.10 

     More details

    Event date: 2015.7.8 - 2015.7.10

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岩手県 安比高原  

    researchmap

  • 長期取得音からライフログとして残したい音の抽出方法

    笠井昭範,原直,阿部匡伸

    マルチメディア,分散,協調とモバイルシンポジウム(DICOMO2015)  2015.7.10 

     More details

    Event date: 2015.7.8 - 2015.7.10

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岩手県 安比高原  

    researchmap

  • 振り返り支援における効率的な映像要約のための自動収集ライフログ活用法

    大西杏菜,原直,阿部匡伸

    ライフインテリジェンスとオフィス情報システム研究会 (LOIS)  電子情報通信学会

     More details

    Event date: 2015.5.14 - 2015.5.15

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:津田塾大学小平キャンパス  

    researchmap

  • Sound collection and visualization system enabled participatory and opportunistic sensing approaches International conference

    Sunao Hara, Masanobu Abe, Noboru Sonehara

    2nd International Workshop on Crowd Assisted Sensing, Pervasive Systems and Communications (CASPer 2015)  IEEE

     More details

    Event date: 2015.3.27

    Language:English   Presentation type:Oral presentation (general)  

    Venue:St. Louis, Missouri, USA  

    researchmap

  • ミックスボイスの地声・裏声との類似度比較

    家村朋典,原直,阿部匡伸

    2015年日本音響学会春季研究発表会  2015.3.17  日本音響学会

     More details

    Event date: 2015.3.16 - 2015.3.18

    Language:Japanese   Presentation type:Poster presentation  

    Venue:日本大学理工学部  

    researchmap

  • 聴取者の主観評価に基づく音地図作成のための環境音収録

    原直,阿部匡伸,曽根原登

    2015年日本音響学会春季研究発表会  2015.3.17  日本音響学会

     More details

    Event date: 2015.3.16 - 2015.3.18

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:日本大学理工学部  

    researchmap

  • 高域部への素片スペクトルとHMM生成スペクトルの導入によるHMM合成音声の品質改善の検討

    稻井禎,原直,阿部匡伸,井島勇祐,宮崎昇,水野秀之

    2015年日本音響学会春季研究発表会  2015.3.17  日本音響学会

     More details

    Event date: 2015.3.16 - 2015.3.18

    Language:Japanese   Presentation type:Poster presentation  

    Venue:日本大学理工学部  

    researchmap

  • 移動経路のパターン分類による人の行動分析

    瀬藤諒,原直,阿部匡伸

    ライフインテリジェンスとオフィス情報システム研究会 (LOIS)  電子情報通信学会

     More details

    Event date: 2015.3.5 - 2015.3.6

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:沖縄科学技術大学院大学  

    researchmap

  • 滞在地の特徴量を利用した「特別な日」検索方式

    林啓吾,原直,阿部匡伸

    ライフインテリジェンスとオフィス情報システム研究会 (LOIS)  電子情報通信学会

     More details

    Event date: 2015.3.5 - 2015.3.6

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:沖縄科学技術大学院大学  

    researchmap

  • 滞在地と経路に着目した生活圏抽出法

    松尾雄二,原直,阿部匡伸

    ライフインテリジェンスとオフィス情報システム研究会 (LOIS)  電子情報通信学会

     More details

    Event date: 2015.3.5 - 2015.3.6

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:沖縄科学技術大学院大学  

    researchmap

  • クラウドセンシングデータによる地域の賑わい分析 -地域経済活性化- International conference

    Sunao Hara

    ISSI2014 

     More details

    Event date: 2015.2.16 - 2015.2.17

    Language:Japanese   Presentation type:Oral presentation (invited, special)  

    researchmap

  • Extracting Daily Patterns of Human Activity Using Non-Negative Matrix Factorization International conference

    Masanobu Abe, Akihiko Hirayama, Sunao Hara

    IEEE International Conference on Consumer Electronics  IEEE

     More details

    Event date: 2015.1.9 - 2015.1.12

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Las Vegas, USA  

    researchmap

  • 発話への関心の有無判別における聞き手の判別基準の有効性

    Ryota Saito, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.12.14

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • ロック歌唱における「歪み声」と「ミックスボイス」の音響的特徴分析

    Tomonori Iemura, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.12.14

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • A Hybrid Text-to-Speech Based on Sub-Band Approach International conference

    Takuma Inoue, Sunao Hara, Masanobu Abe

    Asia-Pacific Signal and Information Processing Association 2014 Annual Summit and Conference (APSIPA ASC 2014)  Asia-Pacific Signal and Information Processing Association (APSIPA)

     More details

    Event date: 2014.12.9 - 2014.12.12

    Language:English   Presentation type:Poster presentation  

    Venue:Cambodia  

    researchmap

  • 音素波形選択型音声合成方式に用いるデータベースサイズと主観評価との関係分析

    Tadashi Inai, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.9.3 - 2014.9.5

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • クラウドセンシングにより収集された環境音のシンボル表現を用いた音地図構築手法

    Sunao Hara, Masanobu Abe, Noboru Sonehara

     More details

    Event date: 2014.9.3 - 2014.9.5

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • FLAG: 位置情報を基軸としたライフログ集約システム

    Akinori Kasai, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.6.28 - 2014.6.29

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • GPSデータから構築したネットワーク構造におけるノード次数に基づく行動分析法の検討

    Daisuke Fujioka, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.5.29 - 2014.5.30

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • スマートデバイスを用いたクラウドソーシングによる環境音収集システムの開発

    Sunao Hara, Akinori Kasai, Masanobu Abe, Noboru Sonehara

     More details

    Event date: 2014.5.24 - 2014.5.25

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 車載用音声対話システムにおけるユーザ負荷を考慮した対話戦略の検討

    Masaki Yamaoka, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.5.22 - 2014.5.23

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 非負値行列因子分解によるPC操作ログからの勤務パタン抽出

    Akihiko Hirayama, Sunao Hara, Masanobu Abe

     More details

    Event date: 2014.5.15 - 2014.5.16

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • クラウドソーシングによる環境音収集

    原直

    第15回 岡山情報通信技術研究会  岡山情報通信技術研究会

     More details

    Event date: 2014.4.30

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岡山大学  

    researchmap

  • New Approach to Emotional Information Exchange: Experience Metaphor Based on Life Logs International conference

    Masanobu Abe, Daisuke Fujioka, Kazuto Hamano, Sunao Hara, Rika Mochizuki, Tomoki Watanabe

    The 12th IEEE International Conference on Pervasive Computing and Communications (PerCom 2014)  IEEE

     More details

    Event date: 2014.3.24 - 2014.3.28

    Language:English   Presentation type:Poster presentation  

    Venue:Budapest, Hungary  

    researchmap

  • 他人との行動ログ比較による個人の行動特徴分析

    Ryo Seto, Masanobu Abe, Sunao Hara

     More details

    Event date: 2014.3.18 - 2014.3.21

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 滞在地と経路に着目した生活圏抽出法の検討

    松尾雄二,原直,阿部匡伸

    情報処理学会第76回全国大会  2014 

     More details

    Event date: 2014.3.11 - 2014.3.13

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:東京電機大学  

    researchmap

  • 滞在地の特徴量を利用した「特別な日」検索方式の検討

    林啓吾,原直,阿部匡伸

    情報処理学会第76回全国大会  2014 

     More details

    Event date: 2014.3.11 - 2014.3.13

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:東京電機大学  

    researchmap

  • 音声波形の高域利用による HMM 音声合成方式の評価

    井上拓真,原直,阿部匡伸,井島勇祐,水野秀之

    2014年日本音響学会春季研究発表会  2014 

     More details

    Event date: 2014.3.10 - 2014.3.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:日本大学  

    researchmap

  • クラウドソーシングによる環境音収集のためのスマートデバイス用アプリケーションの開発

    原直,笠井昭範,阿部匡伸,曽根原登

    電子情報通信学会 LOIS研究会  2014 

     More details

    Event date: 2014.3.7 - 2014.3.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 地理情報を活用したモバイル音声対話システムに関する研究

    原直

    情報処理学会 音声言語処理研究会(SIG–SLP第100回シンポジウム) 

     More details

    Event date: 2014.1.31 - 2014.2.1

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • PC操作ログから抽出したソフトウェアの使用様態による働き方の分析

    平山明彦,原直,阿部匡伸

    第15回IEEE広島支部学生シンポジウム(HISS 15th)  2013 

     More details

    Event date: 2013.11

    Language:Japanese   Presentation type:Poster presentation  

    Venue:鳥取大学  

    researchmap

  • GPSデータの滞在地に着目した行動振り返り支援方式

    林啓吾,原直,阿部匡伸

    平成25年度(第64回)電気・情報関連学会中国支部連合大会 

     More details

    Event date: 2013.10.19

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岡山大学  

    researchmap

  • GPSデータを用いた生活圏の動的生成のためのデータ量に関する検討

    松尾雄二,原直,阿部匡伸

    平成25年度(第64回)電気・情報関連学会中国支部連合大会 

     More details

    Event date: 2013.10.19

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岡山大学  

    researchmap

  • 混合正規分布を用いた声質変換法における分布数とスペクトル変換精度との関係性の検討

    遠藤一輝,原直,阿部匡伸

    平成25年度(第64回)電気・情報関連学会中国支部連合大会 

     More details

    Event date: 2013.10.19

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岡山大学  

    researchmap

  • 位置情報による行動分析を行うための経由地検出の検討

    瀬藤諒,原直,阿部匡伸

    平成25年度(第64回)電気・情報関連学会中国支部連合大会 

     More details

    Event date: 2013.10.19

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:岡山大学  

    researchmap

  • スペクトル包絡と基本周波数の変換が音声の個人性に与える影響の検討

    Keisuke Kawai, Sunao Hara, Masanobu Abe

     More details

    Event date: 2013.9.25 - 2013.9.27

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 音声波形の高域利用によるHMM音声合成の高品質化

    Takuma Inoue, Sunao Hara, Masanobu Abe, Yusuke Ijima, Hideyuki Mizuno

     More details

    Event date: 2013.9.25 - 2013.9.27

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 音響信号とマルチセンサー信号を利用したスマートフォンによる自動車の燃費推定

    Shohei Nanba, Sunao Hara, Masanobu Abe

     More details

    Event date: 2013.5.16 - 2013.5.17

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 音声情報案内システムにおけるBag-of-Wordsを用いた無効入力棄却モデルの可搬性の評価

    真嶋温佳, トーレスラファエル, 川波弘道, 原直, 松井知子, 猿渡洋, 鹿野清宏

    日本音響学会 2013年春季研究発表会  日本音響学会

     More details

    Event date: 2013.3.13 - 2013.3.15

    Language:Japanese   Presentation type:Oral presentation (general)  

    Venue:東京工科大学  

    researchmap

  • HMM音声合成と波形音声合成の混在による方式の評価

    Takuma Inoue, Sunao Hara, Masanobu Abe, Yusuke Ijima, Hideyuki Mizuno

     More details

    Event date: 2013.3.13 - 2013.3.15

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • The Individual Feature Analysis of the Network of the Stay Extracted from GPS Data

    Daisuke Fujioka, Sunao Hara, Masanobu Abe

     More details

    Event date: 2013.3.7 - 2013.3.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Examination of an event sampling process with the similar impression from the others' life log

    Kazuto Hamano, Sunao Hara, Masanobu Abe, Daisuke Fujioka, Rika Mochizuki, Tomoki Watanabe

     More details

    Event date: 2013.3.7 - 2013.3.8

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Bag-of-Wordsを用いた音声情報案内システム無効入力棄却モデルの可搬性の評価

    真嶋温佳, トーレスラファエル, 川波弘道, 原直, 松井知子, 猿渡洋, 鹿野清宏

    第15回日本音響学会関西支部若手研究者交流研究発表会  日本音響学会関西支部

     More details

    Event date: 2012.12.9

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • Development of a toolkit handling multiple speech-oriented guidance agents for mobile applications International conference

    Sunao Hara, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

    The 4th International Workshop on Spoken Dialog Systems (IWSDS2012) 

     More details

    Event date: 2012.11.28 - 2012.11.30

    Language:English   Presentation type:Poster presentation  

    Venue:Paris, France  

    researchmap

  • Evaluation of invalid input discrimination using BOW for speech-oriented guidance system International conference

    Haruka Majima, Rafael Torres, Hiromichi Kawanami, Sunao Hara, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano

    The 4th International Workshop on Spoken Dialog Systems (IWSDS2012) 

     More details

    Event date: 2012.11.28 - 2012.11.30

    Language:English   Presentation type:Poster presentation  

    Venue:Paris, France  

    researchmap

  • 音声情報システムにおける最大エントロピー法を用いた無効入力棄却の評価

    真嶋温佳, トーレスラファエル, 川波弘道, 原直, 松井知子, 猿渡洋, 鹿野清宏

    日本音響学会 2012年秋季研究発表会  日本音響学会

     More details

    Event date: 2012.9

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 携帯端末用の音声情報案内システム開発に向けたネットワークサービスの検討

    原直,川波弘道,猿渡洋,鹿野清宏

    音声言語処理研究会  2012  情報処理学会

     More details

    Event date: 2012.7

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • クラウド時代の新しい音声研究パラダイム

    秋葉友良,岩野公司,緒方淳,小川哲司,小野順貴,篠崎隆宏,篠田浩一,南條浩輝,西崎博光,西田昌史,西村竜一,原直,堀貴明

    音声言語処理研究会  2012  情報処理学会

     More details

    Event date: 2012.7

    Language:Japanese   Presentation type:Symposium, workshop panel (nominated)  

    researchmap

  • 音声情報案内システムにおけるBag-of-Wordsを特徴量とした無効入力の棄却

    真嶋温佳,トーレス・ラファエル,川波弘道,原直,松井知子,猿渡洋,鹿野清宏

    音声言語処理研究会  2012  情報処理学会

     More details

    Event date: 2012.7

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Causal analysis of task completion erros in spoken music retrieval interactions International conference

    Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    LREC 2012  2012  ELDA

     More details

    Event date: 2012.5

    Language:English   Presentation type:Poster presentation  

    Venue:Istanbul, Turkey  

    researchmap

  • Multi-band speech recognition using band-dependent confidence measures of blind source separation International conference

    Atsushi Ando, Hiromasa Ohashi, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    The Acoustics 2012  2012 

     More details

    Event date: 2012.5

    Language:English   Presentation type:Poster presentation  

    Venue:Hong Kong  

    researchmap

  • 携帯端末用音声情報案内システムのためのマイク入力に関する調査

    中清行,原直,川波弘道,猿渡洋,鹿野清宏

    電子情報通信学会 総合大会 情報・システムソサイエティ特別企画 学生ポスターセッション  2012  電子情報通信学会

     More details

    Event date: 2012.3

    Language:Japanese   Presentation type:Poster presentation  

    researchmap

  • 周波数帯域ごとの音源分離信頼度を利用したマルチバンド音声認識

    安藤厚志,大橋宏正,原直,北岡教英,武田一哉

    2012年春季研究発表会  2012  日本音響学会

     More details

    Event date: 2012.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 多様な利用環境における音声情報案内サービスソフトウェアの開発

    原直,川波弘道,猿渡洋,鹿野清宏

    電子情報通信学会総合大会  2012  電子情報通信学会

     More details

    Event date: 2012.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • ブラインド音源分離の信頼度を用いたマルチバンド音声認識

    安藤厚志,大橋宏正,原直,北岡教英,武田一哉

    音声研究会  2012  電子情報通信学会

     More details

    Event date: 2012.2

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation International conference

    Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    ASRU 2011  2011  IEEE

     More details

    Event date: 2011.12

    Language:English   Presentation type:Poster presentation  

    Venue:Hawaii  

    researchmap

  • Training Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations International conference

    Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

    APSIPA-ASC 2011  2011  APSIPA

     More details

    Event date: 2011.10

    Language:English   Presentation type:Poster presentation  

    Venue:Xi'an, China  

    researchmap

  • On-line detection of task incompletion for spoken dialog systems using utterance and behavior tag N-gram vectors International conference

    Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    IWSDS 2011  2011 

     More details

    Event date: 2011.9

    Language:English   Presentation type:Poster presentation  

    Venue:Granada, Spain  

    researchmap

  • Detection of task-incomplete dialogs based on utterance-and-behavior tag N-gram for spoken dialog systems International conference

    Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    Interspeech 2011  2011  ISCA

     More details

    Event date: 2011.8

    Language:English   Presentation type:Poster presentation  

    Venue:Florence, Italy  

    researchmap

  • Music recommendation system based on human-to-human conversation recognition International conference

    Hiromasa OHASHI, Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    HCIAmI'11  2011 

     More details

    Event date: 2011.7

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Nottingham, U.K.  

    researchmap

  • 雑談音声の認識に基づく楽曲連想再生システム

    大橋宏正,原直,北岡教英,武田一哉

    2011年春季研究発表会  2011  日本音響学会

     More details

    Event date: 2011.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • MLLR変換行列に基づいた音響特徴量生成による音響モデル学習

    伊藤新,原直,北岡教英,武田一哉

    2011年春季研究発表会  2011  日本音響学会

     More details

    Event date: 2011.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 音声対話システムにおける発話・行動タグN-gram を用いた課題未達成対話の検出手法と分析

    原直,北岡教英,武田一哉

    2011年春季研究発表会  2011  日本音響学会

     More details

    Event date: 2011.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • MLLR変換行列により制約された音響特徴量生成による頑健な音響モデル

    伊藤新,北岡教英,原直,武田一哉

    音声言語シンポジウム  2010  情報処理学会

     More details

    Event date: 2010.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 雑談音声の常時認識による楽曲提案システム

    大橋宏正,北岡教英,原直,武田一哉

    音声研究会  2010  電子情報通信学会

     More details

    Event date: 2010.10

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act N-gram International conference

    Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    INTERSPEECH2010  2010.9 

     More details

    Event date: 2010.9

    Language:English   Presentation type:Poster presentation  

    researchmap

  • 音声対話システムの発話系列N-gram を用いた課題未達成対話のオンライン検出

    原直,北岡教英,武田一哉

    2011年秋季研究発表会  2010  日本音響学会

     More details

    Event date: 2010.9

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Rapid acoustic model adaptation using inverse MLLR-based feature generation International conference

    Arata ITO, Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    The 20th International Congress on Acoustics (ICA2010)  2010.8 

     More details

    Event date: 2010.8

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Estimation method of user satisfaction using N-gram-based dialog history model for spoken dialog system International conference

    Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

    7th conference on International Language Resources and Evaluation (LREC'10)  2010.5 

     More details

    Event date: 2010.5

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • MLLR 変換行列により生成した音声特徴量に基づく高速モデル適応

    伊藤新,原直,北岡教英,武田一哉

    2011年秋季研究発表会  2010  日本音響学会

     More details

    Event date: 2010.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 楽曲連想再生のための文書特徴量と音響特徴量の対応付け

    高橋量衛, 大石康智, 原直, 北岡教英, 武田一哉

    第4回音声ドキュメント処理ワークショップ  2010 

     More details

    Event date: 2010.2

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 音声対話システムの対話履歴N-gramを利用したユーザ満足度推定手法

    原直,北岡教英,武田一哉

    音声言語シンポジウム  2009  情報処理学会

     More details

    Event date: 2009.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 複数音響モデルからの最適選択による音声認識

    伊藤新,原直,宮島千代美,北岡教英,武田一哉

    平成21年度電気関係学会 東海支部連合大会  2009 

     More details

    Event date: 2009.9

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 楽曲間の主観的類似度と音響的類似度との関連付けに関する検討

    平賀悠介,大石康智,原直,武田一哉

    2009年秋季研究発表会  2009  日本音響学会

     More details

    Event date: 2009.9

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 音声対話システムのユーザ満足度推論におけるネットワークモデルの構築と評価

    原直,北岡教英,武田一哉

    2009年春季研究発表会  2009  日本音響学会

     More details

    Event date: 2009.3

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • 音声認識システムの満足度評価におけるユーザモデル

    原直,北岡教英,武田一哉

    音声言語シンポジウム  2008  情報処理学会

     More details

    Event date: 2008.12

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • Data collection and usability study of a PC-based speech application in various user environments International conference

    Sunao Hara, Chiyomi Miyajima, Katsunobu Ito, Kazuya Takeda

    Oriental-COCOSDA 2008  2008.11 

     More details

    Event date: 2008.11

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • In-car speech data collection along with various multimodal signals International conference

    Akira Ozaki, Sunao Hara, Takashi Kusakawa, Chiyomi Miyajima, Takanori Nishino, Norihide Kitaoka, Katunobu Itou, Kazuya Takeda

    The 6th International Language Resources and Evaluation (LREC08)  2008.5 

     More details

    Event date: 2008.5

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

  • DNN-based Voice Conversion with Auxiliary Phonemic Information to Improve Intelligibility of Glossectomy Patients' Speech International conference

    Hiroki Murakami, Sunao Hara, Masanobu Abe

    APSIPA Annual Summit and Conference 2019  2019.11  APSIPA

     More details

    Language:English   Presentation type:Oral presentation (general)  

    Venue:Lanzhou, China  

    researchmap

  • 舌亜全摘出者の音韻明瞭性改善のためのマルチモーダルデータベースの構築

    村上博紀, 荻野聖也, 原直, 阿部匡伸, 佐藤匡晃, 皆木省吾

    日本音響学会2018年春季研究発表会  2018.3.26  日本音響学会

     More details

    Language:Japanese   Presentation type:Poster presentation  

    Venue:日本工業大学 宮代キャンパス  

    researchmap

  • クラウドソーシングによる賑わい音識別方式のフィールド実験評価

    朝田興平, 原直, 阿部匡伸

    日本音響学会2018年春季研究発表会  2018.3.25  日本音響学会

     More details

    Language:Japanese   Presentation type:Poster presentation  

    Venue:日本工業大学 宮代キャンパス  

    researchmap

  • DNN音声合成における感情付与のための継続時間長モデルの検討

    井上勝喜, 原直, 阿部匡伸, 北条伸克, 井島勇祐

    日本音響学会2018年春季研究発表会  2018.3.25  日本音響学会

     More details

    Language:Japanese   Presentation type:Poster presentation  

    Venue:日本工業大学 宮代キャンパス  

    researchmap

  • An online customizable music retrieval system with a spoken dialogue interface International conference

    Sunao Hara, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda

    4th Joint Meeting of ASA/ASJ  2006.11 

     More details

    Language:English   Presentation type:Poster presentation  

    researchmap

  • Preliminary Study of a Learning Effect on Users to Develop a New Evaluation of the Spoken Dialogue System International conference

    Sunao Hara, Ayako Shirose, Chiyomi Miyajima, Katsunobu Ito, Kazuya Takeda

    Oriental-COCOSDA 2005  2005.12 

     More details

    Language:English   Presentation type:Oral presentation (general)  

    researchmap

▼display all

Works

  • ChartEx

    Sunao Hara

    2017.5

     More details

    Work type:Software   Location:GitHub  

    Excel Addin for export chart as image file such as png, jpeg, and pdf.

    researchmap

  • オトログマッパー

    原 直

    2014
    -
    2016

     More details

    Work type:Software   Location:Google Play  

    研究用に作成した Android アプリケーション

    researchmap

  • TTX KanjiMenu Plugin

    Sunao Hara

    2007.3

     More details

    Work type:Software  

    researchmap

  • Pocket Julius

    原直

    2003.1

     More details

    Work type:Software  

    このパッケージは大語彙音声認識デコーダ Julius を Microsoft Pocket PC 2002 環境で動くようにした Pocket Julius のデモパッケージです.

    researchmap

Awards

  • 学会活動貢献賞

    2023.3   日本音響学会  

     More details

  • 教育貢献賞

    2022.3   岡山大学工学部   実験・演習科目における音声配信環境の構築

    原 直, 右田 剛史

     More details

  • 教育貢献賞

    2022.3   岡山大学工学部   教育用計算機システムの充実に関する貢献

    乃村 能成, 上野 史, 原 直, 渡邊 誠也

     More details

  • 社会貢献賞

    2021.3   岡山大学工学部  

     More details

  • ベストティーチャー賞

    2020.3   岡山大学工学部  

     More details

  • 教育貢献賞

    2019.3   岡山大学工学部  

    原直

     More details

  • 学会活動貢献賞

    2019.3   日本音響学会  

    原直

     More details

  • FIT奨励賞

    2018.9   第17回情報科学技術フォーラム  

    原直

     More details

  • 優秀論文賞

    2016.8   情報処理学会 DICOMO2016  

    小林将大, 原直, 阿部匡伸

     More details

  • 教育貢献賞

    2016.3   岡山大学工学部  

    原直

     More details

  • 平成25年度岡山工学振興会科学技術賞

    2013.7   公益財団法人岡山工学振興会  

    原直

     More details

  • 平成16年秋季研究発表会 ポスター賞

    2004.9   日本音響学会  

    原直

     More details

▼display all

Research Projects

  • Research on a machine learning method for estimating atmospheres of tourist attractions from environmental sounds considering concept drift

    Grant number:23K11335  2023.04 - 2027.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (C)

    原 直

      More details

    Grant amount:\4680000 ( Direct expense: \3600000 、 Indirect expense:\1080000 )

    researchmap

  • 協調的ライブ記録が支えるアクティブラーニング@オンラインの技術研究

    Grant number:21K12155  2021.04 - 2024.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    西村 竜一, 原 直

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4030000 ( Direct expense: \3100000 、 Indirect expense:\930000 )

    本研究では、アクティブラーニングをオンライン展開するために必要となる要素技術開発を行う。特に、グループワークをオンラインで実施することを想定し、学生と学生、学生と指導者、指導者と指導者の間の意思疎通を支援する技術を開発する。
    利用者に適応可能なフレキシブルなインタフェースを実現するため、話者判別法の検討を行った。特に、若年話者判別タスクに深層学習を適用し、異なる分類モデルの検証を行った。データセットには、クラウドソーシングで収集したオンライン実環境発話を使用した。
    オンラインでの意思疎通の際に、度々問題となる話者の早口の可視化手法を検討した。自動音声認識を応用して、単位時間あたりの発話文字数(発話速度)の計測を試みたが、早口の検出部分と聴講者が早口と感じるタイミングが異なることがあることを確認した。複数の自動音声認識エンジンを併用した実験では、人手で書き起こした正確な場合よりも自動音声認識の出力文字数が少なくなる傾向があった。この減少を早口の可視化のファクタとして利用することを検討した。
    音声と映像の併用特徴量を用いて、議論の様子の評価手法を検討した。音と画像を併用することで識別率の改善傾向が得られた。音響信号が取得できない場合でも、画像中の人の動きから判定できることがあることを確認した。多様な情報源から、適切な特徴量を見出す方式について、さらに検討する。
    敵対的生成ネットワークを用いた話者匿名化手法の検証を行った。匿名化処理後の音声に対し、自然性と話者認識可能性、話者弁別可能性を調査した。自然性について、従来法と比較してスコアの改善を得た。処理後音声からの話者特定は困難であることを確認した。話者弁別正解率から、処理後音声間の話者弁別は可能であることが示された。

    researchmap

  • 感情や個人性を高品質に表現可能なDNNに基づく音声合成方式の研究

    Grant number:21K11963  2021.04 - 2024.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    阿部 匡伸, 原 直

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4160000 ( Direct expense: \3200000 、 Indirect expense:\960000 )

    研究計画調書に記載した課題に関して,令和3年度(2021年度)に実施した内容は下記の通り。
    (課題1)非言語情報の表現モデル 「①-1感情表現モデルの検討」 については,話者性を制御できるように補助情報として話者IDを加えるとともに,感情の強さを感情IDのone-hotベクトルの重みによって合成時に制御できるようにモデル構造を改良した。「①-2感情強度表現方式の検討」についてはMOSテストによって感情の強さ制御性能を評価した。評価実験から感情IDの操作によって,“Happy”は感情の強さを制御可能であることが示された.一方,“Angry”は感情の強さが“Happy”ほど適切に制御できなかった。分析の結果, “Angry”は“Normal”に類似した音響パラメータ特徴となっており,今回の実験に使用した“Angry”データは細かな操作が難しい音声であることが明らかとなった。「①-3話者性の多様化への適用」については, ABX テストにより合成音声の話者性を評価した。Xとして自然音声か合成音声のどちらかを提示し,XがA話者とB話者のどちらに近いかを判定させた。自然音声では,“Happy” と“Normal” では正解率が約95%,“Angry” は正解率が約85%であり,他の感情に比べて話者性の差が小さいと考えられる.これに対して合成音声はどの感情においても70%程度となり,正解率は低下するものの話者性の識別はできていると考えられる。また,“Happy”は,話者性の識別率が高く,“Angry”は,話者によっては識別率の高い話者がいた。また,話者性の識別は声質の違いと感情の表出の違いとがあり,どちらが重要な要因であるかはさらなる実験が必要である。

    researchmap

  • 観光地の雰囲気可視化を可能とする簡易なアノテーションに基づく深層学習方式の研究

    Grant number:20K12079  2020.04 - 2023.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    原 直

      More details

    Authorship:Principal investigator 

    Grant amount:\4290000 ( Direct expense: \3300000 、 Indirect expense:\990000 )

    課題1に関連して,これまでに収録を行っていたデータ約800個に対して,1名による詳細なアノテーション付与を行った.課題3で検討した項目に準じて付与を行った.アノテーションのための環境音聴取時には,ストリートビューの映像も同時に提示することで,音だけに依存しない場の印象や雰囲気をアノテーションすることとした.
    課題2に関連して,課題1で得られたデータを利用し,単純なDNN方式による地域特性の分類を行った.分類器には,音源情報を入れることで,地域特性の推定精度が上がる.このとき,人手でつけた音源情報ではなく,音響信号と航空写真から推定した音源情報によっても,人手の情報と同程度の推定精度が得られることを示した.これにより,詳細アノテーションに比肩する情報を,簡易アノテーションに付加情報を与えることで得られる可能性が示唆された.さらに,Concept Driftに基づく適応方式の研究を進めた.
    課題3に関連して,昨年度に引き続き,ISO12913のサウンドスケープとしての考え方に基づいた研究を進めた.地域特性を表現するアノテーションとして,8種類の評価軸を用いることとした.ただし,人手の評価によるばらつきも考慮し,8つの評価軸から,より簡潔に表現することができる2種の評価軸で表す方式を採用し,課題2における推定方式の検討を進めた.
    課題2に挙げたConcept Driftの考え方を取り入れた研究として,国際会議1件,論文誌1件の発表をおこなった.また,各課題にて挙げた内容に基づき,国内会議2件の発表をおこなった.

    researchmap

  • Development of PBL instruction support system to measure learners' activities using acoustic signals

    Grant number:18K02862  2018.04 - 2022.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    NISIMURA Ryuichi

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4420000 ( Direct expense: \3400000 、 Indirect expense:\1020000 )

    In this study, we developed a technology to realize an instructor support system for group work by applying sound information processing technology. (1) Wearable devices worn by learners were improved by evaluating sound source separation features. (2) Deep learning identification algorithms were developed to visualize the participation attitudes of learners. (3) We developed a group work logging system and a support system for annotating group work participation information. (4) We developed a method for speaker anonymization of recorded group work speech by applying deep learning voice transformation. Due to the impact of the new coronavirus, we had to change our original plan and decided not to continue the face-to-face experiments, but we were able to obtain new knowledge that is useful for online education.

    researchmap

  • A Study on Algorithms to Improve Intelligibility of Glossectomy Patients' Speech Using Deep Neural Networks

    Grant number:18K11376  2018.04 - 2022.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    Abe Masanobu

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4290000 ( Direct expense: \3300000 、 Indirect expense:\990000 )

    In this study, we investigate voice conversion algorithms to improve intelligibility of speech uttered by a patient who has articulation disorders because of wide glossectomy and/or segmental mandibulectomy. To achieve real time processing, voice conversion directly modifies waveform using spectrum differential between a healthy speaker and a glossectomy speaker. The spectrum differential is estimated by Deep Neural Networks(DNN). To improve the performance, we proposed to use lip shapes as auxiliary inputs and to introduce knowledge distillation approach to make best use of phoneme labels as auxiliary inputs. Experimental results showed that both approaches work well, and phoneme labels with knowledge distillation has better performance than the usage of lip shapes.

    researchmap

  • 地域活性化政策立案のための音響信号による“賑い度”調査プラットフォームの研究開発

    2015.07 - 2018.03

    Ministry of International Affairs and Communiations  Strategic Information and Communications R&D Promotion Programme 

    Masanobu Abe, Sunao Hara

      More details

    Authorship:Coinvestigator(s)  Grant type:Competitive

    researchmap

  • Development of activity sound visualization method for personal evaluation of PBL

    Grant number:15K01069  2015.04 - 2018.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    Ryuichi Nishimura, Sunao Hara

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4680000 ( Direct expense: \3600000 、 Indirect expense:\1080000 )

    In this study, we have developed methods for realizing to support evaluations of students participating in PBL (Project-Based Learning) on the basis of visualization technologies of sound information. A method of detection of activated communication in group work from dialogue voice was examined. We developed the prototype system for presenting a whole condition of a group work using wearable voice recording terminals. In addition, sound source information visualization methods based on deep learning neural networks have been investigated.

    researchmap

  • A study on monitoring systems with privacy protection control

    Grant number:15K00128  2015.04 - 2018.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)  Grant-in-Aid for Scientific Research (C)

    Masanobu Abe, Sunao Hara

      More details

    Authorship:Coinvestigator(s) 

    Grant amount:\4420000 ( Direct expense: \3400000 、 Indirect expense:\1020000 )

    In this study, we try to develop a monitoring system that takes into account privacy issues. The monitoring system uses a living area and controls the degree of watching over and privacy protection by changing the granularity of the living area. The living area is defined a set of home, frequently visited place of stay, and travel route connecting them. We proposed an algorithm to generate the living area using GPS data collected for a long period. Experimental results show that the proposed algorithm can estimate living area with a precision of 0.85.
    We also carried out questionnaire on user preferences in terms of the monitoring and privacy protection levels used these living area. Experiment results showed that the people on the monitoring side wanted the monitoring system to allow them to monitor in detail. Conversely, it was observed for the people being monitored that the more detailed the monitoring, the greater the feelings of being surveilled intrusively.

    researchmap

  • Study on spoken dialogue system with safety consideration based on automatic estimation of driving situation

    Grant number:26730092  2014.04 - 2017.03

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)  Grant-in-Aid for Young Scientists (B)

    Hara Sunao

      More details

    Authorship:Principal investigator 

    Grant amount:\3640000 ( Direct expense: \2800000 、 Indirect expense:\840000 )

    We conducted below:
    (1) We used bio signals and driving information signals that are measured by the driver’s body and the vehicle’s body for estimating the driving road from the signals. Furthermore, we estimated driving load assuming use of sensors obtained by smartphones. (2) We evaluated the spoken dialog strategy from the viewpoint of user’s driving load. An objective evaluation by computer simulation was conducted by considering both dialog initiatives and the exitance of confirmation utterances. (3) We conducted a subjective evaluation to evaluate the performance of the proposed dialog strategy as a spoken dialog system. The subjects were asked to drive a simulator during talking with a spoken dialog system. (4) A dialog strategy based on graph search was introduced to realize a dialog strategy which considering the estimation result of the user's mental load. We evaluated the proposed system by objective evaluation and subjective evaluation.

    researchmap

▼display all

 

Class subject in charge

  • Seminar in Pattern Information Processing (2024academic year) Year-round  - その他

  • Exercises on Programming 1 (2024academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2024academic year) Second semester  - 水1~3

  • Exercises on Programming 1 (2024academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2024academic year) Second semester  - 水1~3

  • Advanced Internship for Interdisciplinary Medical Sciences and Engineering (2024academic year) Year-round  - その他

  • Technical English for Interdisciplinary Medical Sciences and Engineering (2024academic year) Late  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2024academic year) Year-round  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2024academic year) Year-round  - その他

  • Advanced Research on Multimodal Information Processing (2024academic year) Prophase  - その他

  • Introduction to Information Processing 2 (2024academic year) Second semester  - 月1~2

  • Introduction to Information Processing 2 (2024academic year) Second semester  - 木1~2

  • Information Technology Experiments B (Media Processing) (2024academic year) Third semester  - 火3~7,金3~7

  • Information Technology Experiments B (Media Processing) (2024academic year) Third semester  - 火3~7,金3~7

  • Engineering English (2024academic year) Late  - その他

  • Engineering English (2024academic year) Late  - その他

  • Advanced Study (2024academic year) Other  - その他

  • Technical Writing 1 (2024academic year) Prophase  - その他

  • Technical Writing 2 (2024academic year) Late  - その他

  • Technical Writing (2024academic year) Prophase  - その他

  • Technical Presentation (2024academic year) Late  - その他

  • Advanced Research on Speech Processing I (2024academic year) Prophase  - 月1~2

  • Advanced Research on Speech Processing II (2024academic year) Prophase  - 木5~6

  • Speech and Sound Interface (2024academic year) Late  - 水1~2

  • Exercises on Programming 1 (2023academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2023academic year) Second semester  - 水1~3

  • Exercises on Programming 1 (2023academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2023academic year) Second semester  - 水1~3

  • Advanced Internship for Interdisciplinary Medical Sciences and Engineering (2023academic year) Year-round  - その他

  • Technical English for Interdisciplinary Medical Sciences and Engineering (2023academic year) Late  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2023academic year) Year-round  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2023academic year) Year-round  - その他

  • Introduction to Information Processing 2 (2023academic year) Second semester  - 月1~2

  • Introduction to Information Processing 2 (2023academic year) Second semester  - 木1~2

  • Information Technology Experiments B (Media Processing) (2023academic year) Third semester  - 火3~7,金3~7

  • Information Technology Experiments B (Media Processing) (2023academic year) Third semester  - 火3~7,金3~7

  • Advanced Research on Speech Processing I (2023academic year) Prophase  - 月1~2

  • Advanced Research on Speech Processing II (2023academic year) Prophase  - 木5~6

  • Digital Signal Processing (2022academic year) Third semester  - 火1~2,木1~2

  • Exercises on Programming 1 (2022academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2022academic year) Second semester  - 水1~3

  • Exercises on Programming 1 (2022academic year) 1st semester  - 水1~3

  • Exercises on Programming 2 (2022academic year) Second semester  - 水1~3

  • Technical English for Interdisciplinary Medical Sciences and Engineering (2022academic year) Late  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2022academic year) Year-round  - その他

  • Introduction to Information Processing 2 (2022academic year) Second semester  - 木1~2

  • Introduction to Information Processing 2 (2022academic year) Second semester  - 月1~2

  • Information Technology Experiments B (Media Processing) (2022academic year) Third semester  - 火3~7,木3~7

  • Advanced Research on Speech Processing I (2022academic year) Prophase  - 月1~2

  • Advanced Research on Speech Processing II (2022academic year) Prophase  - 木5~6

  • Digital Signal Processing (2021academic year) Fourth semester  - 月1,月2,木1,木2

  • Exercises on Programming 1 (2021academic year) 1st semester  - 水1,水2,水3

  • Exercises on Programming 2 (2021academic year) Second semester  - 水1,水2,水3

  • Technical English for Interdisciplinary Medical Sciences and Engineering (2021academic year) Late  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2021academic year) Year-round  - その他

  • Introduction to Information Processing 2 (2021academic year) Second semester  - 月1~2

  • Introduction to Information Processing 2 (2021academic year) Second semester  - 木1~2

  • Information Technology Experiments B (Media Processing) (2021academic year) Third semester  - 火3,火4,火5,火6,火7,木3,木4,木5,木6,木7

  • Advanced Research on Speech Processing I (2021academic year) Prophase  - 木5~6

  • Advanced Research on Speech Processing II (2021academic year) Prophase  - 木5~6

  • Exercises on Programming (2020academic year) 1st and 2nd semester  - 水1,水2,水3

  • Exercises on Programming 1 (2020academic year) 1st semester  - 水1,水2,水3

  • Exercises on Programming 2 (2020academic year) Second semester  - 水1,水2,水3

  • Technical English for Interdisciplinary Medical Sciences and Engineering (2020academic year) Late  - その他

  • Research Works for Interdisciplinary Medical Sciences and Engineering (2020academic year) Year-round  - その他

  • Introduction to Information Processing 2 (2020academic year) Second semester  - 月1,月2

  • Introduction to Information Processing 2 (2020academic year) Second semester  - 木1,木2

  • Information Technology Experiments B (Media Processing) (2020academic year) Third semester  - 火3,火4,火5,火6,木3,木4,木5,木6

  • Laboratory Work on Information Technology III (2020academic year) Third semester  - 火3,火4,火5,火6

  • Laboratory Work on Information Technology IV (2020academic year) Third semester  - 木3,木4,木5,木6

  • Advanced Research on Speech Processing I (2020academic year) Prophase  - 木5,木6

▼display all

 

Academic Activities

  • 日本音響学会第24回関西支部若手研究者交流研究発表会

    Role(s):Planning, management, etc.

    日本音響学会関西支部  ( オンライン(Gather.Town) ) 2021.12.4

     More details

    Type:Academic society, research group, etc. 

    日本音響学会関西支部では,若手研究者間での研究交流及び相互啓発を目的として,1998年より「若手研究者交流研究発表会」を開催しています。これまでに数多くの若手研究者の方々に参加・発表していただきました。本年度は,新型コロナウイルス感染症に対する各種イベントへの社会的要請等を鑑み,オンラインで開催します。研究者間の交流だけでなく産学の交流も深めるために,賛助会員の企業展示も開催する予定です。

    researchmap